By expressing prior distributions as general stochastic processes, nonparametric Bayesian methods provide a flexible way to incorporate prior knowledge and constrain the latent structure in statistical inference. The Indian buffet process (IBP) is such an example that can be used to define a prior distribution on infinite binary features, where the exchangeability among subjects is assumed. The phylogenetic Indian buffet process (pIBP), a derivative of IBP, enables the modeling of non-exchangeability among subjects through a stochastic process on a rooted tree, which is similar to that used in phylogenetics, to describe relationships among the subjects. In this paper, we study the theoretical properties of IBP and pIBP under a binary factor model. We establish the posterior contraction rates for both IBP and pIBP and substantiate the theoretical results through simulation studies. This is the first work addressing the frequentist property of the posterior behaviors of IBP and pIBP. We also demonstrated its practical usefulness by applying pIBP prior to a real data example arising in the field of cancer genomics where the exchangeability among subjects is violated.
Sparse Canonical Correlation Analysis (CCA) has received considerable attention in high-dimensional data analysis to study the relationship between two sets of random variables. However, there has been remarkably little theoretical statistical foundation on sparse CCA in high-dimensional settings despite active methodological and applied research activities. In this paper, we introduce an elementary sufficient and necessary characterization such that the solution of CCA is indeed sparse, propose a computationally efficient procedure, called CAPIT, to estimate the canonical directions, and show that the procedure is rate-optimal under various assumptions on nuisance parameters. The procedure is applied to a breast cancer dataset from The Cancer Genome Atlas project. We identify methylation probes that are associated with genes, which have been previously characterized as prognosis signatures of the metastasis of breast cancer.