Incorporating covariates into integrated factor analysis of multi‐view data
Summary
In modern biomedical research, it is ubiquitous to have multiple data sets measured on the same set of samples from different views (i.e., multi‐view data). For example, in genetic studies, multiple genomic data sets at different molecular levels or from different cell types are measured for a common set of individuals to investigate genetic regulation. Integration and reduction of multi‐view data have the potential to leverage information in different data sets, and to reduce the magnitude and complexity of data for further statistical analysis and interpretation. In this article, we develop a novel statistical model, called supervised integrated factor analysis (SIFA), for integrative dimension reduction of multi‐view data while incorporating auxiliary covariates. The model decomposes data into joint and individual factors, capturing the joint variation across multiple data sets and the individual variation specific to each set, respectively. Moreover, both joint and individual factors are partially informed by auxiliary covariates via nonparametric models. We devise a computationally efficient Expectation–Maximization (EM) algorithm to fit the model under some identifiability conditions. We apply the method to the Genotype‐Tissue Expression (GTEx) data, and provide new insights into the variation decomposition of gene expression in multiple tissues. Extensive simulation studies and an additional application to a pediatric growth study demonstrate the advantage of the proposed method over competing methods.
Citing Literature
Number of times cited according to CrossRef: 6
- Xing Gao, Sungwon Lee, Gen Li, Sungkyu Jung, Covariate‐driven factorization by thresholding for multiblock data, Biometrics, 10.1111/biom.13352, 0, 0, (2020).
- Jun Young Park, Eric F. Lock, Integrative factorization of bidimensionally linked matrices, Biometrics, 10.1111/biom.13141, 76, 1, (61-74), (2019).
- Antonia N. Kaczkurkin, Tyler M. Moore, Aristeidis Sotiras, Cedric Huchuan Xia, Russell T. Shinohara, Theodore D. Satterthwaite, Approaches to Defining Common and Dissociable Neurobiological Deficits Associated with Psychopathology in Youth, Biological Psychiatry, 10.1016/j.biopsych.2019.12.015, (2019).
- Emilio Carrizosa, Vanesa Guerrero, Dolores Romero Morales, Albert Satorra, Enhancing Interpretability in Factor Analysis by Means of Mathematical Optimization, Multivariate Behavioral Research, 10.1080/00273171.2019.1677208, (1-15), (2019).
- Sandra E. Safo, Jeongyoun Ahn, Yongho Jeon, Sungkyu Jung, Sparse generalized eigenvalue problem with application to canonical correlation analysis for integrative analysis of methylation and gene expression data, Biometrics, 10.1111/biom.12886, 74, 4, (1362-1371), (2018).
- Huichen Zhu, Gen Li, Eric F Lock, Generalized integrative principal component analysis for multi-type data with block-wise missing structure, Biostatistics, 10.1093/biostatistics/kxy052, (2018).




