Volume 73, Issue 4
BIOMETRIC PRACTICE

Incorporating covariates into integrated factor analysis of multi‐view data

Gen Li

Corresponding Author

E-mail address: gl2521@cumc.columbia.edu

Department of Biostatistics, Mailman School of Public Health, Columbia University, New York 10032, New York, U.S.A.

email: gl2521@cumc.columbia.eduSearch for more papers by this author
Sungkyu Jung

Department of Statistics, University of Pittsburgh, Pittsburgh 15260, Pennsylvania, U.S.A.

Search for more papers by this author
First published: 13 April 2017
Citations: 6

Summary

In modern biomedical research, it is ubiquitous to have multiple data sets measured on the same set of samples from different views (i.e., multi‐view data). For example, in genetic studies, multiple genomic data sets at different molecular levels or from different cell types are measured for a common set of individuals to investigate genetic regulation. Integration and reduction of multi‐view data have the potential to leverage information in different data sets, and to reduce the magnitude and complexity of data for further statistical analysis and interpretation. In this article, we develop a novel statistical model, called supervised integrated factor analysis (SIFA), for integrative dimension reduction of multi‐view data while incorporating auxiliary covariates. The model decomposes data into joint and individual factors, capturing the joint variation across multiple data sets and the individual variation specific to each set, respectively. Moreover, both joint and individual factors are partially informed by auxiliary covariates via nonparametric models. We devise a computationally efficient Expectation–Maximization (EM) algorithm to fit the model under some identifiability conditions. We apply the method to the Genotype‐Tissue Expression (GTEx) data, and provide new insights into the variation decomposition of gene expression in multiple tissues. Extensive simulation studies and an additional application to a pediatric growth study demonstrate the advantage of the proposed method over competing methods.

Number of times cited according to CrossRef: 6

  • Covariate‐driven factorization by thresholding for multiblock data, Biometrics, 10.1111/biom.13352, 0, 0, (2020).
  • Integrative factorization of bidimensionally linked matrices, Biometrics, 10.1111/biom.13141, 76, 1, (61-74), (2019).
  • Approaches to Defining Common and Dissociable Neurobiological Deficits Associated with Psychopathology in Youth, Biological Psychiatry, 10.1016/j.biopsych.2019.12.015, (2019).
  • Enhancing Interpretability in Factor Analysis by Means of Mathematical Optimization, Multivariate Behavioral Research, 10.1080/00273171.2019.1677208, (1-15), (2019).
  • Sparse generalized eigenvalue problem with application to canonical correlation analysis for integrative analysis of methylation and gene expression data, Biometrics, 10.1111/biom.12886, 74, 4, (1362-1371), (2018).
  • Generalized integrative principal component analysis for multi-type data with block-wise missing structure, Biostatistics, 10.1093/biostatistics/kxy052, (2018).

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.