Volume 36, Issue 22
Research Article

A Dirichlet process mixture model for clustering longitudinal gene expression data

Jiehuan Sun

Department of Biostatistics, Yale University, New Haven, 06520 CT, U.S.A.

Search for more papers by this author
Jose D. Herazo‐Maya

Pulmonary, Critical Care and Sleep Medicine, Yale School of Medicine, New Haven, 06520 CT, U.S.A.

Search for more papers by this author
Naftali Kaminski

Pulmonary, Critical Care and Sleep Medicine, Yale School of Medicine, New Haven, 06520 CT, U.S.A.

Search for more papers by this author
Hongyu Zhao

Department of Biostatistics, Yale University, New Haven, 06520 CT, U.S.A.

Search for more papers by this author
Joshua L. Warren

Corresponding Author

E-mail address: joshua.warren@yale.edu

Department of Biostatistics, Yale University, New Haven, 06520 CT, U.S.A.

Correspondence to: Joshua L. Warren, Department of Biostatistics, Yale University, New Haven, CT 06520, U.S.A.

E‐mail: joshua.warren@yale.edu

Search for more papers by this author
First published: 15 June 2017
Citations: 1

Abstract

Subgroup identification (clustering) is an important problem in biomedical research. Gene expression profiles are commonly utilized to define subgroups. Longitudinal gene expression profiles might provide additional information on disease progression than what is captured by baseline profiles alone. Therefore, subgroup identification could be more accurate and effective with the aid of longitudinal gene expression data. However, existing statistical methods are unable to fully utilize these data for patient clustering. In this article, we introduce a novel clustering method in the Bayesian setting based on longitudinal gene expression profiles. This method, called BClustLonG, adopts a linear mixed‐effects framework to model the trajectory of genes over time, while clustering is jointly conducted based on the regression coefficients obtained from all genes. In order to account for the correlations among genes and alleviate the high dimensionality challenges, we adopt a factor analysis model for the regression coefficients. The Dirichlet process prior distribution is utilized for the means of the regression coefficients to induce clustering. Through extensive simulation studies, we show that BClustLonG has improved performance over other clustering methods. When applied to a dataset of severely injured (burn or trauma) patients, our model is able to identify interesting subgroups. Copyright © 2017 John Wiley & Sons, Ltd.

Number of times cited according to CrossRef: 1

  • On the importance of similarity characteristics of curve clustering and its applications, Pattern Recognition Letters, 10.1016/j.patrec.2020.04.024, (2020).

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.