Volume 63, Issue 3

A Segmentation/Clustering Model for the Analysis of Array CGH Data

F. Picard

Corresponding Author

UMR INA P‐G/ENGREF/INRA MIA 518, Paris, France

email:picard@inapg.frSearch for more papers by this author
S. Robin

UMR INA P‐G/ENGREF/INRA MIA 518, Paris, France

Search for more papers by this author
E. Lebarbier

UMR INA P‐G/ENGREF/INRA MIA 518, Paris, France

Search for more papers by this author
J.‐J. Daudin

UMR INA P‐G/ENGREF/INRA MIA 518, Paris, France

Search for more papers by this author
First published: 23 January 2007
Citations: 43

Abstract

Summary Microarray‐CGH (comparative genomic hybridization) experiments are used to detect and map chromosomal imbalances. A CGH profile can be viewed as a succession of segments that represent homogeneous regions in the genome whose representative sequences share the same relative copy number on average. Segmentation methods constitute a natural framework for the analysis, but they do not provide a biological status for the detected segments. We propose a new model for this segmentation/clustering problem, combining a segmentation model with a mixture model. We present a new hybrid algorithm called dynamic programming–expectation maximization (DP–EM) to estimate the parameters of the model by maximum likelihood. This algorithm combines DP and the EM algorithm. We also propose a model selection heuristic to select the number of clusters and the number of segments. An example of our procedure is presented, based on publicly available data sets. We compare our method to segmentation methods and to hidden Markov models, and we show that the new segmentation/clustering model is a promising alternative that can be applied in the more general context of signal processing.

Number of times cited according to CrossRef: 43

  • The revisited knockoffs method for variable selection in L 1 -penalized regressions , Communications in Statistics - Simulation and Computation, 10.1080/03610918.2020.1775850, (1-14), (2020).
  • Navigating through the r packages for movement, Journal of Animal Ecology, 10.1111/1365-2656.13116, 89, 1, (248-267), (2019).
  • Identifying stationary phases in multivariate time series for highlighting behavioural modes and home range settlements, Journal of Animal Ecology, 10.1111/1365-2656.13105, 89, 1, (44-56), (2019).
  • High throughput genotyping of structural variations in a complex plant genome using an original Affymetrix® axiom® array, BMC Genomics, 10.1186/s12864-019-6136-9, 20, 1, (2019).
  • Fine-Scale Tracking of Ambient Temperature and Movement Reveals Shuttling Behavior of Elephants to Water, Frontiers in Ecology and Evolution, 10.3389/fevo.2019.00004, 7, (2019).
  • iSeg: an efficient algorithm for segmentation of genomic and epigenomic data, BMC Bioinformatics, 10.1186/s12859-018-2140-3, 19, 1, (2018).
  • Genomic region detection via Spatial Convex Clustering, PLOS ONE, 10.1371/journal.pone.0203007, 13, 9, (e0203007), (2018).
  • Dynamic stochastic block models: parameter estimation and detection of changes in community structure, Statistics and Computing, 10.1007/s11222-017-9788-9, (2017).
  • undefined, 2016 IEEE International Conference on Functional-Structural Plant Growth Modeling, Simulation, Visualization and Applications (FSPMA), 10.1109/FSPMA.2016.7818290, (68-74), (2016).
  • Piecewise Regression Mixture for Simultaneous Functional Data Clustering and Optimal Segmentation, Journal of Classification, 10.1007/s00357-016-9212-8, 33, 3, (374-411), (2016).
  • iBATCGH: Integrative Bayesian Analysis of Transcriptomic and CGH Data, Statistical Analysis for High-Dimensional Data, 10.1007/978-3-319-27099-9_6, (105-123), (2016).
  • Combining Expression Data and Knowledge Ontology for Gene Clustering and Network Reconstruction, Cognitive Computation, 10.1007/s12559-015-9349-5, 8, 2, (217-227), (2015).
  • Learning the Intensity of Time Events With Change-Points, IEEE Transactions on Information Theory, 10.1109/TIT.2015.2448087, 61, 9, (5148-5171), (2015).
  • Detecting changes in the annual movements of terrestrial migratory species: using the first-passage time to document the spring migration of caribou, Movement Ecology, 10.1186/s40462-014-0019-0, 2, 1, (2014).
  • Fast and Parallel Algorithm for Population-Based Segmentation of Copy-Number Profiles, Computational Intelligence Methods for Bioinformatics and Biostatistics, 10.1007/978-3-319-09042-9_18, (248-258), (2014).
  • Fast estimation of the Integrated Completed Likelihood criterion for change-point detection problems with applications to Next-Generation Sequencing data, Signal Processing, 10.1016/j.sigpro.2013.11.029, 98, (233-242), (2014).
  • Robust Regression Analysis of Copy Number Variation Data based on a Univariate Score, PLoS ONE, 10.1371/journal.pone.0086272, 9, 2, (e86272), (2014).
  • Identifying multiple change points in a linear mixed effects model, Statistics in Medicine, 10.1002/sim.5996, 33, 6, (1015-1028), (2013).
  • An Unsupervised Approach for Automatic Activity Recognition Based on Hidden Markov Model Regression, IEEE Transactions on Automation Science and Engineering, 10.1109/TASE.2013.2256349, 10, 3, (829-835), (2013).
  • Identification of target clusters by using the restricted normal mixture model, Journal of Applied Statistics, 10.1080/02664763.2012.759192, 40, 5, (941-960), (2013).
  • Genome-wide landscape of liver X receptor chromatin binding and gene regulation in human macrophages, BMC Genomics, 10.1186/1471-2164-13-50, 13, 1, (50), (2012).
  • A regression model for estimating DNA copy number applied to capture sequencing data, Bioinformatics, 10.1093/bioinformatics/bts448, 28, 18, (2357-2365), (2012).
  • Fast detection of de novo copy number variants from SNP arrays for case-parent trios, BMC Bioinformatics, 10.1186/1471-2105-13-330, 13, 1, (2012).
  • The evolution of gene expression levels in mammalian organs, Nature, 10.1038/nature10532, 478, 7369, (343-348), (2011).
  • Joint segmentation, calling, and normalization of multiple CGH profiles, Biostatistics, 10.1093/biostatistics/kxq076, 12, 3, (413-428), (2011).
  • undefined, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 10.1109/ICASSP.2011.5946259, (3608-3611), (2011).
  • Fast MCMC sampling for hidden markov models to determine copy number variations, BMC Bioinformatics, 10.1186/1471-2105-12-428, 12, 1, (2011).
  • Discovering Regulatory Overlapping RNA Transcripts, Journal of Computational Biology, 10.1089/cmb.2010.0267, 18, 3, (295-303), (2011).
  • Approximation algorithms for speeding up dynamic programming and denoising aCGH data, ACM Journal of Experimental Algorithmics, 10.1145/1963190.2063517, 16, (2011).
  • On the Adaptive Partition Approach to the Detection of Multiple Change-Points, PLoS ONE, 10.1371/journal.pone.0019754, 6, 5, (e19754), (2011).
  • Preprocessing and downstream analysis of microarray DNA copy number profiles, Briefings in Bioinformatics, 10.1093/bib/bbq004, 12, 1, (10-21), (2010).
  • Segmentation of the mean of heteroscedastic data via cross-validation, Statistics and Computing, 10.1007/s11222-010-9196-x, 21, 4, (613-632), (2010).
  • A hidden process regression model for functional data description. Application to curve discrimination, Neurocomputing, 10.1016/j.neucom.2009.12.023, 73, 7-9, (1210-1221), (2010).
  • Exploratory analysis of functional data via clustering and optimal segmentation, Neurocomputing, 10.1016/j.neucom.2009.11.022, 73, 7-9, (1125-1141), (2010).
  • Discovering Regulatory Overlapping RNA Transcripts, Research in Computational Molecular Biology, 10.1007/978-3-642-12683-3_8, (110-122), (2010).
  • A Double-Layered Mixture Model for the Joint Analysis of DNA Copy Number and Gene Expression Data, Journal of Computational Biology, 10.1089/cmb.2009.0019, 17, 2, (121-137), (2010).
  • Detecting Genomic Aberrations Using Products in a Multiscale Analysis, Biometrics, 10.1111/j.1541-0420.2009.01337.x, 66, 3, (684-693), (2009).
  • A Study on Development of Scoring Campaign System, Korean Journal of Applied Statistics, 10.5351/KJAS.2009.22.1.001, 22, 1, (1-16), (2009).
  • Accounting for uncertainty when assessing association between copy number and disease: a latent class model, BMC Bioinformatics, 10.1186/1471-2105-10-172, 10, 1, (2009).
  • MSMAD: a computationally efficient method for the analysis of noisy array CGH data, Bioinformatics, 10.1093/bioinformatics/btp022, 25, 6, (703-713), (2009).
  • A Penalized Spline Based Method for Detecting the DNA Copy Number Alteration in an Array-CGH Experiment, Korean Journal of Applied Statistics, 10.5351/KJAS.2009.22.1.115, 22, 1, (115-127), (2009).
  • Microarray Comparative Genomic Hybridisation Analysis Incorporating Genomic Organisation, and Application to Enterobacterial Plant Pathogens, PLoS Computational Biology, 10.1371/journal.pcbi.1000473, 5, 8, (e1000473), (2009).
  • A probe-density-based analysis method for array CGH data: simulation, normalization and centralization, Bioinformatics, 10.1093/bioinformatics/btn321, 24, 16, (1749-1756), (2008).

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.