Get access

Bayesian hierarchical mixture modeling to assign copy number from a targeted CNV array

Authors

  • Niall Cardin,

    Corresponding author
    1. Department of Statistics, University of Oxford, Oxford, United Kingdom
    2. University of California, San Francisco, San Francisco, California
    • Department of Statistics, University of Oxford, 1 South Parks Road, Oxford OX13TG, UK
    Search for more papers by this author
  • Chris Holmes,

    1. Department of Statistics, University of Oxford, Oxford, United Kingdom
    2. The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
    3. MRC Harwell, United Kingdom
    Search for more papers by this author
  • The Wellcome Trust Case Control Consortium,

  • Peter Donnelly,

    1. Department of Statistics, University of Oxford, Oxford, United Kingdom
    2. The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
    Search for more papers by this author
  • Jonathan Marchini

    1. Department of Statistics, University of Oxford, Oxford, United Kingdom
    2. The Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom
    Search for more papers by this author

Abstract

Accurate assignment of copy number at known copy number variant (CNV) loci is important for both increasing understanding of the structural evolution of genomes as well as for carrying out association studies of copy number with disease. As with calling SNP genotypes, the task can be framed as a clustering problem but for a number of reasons assigning copy number is much more challenging. CNV assays have lower signal-to-noise ratios than SNP assays, often display heavy tailed and asymmetric intensity distributions, contain outlying observations and may exhibit systematic technical differences among different cohorts. In addition, the number of copy-number classes at a CNV in the population may be unknown a priori. Due to these complications, automatic and robust assignment of copy number from array data remains a challenging problem. We have developed a copy number assignment algorithm, CNVCALL, for a targeted CNV array, such as that used by the Wellcome Trust Case Control Consortium's recent CNV association study. We use a Bayesian hierarchical mixture model that robustly identifies both the number of different copy number classes at a specific locus as well as relative copy number for each individual in the sample. This approach is fully automated which is a critical requirement when analyzing large numbers of CNVs. We illustrate the methods performance using real data from the Wellcome Trust Case Control Consortium's CNV association study and using simulated data. Genet. Epidemiol. 2011. © 2011 Wiley-Liss, Inc. 35: 536-548, 2011

Get access to the full text of this article

Ancillary