An expectation–maximization program for determining allelic spectrum from CNV data (CoNVEM): insights into population allelic architecture and its mutational history

Authors

  • Tom R. Gaunt,

    1. MRC Centre for Causal Analyses in Translational Epidemiology and Bristol Genetic Epidemiology Laboratories, Department of Social Medicine, University of Bristol, Oakfield House, Bristol, United Kingdom
    Search for more papers by this author
  • Santiago Rodriguez,

    1. MRC Centre for Causal Analyses in Translational Epidemiology and Bristol Genetic Epidemiology Laboratories, Department of Social Medicine, University of Bristol, Oakfield House, Bristol, United Kingdom
    Search for more papers by this author
  • Philip A.I. Guthrie,

    1. MRC Centre for Causal Analyses in Translational Epidemiology and Bristol Genetic Epidemiology Laboratories, Department of Social Medicine, University of Bristol, Oakfield House, Bristol, United Kingdom
    Search for more papers by this author
  • Ian N.M. Day

    Corresponding author
    1. MRC Centre for Causal Analyses in Translational Epidemiology and Bristol Genetic Epidemiology Laboratories, Department of Social Medicine, University of Bristol, Oakfield House, Bristol, United Kingdom
    • MRC Centre for Causal Analyses in Translational Epidemiology and Bristol Genetic Epidemiology Laboratories, Department of Social Medicine, University of Bristol, Oakfield House, Oakfield Grove, Bristol BS8 2BN, UK
    Search for more papers by this author

  • Communicated by A. Jamie Cuticchia

Abstract

Copy number variations (CNVs) are a common form of genetic variation in which the allelic population contains a distribution of copy numbers of a particular gene (or other large sequence/region). The simplest forms describe deletion (0 vs. 1 copy) or duplication (1 vs. 2) events. However, some CNV loci contain a much wider range of copy numbers, such as that seen for the CCL3L1 locus. CNV classification methods typically only describe the total (diploid) copy number, leaving the underlying genotypic and allelic frequency distribution unknown. We have developed an expectation–maximization approach for the analysis of data from tandem CNVs that enables estimation of both the allelic copy number frequency distribution and the expected copy number genotype and class distribution under the Hardy-Weinberg equilibrium (HWE). The CNV expectation-maximization algorithm is available in a Web-tool (CoNVEM, http://apps.biocompute.org.uk/convem/), which graphically and numerically presents CNV allele and genotype distributions. We have applied this approach to the analysis of salivary amylase (AMY1A, B, and C), CCL3L1, and SULT1A1 CNVs using published data, and present inferences about the evolutionary history of these loci based on CoNVEM results. Hum Mutat 31:1–7, 2010. © 2010 Wiley-Liss, Inc.

Ancillary