An Integrative Segmentation Method for Detecting Germline Copy Number Variations in SNP Arrays

Authors


Correspondence to: Jianxin Shi, Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, MD 20854.

Abstract

Germline copy number variations (CNVs) are a major source of genetic variation in humans. In large-scale studies of complex diseases, CNVs are usually detected from data generated by single nucleotide polymorphism (SNP) genotyping arrays. In this paper, we develop an integrative segmentation method, SegCNV, for detecting CNVs integrating both log R ratio (LRR) and B allele frequency (BAF). Based on simulation studies, SegCNV had modestly better power to detect deletions and substantially better power to detect duplications compared with circular binary segmentation (CBS) that relies purely on LRRs; and it had better power to detect deletions and a comparable performance to detect duplications compared with PennCNV and QuantiSNP. In two Hapmap subjects with deep sequence data available as a gold standard, SegCNV detected more true short deletions than PennCNV and QuantiSNP. For 21 short duplications validated experimentally in the AGRE dataset, SegCNV, QuantiSNP, and PennCNV detected all of them while CBS detected only three. SegCNV is much faster than the HMM-based (where HMM is hidden Markov model) methods, taking only several seconds to analyze genome-wide data for one subject. Genet. Epidemiol. 36:373–383, 2012. © 2012 Wiley Periodicals, Inc.

Ancillary