The copyright line for this article was changed on 17th February 2016 after original online publication.
Identifying Rare Variants With Optimal Depth of Coverage and Cost-Effective Overlapping Pool Sequencing
Version of Record online: 28 OCT 2013
© 2013 The Authors. *Genetic Epidemiology published by Wiley Periodicals, Inc.
This is an open access article under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non-commercial and no modifications or adaptations are made.
Volume 37, Issue 8, pages 820–830, December 2013
How to Cite
Cao, C.-C., Li, C., Huang, Z., Ma, X. and Sun, X. (2013), Identifying Rare Variants With Optimal Depth of Coverage and Cost-Effective Overlapping Pool Sequencing. Genet. Epidemiol., 37: 820–830. doi: 10.1002/gepi.21769
- Issue online: 13 NOV 2013
- Version of Record online: 28 OCT 2013
- Manuscript Accepted: 27 SEP 2013
- Manuscript Revised: 9 SEP 2013
- Manuscript Received: 23 APR 2013
- National Basic Research Program of China. Grant Number: 2012CB316501
- National Natural Science Foundation of China. Grant Number: 61073141
- group testing;
- overlapping pool sequencing;
- rare variants
Genome-wide association studies have identified hundreds of genetic variants associated with complex diseases although most variants identified so far explain only a small proportion of heritability, suggesting that rare variants are responsible for missing heritability. Identification of rare variants through large-scale resequencing becomes increasing important but still prohibitively expensive despite the rapid decline in the sequencing costs. Nevertheless, group testing based overlapping pool sequencing in which pooled rather than individual samples are sequenced will greatly reduces the efforts of sample preparation as well as the costs to screen for rare variants. Here, we proposed an overlapping pool sequencing to screen rare variants with optimal sequencing depth and a corresponding cost model. We formulated a model to compute the optimal depth for sufficient observations of variants in pooled sequencing. Utilizing shifted transversal design algorithm, appropriate parameters for overlapping pool sequencing could be selected to minimize cost and guarantee accuracy. Due to the mixing constraint and high depth for pooled sequencing, results showed that it was more cost-effective to divide a large population into smaller blocks which were tested using optimized strategies independently. Finally, we conducted an experiment to screen variant carriers with frequency equaled 1%. With simulated pools and publicly available human exome sequencing data, the experiment achieved 99.93% accuracy. Utilizing overlapping pool sequencing, the cost for screening variant carriers with frequency equaled 1% in 200 diploid individuals dropped to at least 66% at which target sequencing region was set to 30 Mb.