Contract grant sponsor: NIH; Contract grant number: MH084678; Contract grant sponsor: NSF; Contract grant number: HG005927.
A Sample Selection Strategy for Next-Generation Sequencing
Version of Record online: 3 AUG 2012
© 2012 Wiley Periodicals, Inc.
Volume 36, Issue 7, pages 696–709, November 2012
How to Cite
Kang, C. J. and Marjoram, P. (2012), A Sample Selection Strategy for Next-Generation Sequencing. Genet. Epidemiol., 36: 696–709. doi: 10.1002/gepi.21664
- Issue online: 12 OCT 2012
- Version of Record online: 3 AUG 2012
- Manuscript Accepted: 13 JUN 2012
- Manuscript Revised: 29 MAY 2012
- Manuscript Received: 23 FEB 2012
- NIH. Grant Number: MH084678
- NSF. Grant Number: HG005927
- SNP discovery;
Next-generation sequencing technology provides us with vast amounts of sequence data. It is efficient and cheaper than previous sequencing technologies, but deep resequencing of entire samples is still expensive. Therefore, sensible strategies for choosing subsets of samples to sequence are required. Here we describe an algorithm for selection of a sub-sample of an existing sample if one has either of two possible goals in mind: maximizing the number of new polymorphic sites that are detected, or improving the efficiency with which the remaining unsequenced individuals can have their types imputed at newly discovered polymorphisms. We then describe a variation on our algorithm that is more focused on detecting rarer variants. We demonstrate the performance of our algorithm using simulated data and data from the 1000 Genomes Project.