Get access

Optimized Selection of Unrelated Subjects for Whole-Genome Sequencing Studies of Rare High-Penetrance Alleles

Authors

  • Todd L. Edwards,

    1. Vanderbilt Epidemiology Center, Division of Epidemiology, Department of Medicine, Vanderbilt University, Nashville,, Tennessee
    2. Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee
    Search for more papers by this author
  • Chun Li

    Corresponding author
    1. Department of Biostatistics, Vanderbilt University, Nashville, Tennessee
    • Center for Human Genetics Research, Vanderbilt University, Nashville, Tennessee
    Search for more papers by this author

Correspondence to : Chun Li, Center for Human Genetics Research, 519 Light Hall, Vanderbilt University Medical Center, Nashville, TN 37212-0700. E-mail: chun.li@vanderbilt.edu

Abstract

Sequencing studies using whole-genome or exome scans are still more expensive than genome-wide association studies on a per-subject basis. As a result, only a subset of subjects from a larger study will be selected for sequencing. To perform an agnostic investigation of the entire genome, subjects may be selected that capture independent ancestral lineages, i.e., founder genomes, and thus avoid redundant information from regions that were inherited identical by descent (IBD) from a common ancestor. We present SampleSeq2 that can be used to select a subset of optimally unrelated subjects with minimal IBD sharing. It also can be used to estimate the number, GT, of founder chromosomes in a sample or select the minimum number of subjects that will carry a target GT. We evaluated SampleSeq2 compared to a random draw of a small number of subjects both by simulation and using the Anabaptist genealogy. SampleSeq2 provided an increase in GT relative to a random draw across a range of small sample sizes. This increase in founder chromosomes improves the power of association tests, mitigates the effect of cryptic relatedness on parameter estimates, increases the total yield of alleles from sequencing, and minimizes the average size of regions shared IBD around disease alleles in cases. Genet. Epidemiol. 36:472-479, 2012. © 2012 Wiley Periodicals, Inc.

Ancillary