• sequencing;
  • ancestry;
  • cryptic relatedness;
  • study design;
  • subject selection

Sequencing studies using whole-genome or exome scans are still more expensive than genome-wide association studies on a per-subject basis. As a result, only a subset of subjects from a larger study will be selected for sequencing. To perform an agnostic investigation of the entire genome, subjects may be selected that capture independent ancestral lineages, i.e., founder genomes, and thus avoid redundant information from regions that were inherited identical by descent (IBD) from a common ancestor. We present SampleSeq2 that can be used to select a subset of optimally unrelated subjects with minimal IBD sharing. It also can be used to estimate the number, GT, of founder chromosomes in a sample or select the minimum number of subjects that will carry a target GT. We evaluated SampleSeq2 compared to a random draw of a small number of subjects both by simulation and using the Anabaptist genealogy. SampleSeq2 provided an increase in GT relative to a random draw across a range of small sample sizes. This increase in founder chromosomes improves the power of association tests, mitigates the effect of cryptic relatedness on parameter estimates, increases the total yield of alleles from sequencing, and minimizes the average size of regions shared IBD around disease alleles in cases. Genet. Epidemiol. 36:472-479, 2012. © 2012 Wiley Periodicals, Inc.