Optimal Two-Stage Design for Case-Control Association Analysis Incorporating Genotyping Errors

Authors

  • Y. Zuo,

    1. Department of Statistics and Probability, Michigan State University, East Lansing, MI 48824, USA
    Search for more papers by this author
    • Equal contribution

  • G. Zou,

    1. Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100080, China
    2. Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY 14642, USA
    Search for more papers by this author
    • Equal contribution

  • J. Wang,

    1. Academy of Mathematics and Systems Science, Chinese Academy of Sciences, Beijing 100080, China
    2. Department of Applied Mathematics and Physics, Graduate School of Informatics, Kyoto University, Kyoto 606-8501, Japan
    Search for more papers by this author
  • H. Zhao,

    1. Department of Epidemiology and Public Health, Yale University School of Medicine, New Haven, CT 06520, USA
    Search for more papers by this author
  • H. Liang

    Corresponding author
    1. Department of Biostatistics and Computational Biology, University of Rochester, Rochester, NY 14642, USA
    Search for more papers by this author

*Corresponding author: H. Liang, Department of Biostatistics and Computational Biology, University of Rochester, 601 Elmwood Avenue, Box 630 Rochester, NY 14642, U.S.A. E-mail: hua_liang@urmc.rochester.edu

Summary

Two-stage design is a cost effective approach for identifying disease genes in genetic studies and it has received much attention recently. In general, there are two types of two-stage designs that differ on the methods and samples used to measure allele frequencies in the first stage: (1) Individual genotyping is used in the first stage; (2) DNA pooling is used in the first stage. In this paper, we focus on the latter. Zuo et al. (2006) investigated statistical power of such a design, among other things, but the cost of the study was not taken into account. The purpose of this paper is to study the optimal design under the given overall cost. We investigate how to allocate the resources to the two stages. Note that in addition to the measurement errors associated with DNA pooling, genotyping errors are also unavoidable with individual genotyping. Therefore, we discuss the optimal design combining genotyping errors associated with individual genotyping. The joint statistical distributions of test statistics in the first and second stages are derived. For a fixed cost, our results show that the optimal design requires no additional samples in the second stage but only that the samples in the first stage be re-used. When the second stage uses an entirely independent sample, however, the optimal design under a given cost depends on the population allele frequency and allele frequency difference between the case and control groups. For the current genotyping costs, we can roughly allocate 1/3 to 1/2 of the total sample size to the first stage for screening.

Ancillary