Testing for Hardy Weinberg Equilibrium in National Household Surveys that Collect Family-Based Genetic Data

Authors

  • Yan Li,

    Corresponding author
    1. Department of Mathematics, University of Texas at Arlington, TX, USA
    2. Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, USA
    Search for more papers by this author
  • Zhaohai Li,

    1. Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, USA
    2. Department of Statistics, The George Washington University, Washington, DC, USA
    Search for more papers by this author
  • Barry I. Graubard

    1. Biostatistics Branch, Division of Cancer Epidemiology and Genetics, National Cancer Institute, Bethesda, Maryland, USA
    Search for more papers by this author

Yan Li, Ph.D., Assistant Professor, University of Texas at Arlington, Department of Mathematics, 411 s. Nedderman Street, Arlington, TX 76019, USA. Tel: 817-272-5683; Fax: 817-272-5892; E-mail: liyanna@uta.edu

Summary

In population-based household surveys, for example, the National Health and Nutrition Examination Survey (NHANES), blood-related individuals are often sampled from the same household. Therefore, genetic data collected from national household surveys are often correlated due to two levels of clustering (correlation) with one induced by the multistage geographical cluster sampling, and the other induced by biological inheritance among multiple participants within the same sampled household. In this paper, we develop efficient statistical methods that consider the weighting effect induced by the differential selection probabilities in complex sample designs, as well as the clustering (correlation) effects described above. We examine and compare the magnitude of each level of clustering effects under different scenarios and identify the scenario under which the clustering effect induced by one level dominates the other. The proposed method is evaluated via Monte Carlo simulation studies and illustrated using the Hispanic Health and Nutrition Survey (HHANES) with simulated genotype data.

Ancillary