U-Statistics-based Tests for Multiple Genes in Genetic Association Studies

Authors

  • Zhi Wei,

    1. Department of Computer Science, New Jersey Institute of Technology, University Heights, Newark, NJ 07102, U.S.A.
    Search for more papers by this author
  • Mingyao Li,

    1. Department of Biostatistics and Epidemiology, University of Pennsylvania School of Medicine, Philadelphia, PA 19104-6021, U.S.A.
    Search for more papers by this author
  • Timothy Rebbeck,

    1. Department of Biostatistics and Epidemiology, University of Pennsylvania School of Medicine, Philadelphia, PA 19104-6021, U.S.A.
    Search for more papers by this author
  • Hongzhe Li

    Corresponding author
    1. Department of Biostatistics and Epidemiology, University of Pennsylvania School of Medicine, Philadelphia, PA 19104-6021, U.S.A.
      *Address for correspondence: Hongzhe Li, Ph.D. Department of Biostatistics and Epidemiology University of Pennsylvania School of Medicine 423 Guardian Drive - 920 Blockley Hall Philadelphia, PA 19104-6021. Tel: (215) 573-5038, Fax: (215) 573-4865 E-mail: hongzhe@mail.med.upenn.edu
    Search for more papers by this author

*Address for correspondence: Hongzhe Li, Ph.D. Department of Biostatistics and Epidemiology University of Pennsylvania School of Medicine 423 Guardian Drive - 920 Blockley Hall Philadelphia, PA 19104-6021. Tel: (215) 573-5038, Fax: (215) 573-4865 E-mail: hongzhe@mail.med.upenn.edu

Summary

As our understanding of biological pathways and the genes that regulate these pathways increases, consideration of these biological pathways has become an increasingly important part of genetic and molecular epidemiology. Pathway-based genetic association studies often involve genotyping of variants in genes acting in certain biological pathways. Such pathway-based genetic association studies can potentially capture the highly heterogeneous nature of many complex traits, with multiple causative loci and multiple alleles at some of the causative loci. In this paper, we develop two nonparametric test statistics that consider simultaneously the effects of multiple markers. Our approach, which is based on data-adaptive U-statistics, can handle both qualitative data such as case-control data and quantitative continuous phenotype data. Simulations demonstrate that our proposed methods are more powerful than standard methods, especially when there are multiple risk loci each with small genetic effects. When the number of disease-predisposing genes is small, the data-adaptive weighting of the U-statistics over all the markers produces similar power to commonly used single marker tests. We further illustrate the potential merits of our proposed tests in the analysis of a data set from a pathway-based candidate gene association study of breast cancer and hormone metabolism pathways. Finally, potential applications of the proposed tests to genome-wide association studies are also discussed.

Ancillary