SEARCH

SEARCH BY CITATION

Keywords:

  • Association studies;
  • extreme sampling;
  • genetic models;
  • genotype relative risks;
  • replication

Summary

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Discussion
  6. Acknowledgements
  7. References

Using extreme phenotypes for association studies can improve statistical power . We study the impact of using samples with extremely high or low traits on the alternative model space, the genotype relative risks, and the genetic models in association studies. We prove the following results: when the risk allele causes high-trait values, the more extreme the high traits, the larger the genotype relative risks, which is not always true for using extreme low traits; we also prove that a genetic model theoretically changes with more extreme trait except for the recessive or dominant models. Practically, however, the impact of deviations from the true genetic model at a functional locus due to selective sampling is virtually negligible. The implications of our findings are discussed. Numerical values are reported for illustrations.


Introduction

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Discussion
  6. Acknowledgements
  7. References

Designs that use extreme high and low traits have been employed in both genetic linkage and association studies (Risch & Zhang, 1995; Forest & Feingold, 2000; Zheng et al., 2006). It has been shown that using extreme traits can improve statistical power for genetic association studies compared to the random sampling of all traits (Slatkin, 1999; Abecasis et al., 2001; Xiong et al., 2002; Chen et al., 2005; Chen & Li, 2011). The use of extreme high and low traits can be applied to design a case-control association study based on a threshold model. One such an example is described in Sims et al. (2008), who studied Wnt pathway genes for bone mass density (BMD) using 170 cases and 174 controls selected with high BMDs (Z-score from 1.51 to 3.97) and low BMDs (Z-score from 1.5 to 3.33), respectively, and demonstrated that using the extreme sampling can robustly detect genes of relevant effect sizes. A comparison between the use of extreme sampling and case-control data was reported by Yang et al. (2010). Extreme sampling has recently been applied to detect genes for rare variants (Guey et al., 2011; Li et al., 2011).

In this paper, we study the impact of extreme sampling on the alternative model space, the genotype relative risks (GRRs), and the modes of inheritance (genetic models). The GRRs are used in designing genetic association studies and genetic models play important roles in testing associations. The additive model, counting the number of the minor alleles in the genotype, is commonly used. Common non-additive models include the recessive and dominant models. When a genetic model is correctly specified, an optimal test is obtained and applied, which is sub-optimal when the model is mis-specified (Freidlin et al., 2002). If the genetic model changes substantially using extreme traits, which is likely because it is defined based on both phenotypes and genotypes, it may affect the power to detect association even though the samples with extreme traits are used. Moreover, it may also affect the interpretation of replication results when extreme traits are used in the replication.

Methods

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Discussion
  6. Acknowledgements
  7. References

Notation

Assume the marker of interest is in complete linkage disequilibrium (LD) with a disease locus. The alleles of the marker are denoted as b and B. Without loss of generality (WLG), when the marker is associated with a quantitative trait, let b be the risk allele, which causes a higher trait value. Three genotypes are denoted as inline image, where inline image counts allele b in the genotype. The frequency of inline image is denoted as inline image (inline image). Let X be a quantitative trait, given by inline image, where μ is the overall mean, g is the genetic value for the genotype G, and e is a non-genetic random error with mean inline image and variance inline image (WLG). Assume that G and e are independent. The value of g is given by inline image, d and a (inline image) for inline image, G1 and G2, respectively, where inline image, 0, or a if the genetic model under the random sampling is recessive, additive or dominant. We do not consider any under-dominant (inline image) or over-dominant (inline image) models. WLG, let inline image. Denote inline image and inline image. Denote the conditional distribution of X given inline image as inline image. The marginal distribution of X is inline image. The null hypothesis of no association is given by inline image, under which inline image.

We consider a threshold model with the truncation points inline image, that is, we only sample individuals with inline image or inline image, where inline image are pre-specified in the design stage. If the distribution of X can be estimated using previous data (Xu et al., 1999), u and v can be chosen as its 100c1th and inline imageth percentiles, respectively, where c1 (c2) can be, say, 0.01, 0.10 or 0.20. If an estimation is not available, one may consider using the extreme rank selection, which does not require an underlying distribution of X (Chen et al., 2005; Zheng et al., 2006). Define the study population as inline image. Only individuals whose traits belong to inline image are genotyped. We call the design with inline image as extreme sampling. Sampling only high traits is a special case by letting inline image. Denote the penetrance for inline image as inline image (inline image), where inline image for inline image. The GRRs are given by inline image (inline image). Under H0, inline image is equivalent to inline image or inline image.

The Model Space Under Extreme Sampling

Under the alternative hypothesis H1, one is more interested in the three common genetic models: recessive (inline image), additive (inline image and inline image) and dominant (inline image) and any model between the recessive and dominant models. That is, without any under- and over-dominant models, the model space formed by inline image under the random sampling of all traits can be written as inline image, which includes the three common genetic models. Define inline image under H1. Hence, inline image and M can be indexed by θ as inline image. The model space M under random sampling is constrained because θ only belongs to [0, 1]. Test statistics are more powerful under the constrained model space than under the unconstrained model space: inline image (Zheng et al., 2009a). Next, we first study if the model space in terms of GRRs inline image or penetrances inline image would be further constrained under extreme sampling.

Denote the density function of X given inline image as inline image (inline image) and inline image. Under extreme sampling, conditional on G, we assume (i) inline image is an increasing function of x, (ii) inline image is symmetric with respect to inline image, i.e. inline image, and (iii) inline image is a decreasing function of inline image. These assumptions are all satisfied if the trait (after log or power transformation) follows a normal distribution. It can be shown that (iv) inline image is a decreasing function of inline image as follows. For inline image, inline imageinline image due to inline image and (i).

Let inline image belongs to M. Then, under extreme sampling, we have inline image and inline image. For inline image, inline image and inline image. It follows that inline image, which is equivalent to inline image. Thus, if we define inline image and inline image, then for any inline image, inline image under extreme sampling, which leads to Result 1.

Result 1. The constraints on the mean traits inline imageunder the random sampling hold on the GRRs under extreme sampling. Thus, using extreme sampling does not further reduce the model space compared to the random sampling.

Impact on the GRRs Under Extreme Sampling

Why would using extreme traits improve power of association studies? Result 1 shows it is not due to a smaller model space. Hence, we study how GRRs change under extreme sampling. Write the GRRs as

  • display math

Taking the partial derivatives of inline image with respect to u and v, we have, for any inline image,

  • display math

We prove inline image under some conditions and inline image for any inline image. For any inline image, by property (iv), inline image. Then inline image. Thus, there exists inline image such that for any inline image and the given inline image, inline image. For the second one, since inline image and inline image, we have inline image for any inline image. In the above results, either u or v is fixed. When they both change, let inline image and denote inline image. Then,

  • display math

In order to have inline image, we only need inline image inline image. That is, u cannot decrease much faster than v increases. This leads to Result 2.

Result 2. Wheninline image is an increasing function of x and inline image is symmetric about μ, the GRRs monotonically increase either when v increases for a fixed inline image, or when u decreases for a fixed inline image that is large enough, or when u decreases and v increases simultaneously provided that inline image is large enough.

The result shows that when v is not extreme enough, going extreme on u alone (extreme low trait values) does not necessarily increase the GRRs. Practically, it implies that, if one chooses either inline imageth upper percentile for extremely high traits or 100c1 th lower percentile for extreme low traits for association studies, one should let c2 be smaller than c1 unless the cost of screening is a concern. On the other hand, using the thresholds inline image and inline image, the probability that an individual will be selected for genotyping decreases as u becomes smaller and/or v becomes larger, and converges to 0 as inline image and inline image. Hence, there is a trade-off between the power/sample size and the cost of screening extreme samples.

We plot the values of GRR λ2 due to extreme sampling with inline image and inline image and inline image, where inline image and p the frequency of allele b and inline image. We choose inline image as the standard normal distribution. GRR inline image given inline image, inline image and inline image is presented in Figure 1, where inline image (more extreme), 0.10, 0.20 (moderate extreme), 0.30 and 0.50. Figure 1 shows that the GRR increases with extreme high traits (larger v), but not necessarily for extreme low traits (smaller u), which depends on how extreme v is. The plots for other parameter values including various allele frequencies (e.g. inline image or 0.45 and other a and d) are similar (the results are not presented here).

image

Figure 1. A plot of GRR λ2 given inline image, inline image, inline image, and inline image and inline image for inline image (extreme truncation), 0.10, 0.20, 0.30 and 0.50.

Download figure to PowerPoint

The numerical values are also reported in Table 1 by focusing on the three scenarios: (a) inline image and both change, (b) c1 is fixed but c2 changes, and (c) c2 is fixed but c1 changes. Numerical values show that, when inline image, inline image and inline image, λ2 increases monotonically from 1.126 with inline image to 1.496 with inline image, and up to 2.154 with inline image. Thus, choosing extreme low traits for a given v does not necessarily increases the GRRs.

Table 1. GRR λ2 under extreme sampling given inline image, inline image, inline image, and inline image and inline image. The smaller c1, the more extreme are the low-trait values. The smaller c2, the more extreme are the high trait values. Results are reported for the three different scenarios (a)–(c)
 GRR
c1inline imageinline imageinline imageinline image
0.011.6141.6141.0721.012
0.101.3872.2011.3871.082
0.201.3032.1541.4961.126
0.301.2482.0871.5271.150
0.501.1671.9661.5171.167
 c2inline imageinline imageinline image
 0.011.6142.2011.966
 0.101.0721.3871.517
 0.201.0341.2141.358
 0.301.0221.1441.268
 0.501.0121.0821.167

Impact on Genetic Models Under Extreme Sampling

The genetic model under the random sampling is indexed by θ such that inline image (i.e. inline image). Under the recessive, additive and dominant models, inline image, respectively. We study how the genetic models change from random sampling to extreme sampling. When inline image (or inline image), i.e., inline image (or inline image), it is straightforward to show inline image (or inline image). However, in general, when inline image for inline image, we do not have inline image with the same genetic model θ. The above arguments are summarized in Result 3.

Result 3. For anyinline image, the recessive (inline image) or dominant (inline image) models under the random sampling will be retained under the extreme sampling. However, the genetic model indexed by inline image under the random sampling will not be retained under the extreme sampling.

The above result shows that the additive model under the random sampling would not be the additive model under the extreme sampling, which depends on how inline image is chosen and the shape of the distribution of X. In particular, to retain the additive model under the extreme sampling from the random sampling, inline image and inline image have to satisfy inline image, under which inline image  inline image (as inline image under the additive model) and inline image and inline image. Thus, inline image, i.e., inline image, the additive model under the extreme sampling.

To examine the deviation of the genetic model under the extreme sampling from the genetic model under the random sampling, we define an induced model as inline image and compare it to θ. The numerical values are reported in Table 2, which shows that inline image is actually quite close to θ, especially when inline image, which corresponds to inline image and inline image. Allele frequency has little impact on the genetic models under the random sampling and extreme sampling. Hence, the power of testing association using extreme traits would be little or not affected when θ is used as the genetic model under extreme sampling even though inline image is the true model.

Table 2. The induced genetic model inline image based on the GRRs under extreme sampling given inline image, 0.30 and 0.45, and inline image and inline image. The entries are the values of inline image given θ
   θ
pc1c20.10.30.50.70.9
0.150.010.010.1050.3160.5240.7240.911
  0.300.1360.3810.5930.7740.930
  0.500.1320.3710.5820.7660.927
 0.300.010.0710.2300.4130.6240.816
  0.300.1010.3050.5070.7070.903
  0.500.1090.3210.5250.7210.909
 0.500.010.0740.2360.4210.6310.869
  0.300.0930.2840.4810.6850.894
  0.500.1000.3010.5020.7020.901
0.300.010.010.1020.3080.5140.7150.908
  0.300.1360.3800.5920.7740.930
  0.500.1320.3710.5810.7660.927
 0.300.010.0710.2290.4110.6230.865
  0.300.1000.3020.5030.7040.902
  0.500.1080.3200.5240.7200.909
 0.500.010.0730.2350.4200.6300.869
  0.300.0930.2830.4800.6840.893
  0.500.1000.3010.5010.7010.901
0.450.010.010.0980.2990.5030.7070.905
  0.300.1360.3790.5910.7730.930
  0.500.1320.3710.5810.7650.927
 0.300.010.0700.2280.4100.6210.865
  0.300.0990.3000.5010.7020.901
  0.500.1080.3180.5220.7190.908
 0.500.010.0730.2350.4200.6300.869
  0.300.0920.2820.4790.6820.893
  0.500.1000.3000.5000.7010.900

Discussion

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Discussion
  6. Acknowledgements
  7. References

We have shown that, when using extreme high or low traits for association studies, the power is increased because the GRRs are increased and not because the model space (the space for the alternative hypothesis of association) is reduced. Although the genetic models are generally changed from random sampling to extreme sampling except for the recessive or dominant models, the changes are small so that the statistical power for association studies is not expected to be affected. Therefore, when the true genetic model is known, the same statistic used for analyzing the association under the random sampling should also be used for analyzing the association with extreme sampling. On the other hand, the true genetic model for many complex traits is rarely known. Genetic association studies based on a single genetic model may not be robust at all. Hence, robust tests (Freidlin et al., 2002; So & Sham, 2011) may be applied to detect association. Our results imply that the same robust tests under the random sampling can also be applied under extreme sampling. For replication studies, as far as the modes of inheritance and the model space are concerned, our results imply that it is valid to use samples with extreme traits, especially the higher traits, to replicate the results obtained based on random samples or less extreme traits. Finally, a limitation of our results is that they are derived when the marker is in complete LD with a functional locus. In practice, however, markers are likely in LD with the functional loci. In this case, the genetic models at the markers of interest are more complicated and may not be the same as those of the functional loci (Zheng et al., 2009b).

Acknowledgements

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Discussion
  6. Acknowledgements
  7. References

The work of J Xu was partially supported by Research Grant No. R-155-000-112-112 of National University of Singapore. The work of A Yuan was supported in part by the National Center for Research Resources at NIH grant 2G12RR003048. The authors do not have conflict of interest.

References

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Discussion
  6. Acknowledgements
  7. References