Robust Quantitative Trait Association Tests in the Parent-Offspring Triad Design: Conditional Likelihood-Based Approaches

Authors


*Corresponding author: Dr. John Jen Tai, Division of Biostatistics, College of Public Health, National Taiwan University, Room 538, No. 17, Xu-Zhou Road, Taipei, 10055 Taiwan. Fax: +886-2-33668042. E-mail: jjtai@ntu.edu.tw

Summary

Association studies, based on either population data or familial data, have been widely applied to mapping of genes underlying complex diseases. In family-based association studies, using case-parent triad families, the popularly used transmission/disequilibrium test (TDT) was proposed for avoidance of spurious association results caused by other confounders such as population stratification. Originally, the TDT was developed for analysis of binary disease data. Extending it to allow for quantitative trait analysis of complex diseases and for robust analysis of binary diseases against the uncertainty of mode of inheritance has been thoroughly discussed. Nevertheless, studies on robust analysis of quantitative traits for complex diseases received relatively less attention. In this paper, we use parent-offspring triad families to demonstrate the feasibility of establishment of the robust candidate-gene association tests for quantitative traits. We first introduce the score statistics from the conditional likelihoods based on parent-offspring triad data under various genetic models. By applying two existing robust procedures we then construct the robust association tests for analysis of quantitative traits. Simulations are conducted to evaluate empirical type I error rates and powers of the proposed robust tests. The results show that these robust association tests do exhibit robustness against the effect of misspecification of the underlying genetic model on testing powers.

Introduction

With the large amount of genetic variants that are available for use in association analyses nowadays, association studies, designed using either population data or familial data, have been proved useful in mapping of genes underlying complex diseases (Kruglyak & Lander, 1995; Risch & Merikangas, 1996; Schork et al., 2001; Cordell & Clayton, 2005). Traditional population-based studies are easier to collect data in practice, but often challenged for the population stratification problem that causes spurious results. To adjust for population stratification, methods using information of unlinked markers, for example, were proposed by several researchers (Devlin & Roeder, 1999; Pritchard & Rosenberg, 1999; Reich & Goldstein, 2001). Instead of use of statistical methods to adjust for population stratification in population-based association studies, on the other hand, family-based association studies avoid the stratification problem by use of the matched data structure generated from the familial data. For example, for case-parent triad families a matched case-pseudocontrol data structure can be created (Falk & Rubinstein, 1987). Based on this data structure a family-based association test, the transmission/disequilibrium test (TDT), was proposed to avoid the stratification problem (Spielman et al., 1993). Methods extending the TDT to allow for various situations, such as multi-allele locus (Sham & Curtis, 1995; Bickeboller & Clerget-Darpoux, 1995; Kaplan et al., 1997), missing parental data (Spielman & Ewens, 1998; Horvath & Laird, 1998; Weinberg, 1999), or genotyping errors (Gordon et al., 2001; Douglas et al., 2002), have been widely proposed. Pros and cons of the population-based and family-based association approaches have been extensively discussed in the literature (Gauderman et al., 1999; Teng & Risch, 1999; McGinnis et al., 2002; Tabor et al., 2002; Cardon & Palmer, 2003).

By specifying the genetic model (mode of inheritance) of a disease, Schaid & Sommer (1993) showed that the TDT is identical to the score test derived from a conditional likelihood under the additive model. Because score tests are powerful in local alternatives (Cox & Hinkley, 1974), the TDT can be regarded as an optimal test for the additive model. Analogously, for dominant and recessive models, the optimal score tests can also be derived based on the corresponding conditional likelihoods. In practice, however, since the genetic model for a complex disease under investigation is usually unknown, developing a robust test that has relatively stable power over all plausible genetic models is thus required.

For binary disease traits, Zheng et al. (2002) developed TDT-type robust association tests using case-parent triad data. When dealing with complex diseases, the phenotype of an individual is likely to be measured as a quantitative trait, such as bone mineral density used in diagnosis of bone disorders (Deng et al., 2002) and bronchial responsiveness or the numbers of eosinophils in airway tissues used in allergic asthma studies (Zhang et al., 1999). For these quantitative traits, several association tests have been proposed by researchers (Allison, 1997; Rabinowitz, 1997; Xiong et al., 1998; Abecasis et al., 2000; Monks & Kaplan, 2000; Sun et al., 2000; Liu et al., 2002; Alcais & Abel, 2003; Diao & Lin, 2006). In particular, the likelihood-based QTDT/orthogonal model proposed by Abecasis et al. (2000) and the TDT-based method proposed by Monks & Kaplan (2000) are feasible in wider family structures than other methods and are widely used (Li et al., 2008). Along with the idea of Li et al. (2008), the two tests are allele-based association tests which means that they set genotype scores according to the count of alleles and so provide evidence of association for an allele at a marker locus. Thus they are expected to have powerful performances for the additive model rather than for other models. In practical studies, when a genetic model cannot be certainly specified, we may consider the additive model as an intermediate solution. However, such a measure still cannot avoid suffering the potential loss of power in some situations. A method that is robust against the influence on testing power due to misspecification of genetic models is therefore required. In this paper, we will apply two robust procedures to establish robust candidate-gene association tests for quantitative traits using parent-offspring triad data.

We will first demonstrate the feasibility of using conditional likelihood of parent-offspring triad data to extract association information between a candidate-gene and the quantitative trait, and then derive the optimal score tests under three typical genetic models. Based on the robust procedures suggested by Gastwirth (1966, 1985); Davies (1977) and Freidlin et al. (1999), we will construct two robust statistics for assessing putative association between a candidate gene and a quantitative trait. The quantitative trait under investigation is assumed to have a distribution which belongs to the exponential family rather than a normal distribution. Statistical powers of the proposed robust tests are compared with the optimal score test under the correct model and with the score tests under incorrect models. According to the simulation results, the proposed robust tests do exhibit robustness with acceptable powers.

Methods

Notations

Suppose that in order to extract information of association between a bi-allelic candidate locus and a quantitative trait, n independent parent-offspring triad families are sampled. In each family, genotypes of the parents and their offspring at the candidate locus and the measurement of quantitative trait of the offspring are collected. Let M label the mutant allele and m the normal allele at the candidate locus. This type of data can be classified into 10 categories by the parental mating type and the offspring genotype as shown in Table 1. Let gp=i represent the parents with the ith parental mating type, gc=j the offspring with the jth genotype, yijk the measurement of the quantitative trait of the kth offspring member in the category (i, j), where i= 1, … , 6, j= 0, 1, 2, k= 1, … , nij, and nij the number of offspring members in the category (i, j). The number of triad families with parental mating type i in the sample is denoted by ni and inline image.

Table 1.  Classification of parent-offspring triad families according to the parental mating types and the offspring genotypes.
Parental mating types (gp)Offspring genotypes (gc)Offspring quantitative traits (yijk)Number of triad families
MM×MM (i= 1)MM (j= 2)y121,…,ymath imagen12
MM×Mm (i= 2)MM (j= 2)y221,…,ymath imagen22
Mm (j= 1)y211,…,ymath imagen21
MM×mm (i= 3)Mm (j= 1)y311,…,ymath imagen31
Mm×Mm (i= 4)MM (j= 2)y421,…,ymath imagen42
Mm (j= 1)y411,…,ymath imagen41
mm (j= 0)y401,…,ymath imagen40
Mm×mm (i= 5)Mm (j= 1)y511,…,ymath imagen51
mm (j= 0)y501,…,ymath imagen50
mm×mm (i= 6)mm (j= 0)y601,…,ymath imagen60

Construction of Conditional Likelihood for Triad Families

Schaid & Sommer (1993) proposed a conditional likelihood method for test of association between a candidate gene and a disease. This method was developed to eliminate spurious association results due to population stratification and to accommodate for departure from Hardy-Weinberg equilibrium in the candidate-gene locus. In the light of their method, here we also take the conditional approach to extract association information to develop the method for analysis of quantitative traits. For an offspring member k in the category (i, j) in Table 1, the conditional probability of observing the offspring genotype gc=j given parental mating type gp=i and quantitative trait yijk is calculated by

image(1)

Assume that if the studied candidate-gene locus is rightly the quantitative trait locus (QTL) responsible for the susceptibility of the disease, then the genotypic values of MM, Mm and mm, which are denoted by uj, j= 2, 1, 0, respectively, should associate with the variation of the quantitative trait. Let d be the displacement effect between the genotypes MM and mm, and t the degree of dominance, then u2 and u1 can be expressed as

image(2)

When t= 0, it represents the underlying genetic model as recessive; when t= 0.5, the model is additive; when t= 1, the model is dominant. These three typical genetic models will be taken into account in the following derivation of association tests for analysis of quantitative traits. Without loss of generality, we assume that d will be greater than 0 if the candidate-gene association is present. Therefore, to test the association between the candidate-gene and the quantitative trait is equivalent to testing whether d is equal to 0 (i.e., u2=u1=u0) or not. Because the parameter t vanishes when d= 0, but co-exists with d when d > 0, t plays a role of nuisance parameter in testing the effect of d. This type of hypothesis was discussed by Davies (1977).

Assume that the distribution of the quantitative trait belongs to the exponential family, as that assumed in the study of Liu et al. (2002), then given the offspring genotype gc=j the probability density function Pr(yijk | gp=i, gc=j) in (1) has the form (McCullagh & Nelder, 1989)

image(3)

for some specific functions a(·), b(·), and c(·), where φ is the dispersion parameter and θj is the canonical parameter that is a function of the genotypic value uj. The distribution defined in (3) implies that the quantitative traits of all subjects in the population have the same distribution form but with different means which depend on the genotypes of subjects. If the quantitative trait is normally distributed, then θj=uj, a(φ) =σ2, bj) =u2j/2 and inline image, where uj and σ2 are the mean and variance of yijk given gc=j. If the quantitative trait follows a gamma distribution, then θj=−1/uj, a(φ) = 1/v, bj) = log (uj) and c(yijk, φ) =v log (v) + (v− 1) log (yijk) − log (Γ (v)), where v is the shape parameter of the gamma distribution, and the mean and variance of yijk given gc=j are uj and u2j/v, respectively. These two distributions will be assumed for evaluation of the power performances of the following proposed methods in Simulations and Results. Let pj | i denote the transmission probability Pr(gc=j | gp=i) and g(·) a specific link function between the canonical parameter and the genotypic value, i.e., θj=g(uj) for j= 0, 1, 2. Because u2 and u1 are determined by d, t and u0, according to (1) the conditional likelihood of the offspring members is set up as

image(4)

The Score Tests for Recessive, Additive and Dominant Models

By means of regular calculation steps, in Appendix I, we derive the general form of the score statistic as

image(5)

where μ=u2 Pr(MM) +u1 Pr(Mm) +u0 Pr(mm) is the overall population trait mean, g′ (μ) is the first derivative of g(μ) with respect to μ and ∂uj/∂d is the derivative of uj with respect to d for j= 0, 1, 2 in which ∂u2/∂d= 1, ∂u1/∂d=t and ∂u0/∂d= 0. Furthermore, the variance of Umath image is

image(6)

Based on (5) and (6), we obtain the score test statistic inline image by substituting the sample mean inline image for the population trait mean μ. In addition, by setting t= 0 (recessive), 0.5 (additive) and 1 (dominant), the corresponding score statistics for test of association under each of the three genetic models are approximated by

image(7)
image(8)

and

image(9)

where inline image is the sample variance of the quantitative trait, and it is verified that the three test statistics are all asymptotically distributed as the standard normal distribution by use of the law of large number and Slutsky's theorem (Appendix I).

Note that the TADD can also be expressed simply by (Appendix II)

image

This result indicates that in analysis of the candidate-gene association test with a quantitative trait the TADD detects significance of the distortion between the distributions of observed and expected offspring genotypes with respective weights inline image. In fact, all of the proposed score statistics TREC, TADD and TDOM are identical to the score statistics derived by Schaid and Sommer (1994) for binary traits except that here additional weights are added.

Robust Association Tests

When the underlying genetic model for the quantitative trait locus is known, a correct score test can be distinctly selected for association analysis. However, when the model is unknown, we would wonder which test is applicable. Developing a robust test that can be used in this uncertain situation is thus necessary. Here, we adopt the maximin efficiency robust procedure proposed by Gastwirth (1966) to establish such a test. Among all plausible models, this procedure calculates the asymptotic relative efficiency (ARE) of a specific test to an optimal test at first. Then, the minimum ARE of the specific test relative to all optimal tests can then be figured out. The Maximin Efficiency Robust Test (MERT) is defined as the test which has highest minimum ARE across all plausible models.

To apply the maximin efficiency robust procedure to our study we need first to calculate the pairwise correlations of the three score statistics. Under the null hypothesis of no association, when n is sufficiently large, the pairwise correlation coefficients of the three statistics are shown to converge to ρ12, ρ13 and ρ23, respectively, as follows (Appendix III)

image(10)

The asymptotic results of these correlations are identical to those obtained by Zheng et al. (2002) for binary traits. This is the case because, as aforementioned, basically the three score statistics obtained in (7), (8) and (9) have the same formulations as those for binary traits except that an extra weight is added to each offspring. Among the three correlation coefficients in (10), it can be immediately verified that ρ13 is the smallest one. That is, the recessive and dominant models are the extreme pair of these plausible genetic models. Therefore, based on the result of Gastwirth (1970, 1985) the MERT can be developed through a linear combination of two extreme test statistics, TREC and TDOM, provided that the necessary and sufficient condition ρ1223≥ 1 +ρ13 holds. The condition can be verified by a short calculation. We thus obtain the MERT for test of association for a quantitative trait as follows

image

which is asymptotically distributed as the standard normal distribution.

On the other hand, Davies (1977) proposed to choose the maximum of the score statistics over all plausible models as a robust statistic. Freidlin et al. (1999) and Zheng and Chen (2005) suggested that a preferable way to select a robust statistic in practice is to simply take the maximum of the score statistics at two extreme models and an additional intermediate model. In the light of their approaches, here we consider

image

for testing association between a candidate gene and the quantitative trait. In the real analysis, we have to simulate the asymptotic distribution of MAX under the null hypothesis to obtain the critical value for testing. To simulate this distribution we can assume the joint distribution of the three statistics TREC, TADD and TDOM have a multivariate normal distribution with mean vector 0 and a covariance matrix inline image which has value 1 for the diagonal elements (i.e., the variances of the three statistics are all 1) and values for the off-diagonal elements (i.e., the covariances of the three statistics) are ρ12, ρ13 and ρ23, respectively. This assumption allows that each statistic, either TREC, TADD or TDOM, has an asymptotic normal distribution with mean 0 and variance 1 as discussed before. In the simulation, for a given value of allele frequency p the numbers of families with parental mating types 2, 4 and 5 (i.e., n2, n4 and n5) will be generated and thus ρ12, ρ13 and ρ23 are determined using (10). Based on the multivariate normal distribution with mean vector 0 and the covariance inline image as mentioned above, the distribution of MAX under the null hypothesis can be simulated and the critical value corresponding to a prescribed significance level α can then be determined.

In practical situations when the potential genetic model is unknown, the parameter t (degree of dominance) could range from 0 to 1. The two robust statistics MERT and MAX proposed above can be applied. However, if t can be specified in a more restricted region, these robust tests should be modified. For example, if there is prior information provided by other studies to ensure that the dominant model is fairly excluded, then a modified robust statistic MERTRA corresponding to the MERT should be used. Denote inline image and MAXRA= max{TREC, TADD} as the robust test statistics developed by the two robust procedures in case of the dominant model being excluded, and inline image and MAXAD= max{TADD, TDOM} as the robust test statistics in case of the recessive model being excluded. Comparisons of robustness and power performances among the tests mentioned in this section are discussed in the following section.

Simulations and Results

Simulation Settings

To investigate the robustness and power performances of our proposed tests and other existing tests, we conducted a series of simulations under two scenarios. Two existing tests were selected, the likelihood-based QTDT/orthogonal model (OM) proposed by Abecasis et al. (2000), and the TDT-based association test (MK) proposed by Monks & Kaplan (2000). In both scenarios, three allele frequencies p= 0.1, 0.2, and 0.5 of the candidate gene M were evaluated and the population was assumed to be in Hardy-Weinberg equilibrium. For each allele frequency parental genotypes of a family were first generated based on the assumption of random mating, and the offspring genotype was then assigned according to the Mendelian transmission. For the first scenario, we assumed that conditional on the offspring genotype MM(j= 2), Mm(j= 1), or mm(j= 0), the quantitative trait of the offspring was produced based on the normal distribution with a mean value of u2, u1 or u0 and variance 1. By assigning u0= 0, u2 and u1 were determined respectively by u2=d and u1=td under a specified genetic model (t= 0 for recessive models, t= 0.5 for additive models and t= 1 for dominant models). For the given values of t and d, associated with the genotypic value u0 and the allele frequency p, the variance of the genotypic value can be calculated. Dividing this genotype variance by the variance of the quantitative trait (here, it is assigned value 1), the corresponding heritability H2 can then be estimated. Because power performances are diverse in different genetic models, to let the robust tests have a fair basis to be compared across the three genetic models, we selected the values of d to allow the powers of the three optimal tests TREC, TADD and TDOM to reach a level around 80%. For example, for allele frequency p= 0.1, we set d= 2.8 (H2= 7.2%) for the recessive model, d= 1.12 (H2= 5.3%) for the additive model, and d= 0.6 (H2= 5.2%) for the dominant model as shown in Table 2. Then based on these settings, the powers of the three optimal tests are obtained with the values 0.801, 0.808 and 0.810, respectively (see Table 3).

Table 2.  Simulation settings for the displacement effect d and corresponding heritability H2 under the three frequencies of allele M and various genetic models for quantitative traits with normal or gamma distributions.
Normal distributionsAllele frequency
Genetic modelsp= 0.1p= 0.2p= 0.5
dH2 (%)dH2 (%)dH2 (%)
Null000000
Recessive2.807.21.104.40.504.5
Additive1.125.30.845.30.665.2
Dominant0.605.20.485.00.504.5
Gamma distributionsallele frequency
Genetic modelsp= 0.1p= 0.2p= 0.5
dH2 (%)dH2 (%)dH2 (%)
 
Null000000
Recessive8.58.23.24.71.55.0
Additive3.25.42.45.42.05.9
Dominant1.75.31.45.31.55.0
Table 3.  Comparisons of empirical type I error rates and powers among three score tests, six robust tests and two popularly used tests (OM and MK) in the scenario that the quantitative trait is normally distributed and the number of triad families is 300 for three allele frequencies 0.1, 0.2, and 0.5 of the candidate gene.
True modelTRECTADDTDOMMERTMAXMERTRAMERTADMAXRAMAXADOMMK
  1. the likelihood-based QTDT/orthogonal model proposed by Abecasis et al. (2000).

  2. the TDT-based association test proposed by Monks & Kaplan (2000).

Allele frequency 0.1
 NULL0.0480.0510.0520.0480.0510.0480.0510.0500.0520.0510.051
 REC0.8010.3230.0580.6850.7650.7320.7710.3150.241
 ADD0.2230.8080.7810.6830.7540.6140.8040.7390.8080.8210.791
 DOM0.0760.7770.8100.5340.7430.8030.8040.7890.765
Allele frequency 0.2
 NULL0.0480.0510.0500.0510.0510.0500.0500.0500.0490.0500.051
 REC0.7990.3270.0610.5920.7250.7010.7460.3220.304
 ADD0.3070.8070.7450.7620.7570.6800.7950.7500.8010.8220.805
 DOM0.0670.7330.8040.5430.7320.7860.7870.7400.733
Allele frequency 0.5
 NULL0.0490.0520.0460.0510.0500.0490.0500.0520.0490.0510.051
 REC0.8050.5650.0670.5640.7220.7500.7660.5720.566
 ADD0.5630.8020.5650.8020.7450.7500.7530.7670.7670.8140.804
 DOM0.0630.5620.8020.5620.7210.7510.7650.5710.563

In the second scenario, we assumed that conditional on the offspring genotype MM, Mm, or mm, the quantitative trait was generated from a gamma distribution with the scale parameter v= 1 and the mean value u2, u1 or u0. Here, u0 was set as 8.0, and u2 and u1 were determined respectively by u2=d and u1=td given values of d and t. For the three allele frequencies p= 0.1, 0.2, and 0.5 of the candidate gene M, again, values of d and the corresponding heritability H2 were selected such that the powers of the three optimal score tests can reach a level around 80% under the true models (Table 2).

Empirical type I error rates and powers of all tests were evaluated by 10,000 replicates at significance level 0.05. In each replicate, 300 parent-offspring families were generated. The asymptotic critical values of the MAX were obtained by simulating the multivariate normal distributions of (TREC, TADD, TDOM). Asymptotic critical values of the other tests were 1.96.

Empirical Type I Error Rates and Powers of the Proposed Tests

Table 3 summarizes the simulation results on empirical type I error rates and powers of our proposed tests and the OM and MK tests when the quantitative traits were generated from normal distributions. The empirical type I error rates of these tests are all close to the prescribed nominal significance level 0.05. Each optimal score test does outperform others in power under the model which is assumed true. For example, in Table 3 in the situation of p= 0.2, TREC has the best power performance with a level of 0.799 when the true model is recessive. But its power drastically drops down to the level of 0.307 when the true model is additive, and down even more to the level of 0.067 when the true model is dominant. Similar results are also observed when the quantitative trait was generated from gamma distributions (Table 4). In sum, it is as expected that TREC is the test having the lowest power when the true model is dominant, and vice versa; when the true model is recessive TDOM is the test having the lowest power. The two robust tests MERT and MAX, in contrast, have relatively stable power performances over the three genetic models. For example, when p= 0.1 in Table 3, the powers of the MERT are all above 0.5, and the powers of the MAX maintain at least 0.7 in various genetic models. The MAX is generally more powerful than the MERT. The TADD also shows robustness in most situations, but its power performance is relatively low, compared with MERT and MAX, under the recessive model. Both OM and MK have comparable power performances with TADD due to the fact that they were constructed based on the additive model assumption. In general, the power of OM was slightly higher than powers of MK and TADD.

Table 4.  Comparisons of empirical type I error rates and powers among three score tests, six robust tests and two popularly used tests (OM and MK) in the scenario that the quantitative trait follows a gamma distribution and the number of triad families is 300 for three allele frequencies 0.1, 0.2, and 0.5 of the candidate gene.
True modelTRECTADDTDOMMERTMAXMERTRAMERTADMAXRAMAXADOMMK
  1. the likelihood-based QTDT/orthogonal model proposed by Abecasis et al. (2000)

  2. the TDT-based association test proposed by Monks & Kaplan (2000)

Allele frequency 0.1
 NULL0.0540.0500.0500.0510.0530.0510.0490.0530.0500.0500.048
 REC0.8010.3490.0600.6940.7690.7380.7740.3430.246
 ADD0.2250.7940.7710.6600.7440.5850.7930.7270.7960.8100.758
 DOM0.0900.7720.8040.5330.7380.7960.7970.7820.737
Allele frequency 0.2
 NULL0.0540.0490.0470.0520.0500.0510.0490.0510.0490.0490.050
 REC0.7910.3390.0600.5960.7240.6970.7400.3380.310
 ADD0.3000.7920.7250.7340.7390.6500.7790.7260.7840.8040.777
 DOM0.0770.7470.8080.5560.7390.7950.7970.7540.731
Allele frequency 0.5
 NULL0.0500.0510.0480.0510.0520.0500.0510.0530.0530.0520.051
 REC0.8170.5910.0640.5930.7430.7710.7850.5970.593
 ADD0.5560.8000.5610.8000.7400.7390.7490.7610.7650.8110.802
 DOM0.0680.5540.8010.5520.7180.7450.7630.5600.557

If the dominant or recessive model is excluded from the plausible genetic models, then we can use a more specific robust test MERTRA, MAXRA, MERTAD or MAXAD to replace the MERT or MAX to increase the testing power. This idea is justified by the simulation results. For example, for p= 0.5 and under the recessive model in Table 3, the power of MERTRA is 0.750 which is superior to the power of MERT 0.564. This shows that if the dominant model can be excluded from the plausible models, use of the MERTRA for test of association would acquire higher power than use of the MERT. Similarly, the MAXAD is overall more powerful than the MAX when the recessive model can be excluded.

Power Performances of the Proposed Tests Over the Whole Range of Degree of Dominance

To investigate the characteristic of power performance of each proposed test over the range of degree of dominance t, we considered two situations of p= 0.2 and 0.5. For each situation, values of t in the whole range [0, 1] with a step size 0.1 were investigated. The displacement effect d was set subject to the assumption that the value of heritability was around 5% for each fixed value of t.

In Figure 1, it is obviously shown that the test statistic TREC or TDOM loses its power rapidly if the presumed value of t is far from the true value (i.e., the true genetic model). The TADD loses substantial power to detect a putative association if the value of t is close to 0 (i.e., the recessive model). In contrast, the MERT and MAX have more stable power performances over the whole range of t, especially the MAX. The result for p= 0.5 is similar to that for p= 0.2 and it shows a symmetric pattern as shown in Figure 2. In particular, the power curve of the MERT coincides with the curve of the TADD. Further comments on this result are given in the Discussion section.

Figure 1.

Power performances of the TREC, TADD, TDOM, MERT and MAX over the range of degree of dominance t from 0 to 1 by controlling heritability around 5% for each genetic model when p= 0.2.

Figure 2.

Power performances of the TREC, TADD, TDOM, MERT and MAX over the range of degree of dominance t from 0 to 1 by controlling heritability around 5% for each genetic model when p= 0.5.

Discussion

Most of the family-based association tests may encounter the problem of loss of power when the underlying genetic model of the studied disease is not commensurate with that by which a test is developed. Association tests that are developed based on allele-counting methods, such as the TDT, can essentially be regarded as derived under the additive model, and would thus have better power performance under the additive model than under others. However, these methods could lose substantial powers if the additive model is not true. For example, the TDT is powerless under recessive models if the frequency of the mutant allele is low either for binary disease traits (Camp, 1997; Lange & Laird, 2002) or quantitative traits (Lange et al., 2002; Purcell et al., 2003). Ideally, if it were possible, we would like to specify an appropriate genetic model for the quantitative trait so that an optimal test can be used for test of association. However, in realistic studies it is usually problematic to do so, and such a method may suffer from the risk of model misspecification. Our simulation results show that once the genetic model is misspecified, a test could critically lose its power in detection of any association candidate gene. Hence, development of robust association tests that are applicable in practice deserves more attention. Issues on robustness have been popularly addressed in statistical inference when there exist nuisance parameters that interfere with the powers of the studied tests. In genetic studies, this issue has also been investigated widely by researchers (Wang et al., 1999; Gastwirth & Freidlin, 2000; Kraft, 2001; Zheng et al., 2002; Xu et al., 2003; Diao & Lin, 2005; Tai & Hou, 2006; Yan et al., 2008; Wang et al., 2008). To obtain robust statistics over all plausible genetic models, we have referred to the studies of Davies (1977) and Zheng & Chen (2005). According to the study of Davies (1977) we should take the maximum score statistic over the whole range of a nuisance parameter as the robust statistic. Nevertheless, by the study of Zheng & Chen (2005) a maximum score statistic at two most extreme models and an intermediate model almost performs as well as the maximum statistic proposed by Davies (1977). Based on their studies, we thus conduct the robust statistic MAX in our study by taking the maximum of the three score statistics under the recessive, additive and dominant models only. In addition, we also use the maximin efficiency robust procedure (Gastwirth, 1966, 1985) to establish the other robust statistic MERT for an alternative choice beyond the MAX. This statistic can be regarded as a combined statistic of the two score statistics under the recessive and dominant models. The robustness of the test comes from the compromise between the two extreme models on powers.

Simulation results show that the two proposed robust tests have relatively stable and acceptable powers under various genetic models for test of association between a candidate gene and a quantitative trait. While the MERT is somewhat easier in computation, its power performance is overall slightly lower than that of MAX. Note that the association tests TADD, OM and MK, which are constructed based on additive model assumption, also show robustness in most situations. In Figure 2, it can be seen that the power curve of TADD coincides with the curve of MERT. This may be due to the fact that TADD is the optimal test derived at the middle point t= 0.5 of the degree of dominance. Note that when p < 0.5, the correlation of (TREC, TADD) is lower than that of (TADD, TDOM). On the contrary, when p > 0.5 the correlation of (TREC, TADD) is higher than that of (TADD, TDOM). It is only when p= 0.5 that the correlation coefficients of (TREC, TADD) and (TADD, TDOM) are equal. Therefore, it is in the condition when the true model is additive and in particular when the gene frequency p= 0.5, the MERT and TADD would almost have identical power performances. However, this issue still needs more investigation.

Looking into the estimated powers in Tables 3 and 4, it shows that TADD, OM and MK perform comparable to, or even better than, MAX or MERT when the true model is additive or dominant. The merit of MAX and MERT is their stability in power performances. However, their advantage is almost under the recessive model only. In the light of these results, it indicates that association tests based on additive model assumption are still a valid method even though the underlying genetic model may not be additive. In other words, in practical studies researchers are encouraged to calculate all three model-specific score tests. If the test statistic based on additive model assumption is distinctly low, the robust tests proposed here may be taken into account as a useful tool for validation of the results.

For analyzing a quantitative trait, parent-offspring triad data may be collected through an ascertainment procedure other than random sampling according to prescribed thresholds, which are determined by the upper and/or lower tails of the offspring trait distribution. As mentioned by Liu et al. (2002), under these circumstances, an estimate of the population trait mean can be obtained by adjusting the sample mean with an offset term, which is available by a priori knowledge about the central tendency of the quantitative trait distribution. Note that the proposed tests are still valid if the estimation of population trait mean is biased. However, a biased estimation will cause negative effects on the powers of these tests. The robust methods proposed in this paper have focused on the triad families; future study on the extension of these to wider family data is under investigation. In addition, the construction of a robust association test based on linear or nonlinear models, so that some other covariates of interest could be involved as well, may also deserve further efforts.

Acknowledgments

The authors would like to thank the editor and two anonymous reviewers for their constructive comments that greatly improved this paper. This work was supported in part by grants AU-97-H-03 (JYW) from Asia University and NSC-96-2628-M-002–025-MY3 (JJT) from National Science Council, Taiwan.

Appendices

Appendix I

Derivation of the Score Statistics Under Recessive, Additive and Dominant Models

For the exponential family setting in equation (3), The canonical parameter θj is a function of the genotypic value uj. Given an offspring genotype j, the quantitative trait yijk of an offspring follows a distribution with mean uj as that defined in equation (2). Based on the data structure in Table 1, taking the natural logarithm of the conditional likelihood in equation (4), we obtain

image((A1))

where inline image. The score statistic becomes

image((A2))

where b′(θj) is the first derivative of bj) with respect to θj. Suppose that θj and uj are linked through a function g(·), i.e., θj=g(uj). For j= 0, 1 and 2, ∂θj/∂d=g′(uj)(∂uj/∂d), where g′(uj) is the first derivative of g(uj) with respect to uj, ∂u2/∂d= 1, ∂u1/∂d=t and ∂u0/∂d= 0. Under the null hypothesis H0: d= 0, u2=u1=u0=u and θ210 such that the overall population trait mean μ becomes μ=E(y) =u2 Pr(MM) +u1 Pr(Mm) +u0 Pr(mm) =u and thus g′ (uj) =g′ (u) =g′ (μ) for j= 0, 1, 2. In addition, since it is known that b′ (θj) =uj (McCullagh & Nelder, 1989), here we have b′ (θj) =uj=u=μ for all j under H0. Therefore, the score statistic in (A2) under H0 turns out to be

image
image((A3))

where inline image and inline image is the conditional expectation of Dj given the parental mating type i. Note that in the composition of inline image, the yijk-part is regarded as fixed and the Dj-part as random which varies according to the corresponding offspring genotype. The information of the parameter of interest (viz., parameter d) contains in the Dj-part. The yijk-part plays a role of weighing for Dj-part in inline image. Such an idea for the composition of inline image had been adopted by other researchers (Rabinowitz, 1997; Sun et al., 2000; Liu et al., 2002). Based on this idea and the assumption that the offspring genotypes in all triad families are independent, the variance of inline image is obtained as

image((A4))

where Vari(D) is the conditional variance of D given the parental mating type i. Values of Ei(D) and Vari(D) for each parental mating type i under the recessive, additive and dominant models, additionally with the transmission probabilities pj | i are shown in Table A1.

Table A1.  The transmission probabilities and the conditional expectations and variances of Dj given each of the parental mating types under the recessive, additive and dominant models.
ipj | iRECADDDOM
j= 2j= 1j= 0Ei(D)Vari(D)Ei(D)Vari(D)Ei(D)Vari(D)
  1. The six parental mating types are defined in Table 1.

1100101010
21/21/201/21/43/41/1610
3010001/2010
41/41/21/41/43/161/21/83/43/16
501/21/2001/41/161/21/4
6001000000

Using the results in (A3) and (A4) the score statistic for test of H0:d= 0 is constructed as

image((A5))

in which the sample mean inline image are taken as an estimate of the population trait mean μ. Since Dj takes a value of 0, t and 1 for each offspring, to implement the score test statistic, the parameter t needs to be determined according to the assigned genetic model. Consider the following three situations:

(1). When the genetic model is recessive, t= 0, D2= 1, D1=D0= 0; given each of the parental mating types, E1(D) = 1, E2(D) = 1/2, E4(D) = 1/4 and Ei(D) = 0 for i= 3, 5, 6. Besides, the conditional variances of Dj are Var2(D) = 1/4, Var4(D) = 3/16 and Vari(D) = 0 for i= 1, 3, 5, 6 (as shown in Table A1). Thus the score statistic in (A5) becomes

image((A6))

which is asymptotically distributed as the standard normal distribution. Because under the null hypothesis inline image converges to Var(y) in probability, by the law of large number theorem, provided that nij is sufficiently large. Thus, the first term in (A6) will converge to inline image when n is sufficiently large, where Var(y) can be estimated by the sample variance inline image. Consequently, the score test statistic for the recessive model is approximated by

image

According to Slutsky's theorem, the asymptotic distribution of TREC is the standard normal distribution.

(2) When the genetic model is additive, t= 1/2, D2= 1, D1= 1/2, D0= 0; based on Table A1, the score test statistic is calculated by

image
image

which can be approximated by

image((A7))

The asymptotic distribution of TADD is also the standard normal distribution.

(3) When the genetic model is dominant, t= 1, D2= 1, D1=D0= 0; according to the results in Table A1, the score test statistic is obtained as

image

and is approximated by

image

which is asymptotically distributed as the standard normal distribution as well.

Appendix II

Derivation of the Simple form of the TADD

If the genetic model is additive, t= 1/2, Dj=∂uj/∂d=j/2 for j= 0, 1, 2, and thus inline image, where inline image is the conditional expectation of the offspring genotype index given the parental mating type i, i.e.,

image

Substituting the above result into (A5) and based on (A7), we obtain

image

Appendix III

Pairwise Correlations Among the Three Score Statistics

Recall that the three score test statistics derived in Appendix I are

image
image

and

image

where the subscripts REC, ADD and DOM are used to distinguish the values of (DjEi(D)) under the recessive, additive and dominant models. Because the offspring genotypes in all triad families are independent, correlation is absent between one offspring and the other for any two of the three statistics, but presents in Dj within the same offspring under different models. Let inline image, inline image, inline image and inline image. Under the null hypothesis, the correlation coefficient between TREC and TADD is

image

Since EY(Z2ijk) = 1, by the law of large number theorem, when n is large, inline image converges in probability to ni for each i. Therefore, the correlation coefficient converges in probability to

image

Similarly, the correlation coefficients between TREC and TDOM and between TADD and TDOM are

image

and

image