SEARCH

SEARCH BY CITATION

Keywords:

  • Interaction testing;
  • parametric bootstrap;
  • permutation methods

Summary

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. References

Permutation tests are widely used in genomic research as a straightforward way to obtain reliable statistical inference without making strong distributional assumptions. However, in this paper we show that in genetic association studies it is not typically possible to construct exact permutation tests of gene-gene or gene-environment interaction hypotheses. We describe an alternative to the permutation approach in testing for interaction, a parametric bootstrap approach. Using simulations, we compare the finite-sample properties of a few often-used permutation tests and the parametric bootstrap. We consider interactions of an exposure with single and multiple polymorphisms. Finally, we address when permutation tests of interaction will be approximately valid in large samples for specific test statistics.


Introduction

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. References

Permutation tests (Ernst, 2004; Higgins, 2004) are very popular in genomic research (Hu et al., 2008; Faulkner et al., 2009; Leak et al., 2009). They are simple to compute where analytic approaches may be intractable, and can be exact where analytic results may be only approximate. Rather than comparing the observed value of a test statistic to its distribution under repeated sampling, a permutation test compares the observed value to a distribution generated by a group of permutations that would not affect the distribution if the null hypothesis were true (chap. 6.2 Cox & Hinkley, 1997). The main limitation of permutation tests is that they are only applicable when the null hypothesis being tested specifies a suitable group of permutations under which the distribution of the data would be unaffected.

The use of permutation methods for testing in the regression model with one main effect (or, more simply, in tests of association of two variables) dates back at least to Fisher's exact test (Fisher, 1935). From data vectors G and Y we create a new data set either by permuting the entries of G to give data (G*, Y) or permuting the entries of Y to give data (G, Y*). The test statistic is evaluated on the new data to give a sample from the permutation distribution, and this procedure is repeated to estimate the permutation distribution as accurately as is desired. A p-value of the test statistic is computed based on the permutation distribution. The procedure is the same whether the predictor variable is continuous or categorical (Ernst, 2004).

When there are two predictors G and Z, permutation testing can become more complicated (Anderson & Robinson, 2001). Testing for both main effects being zero is possible, by permuting the outcome Y and leaving G and Z unchanged, and using datasets (G, Z, Y*) to compute the permutation distribution of a test statistic. However, an exact test for one specific main effect being zero, i.e., testing partial regression coefficients, typically does not exist, as it would require the true value of the other main effect to be known. Anderson & Robinson (2001) compare four approximate permutation tests for partial regression coefficients in models with two main effects, highlighting the Freedman & Lane (1983) method. They note that, typically, the exact test for both main effects is not even approximately valid for testing one main effect. One special case of an available exact test for a main effect of G is when Z is categorical, with several replicates of each of the fixed values. In this case, permutations of Y or G can be done within the groups defined by Z. In genetic applications, a binary covariate Z such as treatment or a categorical genotype at a single nucleotide polymorphism can be used in this way.

A summary of permutation testing in regression for a non-statistical audience can be found in Anderson (2001). The article summarises permutation testing in models with one and two main effects, and notes that in a model with two main effects and an interaction term there is no exact permutation method for testing the interaction term. For tests of interactions, even with categorical G and Z no exact permutation method is available (Anderson, 2001). This is because permutation of Y within levels of G and levels of Z generates new data with the interaction effect unchanged – not removed, as we require for testing. In fact, for all models with one or two main effects and an interaction, Anderson (2001) notes that in general there is no exact permutation method for testing the interaction term.

Though well-established in the statistical literature on experimental design, this result is not widely known in genetic epidemiology or pharmacogenetics. Permutation-based tests for interaction have in fact been used frequently without any rationale given for their exact or approximate validity (Chase et al., 2005; Andrulionyte et al., 2007; Mei et al., 2007; Rana et al., 2007). In this paper we show that these permutation tests need not even be approximately valid. We describe an alternative, the parametric bootstrap, which can give valid tests with moderate sample sizes, and which requires similar computational effort to a permutation test. Parametric bootstrap techniques have been correctly used in a genetic setting, e.g. in (Chen et al., 2007). We will discuss the choice of test statistic and show that a standardised statistic, such as a z-score or p-value instead of a difference in means, can improve the accuracy of parametric bootstrap, and improve adherence of the Type I error rate to the nominal level.

The rest of the paper is organised as follows. In the next section we introduce models with an interaction term, and permutation concepts. We contrast the problem of testing for interaction with the problem of testing for overfitting in a model including interactions, where methods such as logic regression and multifactor-dimensionality reduction (MDR) do validly use permutation tests. We subsequently describe a parametric bootstrap approach to testing for interaction, and evaluate the performance of the parametric bootstrap compared to two types of permutations used commonly in interaction testing. Finally, we consider scenarios where permutation tests of interaction will be approximately valid in large samples for specific test statistics. These scenarios include some of the practical applications of permutation tests for interaction in genetic association studies.

Methods

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. References

Models and Permutation Tests

Interaction is a complex phenomenon, as described in an extensive review by Cox (Cox, 1984). We first consider a test for interaction between the effects of a single genetic polymorphism G and an environmental exposure E on an outcome Y. The null hypothesis is that the interaction term is zero. An alternative statement of the null is that while G and E may have effects, these are specifically additive on the scale given by the model.

If Y is binary, as in a case-control study, the typical null hypothesis is that

  • image(1)

If Y is continuous, a typical null hypothesis is that

  • image(2)

An alternative hypothesis of interest in the binary case may be that

  • image(3)

and, in the continuous case, that

  • image(4)

Thus, the null hypothesis of no interaction is that γ= 0 in models (3) and (4).

For either type of Y and a single genetic polymorphism, two natural test statistics are; inline image, the estimate of the interaction parameter γ, and the z statistic obtained by dividing inline image by its estimated standard error. Although inline image may appear to test the null hypothesis more directly, it is actually well-established that the bootstrap performs better for statistics such as the z-statistic, whose null distribution is approximately pivotal (Davison & Hinkley, 1997). For this reason we investigate both the parameter estimate and the z-statistic.

When considering multiple genetic polymorphisms, there may be many polymorphism-specific estimates (inline image) and corresponding zi. For an omnibus test of no interaction between E and any polymorphism, we use test statistics maximum inline image, and the maximum |zi| or equivalently minimum pi value. While similar properties hold as for a single genetic polymorphism, we defer extended discussion of testing with multiple genetic polymorphisms until our simulation study.

A simple permutation test would fix G and E and permute all outcomes Y to give Y*, as used in Andrulionyte et al. (2007), and Rana et al. (2007). Fixing G and E and permuting Y generates data in which Y* is independent of G and E. However, in equations (1) and (2), Y is not independent of G and E, unless βGE= 0, so the permuted data satisfy a much more restrictive null hypothesis than no interaction. The permutation test will therefore be exact only for this more restrictive null hypothesis, that βGE=γ= 0. Our simulations (in the Results section) show this permutation test being anti-conservative when equation (1) holds, and conservative when equation (2) holds but βG or βE is non-zero.

Null hypothesis of one main effect

No difficulty arises in constructing a permutation test for the null hypothesis of one categorical main effect. For example, if we know that drug E (presumed binary) has an effect on binary outcome Y we may be interested in comparing the null hypothesis

  • image(5)

to the full alternative (3), testing βG=γ= 0.

Permuting Y within individual strata defined by E maintains the difference between Y|E= 1 and Y|E= 0. The estimates for α and βE under the null hypothesis model in equation (5) will be the same in the permuted data as in the observed data. This permutation test examines whether G affects Y, without making any prior restriction on how E and G might interact. For example, if G and E are both genetic polymorphisms, a test such as this may be useful in building models of genetic effects in biological pathways where epistasis is likely to be important.

If there is only a single variable G to be considered, a permutation approach may not be necessary for reliable testing, as the usual χ2 approximation to the likelihood ratio test is likely to be adequate at any sample size where there is useful power. We note that in logistic regression in some cases the likelihood approximation may not work well, a feature often referred to as the Hauck-Donner phenomenon (Hauck & Donner, 1977).

The particular value of the permutation test in this context is that it is applicable with multiple polymorphisms. For example, computing the likelihood ratio p-value for testing inline image across several polymorphisms Gi and taking their minimum gives a test statistic for the null hypothesis that no G has an effect on Y adjusted for E. This minimum p-value will not itself have a uniform distribution, but it can be compared to its permutation distribution to give a valid test.

A permutation testing approach along these lines is used in MDR (Ritchie et al., 2001), a method for reducing the dimensionality of multilocus information. Another example is logic regression (Ruczinski et al., 2003), which constructs predictors from Boolean combinations of binary covariates, and avoids overfitting using permutation applied to models that may contain many interaction terms. Permutation tests are also a very useful tool in situations of multiple testing problems when testing thousands of SNPs.

With a single environmental variable E a maximum z-statistic or minimum p-value can be computed across all SNPs. Comparing this test statistic to its distribution Y within individual strata defined by E controls the family-wise error rates, testing that no SNPs have effects on Y (Dudoit et al., 2003). However, this permutation test is not valid for testing specifically no interaction, i.e., γ= 0, when βG is non-zero, and may give Type I error rates that are too large or too small (Anderson, 2001).

Null hypothesis of two main effects

For a valid permutation test of the hypothesis of no interaction, we would require a group of permutations that exactly preserves both βG and βE in equation (1) and (2), but also ensures γ= 0. In general it is impossible to construct such a group of permutations, as demonstrated by (Chap. 6 Edgington, 1987). If the permutations fix G and E they will not give γ= 0, and if they do not fix G and E they will not preserve the relationship between G and E and so will not preserve βG and βE.

In situations where G and E are known to be independent, however, it is possible to construct valid permutation tests for interaction in certain models. A linear model can be reparametrised by centring G and E at their means

  • image

so that the estimated interaction inline image is uncorrelated with inline image and inline image if the model errors are independent and identically-distributed. Permuting G and E independently will then give a valid permutation test for γ= 0.

An approach like this is used in the Family Based Association Tests (FBAT, (FBAT Toolkit Team, 2004), based on Laird et al. (2000)). The FBAT-I permutation test for gene-environment interaction on a multiplicative scale in case-parent trios. Laird et al. assume that genetic variant G does not affect environmental exposure E, and condition on parental genotypes to remove any correlation between G and E due to population admixture. Their test statistic is

  • image

where s indexes parental genotypes and i indexes cases within a parental genotype stratum. They then permute inline image and inline image independently, fixing the stratum s. This is an exact test of the null hypothesis that G and E are uncorrelated in cases, which is equivalent to the hypothesis of no interaction under the log-relative-risk model assumed in Laird et al. (2000):

  • image

This approach cannot be used to construct exact tests in a logistic regression model, as independence of G and E in the population then implies dependence among cases. However, if the event being studied is sufficiently rare, the logistic regression model will be well-approximated by a log relative risk model and the FBAT-I test will be approximately valid. The same approach of permuting E and G separately with strata defined by Y can be used in a case-only or case-control study of unrelated individuals when the disease is rare and G and E are independent. In studies of unrelated individuals, however, the test lacks the resistance to confounding by population admixture. In addition, the assumption that E and G are independent, which is unavoidable in case-parent trio studies, is restrictive in case-control studies (Sec. 5 Mukherjee & Chatterjee, 2008).

Parametric Bootstrap

Testing in a regression model framework requires computing the distribution of the test statistic under sampling from the null-hypothesis model. For instance, when testing the interaction term in a logistic regression model (3) with two main effects and an interaction term, the null hypothesis is

  • image

as in equation (1). In moderate to large sample sizes, a good approximation to the distribution of the test statistic under sampling from the true null-hypothesis model is the distribution of the test statistic under sampling from the fitted null-hypothesis model. That is, we fix G and E and generate Y* for each individual as a binary variable satisfying

  • image(6)

where inline image and inline image are estimated from the original data, under the null model (1). We then compute the test statistic for this simulated sample, and repeat this process many times. The empirical distribution these provide is an estimate of the test statistic's distribution under the null. Correspondingly, p-values are calculated as the proportion of simulated test statistics that are most extreme than the observed value.

If the distribution of the test statistic depends smoothly on the regression parameter values, which is true in all standard examples, this ‘parametric bootstrap’ approach gives an asymptotically valid test (4.2.3 Davison & Hinkley, 1997). Like the classical bootstrap, it samples from a distribution based on the observed data, but the simulations are from a fitted parametric model rather than the empirical distribution. To obtain a valid test, the fitted parametric model is chosen so that the null hypothesis is satisfied.

The algorithm for the parametric bootstrap can be summarised in the following steps:

  • 1
    Obtain parameter estimates from the original data by fitting a null-hypothesis model, such as equation (1).
  • 2
    Sample responses from the model obtained in Step 1.
  • 3
    Compute the test statistic, based on fitting the alternative-hypothesis model such as equation (3) to the samples obtained in Step 2.
  • 4
    Repeat Steps 2 and 3 many times, to obtain an approximate distribution of the test statistic.
  • 5
    Compute the test statistic for the original data, based on fitting the alternative-hypothesis model such as equation (3).
  • 6
    Compute the p-value, by comparing the test statistic in Step 5 to the distribution in Step 4.

Results

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. References

Simulations

Using simulation, we explore tests of no interaction in regression models. These are for univariate outcomes, in samples of unrelated individuals.

We consider two types of genetic data (single and multiple polymorphisms), and of outcome (binary and continuous). We compare use of three resampling approaches, for a range of sample sizes.

Data generated for single polymorphisms

We assume G to be a binary exposure, such as a genetic polymorphism with dominant or recessive inheritance. It is assumed independent between subjects, and we denote P[G= 1]=pG. We also assume binary E, independent between subjects, where P[E= 1]=pE and inline image. Hence, b denotes the log odds ratio of association between G and E.

For binary outcomes, we generate data using the model

  • image

We set pG= 0.4, a= logit(0.2), b= log(2), resulting in pE|G=0= 0.2, pE|G=1= 0.333 and marginal pE= 0.253. We set α= 0.6, βE= 0.3 and βG= 3, resulting in marginal pY= 0.770. To simulate data under the null hypothesis of no interaction, we set γ= 0.

We generate the continuous outcomes as YNY, 1), where the model for the mean μY is

  • image

We set pG= 0.8, a= logit(0.2), b= log(2), resulting in pE|G=0= 0.2, pE|G=1= 0.333 and marginal pE= 0.307. We set α= 2, βE= 2 and βG= 3, resulting in marginal μY= 5.014. Again, γ= 0.

Data generated for multiple polymorphisms

Here, the genetic data consists of five polymorphisms G1, G2, … G5. To induce correlation among the various Gi, for each subject they were generated from the following hierarchical model;

  • image

Hence, conditional on the latent polymorphism G0, the individual Gi are independent and identically distributed, but they are marginally dependent.

We again assume binary E, independent between subjects, where P[E= 1]=pE and inline image, with a= logit(0.2), b= log(2). We generated independent binary outcomes Y where

  • image

We generated continuous outcome with YN(μ, 1) where

  • image

For both the binary and the continuous outcome we set inline image and βE= 2. We set α= 0.6 for binary outcome and α= 2 for continuous outcome. To simulate under the null hypothesis of no interaction we set inline image.

Resampling approaches

We compare three resampling approaches. These are;

  • • 
    A: Keep covariate pairs (G, E) in the single polymorphism situation or covariate 6-tuples (G1, G2, … , G5, E) in the multiple polymorphism situation and permute Y;
  • • 
    B: Keep covariate pairs (G, E) in the single polymorphism situation or covariate 6-tuples (G1, G2, … , G5, E) in the multiple polymorphism situation and permute Y within levels of E;
  • • 
    C: Follow the parametric bootstrap algorithm.

Approach C is the parametric bootstrap; Approaches A and B are non-parametric permutation methods which might be often used in practice, perhaps erroneously.

In approach C, the null-model used in Steps 1 and 2 of the parametric bootstrap algorithm differs for single and multiple polymorphisms, and also for binary and continuous outcomes. For a single polymorphism, we use the models given by fitting equation (1) for binary outcomes, and fitting the classical linear model with mean as in equation (2) for continuous outcomes. For multiple polymorphisms, we fit five separate models under the null. The models for the mean of binary outcome are

  • image(7)

and for continuous outcomes we fit classical linear models with mean

  • image(8)

For each simulated dataset, the sample responses for Step 2 of the parametric bootstrap are then simulated under these fitted models.

Test statistics and significance

Under approaches A, B and C, and for single and multiple polymorphisms, we compare two types of test statistic. Both are obtained by fitting models which include interaction terms.

For a single polymorphism, we first fit the model specified in equation (1) for binary outcomes, and equation (2) for continuous outcomes. We then consider test statistics inline image and its corresponding z-statistic, in both cases.

For multiple polymorphisms, we first fit five separate models. For binary outcomes model has mean

  • image(9)

For continuous outcomes we fit classical linear models, with mean

  • image(10)

The test statistics considered are the maximum of inline image, the maximum among the five estimates of the interaction parameters γi obtained above, and the minimum pi obtained testing each γi.

In permutation testing, the empirical p-value is calculated as

  • image

where so is the test statistic from the (unpermuted) original data, si is the statistic from permutation i, and N is the number of permutations performed.

Under the parametric bootstrap, the empirical p-value is

  • image

where so is the test statistic from the original data, si is the statistic under bootstrap i from the fitted data, and N is the number of bootstrapped datasets.

For valid tests, under the null hypothesis the empirical p-values should be uniformly distributed on the set i/(N+ 1), which for large N is close to the uniform distribution on (0, 1). We will use quantile-quantile plots to compare the distribution of the computed p-values to the continuous uniform (0, 1) ideal. These are plotted on the − log10 scale to emphasise the area of interest, i.e. the small p-values. We highlight p-values of 0.05 and 0.01.

Our results are based on 10000 simulations for single polymorphisms, and on 1000 simulations for multiple polymorphisms. Within each simulation we took N= 1000 resamples. Reported results are for sample size n= 20, 100 and 500. The patterns of results were similar for n= 50 and 200, and are omitted. Simulations were performed in R (R Development Core Team, 2007).

Simulation results

Figures 1, 2 and 3 show the results for a single polymorphism and binary outcomes. The parametric bootstrap approach is systematically conservative for n= 20 (i.e. too-big p-values and too-small Type I error rate) but provides acceptable performance for sample sizes of n= 50 and higher. The inline image statistic slightly outperforms the z statistic. The performance of the parametric bootstrap further improves with increasing sample size, for both statistics inline image and z. Permutation method A, using statistic inline image results in poor performance across the range of sample sizes. For sample size of n= 20 it is conservative. For higher sample sizes it is anti-conservative (i.e. too-small p-values and too-large Type I error rate) Using the z statistic is conservative up until n= 200, when the size is approximately nominal. Permutation method B, using statistic inline image results in poorly-behaved anti-conservative tests across the whole range of sample sizes. Using the z statistic it provides invalid answers up to a sample size of n= 200, where it starts being approximately correct.

image

Figure 1. QQ plots for − log10 of p-values of inline image and z statistics for binary outcome, with a single polymorphism, under two permutations and a parametric bootstrap. Sample size of 20.

Download figure to PowerPoint

image

Figure 2. QQ plots for − log10 of p-values of inline image and z statistics for binary outcome, with a single polymorphism, under two permutations and a parametric bootstrap. Sample size of 100.

Download figure to PowerPoint

image

Figure 3. QQ plots for − log10 of p-values of inline image and z statistics for binary outcome, with a single polymorphism, under two permutations and a parametric bootstrap. Sample size of 500.

Download figure to PowerPoint

For multiple polymorphisms and binary outcomes, seen in Figure 4, the parametric bootstrap performance was acceptable for n= 100 and larger under either test statistic. Method A gave acceptable performance for n= 500 and above using z but not inline image, as did Method B.

image

Figure 4. QQ plots for − log10 of p-values of maximum inline image and minimum p statistics for binary outcome, with multiple polymorphisms, under two permutations and a parametric bootstrap. Sample size of 500.

Download figure to PowerPoint

For single polymorphisms and continuous outcomes, shown in Figures 5, 6 and 7, the parametric bootstrap proves to be a valid approximate approach throughout, using either inline image or z. The accuracy of the p-values increases with sample size, though the performance is fairly good even for n= 20. For methods A and B, using test statistic inline image is again inappropriate over the whole range of sample sizes, being systematically conservative. Using z provides approximately valid answers, over the whole range of sample sizes.

image

Figure 5. QQ plots for − log10 of p-values of inline image and z statistics for Normally distributed outcome, with a single polymorphism, under two permutations and a parametric bootstrap. Sample size of 20.

Download figure to PowerPoint

image

Figure 6. QQ plots for − log10 of p-values of inline image and z statistics for Normally distributed outcome, with a single polymorphism, under two permutations and a parametric bootstrap. Sample size of 100.

Download figure to PowerPoint

image

Figure 7. QQ plots for − log10 of p-values of inline image and z statistics for Normally distributed outcome, with a single polymorphism, under two permutations and a parametric bootstrap. Sample size of 500.

Download figure to PowerPoint

For multiple polymporphisms and continuous outcomes, the results were similar to the single polymorphism setting. The parametric bootstrap provides approximately valid tests using both statistics, with the accuracy increasing with sample size. Using the maximum inline image test statistic, methods A and B both provide conservative tests for all sample sizes. Using methods A and B, the minimum p statistic provides approximately valid tests for all studied sample sizes. Figure 8 illustrates this for a sample size of 500.

image

Figure 8. QQ plots for − log10 of p-values of maximu inline image and minimum p statistics for Normally distributed outcome, with multiple polymorphisms, under two permutations and a parametric bootstrap. Sample size of 500.

Download figure to PowerPoint

Rationale

The difficulty in constructing permutation tests of interaction arises because the distribution of the test statistic will typically depend on βG and βE and the association between G and E, and it is not usually possible to construct permutations that preserve these main effects. The parametric bootstrap evaluates the distribution of the test statistic at the estimated inline image and inline image and the observed G and E, and so will be valid when these estimates are close to the true value and the observations are representative of the population. In our simulations, it appears that this is practically the case, even with modest sample sizes.

If the test statistic had a distribution that was exactly or approximately the same for all G, βE) a permutation test that was valid for one value of G, βE) would be exactly or approximately valid for all G, βE). This is the case for the FBAT-I test. Permuting G and E separately will preserve the association between G and E, when they are independent, and the distribution of the test statistic does not depend on G, βE).

In a linear or logistic regression model, the parameter estimate inline image has an asymptotically Normal distribution. Dividing inline image by its estimated standard error gives a test statistic that, asymptotically, has a standard Normal distribution under the null, regardless of the value of other parameters in the model. This result extends to multiple parameters such as the set of k estimated interaction coefficients between E and many polymorphisms G1, G2, … , Gk, which asymptotically have a multivariate Normal distribution. Dividing each parameter estimate by its standard error gives a multivariate Normal distribution where each standardised parameter estimate is N(0, 1) under the null, and with correlation between estimates depending on the correlations among E, G1, … , Gk.

If, with multiple polymorphisms, the test statistic is the minimum p-value, its distribution will depend only on the distribution of the standardised parameter estimates. In large samples this distribution will be the same for all G, βE), and even in small samples it will be approximately free of G, βE). Now, a permutation test that fixes E and G and permutes Y will have the correct associations among E, G1, … , Gk but not the correct G, βE), so as the test statistic's distribution does not depend on G, βE) in large samples, the tests will be asymptotically valid.

For tests of main effects, permutation approaches are exact in any sample size, and for arbitrary test statistics. As we have seen, permutation tests for interaction are only approximately valid, and departures from accuracy will depend on both sample size and choice of test statistic.

We believe that the permutation tests are conservative for linear regression because setting the main effects to zero puts the variation explained by the main effects back into the residual variance. Therefore, the variance is too large and this makes the p-value too large, resulting in a conservative test. We believe the permutation tests tend to be liberal for logistic regression because of the non-collapsibility of the logistic model.

Discussion

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. References

The statistical literature shows that exact permutation tests for interactions are not available in most situations. We have described two permutation tests that have been used in practice for interaction testing, and showed that they are not exact. The error in the tests can be substantial. A practical alternative for interaction testing is the parametric bootstrap. In our simulations the parametric bootstrap, while not exact, always outperformed the invalid permutation tests. Since the parametric bootstrap performs better and does not require greater computational effort, it could be recommended for scenarios similar to our simulations.

We contrasted two types of test statistics, based on approximately pivotal quantities (i.e. based on z) and not pivotal quantities (i.e. based on inline image). For the parametric bootstrap these test statistics gave similar performance in our simulations, but the Type I error rate using permutation methods was substantially less accurate when using non-pivotal quantities. Permutation methods did perform acceptably when the sample size was large, and when approximately-pivotal test statistics were used.

It is important to remember that neither the parametric bootstrap nor the permutation tests are exact tests for interaction in small samples. It is also important to remember that, in contrast to the hypothesis of no association, the hypothesis of no interaction is intrinsically dependent on the form of the model. Any approach to testing for interaction must therefore be model-based to some extent.

References

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Results
  6. Discussion
  7. References
  • Anderson, M. J. (2001) Permutation tests for univariate and multivariate analysis of variance and regression. Can. J. Fish. Aquat. Sci. 58, 626639.
  • Anderson, M. J. & Robinson, J. (2001) Permutation tests for linear models. Australian & New Zealand Journal of Statistics 43, 7588.
  • Andrulionyte, L., Kuulasmaa, T., Chiasson, J.-L. & Laakso, M. (2007) Single nucleotide polymorphisms of the peroxisome proliferator activated receptor-α gene (ppara) influence the conversion from impaired glucose tolerance to type 2 diabetes. Diabetes 56, 11811186.
  • Chase, K., Carrier, D. R., Adler, F. R., Ostrander, E. A. & Lark, K. G. (2005) Interaction between the x chromosome and an autosome regulates size sexual dimorphism in portuguese water dogs. Genome Res 15, 18201824.
  • Chen, J., Yu, K., Hsing, A. & Therneau, T. M. (2007) A partially linear tree-based regression model for assessing complex joint gene–gene and gene–environment effects. Genetic Epidemiology 31, 238251.
  • Cox, D. R. (1984) Interaction (with discussion). International Statistical Review 52, 131.
  • Cox, D. R. & Hinkley, D. V. (1997) Theoretical Statistics. CRC Press.
  • Davison, A. C. & Hinkley, D. V. (1997) Bootstrap Methods and Their Applications. Cambridge University Press.
  • Dudoit, S., Shaffer, J. P. & Boldrick, J. C. (2003) Multiple hypothesis testing in microarray experiments. Statistical Science 18, 71103.
  • Edgington, E. S. (1987) Randomization Tests. Marcel Dekker, New York .
  • Ernst, M. D. (2004) Permutation methods: A basis for exact inference. Statistical Science 19, 676685.
  • Faulkner, G. J., Kimura, Y., Daub, C. O., Wani, S., Plessy, C., Irvine, K. M., Schroder, K., Cloonan, N., Steptoe, A. L., Lassmann, T., Waki, K., Hornig, N., Arakawa, T., Takahashi, H., Kawai, J., Forrest, A. R. R., Suzuki, H., Hayashizaki, Y., Hume, D. A., Orlando, V., Grimmond, S. M. & Carninci, P. (2009) The regulated retrotransposon transcriptome of mammalian cells. Nature Genetics 41, 563571.
  • FBAT Toolkit Team (2004) Family Based Association Testing software. URL http://www.biostat.harvard.edu/~fbat/default.html
  • Fisher, R. A. (1935) The Design of Experiments. Edinburgh : Oliver and Boyd.
  • Freedman, D. & Lane, D. (1983) A nonstochastic interpretation of reported significance levels. J. Bus. Econom. Statist. 1, 292298.
  • Hauck, W. W. & Donner, A. (1977) Wald's test as applied to hypotheses in logit analysis. Journal of the American Statistical Association 72, 851853.
  • Higgins, J. J. (2004) An Introduction to Modern Nonparametric Statistics. Pacific Grove , CA : Thomson, Brooks/Cole.
  • Hu, Y., Li, L., Seidelmann, S. B., Timur, A. A., Shen, P. H., Driscoll, D. J. & Wang, Q. K. (2008) Identification of association of common aggf1 variants with susceptibility for klippel-trenaunay syndrome using the structure association program. Annals of Human Genetics 72, 636643.
  • Laird, N. M., Horvath, S. & Xu, X. (2000) Implementing a unified approach to family-based tests of association. Genetic Epidemiology 19(Suppl 1), 3642.
  • Leak, T. S., Perlegas, P. S., Smith, S. G., Keene, K. L., Hicks, P. J., Langefeld, C. D., Mychaleckyj, J. C., Rich, S. S., Kirk, J. K., Freedman, B. I., Bowden, D. W. & Sale, M. M. (2009) Variants in intron 13 of the elmo1 gene are associated with diabetic nephropathy in african americans. Annals of Human Genetics 73, 152159.
  • Mei, L., Li, X., Yang, K., Cui, J., Fang, B., Guo, X. & Rotter, J. I. (2007) Evaluating gene × gene and gene × smoking interaction in rheumatoid arthritis using candidate genes in gaw15. BMC Proceedings 17.
  • Mukherjee, B. & Chatterjee, N. (2008) Exploiting gene-environment independence for analysis of case–control studies: An empirical bayes-type shrinkage estimator to trade-off between bias and efficiency. Biometrics 64, 685694.
  • R Development Core Team (2007) R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna , Austria . ISBN 3-900051-07-3. URL http://www.R-project.org
  • Rana, B. K., Insel, P. A., Payne, S. H., Abel, K., Beutler, E., Ziegler, M. G., Schork, N. J. & O’Connor, D. T. (2007) Population-based sample reveals gene-gender interactions in blood pressure in white americans. Hypertension 49, 96106.
  • Ritchie, M. D., Hahn, L. W., Roodi, N., Bailey, L. R., Dupont, W. D., Parl, F. F. & Moore, J. H. (2001) Multifactor-dimensionality reduction reveals high-order interactions among estrogen-metabolism genes in sporadic breast cancer. Am J Hum Genet 69, 138147.
  • Ruczinski, I., Kooperberg, C. & LeBlanc, M. L. (2003) Logic regression. Journal of Computational and Graphical Statistics 12, 475511.