Statistical Tests for Detecting Rare Variants Using Variance-Stabilising Transformations


  • Kai Wang,

    Corresponding author
    1. Department of Biostatistics, College of Public Health, The University of Iowa, Iowa City, IA, USA
      KAI WANG, PhD, Department of Biostatistics, N322 CPHB, College of Public Health, University of Iowa, Iowa City, IA 52242. Tel: (319) 384-1594; Fax: 319 384 1591; E-mail:
    Search for more papers by this author
  • John H. Fingert

    1. Department of Ophthalmology and Visual Sciences, Carver College of Medicine, The University of Iowa, IA, USA
    Search for more papers by this author

KAI WANG, PhD, Department of Biostatistics, N322 CPHB, College of Public Health, University of Iowa, Iowa City, IA 52242. Tel: (319) 384-1594; Fax: 319 384 1591; E-mail:


Next generation sequencing holds great promise for detecting rare variants underlying complex human traits. Due to their extremely low allele frequencies, the normality approximation for a proportion no longer works well. The Fisher’s exact method appears to be suitable but it is conservative. We investigate the utility of various variance-stabilising transformations in single marker association analysis on rare variants. Unlike a proportion itself, the variance of the transformed proportions no longer depends on the proportion, making application of such transformations to rare variant association analysis extremely appealing. Simulation studies demonstrate that tests based on such transformations are more powerful than the Fisher’s exact test while controlling for type I error rate. Based on theoretical considerations and results from simulation studies, we recommend the test based on the Anscombe transformation over tests with other transformations.


With the recent advances in high-throughput sequencing technologies, the time is ripe to unveil the genetic variants that are missing from conventional genome-wide association studies that target common variants. These variants are believed to be extremely rare in the general population with disease allele frequencies between 1% and 5% or even lower (Schork et al., 2009). Such a low frequency makes it challenging to conduct valid and powerful statistical inference. The normal approximation to a proportion no longer works well (Zar, 1999, Section 13.3) and the Fisher’s exact test is conservative (Altham, 1969; Howard, 1998, see also our simulation results presented later) .

The prevailing methods for association analysis of variants are gene- or set-based. Typically single-nucleotide polymorphisms (SNPs) in a gene or a set are collapsed in various ways (Li & Leal, 2008; Sun et al., 2011) to enrich the genetic information content such that conventional association techniques are applicable. The kernel-based association test (Kwee et al., 2008; Wu et al., 2010) provides an alternative to combining multiple rare variants.

Although testing for gene- or set-based association is of great interest, it is of even greater interest to pin-point the variants with high precision, for example, through single-marker analysis. However, single-marker methods specially designed for rare variants are surprisingly rare. The Uniform-test and the Beta-test (Li et al., 2010) might be the only ones available. These two tests adopt a sequential Bayesian approach. The prior distribution of the parameter is trained using the control sample and then fed into the case sample. One disadvantage of this approach is that switching the case sample and the control sample will lead to a different result.

The goal of this report is to explore an alternative approach to association testing of rare variants in case-control studies. This approach builds upon the rich literature on variance-stabilising transformations of a proportion such that the distribution of the transformed proportion is much better approximated by a normal distribution than that of the proportion itself, especially when the proportion is close to 0 or 1.


Let r be the probability that an individual carries at least one copy of a variant. If there are X such carriers out of n individuals, then a natural estimate of r is inline image. An estimate of the variance of inline image is inline image. Note that this quantity depends on the value of inline image. If the variant is rare, r would be very close to 0 such that inline image with a non-negligible probability. So the distribution of inline image is skewed and the normality approximation fails. To get around of this problem, one approach is to use the angular transformation defined as (Mosteller & Tukey, 1949; Zar, 1999)


This transformation is very natural (Mosteller & Tukey, 1949) considering that, for an angle inline image, there is inline image. The appealing feature of this transformation is that the variance of inline image can be estimated by 1/4n. That is, it no longer depends on inline image.

A better transformation that has the same feature is the Anscombe transformation (Anscombe, 1948)


Anscombe transformation is motivated by optimising the variance of the following transformation over d1, d2, and c


When d1=3/4, d2=1/2, and c=3/8, its variance is 1/4+O(n−2), which is of the lowest order. The variance of inline image is estimated by 1/4(n+1/2). Another variation is the Freeman and Tukey transformation (Freeman & Tukey, 1950)


It has been found that its variance is within 6% of 1/(n+1/2) (Freeman & Tukey, 1950).

Although much better approximated by normality, especially for low frequency r, than the angular transformation, the Ansbombe transformation and the Freeman–Tukey transformation are not invariant to proportional change in X and n. For instance, the transformed value for X=2 and n=100 is different from X=20 and 1000 although the ratio X/n remains the same. The following transformation by Chanter (1975) avoids this phenomenon


where d=1/4n. Its variance is estimated by 1/4n.

All these transformations share the feature that their variances do not depend on the value of r. Their distributions are better approximated by normal distributions than the distribution of inline image. Application of these transformations to case-control association testing is straightforward. Let A denote the allele of minor frequency and a the other allele. Using subscript 1 for cases and subscript 2 for controls, the data from a case-control study can be summarised in Table 1.

Table 1.  Summary of data from a case-control study.
Disease statusGenotypeTotal
aaaA or AA
Case X 1 n 1X1 n 1
Control X 2 n 2X2 n 2

Because the cases are independent of the controls, the variance of the difference of the transformed values between cases and controls is the sum of their variance in each sample. Four tests can be defined as follows, one corresponding to an aforementioned transformation:




Under the null hypothesis that there is no association between the variant and affection status, all four tests asymptotically follow a chi-square distribution with 1 degree of freedom.

Substituting X1 and X2 by n1X1 and n2X2 (the number of noncarriers) in these four tests, respectively, does not change their values as it changes only the sign of the difference in the numerators. We show this below by using the identity inline image. For the angular test,


The derivation for the Anscombe test and the Chanter test is similar because




For the FT test, note that


Simulation and Application

The simulation setup is identical to that used by Li et al. (2010). The Hardy–Weinberg equilibrium is assumed for the general population. Let p denote the population frequency of minor allele A. Let a denote the major allele. The population genotype frequency of genotypes aa, aA, and AA are (1−p)2, 2p(1−p), and p2, respectively. Because allele A is assumed to be rare, genotypes for the controls are simulated using their population frequencies. The genotype frequencies in cases are inline image, and inline image, respectively. Here, inline image is the relative risk of genotype aA versus genotype aa, inline image is the relative risk of genotype AA versus genotype aa, and inline image. The values of p used in the simulation are 0.005, 0.006, 0.007, 0.008, 0.009, 0.01, 0.02, 0.03, 0.04, and 0.05. Given a significance level, the rejection rate is computed as the proportion of rejections over 10,000 simulation replicates.

To study type I error rate, we set inline image. The rejection rates of the angular test, the FT test, the Anscombe test, the Chanter test, and the Fisher’s exact test are reported in Table 2 for 250 cases and 250 controls and in Table 3 for 500 cases and 500 controls. It is clear that (1) Fisher’s exact test is too conservative, especially when the minor allele frequency is low; (2) The angular test tends to be anticonservative, and (3) the FT test, the Chanter test, and the Anscombe test have similar performance. All have size close to the nominal significance level. However, the FT test and the Chanter test are sometimes a little bit anticonservative when the significance level is low (i.e., 0.001, 0.005, and 0.01) and the allele frequency is at the lower end of the range.

Table 2.  Type I error rate computed as the proportion of rejections over 10,000 simulation replicates. There are 250 cases and 250 controls.
Significance levelAllele frequencyTest statistic
Table 3.  Type I error rate computed as the proportion of rejections over 10,000 simulation replicates. There are 500 cases and 500 controls.
Significance levelAllele frequencyTest statistic

To study the power, a dominant model (inline image) and a multiplicative model (inline image) are considered. The rejection rates at significance level 0.05 are graphed in Figure 1 for the dominant model and in Figure 2 for the multiplicative model. The rejection rates at significance level 0.01 are also graphed for these two models, respectively (Figures 3 and 4). Because the angular test has inflated type I error rate, it is not included in the power analysis. For both models, the FT test, the Chanter test, and the Anscombe test have very similar power performance. All are more powerful than the Fisher’s exact test.

Figure 1.

Power at significance level 0.05. Dominant model (inline image).

Figure 2.

Power at significance level 0.05. Multiplicative model (inline image).

Figure 3.

Power at significance level 0.01 for the dominant model (inline image). There are 10,000 simulation replicates.

Figure 4.

Power at significance level 0.01 for the multiplicative model (inline image). There are 10,000 simulation replicates.

Like Li et al. (2010), we also used the data from two empirical studies. In a study of association between copy number variation and schizophrenia (Need et al., 2009), 14 copy number variants were observed in 1013 cases and three in 1084 controls. The p-values are 0.003831 for the FT test, 0.003621 for the Anscombe test, 0.003358 for the Chanter test, and 0.005915 for the Fisher’s exact test. The FT test, the Anscombe test, and the Chanter test all lead to a smaller p-value than the Fisher’s exact test. The other example is on the association between microduplications and schizophrenia (McCarthy et al., 2009). There are nine objects with microduplications in 2645 cases and one object in 2420 controls. The p-values are 0.01183 for the FT test, 0.01107 for the Anscombe test, 0.009577 for the Chanter test, and 0.02258 for the Fisher’s exact test. Again, the former three tests have smaller p-values.


In this report, we have investigated the utility of variance-stabilising transformations in detecting rare variants in case-control genetic association studies. Due to the extremely low frequency of rare variants, the regular normal distribution approximation in deriving the limiting distribution of a test statistic no longer works well. Even the Fisher’s exact test deviates from the nominal significance level (by being conservative) as demonstrated here and elsewhere (Altham, 1969; Howard, 1998). By using a variance-stabilising transformation, the variance of the effect size no longer depends on the sample proportion and the precision of normality approximation is dramatically improved.

The Fisher’s exact test reports p-values only for the outcomes that are as extreme or more extreme than the observed outcome. When the observed outcome is extreme itself, the number of as extreme or more extreme outcomes is very limited. The p-values are thus highly “discretised” and become conservative. In contrast, the proposed methods interpolate p-values between two possible outcomes. The ordinary normal approximation that is alternative to the Fisher’s exact test also does such interpolation, but performs worse than the proposed tests.

Current methods for rare variant association studies avoid the issue of rarity by using various collapsing methods within a gene or a SNP set. The only single marker analysis method we know of that is specifically designed for association studies on rare variants is the Bayesian method (Li et al., 2010). However, it is not symmetric about the case sample and the control sample—switching the samples will result in a different test result. In comparison, the tests based on variance-stabilising transformations are computationally very efficient due to their explicit expressions.

Variance-stabilising transformations for a proportion have been long studied and have been used extensively in fields, such as ecology. Their application to association studies of rare variants is not only conceptually appealing but also, as demonstrated by our research, very promising. The performance of the proposed tests is much better than the Fisher’s exact test in terms of type I error rate and power. We note that their type I error rate is less satisfactory when the allele frequency is in the lower end of the range considered in the simulation. It has been demonstrated that variance-stabilising transformations do not work well for proportions close to 0 (Chanter, 1975). This observation is in support of the importance of variance stabilisation in conducting valid inference.

Our simulation findings confirm previous reports (Chanter, 1975) that the Anscombe transformation achieves variance stabilisation starting at a lower proportion. We recommend the Anscombe test over the angular test, the FT test, the Chanter test, and the Fisher’s exact test for association studies on rare variants.

It should be noted that although variance-stabilising transformations lead to statistics that have higher power than the Fisher’s exact method, the power increase is moderate and diminishes as the minor allele frequency or the sample size increases. When the variant is rare, the power of these variance-stabilising transformation based methods is also low, just like any other methods. This is a limitation of single marker analysis on rare variants. There have been a large number of “collapsing” or “burden” methods by considering a larger genomic unit. A recent review of these methods is provided in Dering et al. (2011). Another review is Bansal et al. (2010).

We focused on single marker analyses as the main goal of this report is to promote the use of variance-stabilising transformations in association studies of rare variants. In addition, the single marker analysis is by itself a very important case but its usefulness in rare variant association studies is under-estimated because previous studies are typically based on untransformed allele frequencies and are under powered. However, the application of the variance-stabilising transformations to gene-based analysis is straightforward if the rare variants are simply collapsed using either indicator coding or dosage coding (Dering et al., 2011). In collapsing, a variant is often weighted inversely to its variance inline image, where inline image is its estimated frequency either in the controls or in the combined sample of cases and controls. It is interesting to note that the first-order derivative of the arcsin function is


Applying Taylor expansion, inline image can be approximated by inline image when p is close to 0. If the variance-stabilising transformations are applied to each rare variant before collapsing, a weighting is automatically achieved.

All the tests based on the variance-stabilising transformations discussed in this report have been implemented in the iGasso package which is available at the R repository (URL:


We thank two anonymous reviewers and a handling editor for their helpful comments. This research was supported in part by the National Institutes of Health (NEI R01EY018825).