Population genetic simulation study of power in association testing across genetic architectures and study designs

Abstract While it is well established that genetics can be a major contributor to population variation of complex traits, the relative contributions of rare and common variants to phenotypic variation remains a matter of considerable debate. Here, we simulate genetic and phenotypic data across different case/control panel sampling strategies, sequencing methods, and genetic architecture models based on evolutionary forces to determine the statistical performance of rare variant association tests (RVATs) widely in use. We find that the highest statistical power of RVATs is achieved by sampling case/control individuals from the extremes of an underlying quantitative trait distribution. We also demonstrate that the use of genotyping arrays, in conjunction with imputation from a whole‐genome sequenced (WGS) reference panel, recovers the vast majority (90%) of the power that could be achieved by sequencing the case/control panel using current tools. Finally, we show that for dichotomous traits, the statistical performance of RVATs decreases as rare variants become more important in the trait architecture. Our results extend previous work to show that RVATs are insufficiently powered to make generalizable conclusions about the role of rare variants in dichotomous complex traits.

validation simulations, these simulations are generally not comparable and have their own flaws. (Moutsianas et al., 2015) systematically characterized the performance of commonly used gene-based RVATs under a range of genetic architectures, sample sizes, variant effect sizes, and significance thresholds, and found that MiST, SKAT-O, and KBAC have the highest mean power across simulated data, but that these tests had overall low power even in the cases of loci with relatively large effect sizes.
It is well-known in the population genetics literature that population expansions and contractions (i.e., demography) can dramatically affect genome-wide patterns of genetic variation in a population (Auton et al., 2009;Bhaskar, Wang, & Song, 2015;Gravel et al., 2011;Uricchio, Zaitlen, Ye, Witte, & Hernandez, 2016), and that the action of natural selection can amplify or inhibit the frequency of functional alleles (Boyko et al., 2008;Eyre-Walker, Woolfit, & Phelps, 2006;Lohmueller et al., 2011). Together, these evolutionary forces shape the genetic architecture of complex traits (Lohmueller, 2014;Uricchio et al., 2016), and are critical components to understand in the pursuit of identifying the genetic basis for the bevy of human phenotypes under study. Inferred demographic models of non-African human populations show a serial bottleneck model as populations migrated in waves across the globe, followed by explosive exponential growth since the dawn of agriculture. Moreover, studies of selection have found that most amino acid changes in proteins and changes in conserved noncoding loci are weakly deleterious (Boyko et al., 2008;Torgerson et al., 2009). Together, growth and selection have resulted in a preponderance of ultra-rare mutations (MAF < 0.1%), which contribute a plurality of heritability in gene expression (Hernandez et al., 2019), BMI (Wainschtein et al., 2019), and possibly other traits. Accounting for demographic and selective effects on the frequency spectrum of causal variation is therefore crucial in characterizing the statistical power of RVATs. However, while previous evaluations of RVAT power have attempted to mimic the frequency spectrum of observed variants, they typically use phenotype models (or genetic architectures) that do not directly account for evolutionary forces like demography and natural selection and are often biologically unrealistic (for example, effect sizes that are simple functions of the minor allele frequency; Wu et al., 2011), limited to specific relative risks (Wray & Goddard, 2010), or lack pleiotropy (Moutsianas et al., 2015).
Another vital component of designing genetic association studies is the method of acquiring genetic data. Although the gold standard for capturing rare variation remains deep whole-genome sequencing (WGS), the $1000 per genome cost still means performing WGS on any sizeable group of individuals remains prohibitively expensive for all but the largest consortia. Genotyping arrays make acquiring genetic data for a large number of individuals significantly less expensive, but lack coverage of rare variation. With larger WGS reference panels like the Haplotype Reference Consortium (HRC; McCarthy et al., 2016), large numbers of genotyped samples can be imputed to gain some insight into rare variation. With such large reference panels, imputation accuracy of genetic variation down to MAF ≅ 0.1% is near perfect in European individuals (Quick et al., 2019). As more diverse reference panels become available (for example, TOPMed; Taliun et al., 2019), imputation in non-European and admixed populations will also improve, particularly for rare variants. Capturing these rare variants using genotyping arrays and imputation is more cost-effective and can lead to many more individuals in a study. However, imputation is limited by the variants that are carried by the individuals in the reference panel, and by the accuracy of the algorithm being used. Imputation accuracy falls off at lower minor allele frequencies (MAF), but the use of large WGS reference panels reduces the threshold of acceptable imputation quality (r 2 > 0.3) to~0.004-0.006% (Taliun et al., 2019) in European and African populations. Despite these limitations, imputation has been used to identify rare variant associations in acute macular degeneration (Helgason et al., 2013), lipid levels in type 2 diabetes patients (Marvel et al., 2017), systemic lupus erythematosus (Martínez-Bueno & Alarcón-Riquelme, 2019), among others. It is possible that additional rare variant association signals can be found in imputed data as imputation quality improves, but it is unclear what the statistical properties of RVATs in this setting are.
Here, we evaluate the statistical power of rare variant association tests in a simulation study under different genetic architectures, methods of acquiring genetic data, and methods of selecting individuals to be a part of the casecontrol cohort. We demonstrate how statistical power of RVATs is dependent on genetic architecture as well as a sampling strategy for the case/control cohort. In particular, we find that sampling the extremes of a quantitative phenotype has the highest RVAT power, but power erodes quickly for all sampling strategies as the amount of genetic variance explained by rare variants increases.

| Simulating genomic sequence data
We simulate neutral genetic sequence data under a coalescent model using msprime (Kelleher, Etheridge, & McVean, 2016) with a European and African demographic history (Tennessen et al., 2012). Under this demographic model, the European population experienced a series of bottlenecks as they moved out of Africa and into Europe. These bottlenecks were followed by super-exponential growth in the European population and recent exponential growth in the African population, along with bidirectional migration. Using this neutral demographic model, we generate a 5 Mb region with a mutation rate of 1e−8 and with genetic map arbitrarily chosen to mimic chr22:17000000-22000000 in hg19.

| Simulating genotype data
Some analyses are based on genotype array data. To simulate a genotyping array, we downsample the simulated neutral sequence data above to match the allele frequency spectrum and the average distance between variants of the Illumina OmniExpress2.5 genotyping chip, used in the GoT2D study (Fuchsberger et al., 2016).

| Simulating quantitative phenotypes
We transform our simulated neutral genetic data into quantitative phenotypes using a three-step procedure, following Uricchio et al (Uricchio et al., 2016). First, we simulate functional variants using the forward simulator SFS_CODE (Hernandez, 2008;Uricchio et al., 2015) under the same demographic model as above, but with purifying selection. Specifically, we generate 2000 independent loci of length 100kbp (for a total of 200 Mb) with 100,000 individuals, where new mutations receive a fitness effect drawn from a gamma distribution (as inferred for nonsynonymous sites; Boyko et al., 2008). This procedure generates a large table of functional variants, with corresponding derived allele counts and fitness effects.
The second step is to project the allele frequencies of our list of functional variants down to the desired sample size (using a binomial model), and transform fitness effects to phenotypic effect sizes using the Uricchio et al. (2016) model. This model parameterized the correlation between fitness effects and phenotypic effect sizes (through ⍴) and the functional relationship between fitness effects and phenotypic effect sizes (through τ and δ). In particular, a causal variant with fitness effect s will have effect size z s as follows: Under this model, with probability ρ, the effect size z s of a site is a direct function of the site's fitness effect (s), otherwise, the effect size z s is a function of a randomly sampled fitness effect (s r ) drawn from the entire list of functional variants generated by the first step above. In this model, δ is +1 or −1 with equal probability to enable trait-increasing and trait-decreasing effects.
The third step for generating quantitative phenotypes is to identify the desired number of causal loci in our 5 mb simulated sequence. For each variant within the causal loci, we sample a random variant from our list of functional variants generated in step two with the exact same allele frequency and assign derived alleles at this causal site the effect size of the sampled functional variant. The quantitative phenotype of each individual (Y i , for the ith individual) is then generated under an additive model by summing the effect sizes of all causal alleles that they carry Where z j is the effect size of causal variant j, X ij is the number of causal alleles carried by individual i at site j, and ϵ is a Normal random variable with mean 0 and variance σ environment 2 (which ensures the desired level of heritability of the trait). See Table 1 for the specific values of ⍴, τ, and heritability that are evaluated in this study. In contrast to previous work with this phenotypic model (Uricchio et al., 2016), we will focus on dichotomous traits, and describe our sampling strategy for such traits below. (Tables 2 and 3) The quantitative phenotypes can be dichotomized to simulate three different sampling strategies: random, 50/ 50, and extremes. In the extreme sampling strategy, we sample the desired number of individuals from the top and bottom of the quantitative phenotype distribution. For the random and 50/50 sampling strategy, we first define the individuals with quantitative phenotypes in the top P% to be our population of cases (where P represents the prevalence of our trait of interest), and the remaining individuals to be our population of controls. We then sample cases and controls from their respective populations. For the random sampling strategy, we sample cases in proportion to the prevalence of the trait, while for the 50/50 sampling strategy we sample equal numbers of cases and controls. The random sampling strategy is used as a worst-case scenario to establish the worst possible power under that sampling strategy.

| Imputing genotyped data
In some analyses, we evaluate the effectiveness of genotype imputation. Such analyses require two data sets: the phenotype sample (e.g., case/control or continuous phenotype), and an imputation reference panel. The dichotomized phenotype individuals are generated as above, with their genetic data down-sampled to mimic a genotype array platform. We then sample an additional set of individuals from the total population to form the imputation panel. The down-sampled genotype data is then prephased using SHAPEIT2 (Delaneau, Marchini, & Zagury, 2012) and imputed using IMPUTE4 (Bycroft et al., 2018).

| Running tests of association on simulated data
We ran rare variant association tests (RVATs) using the rvtests software (Zhan, Hu, Li, Abecasis, & Liu, 2016). We focus on SKAT (Wu et al., 2011), SKAT-O (Lee et al., 2012), and KBAC (Liu & Leal, 2010), which were found to be most powerful in detecting disease-associated variation in a previous study (Moutsianas et al., 2015). We applied each RVAT to nonoverlapping analysis blocks of 10kbp across the simulated region, and computing power and false-positive rates for each test as the proportion of simulations with p-values below 2.5e −6. We ran logistic regression on each variant above MAF = 1% to determine associations with the phenotype using PLINK. The detection threshold was set at 5e−8. To compare GWAS to RVAT power, we evaluate if there is a variant under the GWAS p-value threshold within the 10 kb analysis block. If there is such a variant, we deem the GWAS to have found that analysis block to be causal for comparisons with RVAT.

| Calculating the cumulative genetic variance
We follow (Uricchio et al., 2016) in calculating V x , the genetic variance due to variants at or below allele frequency x, which is given by: Where f y ( ) is the site frequency spectra (SFS), that is the proportion of sampled alleles at frequency y, and E z |y ( ) 2 is the mean-squared effect size of variants at frequency y.
We pool 20 simulations of 300kbp in 50k African individuals using msprime to obtain an accurate measure of the SFS and the expected effect size of variants at frequency x. To normalize across genetic architectures, we divide by V 1 , which is the total additive genetic variance. The V V / 0.01 1 values (denoted as just V 0.01 below) are used to denote the degree to which rare variants (variants with MAF ≤ 1%) matter under a particular pair of parameters under the Uricchio genetic architectures.  Table 1). In Figure 1, we show that the proportion of genetic variance explained as a function of MAF. We focus on the genetic variance explained by variants with MAF < 1% (V 0.01 ), which varies dramatically between 99% when ⍴ = 1, τ = 1, to less than 1% when ⍴ = 0, τ = 0.5. We note that when τ = 1 and ⍴ ≠ 0, rare variants constitute a substantial fraction of the genetic variance (V 0.01 > 40%), and the majority of the rare variant contribution is explained by singletons in this simulated sample of 50,000 individuals. In contrast, when τ = 0.5 and ⍴ ≠ 0, V 0.01 ranges from~20%-60% but singletons are expected to make a more subtle contribution to the genetic architecture of the trait.

| Statistical power varies
dramatically across different study designs, genetic architectures, and polygenicity, but not across RVATs Figure 1 shows that rare variants can contribute substantial heritability to a trait under certain genetic architectures. Now we ask if we can detect the loci that harbor the causal rare variants using existing RVATs. To quantify the effects of genetic architecture and study design on the statistical power of RVATs, we focus on KBAC, SKAT, and SKAT-O, which represent each of the three major categories of RVATs and have been shown to be among the most powerful (Moutsianas et al., 2015). For the 5 Mb region we simulated (see Methods), we raster over parameters in genetic architecture (heritability, number of causal loci, and the relationship between selection and phenotypic effect sizes; Table 1) and in study design (sequencing vs genotype imputation and selection of individuals in the case/control vs extreme phenotype panels). In Figure 2, we show the global overview of statistical power across all simulations. We find that the statistical power of all three RVATs is similar regardless of simulated parameters but tend to be highest with SKAT As expected, power is higher when the causal signal is more concentrated (e.g., when heritability is high or the effect sizes are large due to few causal loci). Given the correspondence among tests, we will focus on SKAT-O in further analyses.

| As rare variants explain more genetic variance of the trait, SKAT-O power decreases
We then ask how SKAT-O power changes as a function of genetic architecture. In Figure 3, we show that as V 0.01 increases (i.e., as rare variants explain increasing amounts of the genetic variance of the trait), the power of SKAT-O decreases. This pattern holds across all sampling strategies and for different levels of polygenicity ( Figure S3). These results show patterns that will repeat in future sections: the extremes study design demonstrates the best overall power, followed by 50/50 and then random. Further, a more concentrated signal (higher heritability and/or lower number of causal loci, see Supporting Information Figures) improves power. We found that as the functional form relating effect size to selection coefficient changes from τ = 0.5 to τ = 1, power increases slightly again, suggesting that V 0.01 may be an overly simplistic characterization of the genetic architecture. Finally, applying SKAT-O to imputed data (bottom facet) reproduces all of the patterns we see when RVATs are applied to sequencing data (top facet), albeit with slightly worse power.
3.4 | Using extreme cases and controls as a sampling strategy improves the statistical power of SKAT-O The number of individuals sequenced as part of a study is a key design parameter of that study. To understand how increasing the number of individuals improves the statistical power of SKAT-O, we simulate across genetic architectures and study designs to find the increase in power per individual using SKAT-O from 2,500 to 20,000 individuals ( Figure 4). In the extremes study design, where half of the individuals in the panel are selected from the extreme cases and half of the individuals are selected from the extreme controls (from a total population of 50,000 individuals), we find that mean power gain is zero. Increasing the number of individuals in this design means more individuals are drawn from closer to the mean of the distribution, so power is already maximized with a smaller sample of 2,500 individuals (and may actually decrease under some scenarios). In the random and 50/50 study designs, increasing the size of the case/control panel increases the number of relevant individuals, and so mean power gain is approximately 2e−5 per individual added. This increase is highly dependent on the genetic architecture underlying the trait of interest.
3.5 | RVATs perform nearly as well on imputed data as they do on sequence data Most genetic association studies have started with genotyping arrays to collect genomic data, followed by imputation against WGS reference panels to maximize discovery potential with single variant analyses. As WGS cost falls, more studies will conduct large-scale WGS, but here we ask if there is a potential opportunity to discover rare variant associations with imputed data. In Figure 5, we compare the mean power of SKAT-O when applied to genotypedthen-imputed samples to the mean power of SKAT-O applied to sequencing data from the same samples. We find that the decrease in power is minimal. Indeed, we find a robust linear relationship between RVAT power with sequencing versus imputed data, suggesting that for all scenarios evaluated here, imputation loses 10% power, on average, compared to sequencing data.

| RVATs under a GWAS peak
The general process of discovering genetic associations typically begins with genotyping and imputing a sample of individuals, followed by GWAS. The (typically unknown) genetic architecture of the trait determines the likelihood that a common variant will be detected with GWAS, and whether a rare variant association signal should be expected.
Rastering over parameters of our phenotype model, a genome-wide significant single marker association (GWAS) was identified at 44.4% of causal loci. Figure 6 shows the power of SKAT-O using sequencing or imputed data conditional on seeing (circles) or not seeing (x's) a statistically significant GWAS hit at a causal locus. We find that under all phenotype model parameters and sampling strategies evaluated, when a GWAS hit is identified, SKAT-O has at least 70% power to detect a rare variant signal with sequence data (and slightly less power with imputed data). If no GWAS peak is identified, there is considerably less power to identify a rare variant signal (and power further erodes as the genetic variance explained by rare variants increases). We then mimic the process of first doing locus discovery on a sample of imputed individuals followed by sequencing for different sampling strategies. In Figure 7, we show that sequencing data has at least 75% power to replicate causal loci identified with imputed data (regardless of the genetic architecture and case-control sampling strategy). However, when no association is found with imputed data, the power to identify causal loci with sequencing data is highly dependent on the case-control sampling strategy, and the overall heritability and genetic architecture of the trait (with power generally decreasing as V 0.01 increases).

| Window of discovery around causal loci
In Figure 8, we plot the probability of SKAT-O detecting an association signal as a function of the distance from a  Table 1 using 50,000 African individuals simulated under an Out-of-Africa demographic model. Each point represents a genetic architecture tested with 10 independent simulations under the RVAT indicated; lines connect the same simulated parameters across RVATs to show that, generally speaking, the rank of statistical power is preserved across RVATs causal locus. To benchmark the width of this discovery window, we use the full-width half-maximum statistic, which is the distance at which the probability of a significant association crosses below 50% of its maximum value (i.e., falls below 50% of the power estimated at the causal locus). Consistent with previous results, the fullwidth half-maximum is largest when there is a large amount of heritability concentrated in few causal loci and under the extremes study design. The larger points in Figure 8 represent this window of discovery, which is, on average, 34.3 kb (sd 18.4 kb) in the random study design, 42.8 kb (sd 19.3 kb) in the 50/50 design, and 64.3 kb (sd 34.2 kb) in the extremes design.

| DISCUSSION
GWAS so far have produced thousands of SNP associations for hundreds of traits (Witte, 2010). However, in these GWAS, the associated SNPs do not recapitulate the estimated heritability of the trait, leading to the problem of "missing heritability". Though there are many proposed sources of this missing heritability, one popular hypothesis is that this missing heritability resides in rare variants. This has led to the development of RVATs and massive investment in large whole-genome sequencing studies. With these tests and this data becoming more and more prevalent, we look at how to optimize the F I G U R E 3 The statistical power of SKAT-O across different sampling strategies (columns) and across different sequencing methods (rows), as a function of the proportion of genetic variance explained by that genetic architecture at MAF = 1%. Each point represents 20 independent simulations of 100 causal loci of 10 kb each across a 5 Mb simulated region for a given genetic architecture for a 50,000 individual African population design of a rare variant association study to maximize power.
It is clear that RVATs can be very powerful for detecting associations under simple genetic architectures (like when the effect size is proportional to log 10 (MAF) as proposed by Wu et al., 2011). Such phenotype models do not take into account evolutionary forces like natural selection and demography, and it is well appreciated that genetic architectures are sensitive to these nonequilibrium evolutionary forces (Gazave, Chang, Clark, & Keinan, 2013;Simons, Turchin, Pritchard, & Sella, 2014). Uricchio et al. (2016) presented a phenotype model that accounts for selection and pleiotropy and showed that existing RVATs struggle at realistic variance explained in genes across different human demographic histories. The Uricchio model captures modularity through the parameter ρ and the relationship between selection and effect size through τ, which enables a thorough exploration of different genetic architectures a trait could have (Figure 1).
We showed analytically that there is a significant amount of genetic variance explained in rare variants across different τ (ρ, ) parameterizations under the Uricchio model (Figure 1), particularly when τ is equal to 1. These results are not surprising, as it has been shown that a substantial amount of heritability derives from rare variants in real traits like gene expression (Hernandez et al., 2019), height and BMI (Wainschtein et al., 2019). Taken together, the significant amount of heritability explained by rare variants under different parameterizations of the Uricchio model shows that RVATs have the potential to associate much of the causal variation underlying a complex trait. However, this model has only been studied in the context of continuous traits. We extend this model to study dichotomous traits (with case/control and extreme phenotype sampling strategies).
Many existing RVATs were thoroughly characterized by (Moutsianas et al., 2015). We chose the most powerful representatives of the three classes of RVATs to use in our study: A variance-component test (SKAT), a burden method (KBAC), and a combined method (SKAT-O). Across all genetic architectures and study designs, we found that SKAT-O is the best performer, so we used SKAT-O in all further analyses on RVAT power in a case/ control association study.
To run a case/control association study, the first step is to determine which individuals to select for your study, and how to acquire their genetic data. We simulated three different sampling strategies: randomly sampling cases and controls proportional to the trait prevalence; sampling half of your study size from cases and half from your controls; and sampling individuals from the extreme tails of a quantitative distribution. Our results show that genetic architectures using imputed data compared to using sequence data. Each point represents a different simulated genetic architecture where we vary the number of causal bins (10 or 100), heritability (0.2 or 0.8), sampling strategy, (⍴, τ) for the underlying phenotype distribution, and the number of simulated case/control individuals in the study choosing from the tails of an underlying quantitative distribution produces the best power (such as sequencing individuals with the highest/lowest high-density lipoprotein cholesterol; Cohen, 2004, or bronchodilator response;Spear et al., 2018). This means for any case/ control association study, spending some time to find the extreme tails of an underlying quantitative distribution for a trait will likely produce the best possible RVAT power (as previously argued using more constrained simulations; Barnett, Lee, & Lin, 2013).
We considered two ways of acquiring genetic data: using a genotyping array followed by imputation against a large reference panel, and direct sequencing of your study sample. Although a $1,000 whole genome is now possible, over the sample sizes required for an effective rare variant association study, the cost is prohibitive except for the largest consortia. Using genotyping arrays then imputing is still much less expensive than WGS (Quick et al., 2019), which could enable more than 5× more genotyped samples than WGS samples.
Applying SKAT-O to imputed data is expected to have lower power for several reasons. First, imputation accuracy decreases as MAF decreases (Howie, Marchini, Stephens, & Chakravarti, 2011;Quick et al., 2019), meaning fewer rare variants will be accurately imputed and correctly identified in the study sample. Second, imputation accuracy is highest when the study sample population and the reference panel population match, F I G U R E 6 The statistical power of GWAS given the results of SKAT-O, across different sequencing methods (rows) and across different sampling strategies (columns), as a function of the cumulative genetic variance explained by variants under 1% minor allele frequency. The shape shows the prediction of SKAT-O; the colors show the underlying number of causal loci and heritability of the trait. GWAS, genome-wide association studies and this is not guaranteed to be the case, particularly when the study sample is from a minority population or an admixed population. Third, a majority of rare variants carried by the imputed samples are unlikely to be carried by the reference panel.
Comparing SKAT-O power across genetic architectures and study designs, we show that genotyping then imputing is about 90% as powerful as WGS using the same number of individuals. This implies that using genotyping then imputing with a larger sample size could produce as much if not more power than a smaller WGS sample. For most current rare variant association studies, our results suggest that using genotyping then imputing is the best way to start. We also looked at the increase in SKAT-O power using WGS after running a genotyping and imputation study; there is a boost in SKAT-O power when using WGS data following imputed data, but the trade-off between cost and power is something to be considered on an individual study basis. The next step in characterizing RVAT power is to consider the genetic architecture of the trait of interest. Though complex trait architectures are not thoroughly understood, we used the Uricchio model to simulate different architectures and label these architectures using the amount of cumulative genetic variance explained by all variants under 1% minor allele frequency (V 0.01 ). We show that SKAT-O power decreases as V 0.01 increases, meaning SKAT-O performance is worst when rare alleles make the largest contributions to trait variance. Although counterintuitive, as one would expect RVATs are best F I G U R E 7 SKAT-O power using sequencing data, given the results of SKAT-O applied to imputed data. The shape indicates whether SKAT-O applied to imputed data correctly identified the causal locus (circles) or missed it (x). The colors show the underlying causal number of loci and heritability of the trait tuned for the scenarios where rare variants matter most in the genetic architecture, our result mirrors the findings of Uricchio et al. (2016). One explanation is that as V 0.01 goes up, the proportion of V 0.01 due to singletons and other ultra-rare alleles increases as well, and statistically associating these ultra-rare alleles is difficult in the RVAT frameworks we evaluated here. We also note that the explosive exponential growth of the Tennessen demographic model used to simulate genetic data leads to an excess of ultra-rare alleles compared to the neutral expectation, such that both cases and controls harbor many ultra-rare variants (thereby confounding RVAT power).
With the decrease in power as rare variants mattered more, we wondered whether nearby regions in rare variant-dominated architectures would provide additional information. We looked at how the probability of SKAT-O detecting a causal region decreases as a function of distance from a causal region. The results suggest that in an unbiased window-based approach to scanning the genome with SKAT-O, positive hits that are not in causal regions may be useful in helping identify true causal regions, although again only in genetic architectures where rare variants do not contribute the majority of genetic variance. Interestingly, the power ranking of study designs is inverse of the ranking of precision, meaning that with a higher power comes a larger window of discovery.
We also looked at the statistical properties of a common analytical path from GWAS to RVATs, and F I G U R E 8 The window of discovery around causal loci, shown as the fraction of simulations that result in a statistically significant RVAT p-value as a function of distance from the nearest causal locus. Different sampling strategies are shown in columns, and V 0.01 thresholds are shown in rows. Error bars are binomial standard errors of the mean. Bigger points represent full-width half-maximum points from imputed data to sequence data. We found that GWAS and SKAT-O are generally concordant, with causal regions identified by GWAS being identified by SKAT-O, whereas a smaller proportion (~15%) of causal regions are identified by SKAT-O and not by GWAS. We see little downside in testing for causal regions using SKAT-O following GWAS, with the ability to pick up additional causal regions on the same data. We caution that this effect declines significantly as rare variants explain more of the genetic variance.
Finally, the number of loci contributing to a trait (or its polygenicity) may be another important component of the trait's architecture. It is not surprising that we found that for a fixed heritability of a trait, RVAT discovery power is higher when there are fewer true causal loci (as effect sizes are concentrated into fewer variants). However, it is possible that the polygenicity of a trait could be constraining the possible range of genetic architectures.
This study has a few limitations. It is based on simulated data that matches inferred human evolutionary history (including selection and demographic history) but these models and simulations are incomplete representations of nature. We do not explore the effects of gene size, mutation rate, haplotype length, or degree of linkage disequilibrium between causal regions. We do not consider the differences between coding and noncoding regions, which have different selection coefficient distributions and potentially different contributions to the genetic architectures for a trait. Future work should consider a phenotype model where the function of a region is taken into account, as ENCODE (The ENCODE Project Consortium, 2012) and other consortia are rapidly adding more dimensions to genomic data. One major shortcoming is that we analyze only African and European populations in this study. With significant growth in admixed populations already happening-the US Census in 2014-15 predicts that the US will be a "majority-minority" country by 2050 (Projections of the Size & Composition of the U.S. Population: to, 2014Population: to, , 2014Population: to, , 2060Population: to, , 2014, meaning significant growth in African American and Latino populations-it will be important to study association testing power in admixed populations. We also believe that incorporating functional annotations, evolutionary forces, and admixture into rare variant association tests would significantly improve statistical power.