Strategies in Aggregation Tests for Rare Variants

Genome‐wide association studies (GWAS) successfully identified numerous common variants involved in complex diseases, but only limited heritability was explained by these findings. Advances in high‐throughput sequencing technology made it possible to assess the contribution of rare variants in common diseases. However, study of rare variants introduces challenges due to low frequency of rare variants. Well‐established common variant methods were underpowered to identify the rare variants in GWAS. To address this challenge, several new methods have been developed to examine the role of rare variants in complex diseases. These approaches are based on testing the aggregate effect of multiple rare variants in a predefined genetic region. Provided here is an overview of statistical approaches and the protocols explaining step‐by‐step analysis of aggregations tests with the hands‐on experience using R scripts in four categories: burden tests, adaptive burden tests, variance‐component tests, and combined tests. Also explained are the concepts of rare variants, permutation tests, kernel methods, and genetic variant annotation. At the end we discuss relevant topics of bioinformatics tools for annotation, family‐based design of rare‐variant analysis, population stratification adjustment, and meta‐analysis. © 2023 The Authors. Current Protocols published by Wiley Periodicals LLC.


INTRODUCTION
During the past two decades, genome-wide association studies (GWAS) identified thousands of genetic factors associated with common diseases and traits (Visscher et al., 2017).The design of GWAS was suitable to test common disease/common variant hypothesis, where the disease-associated discovery was expected to be caused by common alleles with small effect sizes.However, GWAS findings in almost all common diseases explained a small proportion of disease heritability (Emmanuelle, 2020).The current prevailing opinion is the "missing heritability" is hidden in rare variants with relatively large effects on the risk of disease (Kosmicki et al., 2016;Zuk et al., 2014).
"common variant" association tests were underpowered for "rare variants" unless sample size or effect sizes are large (Lee et al., 2014).To increase the power in rare-variant association studies new approaches have been developed.These approaches are based on testing the cumulative effect of rare variants by grouping them in a specified genetic region (Chen & Wang, 2019;Dering et al., 2011;Lee et al., 2014;Sazonovs & Barrett, 2018).Notably, annotation of variants plays a critical role in selecting variants and specifying the appropriate genetic "testing" regions (SNP sets, genes) to test the aggregate effect of rare variants (Watanabe et al., 2017).Aggregation tests with the grouped rare variants reduce the multiple testing burden and increase the signal by the accumulation of the individual variant signals (Povysil et al., 2019).
In this article, we provide an overview of statistical methods for rare-variant association testing in complex diseases.We discuss the selection of rare variants and determining the testing units by using the variant's functional annotation.We review commonly used methods for the rare-variant analysis, grouped into four categories: burden tests, adaptive burden tests, variance-component tests, and combined tests.We describe the application of rare-variants association tests in simulated data.Finally, we provide the protocols explaining step-by-step analysis of aggregations tests using R scripts and an open-source software package, SKAT, written for R (statistical software environment) (Lee et al., 2020;R Core Team, 2022).In the commentary, we address significant topics related to bioinformatics tools for annotation, the family-centric design for rare-variant analysis, population stratification adjustments, and meta-analysis techniques.

Rare Variants
The definition of "rare variant" differs from one study to another.For example, Cirulli and Goldstein (2010) classified variant alleles based on their population minor allelic frequency: variants with allele frequencies over 5% were classified as "common variants", variants with allele frequencies between 1% and 5% were classified as "less common", variants with allele frequencies less than 1% were classified as "rare variants", and variants restricted to probands or immediate relatives were classified as "private variants".On the other hand, Zuk et al. (2014) categorized variants as "common" and "rare" based on the frequency that is practical to test the variants individually within a given study (Zuk et al., 2014).

Permutation Tests
A permutation test is a statistical tool for building sampling distributions by resampling the observed data under the strong null hypothesis that an individual variant or a set of variants does not have an effect on the trait/disease.The permutation approach provides an efficient way to test a null hypothesis when data do not conform distributional assumptions.Specifically, the sampling distribution of the test statistic is estimated by randomly shuffling the outcome of trait/disease in the observed dataset.If the null hypothesis is false, then real data should look different from the datasets created by shuffling the outcome under the null hypothesis (Wasserman, 2004).
Formally, consider two independent samples X 1 , X 2 , . . ., X n and Y 1 , Y 2 , . . ., Y m that follow the distributions D X and D Y , respectively.The null hypothesis to be tested is two samples that are identically distributed H 0 : D X = D Y .Let T be a test statistic and t obs be the test statistic value of observed data T (X 1 , X 2 , . . ., X n , Y 1 , Y 2 , . . ., Y m ).Then samples from both groups are pooled and for each permutation of data (N = (n + m)!) the test statistic T 1 , T 2 , . . ., T N is calculated.Under the null hypothesis test statistic values need to be equally likely.If the test statistic of observed data (t obs ) is greater than (1 − α) × 100 percent of the permuted test statistics (T i ) we can reject the null hypothesis (α is a predefined significance level).Statistical significance (p-value) of the permutation test can be calculated as in equation 1, where I(•) is an indicator that gives the sum of permuted test statistics found greater or equal to the test statistics in observed data.

Kernel Method
The kernel method is a smart way to transform data with complex nonlinear relationship into a higher dimensional feature space where the relation between data points can be easily modeled.For example, the dataset illustrated in Figure 1A shows a non-linear relationship between red and blue dots, which cannot be linearly separated in two-dimensional space.We can transfer a two-dimensional dataset to a three-dimensional feature space using a map function φ (φ : To better understand the kernel method, let us examine the example illustrated in Figure 1 where we have mapped our observed data from R 2 to R 3 using a map function φ. The kernel function κa, b is a function that corresponds to the inner product of any two points φ(a) and φ(b) in the feature space R 3 (κa, b = φ(a) T φ(b)).Now if we will take two points φ(a) and φ(b) from our feature space and calculate their inner product, we will get: The inner product of the points in the feature space R 3 shows that it equals the square of the inner product in the input space R 2 .So, instead of explicitly mapping our dataset through map function φ and calculating the inner product, we can use a kernel function Formally, consider a statistical model with n observed data points.Let x i be the multidimensional observation (x ∈ R q ) and y i be a corresponding quantitative trait for the i th subject.Then a linear regression model can be formulated as in equation 2: f (x) is a function that models a relationship between x and y. w and b are the parameters of the model.w is a multidimensional weight vector (w ∈ R q ) and b is an intercept.The relationship between observed data points and a corresponding quantitative trait can be highly non-linear, thus the linear model will fail to capture existing nonlinearity.To overcome this problem, we can transform the observations {x i } from their original input space to a higher dimensional feature space by a feature map φ.Then, by substituting w T x in our model with w T φ φ(x), where w T φ is the weight vector in the space, we can easily predict y i by linearly combining basis functions (φ(x)).However, this might be computationally expensive when observations would be transformed into a large dimensional space.Kernel methods use the "kernel trick" to avoid this problem, allowing us to work in the R q space without computing the coordinates of the data in the space.
We can define the weight vector as a linear combination of the observations in the kernel feature space (x) = {φ(x 1 ), φ(x 2 ), . . ., φ(x n )} with the coefficient vector α as w φ = n i=1 α i φ(x i ) = (x)α.If we substitute w φ into equation (2), we get: And now we can apply "kernel trick".We can replace φ(x i ) T φ(x) by a kernel mapping function κx i , x, which is the inner product of φ(x i ) and φ(x) (equation 4).We can see that f (x) can be derived as a linear function in the kernel space while the kernel function is a function of the observed data.
Current Protocols Thus, by using an appropriate kernel function we can map our dataset onto a kernel space where the non-linear pattern of original data can be solved using the linear approaches.

STRATEGIC APPROACHES Target Region Based Aggregation Tests
The primary idea behind aggregation tests is that if multiple variants in a gene or region are associated with the disease, then evaluating their cumulative effects will boost the power in association analysis.Recently, several powerful statistical approaches have been developed to assess the cumulative effect of rare variants in a predefined genetic region/gene (Derkach et al., 2013;Ionita-Laza et al., 2011;Lee et al., 2012;Madsen & Browning, 2009;Morgenthaler & Thilly, 2007;Morris & Zeggini, 2010;Wu et al., 2011).We categorize them into four groups: burden tests, adaptive burden tests, variancecomponent tests, and combined tests (Dering et al., 2011;Lee et al., 2014).The power of these tests depends on the assumptions underlying the genetic model.
In this section we give a brief overview for each gene/region-based aggregation tests category, discuss their pros and cons, and provide protocols explaining step by step analysis of commonly used tests using R scripts and an open-source software package, SKAT, written for R (Lee et al., 2020;R Core Team, 2022).
Throughout the article, different colors are utilized in the R code sections to enhance understanding and readability.Comments and explanatory notes are highlighted in magenta, offering contextual insights without being part of the executed code.The core structure, including variable assignments and general syntax, is presented in black.Built-in R functions and their associated parameters are emphasized with bold "lighter shade of blue", guiding readers through specific operations.Additionally, constants and specific numerical values within the code are marked in "darker shade of blue", helping delineate exact values used in computations.This color-coding system is consistently applied across all R code segments in the document, aiding in breaking down the code for those who may be less familiar with scripting in R.
A computer running the R software environment, available for Unix, Windows, and Ma-cOS from http:// www.r-project.org, is required [at least 4 GB RAM (more is recommended for larger datasets), 2 GHz processor or faster, and at least 2 GB space disk].The SKAT package can be installed directly from the Comprehensive R Archive Network (CRAN) within R:

install.packages("SKAT")
Dependency packages "Matrix" and "SPAtest" may be installed automatically when the SKAT package is installed.Otherwise they can be installed separately by using the function "install.packages()".As an example, we first introduce a dataset generated for various aggregation tests.We generated a genetic region with 20 variant sites in 2000 subjects with equal number of cases and controls.

Burden tests
Burden tests combine the information on the rare variants in the target region into a single genetic score and later perform a statistical test to assess the association between the score and disease.The burden approaches make a strong assumption that a large proportion of variants in the target region are causal and have a similar effect in the same direction.If the selected rare variants are both risk and protective or just a small fraction of the variants are causal, then the collapsing of all variants in the target region introduces noise into the combined score, which results in weakening of the association signal and loss of power (Lee, et al., 2014).Here we show how to implement three commonly used burden tests: the cohort allelic sums test (CAST), the rare variant test (RVT; also called MZ test), and the weighted sum statistic (WSS) (Madsen & Browning, 2009;Morgenthaler & Thilly, 2007;Morris & Zeggini, 2010).

Cohort allelic sums test (CAST)
The CAST approach assumes that the presence of any rare variants in the target region increases risk.Firstly, it collapses the genotype information across the rare variants into a genetic score (S) and sets the genetic score to 0 if there is no rare variant present (S = 0) and sets it to 1 if there is at least one rare variant (S > 0).Finally, the number of individuals with one or more mutations in the target region is compared between cases and controls (Morgenthaler & Thilly, 2007).f.ex = fisher.test(cntgTable)

MZ test
The MZ test is based on a regression framework that tests the accumulation of the minor alleles across rare variants in the target region.Firstly, the MZ approach selects the rare variants using a predefined minor allele frequency threshold and calculates the proportion of minor alleles for each individual.There are two ways to calculate the proportion: (1) use the "count" of minor alleles ({0,1,2}) at the rare variants, and (2) use the "presence" of minor alleles ({0,1}) at the rare variant.Finally, a regression approach is used to test the association between genetic score and disease (Morris & Zeggini, 2010).
1. Define a minor allele frequency threshold to select rare variants.

Weighted sum statistic (WSS)
The WWS method is based on the assumption that the variants with a lower frequency have a larger effect on disease risk.WSS assigns the weighting terms to the variants within the target region and aggregates them into the genetic score for each individual.Firstly, WSS calculates the weighting terms (w) and genetic scores (s) as defined in equations 5 and 6, where m is the number of minor alleles in the controls, n is the total number of controls, L is the number of total variants within the target region, and I i represents the number of mutations in variant i (I i ∈ {0, 1, 2}).
Then, all cases and controls are ranked together according to genetic score (s).Later, the sum of ranks of cases (x) is calculated.Finally, a permutation test is performed to find the p-value.Figure 2 illustrates how the weighting term decreases when the minor allele frequency increases in our generated dataset with 1000 controls (Madsen & Browning, 2009).
1. Calculate the weighting terms for each variant.

Adaptive burden tests
Burden test approaches are based on the strong assumption that variants at the target region are causative with the similar magnitude.But this scenario is not always true and scenarios with the presence of risk, neutral, and protective variants together result in a substantial loss of power in burden tests.Adaptive burden test approaches were proposed to overcome this limitation.Adaptive approaches modify the burden tests in a data adaptive way, make them less sensitive to the presence of neutral and protective variants, and maintain high power across most scenarios (Han & Pan, 2010;Ionita-Laza et al., 2011;Lin & Tang, 2011).We show the adaptive burden test called replication based test (RBT) that is based on WSS and uses an adaptive-weighting strategy to be less sensitive to variants with opposite directions in the target region (Ionita-Laza et al., 2011).There are other adaptive burden test approaches, e.g., the eSum test (Han & Pan, 2010) and Current Protocols EREC test (Lin & Tang, 2011), which have also been published and used, although less frequently than those described here.

Replication-based test (RBT)
The RBT modifies the WSS approach to deal with variants that possibly have different association directions by classifying the variants into two distinct groups as variants more likely to be risk or not.Firstly, the RBT calculates the counts of minor alleles in cases and controls.Later, it calculates the data summary (n k k ), which shows the information of the variants with the number of minor allele count in cases (k ) and controls (k).For example, n 3 2 represents the number of variants with 3 minor allele counts in cases and 2 in controls.Then, the RBT defines a set of data-dependent weights (w k k ) for each variant as shown in equation 7, where the number of mutations at the rare-variant sites are assumed to have a Poisson distribution.
Then, the RBT calculates two test statistics as shown in equation 4 for two different scenarios (S risk , S protective ) based on inclusion of just risk variants (k > k) or protective variants (k < k).In equation 4, N r stands for the upper threshold number of the rare variants' occurrence in controls.Because of the lack of prior information on direction of association, max-statistic (S = max(S plus , S minus )) is used to select the correct statistic.Finally, the RBT uses a permutation test to calculate the significance of the test statistic S (Ionita-Laza et al., 2011).
1. Calculate the minor allele counts in cases and controls.Adaptive burden tests address the limitations of regular burden tests by choosing the data-based weights to alleviate the "same direction and similar magnitude of effect" assumption.However, data-based weight calculations require permutation tests to assess statistical significance.Most of the adaptive burden tests estimate the p-value using a permutation test and are not computationally feasible for genome-wide studies.

Variance-component tests
Variance-component approaches were developed to overcome the burden tests loss of power when direction of effect of rare alleles differ or when non-causal variants are included within the test's target region.The variance-component approach evaluates the distribution of genetic effects between cases and controls and allows for testing of rare variants with opposite effects (Neale et al., 2011;Pan, 2009;Wu et al., 2011).The sequence kernel association test (SKAT) is a widely used approach that applies the variancecomponent test within a mixed regression model (Wu et al., 2011).The SKAT allows for covariate adjustment and weighting of rare variants.Here we implement the SKAT approach using a mixed-model framework for dichotomous data.

Sequence kernel association test (SKAT)
To relate the rare variants for the i th individual to the phenotype (y i ), the SKAT considers the logistic regression model for dichotomous disease as shown in equation 9.
In the model for individual i: . ., g il ) represents allele counts for l variants (0,1,2), α is a sscovariates coefficient vector with the size k + 1 (first element is the intercept), and β is a variants coefficient vector with the size l.The SKAT assssumes that β 1 , . . ., β l coefficients are independent and normally distributed with the mean 0 and variance w 2 1 τ, . . ., w 2 l τ .So, the null hypothesis is H 0 : τ = 0 (H 0 : β = 0) for equation 5 that can be tested with the variance-component score test.The SKAT statistics Q S (equation 6) is a weighted sum of individual variant score statistics, where weight (w j ) can be calculated as a function of minor allele frequency and functional annotation in the observed dataset.In equation 6, n is the total number of individuals, and ỹi is the estimated mean of y i under the null hypothesis (y i = α X i ).The Fisher method performs a "novel" permutation distribution approach to assess statistical significance.There are several steps in the permutation approach that will be described in the following implementation lines.We will use SKAT approach for variancecomponent test and the WSS approach for burden tests (Derkach et al., 2013).

Optimal sequence kernel association test (SKAT-O)
SKAT-O proposes a unified test approach that combine a burden test and SKAT in one framework by taking weighted averages of each statistic.Equation 12shows the test statistic of SKAT-O, where ρ is a weight coefficient ranging from 0 to 1.The SKAT-O test reduces to SKAT (Q SKAT ) when ρ = 0 and burden test (Q Burden ) when ρ = 1.
To maximize the power of the SKAT-O, estimate the optimal weight coefficient ρ minimizes the p-value of the test.The SKAT-O method in the SKAT library performs a simple grid search with eight different values of ρ = (0, 0.1 2 , 0.2 2 , 0.3 2 , 0.4 2 , 0.5 2 , 0.5, 1).The test significance of Q SKAT −O can be calculated analytically by using one-dimensional numerical integration.Here we use the SKAT package to calculate the SKAT-O test significance in the generated dataset (Lee et al., 2012).

of 17
Current Protocols

COMMENTARY Background Information
Efficiency of aggregation-based approaches relies on the weighting and/or filtering criteria that enhance the signals of causal variants and minimize the effect of neutral variants in the target region (Bocher & Genin, 2020;Moutsianas et al., 2015;Richardson et al., 2016).The immense number of variants produced by the sequencing and imputation approaches introduces challenges when it comes to prioritizing the functional variants in the target region and in testing their association with disease.Large numbers of bioinformatic tools with different annotation strategies have been developed that aim to predict the functional "importance" of the variants (Adzhubei et al., 2010;Kircher et al., 2014;Kumar et al., 2009;Yourshaw et al., 2015).Further studies are needed to estimate the right thresholds for different annotation approaches to prioritize the variants that confer risk or protection and to improve the statistical power of rare-variant analysis.It is important to note that applying hard thresholds to estimate and define functionally "important" variants may cause loss of the sensitivity in aggregation tests (Povysil et al., 2019).

Family-based studies
In this article we discussed rare-variant aggregation-based approaches that were initially formulated for unrelated case-control studies.On the other hand, many current complex disease sequencing studies focus on family-based designs.Multiplex families with several affected individuals have an increased genetic burden conferring risk of disease.Thus, family-based approaches provide a significant statistical boost in power to identify highly penetrant rare risk or protective variants.Over recent years aggregation-based approaches that are primarily designed for population-based studies have been adapted to investigate association of sets of rare variants with disease in family-based designs (Choi et al., 2014;He et al., 2017;Schaid et al., 2013;Sul et al., 2016;Wang et al., 2013;Wu et al., 2011;Yan et al., 2015).Fernandez et al. (2018) evaluated the performance of the most common aggregation-based rare-variant tests, such as EPACTS, FSKAT, GSKAT, RVGDT, SKAT, FarVAT, FarVAT-BLUP, and Rare-IBD using a real dataset.To achieve the optimum performance the authors suggested combining different methods and to look for common signatures across the tests (Fernandez et al., 2018).

Ancestral diversity
Population stratification is a confounding factor that occurs when associations studies are conducted among ancestrally diverse individuals (Hellwege et al., 2017).Systematic differences in allele frequencies between distant ancestral populations may lead to spurious associations in both common and rare variants.Confounding by genetic ancestry is more likely to be a problem in ancestrally admixed populations, such as Hispanics and African Americans due to their high ancestral diversity.Different strategies have been developed to control for populations substructure (Luo et al., 2018).Use of principal components as a covariate is the most prevalent approach to adjust for populations substructure in association studies.Accordingly, in the studies with ancestrally diverse populations, to control for population substructure approaches that can incorporate covariates need to be considered.

Combining studies with meta-analyses
Although aggregation-based approaches boost statistical power in rare-variant association studies, statistical tests often suffer type II error because of small sample sizes.Performing meta-analysis is a way to avoid sample size limitations by combining summary statistics from multiple studies.Recently, various metaanalysis approaches have been developed for aggregation-based association studies, such as MetaSKAT, MASS, RAREMETAL, seqMeta, metaFARVAT, and Trans-Meta Rare (Feng et al., 2014;Lee et al., 2013;Shi et al., 2019;Tang & Lin, 2013;Voorman et al., 2017;Wang et al., 2019).RAREMETAL, seqMeta, and metaFARVAT support family-based studies (Feng et al., 2014;Voorman et al., 2017;Wang et al., 2019).RAREMETAL can be just applied to quantitative phenotypes (Feng et al., 2014) and Trans-Meta Rare was developed to meta-analyze trans-ethnic datasets (Shi et al., 2019).Michailidou (2018) provides a detailed overview of the methods used for the metaanalysis in rare-variant aggregation tests.
This article highlights critical concepts of rare-variant aggregation testing approaches to successfully perform a rare-variant analysis.To select and conduct an optimal rare-variant analysis a basic understanding of the theory behind aggregation tests is essential.We presented an overview of some commonly used

Testing Approach Considerations
See Table 1 for a list of the various rarevariant aggregation testing approaches and their advantages and disadvantages.

Figure 1
Figure1Illustrating the idea of kernel methods.(A) Shows the original data that is not linearly separable.(B) Shows the data mapped into the higher dimensional space where it is linearly separable.

Figure 3
Figure 3 The counter plot illustrates the increase of Fisher W score when both or one of p VC and p Burden values decrease.counter plot in Figure 3 illustrates how the Fisher score changes according to p VC and p Burden values.W F score gets large values when both or one of p VC and p Burden have small values.W F = −2 log (p VC ) − 2log (p Burden ) Equation 11

Table 1
Advantages and Disadvantages of Rare Variant Aggregation Testing Approaches