General Framework for Meta‐Analysis of Haplotype Association Tests

ABSTRACT For complex traits, most associated single nucleotide variants (SNV) discovered to date have a small effect, and detection of association is only possible with large sample sizes. Because of patient confidentiality concerns, it is often not possible to pool genetic data from multiple cohorts, and meta‐analysis has emerged as the method of choice to combine results from multiple studies. Many meta‐analysis methods are available for single SNV analyses. As new approaches allow the capture of low frequency and rare genetic variation, it is of interest to jointly consider multiple variants to improve power. However, for the analysis of haplotypes formed by multiple SNVs, meta‐analysis remains a challenge, because different haplotypes may be observed across studies. We propose a two‐stage meta‐analysis approach to combine haplotype analysis results. In the first stage, each cohort estimate haplotype effect sizes in a regression framework, accounting for relatedness among observations if appropriate. For the second stage, we use a multivariate generalized least square meta‐analysis approach to combine haplotype effect estimates from multiple cohorts. Haplotype‐specific association tests and a global test of independence between haplotypes and traits are obtained within our framework. We demonstrate through simulation studies that we control the type‐I error rate, and our approach is more powerful than inverse variance weighted meta‐analysis of single SNV analysis when haplotype effects are present. We replicate a published haplotype association between fasting glucose‐associated locus (G6PC2) and fasting glucose in seven studies from the Cohorts for Heart and Aging Research in Genomic Epidemiology Consortium and we provide more precise haplotype effect estimates.


Introduction
In recent years, genome-wide association studies (GWAS) have identified multiple common variants associated with disease and disease-related traits. In a typical GWAS, associa- tion between a trait and genetic variants is tested one variant at a time, and variants with weak association routinely fail to be detected, especially in small cohorts. Therefore, metaanalysis is often used by large consortia to increase statistical power [Dupuis et al., 2010, Scott et al., 2012, Stram, 1996, Zeggini et al., 2008 to detect variants with a moderate to weak association with the trait of interest. Even with large meta-analyses, variants identified to date only explain a small proportion of the total heritability. In order to identify the source of the unexplained heritability, emerging approaches have attempted to account for multiple variants at once when evaluating association with a trait. Such approaches include penalized regression methods [Li et al., 2011, Wu et al., 2009], pathway analysis [Holden et al., 2008], gene-based tests such as burden [Madsen and Browning, 2009] and SKAT [Wu et al., 2010], and haplotype analysis [Liu et al., 2008, Schaid et al., 2002, Tregouet et al., 2004. The power of these approaches can be enhanced by increasing sample size or combining multiple studies. Methods for meta-analysis of gene-based tests are well established and widely used [Hu et al., 2013, Liu et al., 2014, but there are no widely used methods for the meta-analysis of haplotype association tests.
In this article, we propose a meta-analysis approach to combine haplotype association results from multiple studies. In the first step of our method, each study provides regression estimates and covariance matrices of haplotype effects, with adjustment for familial correlation to accommodate familial samples or cryptic relatedness. In our second step, cohort-specific haplotype effect estimates are pooled using a multivariate generalized least square meta-analysis approach. A global association test and evaluation of the effect of each haplotype can be obtained within our framework. We perform a simulation study to evaluate our approach, comparing results with more traditional meta-analysis of single-variant association tests and gene-based tests. Finally, we replicate a published haplotype association between a fasting glucose-associated locus (G6PC2) and fasting glucose in seven studies from the Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium and are able to provide more precise haplotype effect estimates than the prior report involving haplotype estimates from a single cohort [Mahajan et al., 2015]. Code implementing the novel approach, along with a tutorial, is available at http://sites.bu.edu/fhspl/publications/metahaplo.

Haplotype Association Test at Cohort Level
Our approach is based on Zaykin et al.'s [2002] haplotype analysis method for unrelated samples. We incorporate random effects to account for family structure, making the approach applicable to family-based cohorts, unrelated samples, or a mix of the two. We assume that a total of n subjects from a study are sequenced in a region with q SNVs and as a result, K haplotypes are observed. We assume a general linear (mixed-effect) model, written as: where Y is an n × 1 quantitative trait vector, X is an n × p matrix of covariates (without intercept) including, for example, age, sex, and associated genetic principal components controlling for potential population stratification, α is a p × 1 coefficient vector for the p adjustment variables, each n × 1 vector h m (m = 1, · · · , K ) is the expected haplotype dosage, b is an n × 1 random effect vector that accounts for the relatedness within families, and is an n × 1 vector of the ran-dom error terms. When haplotype m of the j th (j = 1, · · · , n) subject is observed, h mj , the j th entry in h m is either 0, 1, or 2, that is, the number of copies of haplotype m the j th subject carries. Otherwise, expected haplotype dosages E [h mj |G j ] are inferred from G j , the q × 1 genotype vector of the j th subject, using statistical algorithms such as the expectationmaximization (EM) algorithm [Dempster et al., 1977]. For the j th subject, the sum of the K haplotype dosages m=K m=1 h mj is always equal to 2. The n × 1 random effect vector b is assumed to follow a normal distribution N(0, σ 2 a ), where σ 2 a is the additive variance and is the relationship matrix (with entries equal to twice the kinship coefficient for related pairs and 0 for unrelated pairs) derived from pedigree structure or genome-wide information; in unrelated samples, the matrix reduces to I, the n × n identity matrix. Finally, we assume the vector of error terms follows a normal distribution N(0, σ 2 e I), where σ 2 e is the variance of the error term. Let X o = (X, h 1 , · · · , h K ) denote the overall design matrix of size n × (p + K ), and define the overall variance matrix as = σ 2 a + σ 2 e I. The parameters α and β k (k = 1, · · · , K ) are estimated as (X T oˆ -1 X o ) -1 X T oˆ -1 Y, whereˆ is evaluated at the maximum likelihood estimatesσ 2 a andσ 2 e , which can be obtained using the lmekin function in R's coxme package [Therneau, 2012]. The estimated variance of the effect estimates is (X T oˆ -1 X o ) -1 . The method reduces to an ordinary linear regression when applied to unrelated samples.

Meta-Analysis
We assume a total of N cohorts participate in the metaanalysis and the i-th (i = 1, · · · , N) cohort provides the estimatesβ i and the covariance matrixv ar(β i ) of the haplotype effects for K i haplotypes, and a total of K haplotypes are observed in at least one cohort. We propose a multivariate meta-analysis approach [Becker and Wu, 2007] based on generalized weighted least squares to combine the length K i haplotype effect estimates from each cohort, denoted byβ i for studies i = 1, · · · , N, into a single effect estimate vectorβ of length K . The generalized weighted least square approach is formulated as: whereβ i (i = 1, · · · , N) is the K i × 1 haplotype coefficient vector for cohort i; β is the stacked haplotype coefficient vector fromβ i (i = 1, · · · , N); β is the K × 1 coefficient vector of the haplotype effects; with zeros and one in each row indicating which haplotype effect is observed by cohort i; e is the error term which is assumed to have a multivariate normal distribution with a mean of 0 and a covariance matrix of Note that in the meta-analysis stage, cohort haplotypes are reordered to match the order assigned to the K haplotypes observed in at least one cohort, and the design matrix W reflects this reordering. Furthermore, because is unknown, in our method, we substitute the sample estimatê

Hypothesis Testing
The global null hypothesis of no association of any haplotype with the trait is expressed as To construct a test statistic to test for haplotype association, we reparameterize it into the equivalent null hypothesis, where β 1 is chosen from commonly observed haplotypes: The null hypothesis can be tested using a Wald test statistic of the form whereγ is estimated fromβ and V * is the covariance matrix ofγ, with a dimension of (K -1) × (K -1) and the jj th element having the form V * jj = V jj -V j 1 -V j 1 + V 11 . Under the null hypothesis, the Wald test statistic follows a χ 2 K -1 distribution asymptotically.

Cohorts for Heart and Aging Research in Genetic Epidemiology Consortium
The CHARGE consortium comprises multiple studies with the common goal of identifying genes and loci associated with cardiovascular-related traits. Seven CHARGE cohorts contributed to a meta-analysis evaluating the association between genetic variants and fasting glucose in 25,305 nondiabetic participants (Table 1). Fasting glucose levels in millimole per liter were analyzed in participants free of type-2 diabetes. Type-2 diabetes was defined by cohorts referring to at least one of the following criteria: a physician diagnosis of type-2  diabetes, on the antidiabetic treatment of type-2 diabetes, fasting plasma glucose ≥7 mmol/l, random plasma glucose ≥11.1 mmol/l, or hemoglobin A1C ≥ 6.5%. Study-specific sample exclusions were detailed in [Wessel et al., 2015]. Genotypes were obtained from the Illumina HumanExome BeadChip [Grove et al., 2013], a genotyping array containing 247,870 variants discovered through exome sequencing in ∼ 12,000 individuals, in which ∼ 75% of the variants are lowfrequency variants (Minor Allele Frequency (MAF) < 0.5%). The main content of the chip comprises protein-altering variants (nonsynonymous coding, splice-site, and stop gain or loss codons) seen at least three times in a study and in at least two studies providing information to the chip design. We selected four G6PC2 variants previously studied for their haplotype association with fasting glucose [Mahajan et al., 2015].

Simulation Studies
To evaluate the validity and power of our approach, we perform a simulation study varying the number of cohorts included in the meta-analysis (5 or 10), and the type of samples (unrelated, family-based, mix of the two). We also vary the sample size from 400 up to 1,600 subjects per cohort. See Table 2 for a description of the various study designs investigated in type-I error rate and power.
Simulated trait values are dependent on sex, age, and haplotypes/genetic variants (power evaluation only). Sex of  mothers and fathers (founders) are fixed in a heterosexual marriage but are randomly assigned to offspring, with equal probability. The age for unrelated individuals and the first offspring in a family are generated from a uniform distribution over the range 30 to 50. Additional offspring's ages are set to be within 5 years of the first offspring with at least a 1 year gap (no twins), using a uniform distribution. For family samples, the age of the mother is restricted to be 20-45 years older than her offspring, and the father's age to be within 5 years of the mother's age, with a restriction that the age be at least 20 years older than the older offspring. We select the known T2D-associated genes G6PC2 (chromosome 2; Tables 3 and 4) and JAZF1 (chromosome 7; Tables 5 and 6) to generate the reference panel haplotypes (Tables 3 and 4). We use the observed haplotypes and frequencies estimated by EM algorithm from 6561 participants from the Framingham Heart Study. For example, in JAZF1 no single haplotype has a frequency greater than 25% and eight haplotypes have frequency greater than 1% (Table 6).
Genotypes are simulated by randomly assigning a pair of haplotypes to founders, and by dropping randomly selected haplotypes to offspring assuming no recombination within haplotypes. Although phasing information is available in our simulation setting, we do not use the phase information when implementing our approach because such informa- tion is not typically available in real datasets. We use the EM algorithm to infer expected haplotype dosage conditional on genotypes via R package haplo.stats [Sinnwell and Schaid, 2013]. When estimating haplotype effects at the cohort-level, rare haplotypes (frequency< 0.1%) are collapsed to stabilize the computation and to avoid potential singularities due to high LD among SNVs.

Type-I Error Rate
For evaluating the type-I error rate of our new approach, a trait unassociated with the haplotypes is simulated using a multivariate normal distribution with meanμ = 0.02 × age + 0.5 × sex (sex is set to 1 for males and to 2 for females) and a covariance matrix σ 2 a + σ 2 e I , with σ 2 a = σ 2 e = 0.5. Age and sex explained about 10% and 5% of the trait variance, respectively, resulting in a trait with moderate heritability (h 2 = σ 2 a Var(Y) ≈ 0.42). Cohort-specific analyses are performed by first estimating haplotypes using the EM algorithm implemented in the R package haplo.stats, followed by regression analysis using haplotype dosages and covariates as independent variables. Cohort results are then meta-analyzed using the novel approach previously described, and the global association test P values are recorded. Ten thousand simulations are performed to assess the type-I error rate in all scenarios at the nominal threshold α = 0.01 (Table 2).

Power Evaluation
The power of our novel haplotype meta-analysis approach is evaluated in a total of 16 scenarios (phenotype datasets) divided into four study designs (study design 1-4 from Table 2), with varying haplotype or SNV effects. For each scenario, we first compute the meta-analysis haplotype global test statistic, and then compare to meta-analysis of both single variant tests and gene-based tests. For single variant tests, we compute the meta-analysis test statistic using inverse-variance weighted method that has been shown to be the most powerful when the effect size is constant across cohorts [Zhou et al., 2011]. We then select the SNP with the minimum meta-analysis P-value P min = min {1≤i≤K } P i (K = 4 for G6PC2; K = 5 for JAZF1) and adjust the meta-analysis P-value for multiple testing using a Bonferroni correction for the effective number of independent variants [Gao et al., 2008]. We denote the result for the best SNP in the single variant analysis by "min P". For gene-based tests, we choose SKAT and Burden test with Wu weights and perform the analysis using R package seqMeta [Voorman et al., 2014]. We use α = 0.001 to evaluate the power of all four approaches. For each scenario, the phenotype is simulated using a multivariate normal distribution with meanμ and a covariance matrix σ 2 a + σ 2 e I , with σ 2 a = σ 2 e = 0.5, but unlike the type-I error scenarios, the value ofμ depends on genotypes/haplotypes in addition to the covariates of age and sex. We investigate four genetic effect scenarios: one or two causal genetic variants, or one or two causal haplotypes. For the causal variant scenario,μ = 0.02 × age + 0.5 × sex + b g 1 × g 1 + b g 2 × g 2 where g j (j = 1, 2) is a vector containing the number of minor alleles (0, 1, or 2) carried by individuals in the sample, and b g j is the effect of variant j , set to R 2 4MAF j (1-MAF j ) , where MAF j is the minor allele frequency of variant j and R 2 = 0.01 is the proportion of variance explained by this specific variant (haplotype). When only one causal variant is included in the model, b g 2 = 0 and b g 1 is multiplied by √ 2. For the causal haplotype models, μ = 0.02 × age + 0.5 × sex + b h1 × h 1 + b h2 × h 2 , where h j is a vector containing the number (conditional dosage) of haplotype j carried by individuals in the sample, and b hj is the effect of haplotype j , set to R 2 2h j (1-h j /2) , whereh j is mean haplotype dosage of haplotype j and R 2 = 0.01. When only one causal haplotype is included in the model, b h2 = 0 and b h1 is multiplied by √ 2. For the JAZF1 gene, we select two haplotypes, GTATA (the most frequent haplotype) and GCGCG (the third most frequent haplotype), to have an effect on the phenotype while all other haplotypes have no effect on the phenotype. For models with single variant effects, we select rs849134 and rs38523 to have nonzero effect on the trait, while all other genetic variants have no effect. For the G6PC2 gene, we select CCAC and TCAG, the two most frequent haplotypes to have an effect on the phenotype. For models with single variant effects, we select rs560887 and rs2232323 to have nonzero effect on the trait.
A thousand simulations with five independent cohorts are performed to compare the power of our approach to the single variant method adjusted for multiple testing and gene-based methods.

Meta-Analysis of Four Coding Variants on G6PC2 Region
G6PC2 is a known locus to affect fasting glucose level. Among the 17 exonic variants on the exome chip, 15 are rare variants (MAF<1%) and two are common variants (rs560887 with MAF = 25.4%; rs492594 with MAF = 43.7%). Previous GWAS have identified the A allele of rs560887, one of the two common variants to be associated with lower fasting glucose level ([Bouatia-Naji et al., 2008]: β = -0.07 mmol/l, p = 6 × 10 -16 ; [Dupuis et al., 2010]: β = 0.075 ± 0.003 mmol/l, p = 8.5 × 10 -122 ). A recent large-scale exome-chip analysis indicated that these 15 rare variants also had a joint effect on fasting glucose [Wessel et al., 2015]. Our approach is applied to study the association between the haplotype structure of four coding variants rs560887, rs138726309, rs2232323, and rs492594 and fasting glucose, using CHARGE exome-chip data. We perform a meta-analysis of seven studies comprising of three family-based and four population-based cohorts with up to 25,305 non-diabetic European participants, to better understand how the overall haplotype structure as well as how the single haplotype affect fasting glucose level. With a meta-analysis sample size of 25,305, we have successfully replicated a previous reported haplotype analysis of four coding variants on G6PC2 region [Mahajan et al., 2015], but with higher precision (Table 7). Our effect size estimates are consistent with previously published estimates, in terms of both direction and magnitude. However, prior results were based on a single population-based cohort with 4,442 participants. In contrast, our analysis is based on seven cohorts with over 25,000 participants. Among the five haplotypes shared by all seven studies, one copy of the most significant haplotype, TCAG, decreases fasting glucose levels by 0.074 (95% confidence interval (CI): 0.063,0.085) mmol/l, on average; one copy of the second most significant haplotype, CCAG, increases the average fasting glucose levels by 0.039 (95% CI: 0.028,0.050) mmol/l; and one copy of the third most significant haplotype, TCCG, decreases fasting glucose levels by an average of 0.12 (95% CI: 0.065,0.18) mmol/l. Most Figure 1. Power of the haplotype meta-analysis approach compared to gene-based methods and single SNV meta-analysis (min P) adjusted for multiple testing in the G6PC2 region, evaluated at α = 0.001 in four study designs. Description of the four study designs used in the simulation can be found in Table 2 (study design 1-4). The labels on the x axes denote that 1 (SNV) or 2 (2SNVs) SNVs are influencing the phenotypes, or 1 (1HAP) or 2 (2HAPs) haplotypes have an effect on the phenotypes. haplotype effect estimates reported in Mahajan et al. [2015] fall within our 95 % CI, with the exception of estimates for TCCG (Mahajan et al.'s [2015] estimates = 0.205), which fall just outside our reported CI.

Simulations
Ten scenarios with increasing diversity in the study designs of the cohorts included in the meta-analysis are simulated to evaluate type-I error rate of our approach. The type-1 error rate is well controlled in all scenarios investigated (Table 2).
In the simulations to evaluate power, our approach is shown to be almost as powerful as the single SNV approach when SNVs are influencing the trait, but much more powerful to detect true haplotype effects. For example, in the family based design scenarios, our approach is 40% more powerful than single SNV analyses when two haplotypes have nonzero effect on the phenotypes (Figures 1 and 2). A similar pattern is observed for designs with a mix of unrelated and related samples. The gain in power is smaller when a single haplotype is influencing the trait, but present for all scenarios evaluated. When compared to the gene-based tests, our approach is uniformly more powerful in all scenarios across all study designs (Figures 1 and 2) because of the Wu (default) weighing scheme that downweights common variants.

Discussion
We have proposed a general meta-analysis approach to combine the haplotype association results from multiple cohorts. Our approach imposes no restrictions on the haplotypes observed across cohorts. Instead, our approach can incorporate information from haplotypes observed in a single cohort in addition to haplotypes observed in multiple cohorts. In the first stage of our approach, haplotype association analysis is performed at the cohort level. Information about the haplotype structure, frequencies, effect estimates, and covariance of effect estimates is collected, and metaanalyzed in the second stage using a generalized weighted least square approach. The association between a trait and any single or multiple haplotypes can be easily evaluated within our framework.
We evaluated the type-I error rate in a variety of scenarios with different cohort designs that included a mix of unrelated and family samples. Type-I error rate was controlled in all scenarios investigated. We also compared the power of Genetic Epidemiology, Vol. 40, No. 3, 244-252, 2016 Figure 2. Power of the haplotype meta-analysis approach compared to gene-based methods and single SNV meta-analysis (min P) adjusted for multiple testing in the JAZF1 region, evaluated at α = 0.001 in four study designs. Description of the four study designs used in the simulation can be found in Table 2 (study design 1-4). The labels on the x axes denote that 1 (SNV) or 2 (2SNVs) SNVs are influencing the phenotypes, or 1 (1HAP) or 2 (2HAPs) haplotypes have an effect on the phenotypes. our approach with single variant tests corrected for multiple testing (min P approach), and demonstrated that our approach had equivalent power when variants, not haplotypes, influenced the trait, but was more powerful in the presence of true haplotype effects. Our haplotype approach also provided more evidence for association compared to gene-based tests applied with the default weighting scheme, as exemplified in a recent large-scale exome-chip project [Wessel et al., 2015] applied to the G6PC2 region comprising 15 rare variants (MAF<1%). Our simulations also illustrated that the haplotype effect size estimates obtained from metaanalysis were unbiased, even when family-based cohorts were included.
While our approach cannot serve as the only tool for the discovery of associated variants and regions, it is a complementary tool to single-variant and gene-based tests. Mahajan et al. [2015] demonstrated the usefulness of haplotype analysis in their investigation of the effect of G6PC2 variants on fasting glucose. In 4,442 nondiabetic subjects from the Oxford Biobank, the G allele from the coding variant rs492594 appears to significantly decrease fasting glucose levels. However, when conditioning on the variant with the largest effect (rs560887) on fasting glucose, the effect estimates of the G-allele from rs492594 is reversed, and the G allele appears to decrease fasting glucose, an apparent paradox. However, looking at the haplotype estimates elucidates the mystery: the rs492594 G allele is most frequently observed on the same haplotype as the glucose raising allele (T) from the strongest associated variant (rs560887), giving the impression that the G allele also increases fasting glucose. Our analysis supports this conclusion, and refines the effect estimates provided by Mahajan et al. [2015] by increasing the number of samples used to obtain effect estimates via meta-analysis, providing more precise estimates, as reflected in the smaller standard errors.
Our approach has some limitations. The variants included in the haplotype analysis must be genotyped or imputed in all cohorts. In other words, all cohorts must include the same set of variants in their analysis. Moreover, when using imputed genotypes, best-guess genotypes must be used because the approach does not currently handle genotypes in the form of dosage. The EM algorithm currently employed for inferring haplotypes works best for a moderate number of variants (< 15), and very rare haplotypes (frequency< 0.1%) are recommended to be collapsed to ensure computation stability. Despite these limitations, our approach has the potential to shed some light on the relationship between traits and multiple associated SNVs in a region.