Haplotype reference consortium panel: Practical implications of imputations with large reference panels

Recently, the Haplotype Reference Consortium (HRC) released a large imputation panel that allows more accurate imputation of genetic variants. In this study, we compared a set of directly assayed common and rare variants from an exome array to imputed genotypes, that is, 1000 genomes project (1000GP) and HRC. We showed that imputation using the HRC panel improved the concordance between assayed and imputed genotypes at common, and especially, low-frequency variants. Furthermore, we performed a genome-wide association meta-analysis of vertical cup-disc ratio, a highly heritable endophenotype of glaucoma, in four cohorts using 1000GP


INTRODUCTION
Progress in understanding the genetic etiology of complex traits is largely determined by the success of genome-wide association studies (GWAS) (McCarthy et al., 2008;Price, Spencer, & Donnelly, 2015).Imputation, that is, statistical inference of genotypes not directly assayed by arrays (Li, Willer, Sanna, & Abecasis, 2009), is crucial to the success of GWAS.Imputation increases the number of variants tested per study and allows combination of multiple studies assayed in different arrays, boosting the power of GWAS (Hao, Chudin, McElwee, & Schadt, 2009;Huang et al., 2015;Marchini & Howie, 2010).Recently, the Haplotype Reference Consortium (HRC) released the first version of a panel encompassing 64,976 haplotypes (McCarthy et al., 2016).This resource combines other widely used panels such as the 1000 Genomes Project (1000GP) (Genomes Project et al., 2015), the Genome of the Netherlands (GoNL) (Boomsma et al., 2014; Genome of the Netherlands, 2014), the UK10K (Consortium et al., 2015), and also includes sequencing data from 17 cohorts (McCarthy et al., 2016).
Previous studies have shown that larger reference panels considerably increase imputation accuracy, particularly for low-frequency variants, and that this gain in imputation accuracy results in increased statistical power (Browning & Browning, 2009;Deelen et al., 2014;McCarthy et al., 2016).To quantify the advantage of the HRC reference panel in the imputation of common and rare variants, we first assessed the concordance rate between directly assayed genotypes from exome array data and best-guess genotypes from 1000GP and HRC imputations.Then, we evaluated any difference in statistical power by comparing the meta-analysis of GWAS of vertical cup-disc ratio (VCDR) using 1000GP imputation versus HRC imputation, in the same samples.VCDR is a well-recognized endophenotype of primary open-angle glaucoma (Charlesworth et al., 2010) and a highly heritable trait (h 2 = 0.66) (Charlesworth et al., 2010) used for clinically assessing glaucoma patients.Here, we assessed the impact of the imputation panel on a meta-analysis of GWAS of VCDR by analyzing four cohorts: the Rotterdam Study (RS-I, RS-II, RS-III) (Hofman et al., 2015) and the Erasmus Rucphen Family (ERF) (Aulchenko et al., 2004;Pardo, MacKay, Oostra, van Duijn, & Aulchenko, 2005) study.

Study descriptions
The description of the participating studies can be found in the Supporting Information.All studies adhered to the tenets of the Declaration of Helsinki and were approved by their local Medical Ethics Committees.Written, informed consent was obtained from all participants.

Genotype and imputations
The Rotterdam Study cohorts (RS-I, RS-II, and RS-III) and the ERF study were genotyped using commercially available genotyping arrays, and genotyping quality control (QC) was done for each cohort individually.Detailed information about genotyping and imputations for each study is presented in Supp.Table S1.The genome-wide arrays were used to impute variants twice, once with the 1000GP (Genomes Project et al., 2015) and once with HRC (McCarthy et al., 2016).Imputations with the 1000GP were described previously (van Leeuwen et al., 2015).For HRC, file preparation was done using scripts provided online (HRC Imputation preparation and checking: http://www.well.ox.ac.uk/~wrayner/tools/; v4.2.1).Imputation with HRC was facilitated by the Michigan Imputation server (Das et al., 2016).Filtered genotypes were uploaded.The server uses SHAPEIT2 (v2.r790) to phase the data and Minimac 3 for imputation to the HRC reference panel (v1.0).We used the imputed dosages returned by the service (McCarthy et al., 2016).

Genotyping on the exome array
In total 3,159 participants from RS-I were genotyped in the HumanExome BeadChip v1.0 from Illumina (Illumina, Inc., San Diego, CA).To increase the quality of the rare variant genotype calls on the exome array, the genotypes were jointly called with 62,266 samples from 11 studies at the University of Texas HSC at Houston (UT Houston) (Grove et al., 2013).QC procedures for the genotype data were done both centrally at UT Houston and locally.The central QC procedures have been described previously (Grove et al., 2013).Locally, additional QC included removal of: (1) individuals with low genotype completion rate (<90%), (2) variants with low genotype call rate (<95%), (3) individuals with sex-mismatches, (4) one individual from duplicate pairs, and (5) variants not called in over 5% of the individuals and those that deviated significantly form the expected Hardy-Weinberg Equilibrium proportions (P < 1 × 10 −06 ).

Variant selection and calculation of concordance
A total of 236,756 variants on the exome array passed QC.Of these, we selected variants that fulfilled the following four criteria: (1) were imputed in both the 1000GP and HRC, (2) were polymorphic, (3) had non-ambiguous allele coding, and (4) showed no differences in reference allele frequency.In summary, 82,281 variants on the exome array were imputed in both 1000GP and HRC panels.Of these, 31,022 did not fulfill our criteria (n = 29,819 monomorphic, n = 1,141 with ambiguous allele coding, and n = 62 with large differences [>10%] in reference allele frequency).The final set consisted of 51,259 variants, for which we calculated either the concordance of the alternate allele calls (for the low-frequency and rare variants) or the concordance for both reference and alternate allele (for common variants).Calculation of concordance was performed using the exome array genotypes as benchmark and the best-guess genotypes from either 1000GP or HRC.
For the common variants, the reported concordance is the percentage of correctly imputed alleles divided by the number of exome array alleles according to the following formula (n = number of individuals): For the rare variants, the reported concordance is the percentage of correctly imputed alternate alleles divided by the number of exome array alternate alleles according to the following formula:

Exome array (benchmark) # of alternate
3 that showed a concordance rate <90% (n = 44) were examined in the list of vari-ants located in inaccessible regions of the genome from the GoNL (Boomsma et al., 2014).

Genome-wide association analyses in RS-I-II-III and ERF
We performed a GWAS in four Dutch cohorts: RS-I, RS-II, RS-III, and ERF (n = 12,441).GWAS were conducted twice per cohort, once using 1000GP imputations and once using HRC imputations.Association of directly genotyped and imputed dosages with VCDR was tested using linear regression under an additive model.All analyses were adjusted for age, sex, and the first five principal components in RS-I, RS-II, and RS-III, or family structure in ERF.The 1000GP imputations GWAS analyses were conducted using ProbABEL (Aulchenko, Struchalin, & van Duijn, 2010) software.To account for family relationships in ERF, the mmscore function implemented in GenABEL (Aulchenko, Ripke, Isaacs, & van Duijn, 2007) was used.Association analyses with HRC imputations were performed in RvTest-meta score option; to adjust for familial relationships in ERF, the kinship matrix estimated from the genotyped data was used.Variants with MAF < 0.01 in 1000GP and MAF ≤ 0.001 in HRC, and with low imputation quality score (R 2 < 0.3) were excluded from the analyses.The EasyQC software was used for QC at the study level, as described elsewhere (Winkler, et al., 2014).The inflation factor () of each individual study using 1000GP and HRC panels can be found in the Supp.Table S2.

Meta-analysis of RS-I-II-III and ERF
Meta-analyses for both 1000GP and HRC association results were conducted using the inverse variance-weighted fixed-effect method in METAL (Willer, Li, & Abecasis, 2010).Genomic control was applied in METAL and heterogeneity was calculated.After meta-analyses, we excluded the variants that were not present in at least two of the four cohorts.Given that HRC does not include INDELs, we excluded the INDELs from the 1000GP meta-analysis.The remaining variants were used for comparison of the results between 1000GP and HRC.To calculate the  in each meta-analysis we used those variants present in both meta-analysis (n = 7,654,440). in 1000GP was 1.096, and in HRC 1.102, see QQ plot in Supp. Figure S1.
We then subtracted the chi-square in HRC meta-analysis from the chi-square in 1000GP, and calculated the proportion of variants that showed a positive sign (i.e., larger chi-square in HRC).We then used a proportion Z test to evaluate if the proportion of variants with a positive sign was significantly different from 0.5.

Comparison 1000GP versus HRC
We first compared the R 2 value derived from the imputation software between 1000GP and HRC imputations (see Fig. 1).In this study, R 2 1000GP refers to the R 2 value in 1000GP imputations and R 2 HRC to the value in HRC.As reported by the HRC consortium (McCarthy et al., 2016), we confirmed that the R 2 HRC was higher than the R 2 1000GP , particularly for rare (0.001 < MAF < 0.01) and low-frequency variants (0.01 < MAF < 0.05) variants.
In 3,159 individuals from the Rotterdam Study-I (RS-I) assayed on the HumanExome BeadChip v1.0 (exome array), we determined the concordance between directly assayed genotypes and imputed genotypes.Of the 51,259 studied variants (see Materials and Methods section), 21,230 were common (MAF > 0.05 in the exome array), 8,006 were low-frequency variants (0.01 < MAF < 0.05) and 11,090 were rare variants (0.001 < MAF < 0.01).A total of 10,933 very rare variants (i.e., MAF < 0.001) were excluded as the minor allele count for the variants were six or less.The concordance rate of very rare variants can be found in Supp.Table S3 and Supp. Figure S2.
Figure 2 shows the concordance between exome array and 1000GP in red and the concordance between exome array and HRC in blue (plots are shown per MAF and R 2 bins; for comparison purposes, we used the R 2 1000GP to define the bins).Supp.Table S4 shows the number of variants per bin and the concordance rates for both 1000GP and HRC.
For low-frequency variants, a remarkable improvement in concordance was observed as shown in Figure 2D-F.Overall (N variants = 8,006), the concordance increased from 44% in 1000GP to 66% in HRC.For variants with R 2 1000GP > 0.8 (N variants = 3,844), the percentage of concordance improved from 85% in 1000GP to 98% in HRC.Improvement was most pronounced for variants with R 2 1000GP 0.3-0.8 (Fig. 2E; N variants = 3,301), in which the percentage of concordant variants improved from 7% in 1000GP to 44% in HRC.Despite the overall improvement, the concordance is relatively low for variants with R 2 1000GP < 0.3 (Fig. 2F; N variants = 861); concordance improved from 1% in 1000GP to 5% in HRC (see details in Supp. Table S4).

Impact of HRC imputations on VCDR GWAS
To evaluate the improvement in statistical power, we compared the results of a meta-analysis of GWAS using 1000GP imputations to a meta-analysis of GWAS using HRC imputations.Both meta-analyses included 12,441 individuals from four independent Dutch cohorts (RS-I, RS-II, RS-III, and ERF).Table 1 describes the baseline characteristics of the study populations.The inflation factor lambda (), an indicator of potential population stratification and false positive rate, was comparable in the two meta-analyses with values of 1.096 in the 1000GP meta-analysis and 1.102 in the HRC meta-analysis (Supp.Fig. S2), suggesting no increase in false positives in the HRC analyses.

S1 and Table
To assess the improvement in P values, we calculated the proportion of variants for which a stronger association was found in the HRC meta-analysis.62% of variants that showed suggestive association in 1000GP or HRC meta-analyses (i.e., 5.0 × 10 −08 < P < 1.0 × 10 −04 ; n variants = 4,683) showed smaller P values in the HRC meta-analysis (P = 3.07 × 10 −61 ; Table 2) suggesting improved sensitivity in the HRCbased meta-analysis.For variants below suggestive significance (i.e., 1.0 × 10 −04 < P < 1), the P values did not improve, suggesting better specificity.No statistically significant differences were observed for the genome-wide significant variants.
Based on the data of the Rotterdam Study and ERF, we confirmed in our 1000GP meta-analysis seven loci reported by the International Glaucoma Genetics Consortium (IGGC) (Springelkamp et al., 2014(Springelkamp et al., , 2016)), which is 2.6 times the size of this study.Using the HRC imputation, we confirmed one additional common variant (rs6539763, MAF = 0.46, close to TMTC2), previously reported by the IGGC (Springelkamp et al., 2014).When comparing the top associated variants from both 1000GP and HRC meta-analyses, we found that in three  out of the seven confirmed loci, the top associated variant was the same (rs1192415 close to TGFBR3, rs7916410 close to ATOH7, and rs9613667 in CHEK2).For the other four loci, top variants were different in each meta-analysis (Supp.Table S5).In all cases, the identified variants were in LD (0.74 < r 2 < 1) with the variant previously reported by the IGGC.

DISCUSSION
We showed that the HRC imputations improve the concordance between directly assayed and imputed genotypes over a large range of MAFs.In addition, our VCDR analyses show that the P values obtained with HRC imputations compared with 1000GP imputations are on Variants from 1000GP and HRC meta-analyses were divided into four categories, that is, genome-wide associated, suggestive, nominally associated and not associated.
* Z score and P value from a proportion z test, in which H 0 = 50%.
average significantly smaller for suggestive variants.The improvements were also seen with common variants, predicting that HRC imputations may also be relevant for finding new common variants.
As described by the HRC (McCarthy et al., 2016), the imputation accuracy (measured by the R 2 ) is higher when using the HRC panel.
In our comparisons, 93.27% (n = 19,802 out of 21,230) of the common variants imputed with 1000GP had a R 2 > 0.8; this percentage increased to 98.74% in HRC (n = 20,962 out of 21,230; Supp.Table S6).Despite the overall improvement, there is still a group of common variants with R 2 < 0.3 that showed a low concordance rate (i.e., <90%).Of these, about 18% (eight out of 44) are located in inaccessible regions of the genome that represent a challenge for short-read technologies (Deelen et al., 2014;Genome of the Netherlands, 2014;Genomes Project et al., 2010;Genomes Project et al., 2012) (Supp. Table S7).Investment in long-read sequencing and single-molecule mapping, along with improvements in the generation of high-quality de novo assemblies of genomes, will advance our understanding of the genome and its variation, and will further improve the accuracy of genotype imputations (Berlin et al., 2015;Chaisson, et al., 2015).
In this study, we focused on the assessment of genotype concordance rate rather than on the R 2 changes as described by the HRC in McCarthy et al.We calculated and assessed the concordance rate between high-quality genotypes (Grove et al., 2013) et al., 2016).A marked improvement was observed for variants with R 2 1000GP 0.3-0.8 (Fig. 2E).Thus, low-frequency variants commonly filtered out in meta-analyses of GWAS at present might be studied reliably using HRC imputations.For rare variants (0.001 < MAF < 0.01), there is also a relevant improvement in concordance that facilitates in silico validation of findings of exome array or sequencing results, and gene discovery.Finally, for rare variation we observed that the concordance of well-imputed variants increased by 20% (65% in 1000GP vs. to 86% in HRC).
The HRC panel incorporates 64,976 haplotypes, including 998 Dutch haplotypes from the GoNL (Genome of the Netherlands, 2014; van Leeuwen et al., 2015); previous studies have shown that sample size of the reference panel is often more important than population matching (Huang et al., 2015;Marchini & Howie, 2010).This finding was also observed in a previous study (Deelen et al., 2014), in which a combined reference panel including GoNL and 1000GP yielded the best imputation results.This suggests that the improvement in genotype concordance observed in our study is related to the increase sample size of the HRC.
Our analysis of VCDR in the Rotterdam and ERF cohorts used the same group of individuals, thus allowing an informative comparison between meta-analysis results performed with genotypes imputed with 1000GP versus genotypes imputed with HRC.We showed that for over a large range of MAFs, including common variants, a significant improvement in P values can be achieved with HRC imputations.
The same trend was observed when P values from both meta-analyses were compared (Supp.Fig. S3).Table 2 shows that the percentage of genome-wide significant, and particularly, suggestive variants in the GWAS performed using the genetic data imputed with the HRC reference panel is larger compared with the GWAS using the 1000GP reference panel.Given the same sample size (n = 12,441), we confirmed in the 1000GP meta-analysis seven previously reported VCDR-loci (Springelkamp et al., 2014(Springelkamp et al., , 2016)), whereas in the HRC meta-analysis an additional known VCDR-locus (close to TMTC2) was confirmed.
A potential limitation in this study is that we used exome array data as benchmark, and genotype calling may be subject to error.However, data were jointly called with around 62,000 other samples, thereby improving the accuracy of our genotype calls (Grove et al., 2013).
Assessment of concordance using exome-array genotypes allowed us to compare the performance of the both reference panels independently of the post-imputation metric (R 2 ) derived from imputation software.Our sample size limits the assessment of very rare variants (MAF < 0.001); however, the same trend was observed for these variants (Supp.Table S3).We analyzed a genetically homogeneous Dutch population, thus we cannot predict the performance of HRC in non-Europeans.Assessment in other populations will reveal the performance of HRC and 1000GP panels in non-European samples.Our findings provide further evidence for the gain in power when a large reference panel as released by the HRC is used for imputations.We have shown that concordance between data directly assayed in the exome array and HRC imputations is high, particularly, for variants with an R 2 > 0.8 across a wide range of allele frequencies.Furthermore, our analyses of VCDR data in 12,441 subjects showed smaller P values for common and rare variants when using the HRC panel.Thus, imputations with HRC and other larger reference panels improves the statistical power to discover both common and low-frequency variants, opening new avenues for fine-mapping and gene discovery.

F
I G U R E 1 Imputation accuracy 1000GP versus HRC.Postimputation quality metric based on the R 2 value from 1000GP (in gray) and HRC (in black).MAF, minor allele frequency

F
Comparison of concordance between exome-array and imputed data (1000GP and HRC).Histograms are presented by bins selected based on minor allele frequency (MAF) and R 2 1000GP values.For each histogram, the concordance rate is on the x-axis and the percentage of variants (N total of variants) is on the y-axis.Concordance between exome-array genotypes and best-guess genotypes from 1000GP are shown in red and from HRC in blue.Overlapped regions of the histograms are in dark-red.Panels A-C show comparison of concordance for common variants; panels D-F show comparison of concordance for low-frequency variants, and panels G-I show comparison of concordance for rare variants.Based on the R 2 1000GP threshold, panels A, D, and G) show the comparison of concordance for variants with high R 2 1000GP (>0.8), panels B, E, and H show the comparison of concordance for variants with R 2 1000GP between 0.3 and 0.8, and panels C, F, and I show the comparison of concordance for variants with low R 2 1000GP (<0.3)