Susceptibility loci for metabolic syndrome and metabolic components identified in Han Chinese: a multi‐stage genome‐wide association study

Abstract Metabolic syndrome (MetS), a cluster of metabolic disturbances that increase the risk for cardiovascular disease and diabetes, was because of genetic susceptibility and environmental risk factors. To identify the genetic variants associated with MetS and metabolic components, we conducted a genome‐wide association study followed by replications in totally 12,720 participants from the north, north‐eastern and eastern China. In combined analyses, independent of the top known signal at rs651821 on APOA5, we newly identified a secondary triglyceride‐associated signal at rs180326 on BUD13 (P combined = 2.4 × 10−8). Notably, by an integrated analysis of the genotypes and the serum levels of APOA5, BUD13 and triglyceride, we observed that BUD13 was another potential mediator, besides APOA5, of the association between rs651821 and serum triglyceride. rs671 (ALDH2), an east Asian‐specific common variant, was found to be associated with MetS (P combined = 9.7 × 10−22) in Han Chinese. The effects of rs671 on metabolic components were more prominent in drinkers than in non‐drinkers. The replicated loci provided information on the genetic basis and mechanisms of MetS and metabolic components in Han Chinese.


Introduction
Metabolic syndrome (MetS) is characterized by a cluster of metabolic disorders including obesity, dyslipidemia, elevated fasting plasma glucose and elevated blood pressure [1]. MetS increases the risks for diabetes mellitus, cardiovascular diseases and cancers as well as increased mortality from all causes [2][3][4][5]. The prevalence of MetS (as defined by the International Diabetes Federation consensus in 2005) was reported to be 16.5% from a cross-sectional study in Chinese adults aged 35-74 years in 2000-2001 [6]. The estimated prevalence had increased to 18.2% according to the China Health and Nutrition Survey in 2009 [7]. Given that the prevalence of MetS is high and increasing quickly in the Chinese population, strategies for its early detection and effective intervention are urgently needed.
It is generally accepted that the current criteria of MetS are integrated assessment strategies with only a binary outcome rather than precise levels of metabolic components. Screening for genetic susceptibility loci using these simplified definitions is based on the common phenomenon of pleiotropy in which one gene or one variant affects multiple phenotypes. Pleiotropy has been reported in genetic variations to be associated with high-density lipoprotein cholesterol (HDL-C), triglyceride (TG) and low-density lipoprotein cholesterol (LDL-C) [29]. A systematic review of pleiotropy from a broader viewpoint suggested that a large number of genes and SNPs show pleiotropic effects in common complex diseases and traits [30]. These findings suggested that the combined analysis of metabolic components as a whole is a critical supplement to metabolic componentspecific screening.
In the current study, we searched for genetic susceptibility loci for MetS and metabolic components using a multistage GWAS and aimed to understand the mechanism behind the associations in Han Chinese.

Participants
In the genome-wide discovery stage, 998 participants with MetS and 996 healthy controls were recruited from a community-based survey in 2010-2011 in Linpu town, Xiaoshan District, Hangzhou, Zhejiang Province, China. For replication, seven independent cohorts with a total of 5514 cases and 5464 healthy controls were recruited from north-eastern China (Shenyang cohort), northern China (Beijing cohort) and eastern China (Hangzhou, Daicun, Wenzhou, Zhoushan, and Zhejiang cohorts) (see Table S1 for further details). All participants were of Han Chinese ethnicity. Individuals were excluded if they had metabolicrelated interventions or had cancer, or serious chronic liver, lung, heart or kidney disorders. The study protocol was approved by the Research Ethics Committee at the School of Medicine, Zhejiang University. Each participant gave informed consent.

Anthropometric measurements and epidemiological investigation
Anthropometric indices (weight, height, waist circumference and hip circumference) and blood pressure were measured following standard protocols. Body mass index (BMI) was calculated as the bodyweight in kilograms divided by the square of the height in metres. Waist-to-hip ratio (WHR) was calculated as waist circumference divided by hip circumference in centimetres. Serum biochemical parameters including fasting blood glucose (FBG), TG, HDL-C and LDL-C were measured after overnight fasting. The serum levels of APOA5 and BUD13 were measured using enzyme-linked immunosorbent assay (ELISA) kits from Cusabio (Wuhan, Hubei, China, code: CSB-E11901h, CSB-EL002885HU). Alcohol consumption (classified as non-drinker, light drinker or heavy drinker) was assessed in face-to-face interviews.  requiring the presence of three or more of the following: BMI ≥25 kg/ m 2 ; systolic/diastolic blood pressure (SBP/DBP) ≥ 140/90 mmHg; FBG ≥ 6.1 mmol/l; TG ≥ 1.7 mmol/l; HDL-C < 0.9 mmol/l (men) or <1.0 mmol/l (women) [31]. Healthy controls were free of the above metabolic disorders. BMI, WHR, FBG, TG, HDL-C and LDL-C were treated as continuous variables in analyses.

Genotyping, imputation and quality controls
Genomic DNA was isolated from whole blood using a TACO automatic nucleic acid extraction apparatus (GeneReach Biotechnology Corp., Taichung, Taiwan). Genotyping of the GWAS samples was conducted using Illumina Human-OmniExpress 760k chips (Illumina, San Diego, CA, USA) in the Bio-X Center, Shanghai Jiao Tong University, according to the manufacturer's protocol. Nineteen randomly selected samples were genotyped repeatedly, and the results showed~99.9% concordance with corresponding samples in the discovery stage. The case and control samples were distributed evenly in each plate. Negative controls (without DNA template) were included on every plate.
Systematic quality control was conducted in the discovery stage ( Fig. S1). SNPs were excluded if (i) they did not map to autosomal chromosomes, (ii) they had a minor allele frequency (MAF) <0.05 in current samples, (iii) the distribution in controls deviated from the Hardy-Weinberg equilibrium (P < 1.0 9 10 À4 ) or (iv) the call rate was <95%. Samples were excluded from analyses if they (i) had overall successful genotyping call rates <95%, (ii) were population outliers according to the smartPCA program from EIGENSOFT [32] or (iii) had probable relatives (PI_hat > 0. 25). After the quality control procedure, a total of 862 participants with MetS, 880 healthy controls and 533,059 SNPs were included in the discovery stage analyses.
The post-quality control GWAS data were used for imputation. We imputed ungenotyped SNPs via IMPUTE2 [33] with the haplotype reference data of 1092 individuals from the 1000 Genomes Project Phase I Integrated Variant Set Release (v3, March 2012) in NCBI build 37 (hg19) coordinates. SNPs with info score quality estimates of <0.8 were excluded from analyses. Finally, 4,642,479 SNPs were used for fine mapping and SNP function prediction.
In replication stage I, genotyping was performed with SNPscan TM (Genesky Biotechnologies Inc., Shanghai, China). The TaqMan genotyping platform (ABI 7900HT Real Time PCR system, Applied Biosystems, Foster City, CA, USA) was used in replication stages II and III. To evaluate the accuracy of SNPscan and the TaqMan platform, an additional 5% of samples was genotyped by Sanger sequencing. Genotyping call rate control (more than 95%) and Hardy-Weinberg equilibrium control (P > 1.0 9 10 À3 ) were implemented in the replication stages.

Strategies for signal selection, replication and statistical analyses
We considered MetS as a binary outcome and performed the Cochran-Armitage trend test in a logistic regression model with age, gender and the first two principal components as covariates using PLINK 1.07 [34] in the discovery stage. Signals with a P-value <5.0 9 10 À5 were selected for replication. Metabolic components [BMI, WHR, FBG, TG (log-transformed), HDL-C and LDL-C] were also considered as quantitative outcome for screening. Multiple linear regressions were performed for quantitative variables with age, sex and the first two principal components as covariates. BMI was considered as an additional covariant for screening SNPs affecting FBG, TG, HDL-C and LDL-C. Component-associated signals with a P-value <1.0 9 10 À5 in the discovery phase were selected for replication.
To prune candidate SNPs sharing the same potential biological effects, the conditional analysis was performed with any two candidate SNPs within~1 Mb. We kept only one of the statistically significant SNP if the other SNP had a P-value > 0.05 in conditional analyses. As a result, 39 SNPs with independent effects were selected (Fig. S2). Thirty-two of the 39 SNPs were successfully designed in replication phase I. Analyses were conducted in the replication stages with age and gender as covariates using PLINK 1.07 (the same as in the discovery stage). Combined effects were calculated with meta-analyses using Stata 12.0 (STATA Corp, College Station, TX, USA) for the SNPs with P-values <0.05 in the replication stage and a consistent direction of effect with discovery stage. SNPs with a combined P-value <5.0 9 10 À8 were regarded as significant at the genome-wide level. Replications II and III were conducted if the P-value of an SNP was <0.05 while the combined P-value did not reach genome-wide significant level (5.0 9 10 À8 ) (Fig. S2). Meta-analyses were applied to combine the results from different cohorts and stages with fixed-effect models.
Linear regressions were conducted for the associations among tagSNPs (in the additive model), lifestyle (alcohol consumption) and serum levels of APOA5, BUD13, TG and HDL-C using SAS for Windows (version 9.2, SAS Institute Inc., Cary, NC, USA).
The manhattan plots and quantile-quantile (Q-Q) plots were drawn using R package 'gap'. The genetic inflation factors were calculated using PLINK 1.07. The significant genome-wide regions were plotted using the online tool LocusZoom based on the ASN population in hg19 coordinates [35]. SNP function predictions were conducted after imputation-based fine mapping. The genetic architectures surrounding replicated SNPs were assessed using the ENCODE database from the UCSC genome browser [36].

Results
After quality control, 1742 participants (862 MetS cases and 880 healthy controls) and 533,059 genotyped SNPs were included for analyses in the discovery stage. Replication samples consisted of 656 cases and 933 controls for phase I, 709 cases and 1921 controls for phase II, and 4149 cases and 2610 controls for phase III. The characteristics of these participants in each stage are shown in Table S2. The inflation factors ranged from 1.01 to 1.02 for MetS and metabolic components, suggesting little evidence of population stratification after quality control.

Association analyses for MetS and metabolic components
To screen MetS-associated variants, logistic regressions were performed for each SNP adjusted for age, gender and the first two principal components (Fig. S3) in the discovery stage. The genetic inflation factor was 1.02. The Q-Q and manhattan plots are shown in Figure 1. Sixteen independent SNPs were found to be associated with MetS (P < 5.0 9 10 À5 ) in the discovery stage. Then, these SNPs were genotyped in replication stage I. Two SNPs (rs651821 and rs671) were associated with MetS (P < 0.05, Table S3), and their effects were consistent with the results in the discovery stage. rs651821 and rs671 were further genotyped in replication stages ( Table 1). The combined analyses presented that the C allele of rs651821 increases the risk of MetS with an odds ratio (OR) and 95% confidence interval (CI) of 1.28 (1.20, 1.36), with a combined P = 4.2 9 10 À17 . The A allele of rs671 decreased the risk of MetS with an OR and 95% CI of 0.71 (0.67, 0.76), combined P = 9.7 9 10 À22 . The effect sizes in each stage are presented in Table 1 and Table S3. Considering the age difference between cases and controls in some cohorts, we also classified the samples into three age groups (≤ 30, 31-60 and >60 years) and then performed age-stratified analyses for the MetSassociated SNPs. The effects of rs651821 and rs671 were stable among different age groups, and the results were consistent with the overall analyses (Table S4).
In order to screen the SNPs associated with metabolic components, linear regressions were conducted using BMI, WHR, FBG, TG (log-transformed), HDL-C and LDL-C as dependent variables. Q-Q plots and manhattan plots are shown in Fig. S4. The genetic inflation factors ranged from 1.01 to 1.02 for these quantitative outcome. In the discovery stage, 24 SNPs had independent effects on the metabolic components with P-value < 1.0 9 10 À5 after conditional analyses. In replication stage I, rs1506525, rs4532958 and rs445925 were associated with BMI, WHR and LDL-C (P < 0.05), respectively. A marginally significant association was found between rs180326 and TG after controlling for the top signal of rs651821 (P = 0.063). Combining the results of discovery and replication stage I, the association of rs445925 with LDL-C reached significance at the genome-wide level (P combined = 1.1 9 10 À13 ). The A allele of rs445925 was associated with a decreased level of LDL-C [beta 95% CI = À0.22 (À0.28, À0. 16)]. rs1506525, rs4532958 and rs180326 were further replicated as the combined P-values did not reach the genome-wide threshold for significance. We further genotyped rs651821 in replication stages II and III to determine whether the signal at rs180326 was independent of rs651821.
A novel secondary TG-associated signal at rs180326 on BUD13 The top knew TG-associated signal of rs651821 was localized in the 5 0 UTR of APOA5, which belongs to the apolipoprotein gene cluster on chromosome 11q23 (Figure 2). A novel secondary TG-associated signal at rs180326 on BUD13 was replicated after controlling for the effect of rs651821. The combined effect size of the minor allele of rs180326 was À0.04 (95% CI: À0.05, À0.03 P = 2.4 9 10 À8 ) with and 0.06 (95% CI: 0.05, 0.06 P = 1.9 9 10 À44 ) without controlling for the top signal of rs651821. The opposite effects of rs180326 were stable in different stages (Table 2).
To uncover the mechanism behind the opposite effects of the novel secondary signal, we measured the serum levels of APOA5 and BUD13 using ELISA kits as discribed in method. As shown in Figure 3, we observed that the minor allele C of rs651821 was associated with a lower level of APOA5 than the T allele (P = 7.4 9 10 À9 ). The serum level of APOA5 was associated with TG (beta = À0.35, P = 3.3 9 10 À12 ). The minor allele C of rs180326 was associated with a decreased serum level of BUD13 (P = 0.07). The serum level of BUD13 was associated with an increased level of serum TG (beta = 0.10, P = 7.6 9 10 À3 ) and explained 14.2% of the serum TG variance. From the associations observed above, we speculated   that the association between rs180326 and serum TG was masked by the LD between rs180326 and rs651821 before adjusting the top signal. Therefore, the combined effect of the minor allele C of rs180326 was inconsistent with and without controlling for the top signal at rs651821. Additionally, to determine whether APOA5 and BUD13 mediate the associations between rs651821 and metabolic components, we performed linear regressions and found that the association between rs651821 and TG was partly independent of the mediator APOA5 (P < 0.05 after controlling for the serum level of APOA5). No statistical association between rs651821 and TG was found when we added serum level of BUD13 as a covariate in the regression model (P = 0.259).
Associations among rs671 (on ALDH2), alcohol consumption and MetS rs671 is known to be a non-synonymous mutation on ALDH2. As shown in Table S6, the minor allele frequency of rs671 is much higher in Asians (0.20), especially in the Han Chinese population (0.29) than in European (<0.01). Results showed that rs671 was strongly associated with alcohol consumption (P combined = 1.7 9 10 À58 ) in 4295 participants from the Beijing, Shenyang, Xiaoshan and Hangzhou cohorts. The interaction between the genotypes of rs671 and alcohol consumption status was found for MetS (P = 0.014). Stratification analysis showed that rs671 was significantly associated with MetS in drinkers (P = 7.5 9 10 À6 ). Whereas only marginal significance in non-drinkers (P = 0.097) was observed. Similar results were obtained in the associations with obesity and the other metabolicrelated components BMI, WHR, SBP and TG (Table 3). Then, we adjusted the alcohol consumption levels in drinkers in regression models. Our results suggested that these associations were partly independent of the alcohol consumption levels.

Discussion
In this multistage GWAS, using samples from north, north-eastern and eastern China, we identified two SNPs (rs651821 on APOA5 and rs671 on ALDH2) associated with MetS. Independent of the top signal at rs651821 in APOA cluster, rs180326 on BUD13 was observed as a novel secondary signal associated with serum TG. rs651821 (primary signal) P discovery = 7.8 10 -16 P combined = 2.2 10 -117 rs180326 (secondary signal) P discovery = 3.9 10 -7 P combined = 1.9 10 -44 P discovery = 1.3 10 -2 (condiƟonal analysis) P combined = 2.4 10 -8 (condiƟonal analysis) Fig. 2 The regional plots of the top signal at rs651821 and the secondary signal at rs180326 for triglyceride. The regional plots were plotted via the online tool LocusZoom using ASN population as reference for LD calculations in hg19 coordinates. P-values used for the regional plot were estimated from the discovery stage. The combined P-values were given for the two signals. For rs180326, the conditional analysis was performed adjusting the top signal at rs651821.   The top TG-associated signal was located at rs651821 on APOA5. This result was consistent with previous studies [37][38][39] as shown in Table S7. Recently, rs2266788, which was highly correlated with rs651821 (LD r 2 = 0.83), was demonstrated as a functional point mutation [25]. In this study, it was proposed that the effect of rs2266788 (localized at the 3 0 UTR of APOA5) was mediated by the microRNA miR-485-5p expressed in the human liver. We performed conditional analyses in the discovery and replication stages and found a novel secondary signal at rs180326 on BUD13 in this region. From the associations among rs651821, rs180326, serum levels of APOA5, BUD13 and TG, our results confirmed that the effect of rs180326 was masked by the strong LD between rs180326 and rs651821 before controlling the top signal in this region, which means that a false effect of rs180326 would be found without conditional analysis. Additionally, our results suggested that BUD13 was another potential mediator besides APOA5 for the association between the top signal at rs651821 and TG. Integrated with previous clues, the signal at rs651821 affects lipid metabolism via at least two causal variants. One of the causal mutations is the rs2266788 (on APOA5) which has been reported [25]. Another causal variant (rs180326 or other loci) probably affects serum TG level via BUD13. BUD13 is located in the APOC3/A4/A5 gene cluster on chromosome11q23.3. Genetic variants within this region are known to be associated with serum lipid components. A significant association between serum BUD13 and TG levels was observed in current study, whereas the role of BUD13 and the molecular mechanisms for these effects remained to be determined.
The non-silence variant rs671 was localized in exon 12 of ALDH2, a key enzyme of alcohol metabolism. Mutation of rs671 from glutamate to lysine in ALDH2 results in an enzyme that is rendered essentially inactive in vivo [26]. Recent GWAS indicated that rs671 is associated with daily alcohol consumption [40,41]. Our results were consistent with the previous findings. The possible mechanism is that carriers of the A allele have a reduced capacity to catalyse acetaldehyde, and this leads to immediate and unpleasant symptoms, such as the flushing response and nausea, which probably result in reduced alcohol consumption. In addition, a significant interaction between alcohol consumption status and rs671 was found in the current study. In the stratified analysis of alcohol consumption status, a significant association between rs671 and MetS in drinkers was found. For drinkers, the A allele of rs671 was significantly associated with reduced risk of developing MetS. However, for non-drinkers, there was no significant difference in the genotype frequency of rs671 between MetS and healthy controls. Furthermore, apart from influencing the alcohol consumptions, additional effects of rs671 were found for the associations between rs671 and metabolic components. Interactions between rs671 and alcohol consumption had been reported previously for serum TG [37] and other phenotypes including oesophageal cancer and an acutephase inflammation marker alpha-1 antitrypsin [42,43].
In conclusion, we performed a multiple-stage GWAS for MetS and metabolic components in Han Chinese. A novel secondary TG-associated signal at rs180326 on BUD13 was replicated. rs651821 on APOA5 was validated as a pleiotropic locus associated with MetS. In addition, evidence showing that, besides APOA5, BUD13 was another potential mediator for the association between rs651821 and serum TG. Interactions between rs671 and alcohol consumption status were found for MetS and metabolic components. The results of the current  BMI, body mass index; DBP, diastolic blood pressure; FBG, fasting blood glucose; HDL-C, high-density lipoprotein cholesterol; LDL-C, low-density lipoprotein cholesterol; MetS, metabolic syndrome; N, number of participants; SBP, systolic blood pressure; TC, total cholesterol; TG, triglyceride; WHR, waist-to-hip ratio.
The levels of obesity and metabolic components in the rs671 genotypes are showed as mean AE SD or median (25th percentile, 75th percentile) in drinkers and non-drinkers.
*Logistic and linear regression for the association between rs671 and MetS and metabolic components in drinkers and non-drinkers were conducted separately. Age, gender and study were set as covariants in regression models. † Then, we adjusted for the alcohol consumption levels in drinkers. ‡ In addition, we tested the interaction between rs671 and alcohol consumption status for each trait.

Conflicts of interest
The authors declare that they have no competing interests.

Supporting information
Additional Supporting Information may be found online in the supporting information tab for this article: Figure S1 Flow-chart of the quality-control procedure in the discovery phase. Figure S2 Flow-chart of the study design.

Table S1
The cohorts in the discovery and replication stages