Genomewide meta‐analysis identifies loci associated with IGF‐I and IGFBP‐3 levels with impact on age‐related traits

Summary The growth hormone/insulin‐like growth factor (IGF) axis can be manipulated in animal models to promote longevity, and IGF‐related proteins including IGF‐I and IGF‐binding protein‐3 (IGFBP‐3) have also been implicated in risk of human diseases including cardiovascular diseases, diabetes, and cancer. Through genomewide association study of up to 30 884 adults of European ancestry from 21 studies, we confirmed and extended the list of previously identified loci associated with circulating IGF‐I and IGFBP‐3 concentrations (IGF1, IGFBP3,GCKR,TNS3, GHSR, FOXO3, ASXL2, NUBP2/IGFALS, SORCS2, and CELSR2). Significant sex interactions, which were characterized by different genotype–phenotype associations between men and women, were found only for associations of IGFBP‐3 concentrations with SNPs at the loci IGFBP3 and SORCS2. Analyses of SNPs, gene expression, and protein levels suggested that interplay between IGFBP3 and genes within the NUBP2 locus (IGFALS and HAGH) may affect circulating IGF‐I and IGFBP‐3 concentrations. The IGF‐I‐decreasing allele of SNP rs934073, which is an eQTL of ASXL2, was associated with lower adiposity and higher likelihood of survival beyond 90 years. The known longevity‐associated variant rs2153960 (FOXO3) was observed to be a genomewide significant SNP for IGF‐I concentrations. Bioinformatics analysis suggested enrichment of putative regulatory elements among these IGF‐I‐ and IGFBP‐3‐associated loci, particularly of rs646776 at CELSR2. In conclusion, this study identified several loci associated with circulating IGF‐I and IGFBP‐3 concentrations and provides clues to the potential role of the IGF axis in mediating effects of known (FOXO3) and novel (ASXL2) longevity‐associated loci.


Meta-Analysis
Individual studies' result files underwent extensive quality control before meta-analysis.
File format as well as plausibility and distributions of association results including effects, their standard errors, allele frequency and imputation quality of the SNPs were performed by the gwasqc() function of the GWAtoolbox package v1.01 (Fuchsberger et al. 2012). All cohort specific λ GC values were between 0.99 and 1.07 for the IGFBP-3 and between 0.98 and 1.08 for the IGF-I outcome. Additionally, the known IGFBP-3 association of rs11977526 in IGFBP3 was checked for a consistent effect direction and size in each study. The family-based cohorts FHS and MICROS conducted additional analyses on the men and women combined sample by additionally adjusting for sex to account for relatedness in the sex-combined analyses.
In total, GWAS results of ~2.6 million SNPs in at least 50% of the studies were available.
Genome-wide significance was defined as a p-value < 5×10 -8 correcting for one million independent tests.
Because several studies joined the project after the initial GWAS was finished, we implemented a multi-stage design with two GWAS stages and an additional stage with de novo genotyping data to confirm novel loci associated with (IGF-I only) levels. After stage 1 GWAS, all 19 lead SNPs from all traits with a P < 10 -6 were taken forward to stage 2. SNPs were selected only once if the smallest P-value was in the overall stratum without adjustment for IGF-I or IGFBP-3, respectively, and if the association had the same effect direction after this adjustment or in the respective sex-stratum. One additional cohort with IGF-I but without genome-wide data was available for stage 3. Therefore, all IGF-I lead SNPs of novel loci that had a combined stage 1 and stage 2 P < 10 -8 (except GCKR) were selected for de novo replication. The lead SNP rs780093 of the GCKR locus was already genome-wide significant in stage 1 (P= 1.0×10 -9 ) and had a P-value of 4.8×10 -6 in stage 2 which is significant after Bonferroni correction for 19 tests, and thus was not included in the de novo replication due to limited funding. Of the seven SNPs selected for replication, for one SNP (rs1065656 of the NUBP2 locus) de novo genotyping failed.
All lead SNPs that were genome-wide significant after the final stage were considered as replicated.
From among all SNPs not being in linkage equilibrium (see Methods) and where results were available from at least 50% of the studies, the SNP with the smallest P-value was selected as lead SNP for the respective locus and trait. Of note, the combined stage 1 and stage 2 GWAS meta-analysis did not reveal any additional loci with a P-value < 10 -8 . However, if there was a SNP in high LD with a lead SNP and also having a smaller p-value than the lead SNP after this combined analysis, this SNP was selected as the new lead SNP of the respective locus.

Assessment of Independent Signals
The analysis of secondary signals in the NUBP2 locus was performed using the software GCTA (Yang et al. 2011) and the genotypes of the SHIP cohort, and was verified by a second analysis using the genotypes of the NHS/HPFS cohorts. The GC-corrected and QC filtered metaanalysis results and a condition list containing the lead SNPs of the final loci were used as input for the conditional analysis. An additional hit was found if the conditional P-value was below genome-wide significance. Subsequently, this SNP was added to the condition list and the conditional analysis was performed again until no additional significant independent association was found.

Gene Expression and eQTL Analyses
For each of the lead SNP of the significant loci after final stage, significant cis eQTL associations in whole blood were looked-up in the publically available association result database (Westra et al. 2013). We also looked-up cis eQTL associations for lead SNPs in "MuTHER" (http://www.muther.ac.uk/) (Grundberg et al. 2012). Lymphocytes (LCLs and in some, fresh lymphocytes); subcutaneous fat; muscle and skin biopsies have been obtained from ~856 twins (1/3 monozygotic, 2/3 dizygotic) from the well-characterised TwinsUK BioResource.
Association analysis of whole blood gene expression data with serum IGF-I and IGFBP-3 measurements was conducted in 986 samples of the SHIP-TREND cohort. The gene expression levels were obtained and normalized as described previously (Schurmann et al. 2012). Gene expression levels were used as outcome in a linear regression model including the serum IGF-I or IGFBP-3 values as independent variable by adjusting for sex, age, red and white blood cell counts, RNA integrity number, sample storage time, and RNA amplification and labeling batch as a factor. For sensitivity analyses, BMI was included as an additional covariate in the model. The SHIP-TREND expression dataset is available at GEO (Gene Expression Omnibus) public repository under the accession GSE 36382.

Association with Plasma Protein Levels
The complexity of the plasma proteome presents an analytical challenge. Therefore, we subjected the plasma samples to immunoaffinity subtraction using the Multiple Affinity Bone mineral density: publicly available data of the GEFOS Consortium (http://www.gefos.org/?q=content/data-release). (Estrada et al. 2012) Type 2 diabetes: publicly available data of the DIAGRAM Consortium (http://diagramconsortium.org/downloads.html). Morris et al. 2012) Diabetes related traits: publicly available data of the MAGIC Consortium (http://www.magicinvestigators.org/downloads/). Saxena et al. 2010;Soranzo et al. 2010) Coronary artery disease: publicly available data of the CARDIoGRAMplusC4D Consortium (http://www.cardiogramplusc4d.org/downloads/).(Coronary Artery Disease Genetics 2011; Schunkert et al. 2011;Consortium et al. 2013) Survival beyond age 90 years old: results were obtained from a GWAS of longevity providing by Linda Broer et al (Broer et al. 2014).

Supplementary Figure 1: Manhattan Plots of Men and Women Strata
Manhattan plots of the combined stage 1 and 2 meta-analysis results of IGF-I and IGFBP-3 traits in men and women strata. SNPs are plotted on the x-axis according to their position on each chromosome with the -log 10 association p-value on the y-axis. The upper solid horizontal line indicates the threshold for genome-wide significance. Known hits are colored in orange, new findings in blue. Plots are truncated on the y-axis to 20.

Supplementary Figure 2: QQ Plots of Meta-Analysis Results
Quantile-quantile plots of the combined stage 1 and 2 meta-analysis results of the IGF-I and IGFBP-3 traits across all strata. The observed p-values are plotted on the y-axis against the expected p-values under no association on the x-axis.

Supplementary Figure 3: Regional Association Plots for IGF-I Traits
Regional association plots of all genome-wide significantly associated SNPs with IGF-I as outcome after final stage. The -log 10 association p-values of all SNPs in a 500kb vicinity are shown on the y-axis. The color corresponds to the correlation with the lead SNP based on the HapMap II CEU data. Plots are ordered by association p-value of the lead SNP.

Supplementary Figure 4: Regional Association Plots for IGFBP-3 Traits
Regional association plots of all genome-wide significantly associated SNPs with IGFBP-3 as outcome after final stage. The -log 10 association p-values of all SNPs in a 500kb vicinity are shown on the y-axis. The color corresponds to the correlation with the lead SNP based on the HapMap II CEU data. Plots are ordered by association p-value of the lead SNP.

Supplementary Figure 5: Results of Bivariate Analysis of IGF-I and IGFBP-3
Manhattan plot (upper panel) and quantile-quantile plot (lower panel) of the genome-wide bivariate analysis on IGF-I and IGFBP-3. In the Manhattan plot, SNPs are plotted on the x-axis according to their position on each chromosome with the -log 10 association p-value on the yaxis. The upper solid horizontal line indicates the threshold for genome-wide significance. Genome-wide significant loci are colored in blue. The plot is truncated on the y-axis to 20. The quantile-quantile plot shows the observed p-values on the y-axis against the expected p-values under no association on the x-axis.

Supplementary Figure 6: Flow chart of the study's design.
Loci significantly associated with circulating IGF-I and IGFBP-3 concentrations at each analysis stage.         "-": effect allele associated with lower IGF-I levels.
Resuts for 13 independent SNPs defined based on LD (settings r2>0.01, 1 Mb distance) are indicated by bold text.