Ischemic Stroke Is Associated with the ABO Locus: The EuroCLOT Study

Objective End-stage coagulation and the structure/function of fibrin are implicated in the pathogenesis of ischemic stroke. We explored whether genetic variants associated with end-stage coagulation in healthy REFVIDunteers account for the genetic predisposition to ischemic stroke and examined their influence on stroke subtype. Methods Common genetic variants identified through genome-wide association studies of coagulation factors and fibrin structure/function in healthy twins (n = 2,100, Stage 1) were examined in ischemic stroke (n = 4,200 cases) using 2 independent samples of European ancestry (Stage 2). A third clinical collection having stroke subtyping (total 8,900 cases, 55,000 controls) was used for replication (Stage 3). Results Stage 1 identified 524 single nucleotide polymorphisms (SNPs) from 23 linkage disequilibrium blocks having significant association (p < 5 × 10–8) with 1 or more coagulation/fibrin phenotypes. The most striking associations included SNP rs5985 with factor XIII activity (p = 2.6 × 10–186), rs10665 with FVII (p = 2.4 × 10–47), and rs505922 in the ABO gene with both von Willebrand factor (p = 4.7 × 10–57) and factor VIII (p = 1.2 × 10–36). In Stage 2, the 23 independent SNPs were examined in stroke cases/noncases using MOnica Risk, Genetics, Archiving and Monograph (MORGAM) and Wellcome Trust Case Control Consortium 2 collections. SNP rs505922 was nominally associated with ischemic stroke (odds ratio = 0.94, 95% confidence interval = 0.88–0.99, p = 0.023). Independent replication in Meta-Stroke confirmed the rs505922 association with stroke, beta (standard error, SE) = 0.066 (0.02), p = 0.001, a finding specific to large-vessel and cardioembolic stroke (p = 0.001 and p = < 0.001, respectively) but not seen with small-vessel stroke (p = 0.811). Interpretation ABO gene variants are associated with large-vessel and cardioembolic stroke but not small-vessel disease. This work sheds light on the different pathogenic mechanisms underpinning stroke subtype. Ann Neurol 2013

I schemic stroke is among the leading causes of death and disability in high-income countries. 1 EuroCLOT is a European Union-funded multicenter study established to identify the genetic variants contributing to end-stage coagulation, as a means of exploring whether the same variants contribute to risk of ischemic stroke. It is known that genetic factors account for approximately 60% of the risk of thrombosis, 2 and studies have demonstrated the influence of genetic factors on the individual components of coagulation and fibrinolysis. Furthermore, ex vivo measures of fibrin structure and fibrinolysis have been shown to be heritable. 3 The nature of the structure and function of fibrin has been shown to influence clot behavior, and earlier work by the EuroCLOT consortium has demonstrated heritability of fibrin clot phenotypes measured by a high-throughput turbidimetric assay and several regions of linkage. 4 The goal of this study was to extend these observations by using the genome-wide association (GWA) approach to identify common genetic loci associated with coagulation phenotypes and to determine whether associated loci were further associated with the clinically important phenotype ischemic stroke and its different subtypes. GWA studies have identified common genetic loci of small effect associated with clinical phenotypes such as coronary artery disease. 5 The GWA method allows an agnostic study of variation within the genome, unbiased by prior knowledge of the cellular pathways involved or the use of candidate genes, and has been successful in finding hundreds of gene loci to date. 6 The overall aim was to determine whether genetic variants associated with coagulation and fibrin structure function were risk factors for ischemic stroke and if so whether such associations differed between stroke subtypes.

Subjects and Methods
We used a 3-stage study design to identify common variants influencing coagulation and fibrin structure/function in the normal population and then tested genome-wide significant independent single nucleotide polymorphisms (SNPs) for association with stroke in subjects of Northern European extraction (Fig 1).
To study the broad range of hemostatic variables contributing to end-stage coagulation, GWA studies of fibrin structure/function ex vivo, fibrin turnover (D-dimer) in vivo, and individual hemostatic components were performed in a healthy volunteer cohort of twins (Stage 1). In Stage 2, those variants found to be independently associated with coagulation or fibrin structure/function were assessed as risk factors for ischemic stroke in cases and controls. In Stage 3, the top 4 SNPs from the meta-analysis of ischemic stroke were examined for replication in a third clinical collection of stroke having information on whether stroke resulted from occlusion of large-vessel, small-vessel, or cardiac emboli. Detailed methods are provided below. Written informed consent was obtained from participants in the study, and each individual study group obtained local ethics approval.

Phenotyping the Cohorts
TwinsUK. The subjects were obtained from the TwinsUK (TUK) registry (www.twinsuk.ac.uk) at King's College London, United Kingdom, which has been ascertained by a national media campaign. 7 For historical reasons, the majority of twin volunteers are female. TUK subjects have been shown to be representative of the wider general populations for genetic and lifestyle factors associated with a variety of traits. 8 TUK subjects were phenotyped for fibrin structure/function, D-dimer, and hemostatic factors, according to methods described in detail elsewhere. [9][10][11][12] In brief, fibrin structure/function was assessed using a turbidimetric assay, whereas D-dimer (as a measure of in vivo fibrin turnover), coagulation factors (F) VII, VIII, FXII, FXIII A and B subunits (FXIIIA, FXIIIIB), prothrombin, and von Willebrand Factor (vWF) were quantified by enzymelinked immunosorbent assay, and fibrinogen, FVII, and FXIII by functional activity assays.
The MOnica Risk, Genetics, Archiving and Monograph (MORGAM) Cohort. The cohorts of the MORGAM project consist of the respondents of representative adult population samples. 13 This study includes cohorts from a variety of centers, including Finland (FINRISK, ATBC), France (Lille, Strasbourg, Toulouse), Italy (Brianza), Northern Sweden, and Northern Ireland (Belfast) as described at http://www.ktl.fi/publications/morgam/cohorts. The participants were examined and DNA was collected at baseline, and they were followed up for stroke and acute coronary events. Genotyping was carried out in a case-cohort setting. 14 In MORGAM cohorts, the end-point used was the subject presenting with first ischemic stroke. For some events the diagnosis was based on validation, and for some on the clinical or death certificate diagnosis (International Classification of Diseases [ICD]-9 codes 433 or 434, or ICD-10 code I63).
Wellcome Trust Case Control Consortium 2. The Wellcome Trust Case Control Consortium 2 (WTCCC2) ischemic stroke study comprises ischemic stroke cases recruited from 3 centers in the United Kingdom (St George's London, Oxford, and Edinburgh) and 1 center in Munich, Germany. In all cases, ischemic stroke was defined as a focal neurological deficit lasting >24 hours; in 1 cohort (St George's), cases of transient ischemic attack with associated recent brain infraction were also included. Cerebral infraction was confirmed on brain imaging with computed tomography (CT) or magnetic resonance (MR) imaging, which was performed in 100% of cases, and extensive phenotyping was performed to allow stroke subtyping using a modified TOAST classification. 15 Full details of populations and investigation performed have been previously published. 16 Imaging of the cerebral arteries using carotid and vertebral duplex ultrasound and/or MR angiography or CT angiography was performed in >95%, echocardiography in 59.7%. Controls for the UK cases were the shared WTCCC2 controls drawn from the National Blood Service or the 1958 Birth Cohort Study (http://www.b58cgene.sgul.ac.uk). German controls were from the population-based KORAgen study (http://www.helmholtz-muenchen.de/en/kora-en/kora-homepage/index.html). This study group was used primarily in Stage 2 but also for subgroup analysis in Stage 3.
MetaStroke. MetaStroke is a project of the International Stroke Genetics Consortium and comprises ischemic stroke cases whose DNA has been collected and undergone GWA scan, recruited from centers in Europe ( Ischemic stroke was defined clinically as a focal neurological deficit lasting >24 hours. In almost all case-control studies, a high level of brain imaging and extensive phenotyping was performed, although this was less detailed in some of the prospective studies. In those studies with adequate investigations to allow stroke subtyping, this was performed using a modified TOAST classification. 15 Controls were collected by the individual groups.

Genotyping and Within-Cohort Analysis
TUK. Genotyping was performed in 3 different genotypic batches using Human Hap 300 k Duo and Human Hap610 Quad array (Illumina, San Diego, CA). Genotyping results from the different arrays were collated and quality control was performed as described previously, 17 including retention of those SNPs with sufficiently high genotyping rates (95% or above) and Hardy-Weinberg equilibrium (p > 0.0001). Imputation of nongenotyped SNPs was performed to HapMap2 Caucasian population haplotypes using IMPUTE version 2. 18 Population substructure and admixture was excluded in TUK using Eigenvector analysis.
MORGAM. Four SNPS (rs10665, rs2022309, rs5985, and rs651007) were genotyped at the National Institute for Health and Welfare in Finland. Several sample-and plate-specific quality control measures were implemented to minimize errors, and in addition genotyping quality was assessed from 5% blind duplicate samples in each 96-well plate. For 234 samples with low DNA yield, DNA was amplified and genotyped as previously described. 19 Genotyping was performed using the MassARRAY System and iPLEX Gold chemistry (Sequenom, San Diego, CA) with standard protocol. Genotype clusters were manually reviewed using Typer 4.0 software (Sequenom), and genotype calls were corrected where necessary. Genotyping success rate was >95% for all but 1 SNP (rs2022309, 91.3%), with an average success rate of 95.7%. No discrepancies were identified among a total of 1,256 successful blind duplicate genotype pairs. Cox regression analysis adapted for the casecohort data was used to assess the association between the genotypes and ischemic stroke in the MORGAM cohorts, assuming an additive genotypic effect. The analysis was stratified by cohort and sex.
WTCCC2. Stroke cases were genotyped using the Illumina 660Q platform. Shared WTCCC2 controls were genotyped using the Illumina 1M Duo platform. German controls were genotyped using the Illumina 550 platform. Analysis of the UK and German cohorts was performed independently using PLINK 20 after quality control checking using a genotyping call rate of 98%, Hardy-Weinberg equilibrium call rate of 1e 220 , and checks for individual relatedness and population stratification. The UK and German cases were then meta-analyzed using METAL. 21 Samples were identified and removed if the genome-wide patterns of diversity differed from those of the collection at large, interpreting them as likely to be due to biases or artifacts. To do so, we used a Bayesian clustering approach to infer outlying individuals on the basis of call rate, heterozygosity, ancestry, and average probe intensity. We used a hidden Markov model to infer identity by descent along the genome and removed individuals iteratively to obtain a set with pair-wise identity by descent <5%. Samples were also removed if their inferred gender was discordant with the recorded gender or if <90% of the SNPs typed by Sequenom (Sequenom iPLEX assay for 4 gender SNPs) were concordant with the genome-wide data. For the EuroCLOT study, individual UK and German cohort and meta-analysis results were examined for the 23 available genotypes. This was performed for the phenotype of all ischemic stroke, together with the ischemic stroke subtypes of small-vessel disease, large-vessel disease, and cardioembolic stroke.
MetaStroke. Genotyping of the 13 MetaStroke contributors was performed independently by each group, using either Illumina or Affymetrix (Santa Clara, CA) platforms. Further details on cases and controls, genotyping, and imputation are available in Supplementary Table 3.

Statistical Analysis
Stage 1. We used multiple linear regression models to assess association between genotypes and phenotypes, using age as a covariate. The phenotypes examined in the TUK cohort were inverse-normal transformed to satisfy the assumption of normality of trait distribution of the linear models. Association analysis was carried out using Merlin 22 to control for family structure within the dataset. Independence of the effects conferred by SNPs in the same region was assessed by means of a backward stepwise regression analysis on the trait with which they were associated. This yielded 23 statistically independent significant SNPs (p < 5 3 10 28 ), associated with at least 1 quantitative outcome, which were taken forward for examination in the clinical groups at Stage 2. This stage of the analysis was performed using Stata for Windows version 10 (Stata-Corp, College Station, TX) with adjustment for the twins' relatedness.
Stage 2. The 23 independent SNPs remaining significant after multiple regression were carried through to investigation of association with ischemic stroke in MORGAM and WTCCC2. Results for each were meta-analyzed using a fixed effects inverse variance weighting implemented in METAL. 21 Stage 3. The 4 most significantly associated SNPs from Stage 2 were tested for association with overall ischemic stroke in Meta-Stroke. This international collaboration brings together GWA studies in ischemic stroke and (depending on SNP) includes 8,900 cases of ischemic stroke and 55,000 controls. In addition, subgroup analysis was possible (in MetaStroke and WTCCC2), as stroke events had been subphenotyped into large-vessel, small-vessel, and cardioembolic stroke by many of the contributing study groups, using the TOAST classification. 15 Within MetaStroke, samples were excluded from analysis  if they had call rates <80% or if reported gender was discordant with gender-specific markers. We removed pairs of samples showing concordance indicative of being duplicates. MetaStroke genotyping results were imputed to HapMap2 using MACH2. Where SNPs were imputed, r 2 values were >0.9. Four SNPs analyzed in these cohorts were meta-analyzed using a fixed effects model with the metan module in Stata version 10.

Results
The characteristics of the 2,128 twin participants are shown for TUK in Table 1 and Supplementary Table 1 (Stage 1). The mean age of the twins was 50.4 years, and the sample included 87 (4.4%) males. All were of North European descent. The sample size varied between assays; for clarity, the number of subjects is included in the tables for each phenotype. Details of the clinical collections of stroke cases and controls are shown in Table 2.
Stage 1 There were a number of strikingly strong genotype-phenotype associations identified in the TUK discovery group, and in total 524 associations were found having p < 5 3 10 28 . The 524 SNPs identified as significant genome-wide were mostly associated with coagulation factor phenotypes; there was 1 association with lag time to Of the 524 genome-wide significant associations identified in Stage 1, only independent SNPs are shown, and where associated with multiple traits, the most significant result is given (all Stage 1 results are listed in Supplementary Table 2). The effect size (Effect) and SE are expressed in terms of standard deviation for each phenotype. The probabilities for association (p) are from multivariate models using single SNP genotypes as independent variables and age as covariate. fibrin clot formation. After the interdependence of the SNPs had been established by backward stepwise regression analysis, 23 statistically independent SNPs were identified for examination in Stage 2 (shown in Table 3). The strongest signals were observed for SNP rs5985 in the F13A1 gene (encoding the FXIII A subunit) and FXIII activity (p 5 2.6 3 10 2186 ), followed by rs2731672 in the F12 gene associated with FXII concentration (encoding FXII; p 5 1.3 3 10 2115 ; Supplementary Table 2) and rs505922 in the ABO gene with vWF (p 5 4.7 3 10 257 ; see Table 3) and factor VIII (p 5 1.2 3 10 236 ; see Supplementary Table 2). Further coagulation-related phenotype-SNP associations were identified for rs10665 in F7/MCF2L and FVII clotting activity (p 5 2.4 3 10 247 ), and rs2022309 in the F3 gene (encoding tissue factor) with D-dimer concentration (p 5 4.3 3 10 28 ). A clear relationship was found between plasma FXIII A subunit and SNP rs12137359 (p 5 1.0 3 10 227 ) lying within the gene ZBTB41 (zinc finger and BTB domain containing 41, a highly conserved gene). However, this region on chromosome 1q is rich with candidate genes, and the SNP in question lies downstream of the CFH and CFHR1-5 genes (encoding complement factor H and CFH-related proteins 1 to 5) as well as F13B (encoding FXIII B subunit). There is also an association in this same region between rs800292 in the CFH gene and FXIIIA concentration (p 5 1.5 3 10 212 ).

Stage 2
In the MORGAM study, 6 of the 23 independent SNPs were available for lookup. None of the SNPs was significantly associated with ischemic stroke in this study group or in WTCCC2, although there was a suggestion of an effect for rs505922 in both MORGAM (T allele, beta 5 20.126, p 5 0.067) and WTCCC2 (T allele, beta 5 20.054, p 5 0.097). In the meta-analysis of WTCCC2 and MORGAM, SNP rs505922 in the ABO gene was associated with ischemic stroke (beta for T allele 5 20.067, p 5 0.023), with the major T allele being protective against stroke (Table 4).
Finally, to determine whether the genetic influence was acting through known risk factors, we performed subgroup analysis in the sample having this information, WTCCC2-Munich.
Adjusting for hypertension,

Discussion
Ischemic stroke accounts for considerable morbidity and mortality in Western countries, and treatment is limited at present. Our 3-stage study design optimized power for discovery of common genetic variants predisposing to ischemic stroke and stroke subtype. We performed a GWA study of intermediate coagulation and fibrinolytic phenotypes in healthy volunteers to examine the genetic determinants of end-stage coagulation and went on to study their influence on stroke and stroke subtype. We identified a large number of genetic variants associated with measures of coagulation factors, both functional and antigenic, some of which have been included in GWA metaanalyses of coagulation. 23 We confirmed that polymorphisms in the ABO gene were significantly associated with vWF and FVIII levels in healthy volunteers. Significant associations between SNPs in ABO and levels of vWF (rs505922, rs643434, rs8176743) and/or FVIII (rs505922, rs651007) were identified; we went on to demonstrate significant associations between ABO SNPs, in particular rs505922, and ischemic stroke (see forest plot in Fig 2). The associations between FVIII levels and the ABO gene variant rs505922, and between ABO and coronary disease, suggest a possible mechanism behind the welldocumented association between the ABO blood group and risk of vascular disorders. Non-O blood groups are at increased risk of stroke, 24 peripheral vascular disease, and myocardial infarction (MI) but not coronary artery disease (as assessed by angina, summarized by Wu et al 25 ), and this suggests that end-stage coagulation is the critical determinant. The association we found with FVIII levels may account for this. Recent GWA studies of MI have identified variants within the ABO gene that predispose to MI, 26,27 and this relationship appears to hold for common forms of thrombotic stroke; we found evidence of association in large-vessel and cardioembolic stroke, but there was no association with small-vessel disease. At present, none of the SNPs significantly associated with stroke is reported to be associated with known risk factors such as hypertension, hyperlipidemia,   diabetes, or propensity to drink alcohol or smoke. Subgroup analysis of the study group having risk factor information (WTCCC2-Munich) attenuated the strength of the association but did not suggest that the action of the genetic variation was predominantly though 1 of these risk factors. SNP rs505922 represents a single base pair change from T to C at position 135,139,050 and lies within the first intron of the ABO gene, although its haplotype block contains the promoter and introns 1 and 2. The minor allele frequency of this SNP is 36% in Northern Europeans. The ABO gene encodes a glycosyltransferase enzyme that catalyses the transfer of different carbohydrate groups onto the H antigen, thus forming A and B antigens of the ABO system. In support of a functional role in thrombosis (as opposed to atherosclerosis), the non-O blood group has also been shown to be a risk factor for venous thrombosis, 28 and in a large prospective study, pulmonary embolism. 29 A previous GWA study identified the same SNP, rs505922, to be associated with venous thromboembolism, 30 and a recent GWA study of blood metabolites suggests that this locus may act via an effect on fibrinogen phosphorylation. 31 Our results demonstrate that the association between ABO SNPs and ischemic stroke is limited to large-artery and cardioembolic stroke, but absent in small-vessel stroke. Thromboembolism plays an important role in pathogenesis of both cardioembolic and large-artery stroke, with thrombus arising in the heart and on larger-artery atherosclerotic plaques, respectively, which may break off and embolize into the cerebral circulation. In both stroke subtypes, cerebral emboli can be detected in the cerebral circulation using transcranial Doppler, 32 and antithromboembolic therapy reduces stroke risk. Recently, vWF inhibition has been shown to reduce cerebral thromboembolism in man, 33 a clinical observation that is in keeping with our findings. In contrast, the pathogenesis of small-artery stroke is unclear, and the role of thrombosis remains uncertain. 34 Our results suggest that thrombosis may be less important for this stroke subtype and explain why antithromboembolic medication is less effective. The subtype specificity we have identified is consistent with others' results; of 5 GWA studies identified and replicated, 2 have been studies of cardioembolic stroke, 35,36 2 of large-vessel stroke, 37 and 1 of small-vessel stroke. 38 Taken together, these data highlight that the clinical endpoint of ischemic stroke represents a varied phenotype likely resulting from multiple pathogenic mechanisms.
Other associations between SNPs and intermediate phenotypes included rs12137359 and FXIII activity and rs800292 and FXIIIA subunit levels. Both variants are found close to the gene encoding the FXIIIB subunit, which acts as a carrier protein for FXIIIA in the circulation and stabilizes FXIIIA to regulate activation; however, these SNPs were not associated with MI or ischemic stroke. We also identified associations between SNPs in the vicinity of the F7 gene and FVII:C, consistent with a number of studies that have previously identified relationships between variation in the structural genes for FVII and circulating levels. 39,40 No other SNPs significantly associated with coagulation intermediate phenotypes were significantly associated with ischemic stroke.
There are a number of limitations to this work. First, TUK is predominantly female in its composition, for historical reasons. Although TUK subjects are representative of the general population variation 8 and there is no evidence of an effect of gender on the ABO predisposition to cardiovascular disease, the associations identified in Stage 1 are pertinent to females from Northern Europe. Second, the clinical studies used for Stage 2 were heterogeneous in many respects. We decided that it was of overriding importance to obtain a large sample, so we combined prospective and cross-sectional studies. One of the main strengths of the study design was the use of multiple novel intermediate phenotypes, as well as having the power to investigate stroke subtypes. The Stage 3 study groups had differing methods of genotyping and imputation, but methods have been shown to be broadly comparable. 41 In conclusion, using end-stage coagulation intermediate traits in healthy volunteers, we identified 23 genome-wide independent coagulation-associated SNPs, which were investigated in a number of clinical collections of stroke. Genetic variant rs505922 in the ABO locus was found to be associated with ischemic stroke, and in particular the subtypes large-vessel and cardioembolic stroke, but not small-vessel disease. This SNP was highly associated with vWF and FVIII in the discovery phase, and this observation throws light on possible mechanisms underlying end-stage coagulation in cardiovascular disease. It seems that common genetic variants exert some of their influence on end-stage stroke through coagulation, and further work is needed to tease apart these complex networks of interactions. The identification of the ABO locus through its association with vWF and FVIII points the way for mechanistic work to understand better the role of these 2 coagulation factors in end-stage arterial thrombosis.

Acknowledgment
None of the funding bodies given below played any role in the design, writing, or decision to publish this article.
TwinsUK The Heart and Vascular Health study research was supported by NIH National Heart, Lung, and Blood Institute (NHLBI) grants R01 HL085251 and R01 HL073410.
The Atherosclerosis Risk in Communities Study was carried out as a collaborative study supported by NHLBI contracts (HHSN268201100005C, HHSN2682011000 06C, HHSN268201100007C, HHSN268201100008C, HHSN268201100009C, HHSN268201100010C, HHS N268201100011C, and HHSN268201100012C) and grants (R01HL087641, R01HL59367, and R01HL086694); National Human Genome Research Institute contract U01HG004402; and NIH contract HHSN268200625226C. Infrastructure was partly supported by grant number UL1RR025005, a component of the NIH and NIH Roadmap for Medical Research. Atherosclerosis Risk in Communities analyses performed as part of the MetaStroke project were supported by grant HL093029 to M.Fa. The authors thank the staff and participants of the Atherosclerosis Risk in Communities study for their important contributions.