Lessons from genome-wide association studies in venous thrombosis
Pierre-Emmanuel Morange, Laboratory of Haematology, CHU Timone, 246, rue Saint-Pierre, 13385 Marseille Cedex 05, France.
Tel.: +33 4 91 38 60 49; fax: +33 4 91 94 23 32.
Summary. From the first genome wide association studies (GWAS) conducted on age-related macular degeneration back in 2005 until now, hundreds of studies have applied this strategy to identify novel genetic loci associated with hundreds of human diseases and related quantitative risk factors. While the GWAS revolution has just started to shift towards the next generation sequencing’s burst, it is important to illustrate how the genetics research in venous thrombosis has benefit from the GWAS paradigm.
The research landscape in the field of human disease genetics has been completely revolutionised over the last decade. The revolution started with the achievement of the Human Genome and the HapMap projects, the former cataloguing all common single nucleotide polymorphisms (SNPs), i.e. with minor allele frequency (MAF) > 0.05, of the genome, the latter the extent of their linkage disequilibrium (LD) [1,2] and with the concomitant development of DNA chips allowing the genotyping of hundreds of thousands of SNPs covering the whole genome. Genetic research then entered the era of genome-wide association studies (GWAS)  that consist of testing the association of a huge number of SNPs with a phenotype in studies with large sample size through agnostic searches independent of the traditional biology. This strategy has been impressively successful with about 2000 new loci (end of 2010) found associated with human diseases and their quantitative risk factors (http://www.genome.gov/gwastudies/), and successes were observed for both common complex and rare familial forms of human diseases. Venous thrombosis (VT), including deep vein thrombosis (DVT) and pulmonary embolism (PE), was no exception and several novel susceptibility genes for VT have been recently identified by studies adopting GWAS-based research strategies. This report reviews these recent findings and summarising potential research paths that might be explored.
Known genetic risk factors for VT before entering the GWAS era
Until the end of the 20th century, classical approaches for dissecting the genetic component of human diseases were association and linkage studies. Compared to the number of studies performed, successes in VT were relatively modest despite some notable discoveries.
Well-established susceptibility genes for VT are SERPINC1, PROC, PROS1, F5, F2, ABO and FGG  (Table 1). The first three code for natural coagulation inhibitors, antithrombin, protein C (PC) and protein S (PS), respectively, and harbour multiple and private mutations (i.e. present in < 0.001 of the population) responsible for deficiencies in their associated proteins. Association of these deficiencies with VT risk dates back more than 25 years [5–7]. These deficiencies are relatively rare, affecting < 1% of the general population, and increase VT risk by approximately 10 in heterozygous carriers.
Table 1. Known susceptibility genes for VT before the GWAS era
|ABO||[O,A2] vs. [A1,B]||0.30||1.50|| FVIII, VWF|||
|F5||rs6025||G/A||0.05||3.00||Resistance to activated protein C|||
|FGG||rs2066865||C/T||0.25||1.47||↓ Fibrinogen γ’|||
|PROC||Multiple private mutations||∼ 10||Protein C deficiency|||
|PROS1|| ||Protein S deficiency|||
|SERPINC1|| ||Antithrombin deficiency|||
Conversely, the known genetic contribution of the latter four genes, F5, F2, ABO and FGG, on VT susceptibility could be attributable to more frequent SNPs. For F5, the culprit polymorphism is the rs6025, known as FV Leiden. This arginine to glutamine substitution at amino acid 506 of the protein (R506Q) leads to resistance to activated protein C . The Q506 allele with frequency of approximately 5% in Caucasians is associated with an increased risk of approximately 3 in heterozygous carriers . A slightly lower risk (approximately 2.5) is associated with the rs1799963-A allele of the F2 gene, with frequency of approximately 2% in the general population . This polymorphism, generally referred to F2 G20210A, is located in the 3′UTR region of the gene and is associated with elevated plasma prothrombin (FII) levels . The higher FII concentrations arise through more efficient processing of the 20210A containing pre-mRNA [11,12]. More frequent (approximately 30%) are the ABO blood groups, A1 and B, associated with increased risk of VT . However, the associated odds ratio (OR) for VT is even smaller ranging between 1.5 and 2. This association is partially explained by higher concentrations of the coagulation factors von Willebrand factor and factor VIII, presumably because of decreased clearance, but the exact mechanisms relating ABO to VT is not completely characterised (see below). The last SNP that was robustly associated with VT prior to the GWAS revolution is the FGG rs2066865 located in the 3′UTR of the gene. The rs2066865-allele with frequency of approximately 25% is associated with an increased OR of approximately 1.4 [14,15]. Several arguments suggest that, by affecting a polyadenation site, this SNP modulates the levels of fibrinogen γ’, a spliced isoform of fibrinogen γ, and then the risk of VT [16,17].
Even though no recent work has been performed to correctly estimate the percentage of idiopathic VT (i.e. without any known environmental origin) that is explained by these genetic factors, it is widely accepted that additional genetic factors are still to be identified and the emergence of the GWAS strategy brought great expectations for identifying them. These genes could also contribute to the incomplete penetrance of known genetic variants and the widespread clinical heterogeneity of the disease.
Novel susceptibility genes for VT discovered through GWAS approaches
Contribution from association studies on VT
The first work related to VT that rode the wave of GWAS was that of Bezemer et al.  who launched a large-scale association study on DVT concentrating on approximately 200 000 SNPs, mainly non-synonymous with minor allele frequency (MAF) > 5%, located within known genes. By use of a multi-stage strategy relying on both pooled and individual DNA analyses performed in the LETS and MEGA studies , two novel susceptibility loci were identified:
GP6: In this work, the rs1613662-G allele was found associated with a decreased OR for DVT of 0.87 . This association was further observed in three independent samples of French origin  and in an American population . Rs1613662 refers to an A/G substitution in amino-acid 219 located in exon 4 of the GP6 gene. It results in a p serine to proline substitution and identifies two common isoforms, GP6a and GP6b, with frequencies 0.85 and 0.15, respectively . GP6 encodes the receptor glycoprotein (GP) VI that has a major role in collagen-induced platelet signalling. Very recently, the rs1613662-G was found associated with reduced platelet function including decreased cross-linked collagen-related peptide (CRP-XL)-induced P-selectin expression and CRP-XL induced platelet aggregation .
F11: Bezemer et al. also observed that C allele at the CYP4V2 rs13146272 (A/C) was less frequent in DVT patients than in controls (0.31 vs. 0.36), an association further confirmed in both French and American populations [15,19]. It was tempting to hypothesise that this polymorphism could be functional as it results in a glutamine to lysine (Q259K) substitution. However, CYP4V2 maps close to the F11 gene coding for FXI, an obvious candidate for VT susceptibility  and further works showed that the association of rs13146272 with DVT was due to its linkage disequilibrium with two F11 SNPs, rs2289252 and rs2036914, the latter two acting additively to influence DVT risk through a modulation of FXI levels . The allelic ORs for DVT were about approximately 1.35 for both F11 SNPs.
The last result of this pioneering work was the identification of a common DVT-associated SNP, rs2227589, in the SERPINC1 gene previously discussed above. Even though this association was not observed in all cohorts that tried to replicate the finding [15,19], the risk allele has been associated with mildly reduced anticoagulant activity . The SERPINC1 gene is not a novel locus, but this would suggest that, in addition to rare mutations, this gene could also harbour common susceptibility alleles.
A second large-scale association study on VT risk has been conducted . Due to the lack of power related to the use of stringent statistical thresholds and moderate sample sizes (only 418 VT cases compared to 1228 healthy subjects all of French origin), no new susceptibility loci for VT was detected. In this work, where about 300 000 SNPs with MAF > 0.05 spread over the whole genome were studied, the strongest associations were observed at two known loci, the FV and ABO genes. However, the application of a multi-stage strategy that investigated in different replication cohorts (MARTHA, FARIVE and MEGA) all SNPs that did not reach statistical genome-wide significance (approximately P < 10−7) but that were nevertheless quite highly significant (P < 10−5) in the original GWAS, resulted in the identification of one new susceptibility locus on chromosome 6p24.1 . More precisely, the rs169713-C allele was found more frequently in this collection of approximately 6000 VT cases than in approximately 7000 healthy individuals (0.24 vs. 0.21) and associated with an increased OR for VT of 1.20. Rs169713 lies approximately 100 kb downstream of the HIVEP1 gene, a gene about which little is known except that it belongs to a family of genes participating in the transcriptional regulation of a variety of inflammatory genes. Functional studies are on-going to characterise the role of this genomic region in the susceptibility to VT.
Contribution from association studies on quantitative intermediate phenotypes
It has been known for a long time that investigating the genetic determinants of quantitative biomarkers suspected to be involved in the physiopathology of a disease could aid in identification of disease-associated genes. The underlying hypothesis is that polymorphisms associated with increased (decreased) levels of the quantitative risk factor should be associated with increased (decreased) risk of the disease. The application of such a strategy to results obtained from GWAS on biological phenotypes considered as biomarkers for VT also successfully contributed to the identification of novel susceptibility loci for the disease.
Contribution from studies on activated partial thromboplastin time (aPTT)
Shortened aPTT levels have been shown to be a reliable predictor of VT . Three new SNPs were reported to influence aPTT levels in a GWAS carried out on a sample of 1477 healthy individuals . These three SNPs, rs27431672 (F12), rs9898 (HRG) and rs710446 (KNG1), were then tested for association with VT in a case–control study of 1,542 VT patients and 1,110 healthy individuals, all of French origin . Among the three tested SNPs, only rs710446 was associated with VT risk, and the association was also observed in an independent French sample of 590 controls and 596 patients . The rs710446-C allele, originally found in the GWAS associated with decreased aPTT levels, was associated with increased risk of VT, OR approximately 1.20, an effect compatible with that expected from traditional biology relating aPTT to VT. Rs710446 refers to the non-synonymous Ile581Thr variant of the KNG1 gene encoding high molecular weight kininogen (HK), another obvious candidate for VT physiopathology. HK plays an important role in blood coagulation by positioning prekallikrein and FXI near factor XII . In addition, KNG1 knockout mice demonstrate prolonged aPTT and delayed arterial thrombosis . Moreover, antibodies against mouse FXI that directly interfer with the FXI-HK interaction prevented arterial occlusion induced by FeCl3 to a similar degree as complete FXI deficiency . Research is on-going to test whether the observed association with VT could be mediated by FXI levels.
Contribution from studies on PS related phenotypes
Due to the importance of PS in the susceptibility to VT (see above), any SNP contributing to the plasma variability of PS or its cofactors could be good candidate for VT risk. PS circulates in plasma either in a free form or complexed with the C4b-binding protein (C4BP), a heterodimeric molecule existing as three isoforms α7β1, α6β1 and α7β0. A GWAS was conducted in 352 individuals as part of the GAIT study to identify SNPs that could influence plasma levels of PS or C4BP isoforms . None of the approximately 280 000 SNPs tested were strongly associated with PS levels but three, in strong LD with each other, were found associated with plasma levels of α7β0. One of these SNPs, rs3813948, was further found to explain 11% of C4BPA monocyte mRNA expression in the Gutenberg Heat Study . The rs3813948-C allele associated with increased levels of both α7β0 and C4BPA expression was further found associated with increased risk of VT in the MARTHA and FARIVE studies gathering 1706 VT cases and 1739 controls , with corresponding OR for VT of approximately 1.20. Interestingly, this SNP was not associated with PS levels (free or total) or with C4BP expression designating the circulating form of the C4BP unable to bind PS as a new marker involved in VT susceptibility and strengthening the previously noted possibility that C4BP can be independently active in the coagulation pathway . C4BP belongs to complement inactivator proteins which are thought to be involved in immune response and inflammation, and this could favour a role of inflammation in VT-related mechanisms .
Contribution from studies on von Willebrand factor (VWF)
In a GWAS project carried out in approximately 23 000 participants on behalf of the CHARGE consortium , eight genes stood out that may regulate VWF plasma levels, elevated levels of which are known as a risk factor for arterial and venous thrombosis . Two were genes already known to be associated with VWF levels, ABO and VWF. The association of the former with VT risk was discussed above while that with the latter had not been previously observed. The VWF polymorphism showing the strongest association with VWF levels was the rs1063856, a non- synonymous Thr789Ala substitution, was further investigated in two case–control studies for VT, MEGA and HVH . In both studies gathering 5123 VT cases and 5569 controls, the rs1063856-C allele that had been found associated with increased VWF levels was associated with increased OR for VT of approximately 1.15. The rs1063856 is located in VWF exon 18 at a site encoding for the D’ domain, which is involved in binding of FVIII. It could be hypothesised that the nucleotide change at amino acid 789 increases the efficiency with which the VWF molecule transports or releases FVIII into circulation, thereby increasing the risk of VT.
The six other genes identified in the CHARGE consortium and unsuspected from previous works were STXBP5 via the rs9390459, SCARA5 (rs2726953), STAB 2 (rs4981022), STX2 (rs7978987), TC2N (rs10133762) and CLEC4M (rs868875). In the MEGA and HVH studies, the rs1039084 serving as a proxy for the STXBP5 rs9390459 was further found associated with VT, the rs1039084-G allele being associated with decreased risk of the disease, OR approximately 0.90 . A similar trend was observed in the MARTHA study  for the rs9390459-A allele, OR approximately 0.93, but it did not reach statistical significance. This lack of significance could be due to the under-powered nature of the MARTHA study, which relied on a sample of 1,150 VT cases and 801 controls. Conversely, the TC2N rs1884841 serving as a proxy for the rs10133762 was associated with VT in the French GWAS sample, MARTHA and FARIVE  while no association was observed in the HVH study . In the three French collections gathering 2163 VT patients and 2617 controls, the rs1884841-T allele with frequency approximately 0.44 was associated with an increased OR for VT of 1.27, this allele corresponding to that associated with increased VWF levels according to CHARGE results . TC2N codes for the Tandem C2 domains nuclear protein whose relation with VT remains to be understood.
No association with VT was observed either in HVH or MARTHA for STX2 rs7978987, CLEC4M rs868875, SCARA5 rs2726953 and STAB 2 rs4981022. With regard to STAB 2, it is noteworthy that the gene is located within a linkage signal detected in a pedigree analysis on VWF levels  and that another SNP, rs1593812, lying in the promoter region of the gene but not in LD with the rs4981022, showed some suggestive evidence for association with VT in the French GWAS study . In-depth investigation of the genetic variability at the STAB 2 locus would be warranted.
Additional information on VT-linked mechanisms brought by GWAS approaches
There have been other GWAS that were performed on quantitative traits related to the pathophysiology of VT in which identified SNPs have not yet been well investigated in relation to the risk of VT. However, their identification may provide new insights in the complex pathways involved in the biological mechanisms of the disease and may generate new hypotheses to be further assessed. A few examples are discussed below.
In addition to PC deficiency, which is discussed above and is responsible for some familial VT cases, low level PC has also been shown to be associated with risk of VT in the general population . A GWAS on PC concentrations was performed in the ARIC Study composed of approximately 8000 individuals and identified five independent loci associated with plasma levels of PC . The strongest association was observed at the PROCR locus encoding for the endogenous protein C receptor (EPCR) protein. In this study, the PROCR polymorphism showing the higher association with PC levels was the rs867186 that explained about 10% of their variability. This association was consistent with a previous report in which rs867186 was significantly associated with plasma PC in sample of 336 participants . Interestingly, rs867186 was also associated with plasma FVII antigen and activity in two other studies [36,44]. Rs867186 results in a serine to glycine substitution at amino acid 219 located in the transmembrane domain of the EPCR protein, the Gly219 allele renders the receptor more sensitive to cleavage and also leads to a truncated mRNA through alternative splicing [45,46]. The Ser219Gly substitution was shown to explain 75% of the variability of plasma soluble EPCR , increased levels of which were found associated with increased risk of VT [48,49]. However, the direct association of this variant with VT risk is still debated . The additional PC-associated loci identified in the ARIC study were EDEM2, BAZ1B, PROC and GCKR, and their genetic variability in relation to VT deserves further investigation. The GCKR gene has very recently been identified in a GWAS project on more than 80 000 subjects as a novel locus influencing C-reactive protein levels . This association adds further arguments in favour of a tight link between inflammation and coagulation.
Plasminogen activator inhibitor-1 (PAI-1)  and mean platelet volume (MPV)  are two additional phenotypes suspected to play a modest role in venous thrombosis and that have undergone a GWAS. Even if common SNPs in SERPINE1, the main locus controlling PAI-1 levels, hardly contribute to VT susceptibility, one cannot exclude that SNPs in other loci involved in PAI-1 regulation and identified in GWAS , such as HABP2, could be associated with the disease. Similarly, ARHGEF3, TAOK1 and WDR66 recently identified as loci associated with MPV [55,56] could be good candidates whose genetic variability deserves to be further assessed in relation to VT risk.
New biological mechanistic information can also be obtained from GWAS, not by identifying a novel susceptibility locus to a given phenotype, but by identifying a locus shared by several GWAS. The GCKR locus mentioned above is an example, and ABO is another important one. In addition to its association with VT risk and VWF levels outlined above, this locus emerged from GWASes on lipids, inflammatory markers, type 2 diabetes and coronary atherosclerosis . ABO(H) blood group carbohydrate structures are expressed on a wide variety of human tissues including platelet and vascular endothelium . But in spite of the association studies, the exact mechanism(s) linking all these phenotypes are still obscure.
What to do now with findings from GWAS?
All novel SNPs discussed in this report had MAF > 0.05 and confer a modest increase in the risk of VT with corresponding OR ranging between 1.10 and 1.35 (Table 2). Some of these SNPs could be functional and others may only be markers in LD with functional variant(s) that are still to be identified. Even though no recent family studies have been performed with these SNPs, it is unlikely that the risk alleles identified so far explain a major proportion of the familial risk of VT and of the clinical variability, as observed for most other human diseases investigated through GWAS. Risk factors with this level of relative risk generally do a poor job of distinguishing people who will develop a common disease from those who will not. Nevertheless, in the context of arterial thrombosis, there have been some attempts to develop genetic scores based on common SNPs identified by GWAS for predicting the occurrence of the disease , and it would be interesting, at least from an intellectual standpoint, to assess the validity of a similar score derived from the SNPs listed in Table 2.
Table 2. New variants associated with VT identified by GWAS approaches
|GP6||rs1613662||A/G||0.82||1.15|| Platelet activation and aggregation|||
It is also very important to emphasise that the additional common VT-associated SNPs that are yet to be identified would likely have smaller genetic effects and smaller MAF than those detected thus far. As a consequence, joint efforts associating much larger GWAS studies would be required to identify them. To illustrate this; to robustly identify ten novel SNPs associated with an OR for coronary artery disease of approximately 1.12, more than 140 000 individuals had to be studied . We are far from reaching such sample size in the field of venous thrombosis stressing the need for consortia collaboration similar to those set up for arterial thrombosis.
We have also highlighted that a gene harbouring a rare mutation responsible for strong familial VT cases could carry common SNPs associated with mild effect on VT. The inverse could also apply. All novel VT-associated loci discussed here could carry yet unidentified mutations associated with stronger genetic effects, making these loci good candidates for deep sequencing of their genomic regions. The recent sequencing of the F9 gene illustrates the interest of such deep-sequencing strategy as it permitted the identification of a rare mutation (R338L) responsible for high levels of FIX activity and familial cases of early-onset VT .
All elements collected so far on the genetic susceptibility to VT strongly confirm, if need be, that the disease is complex where both multiple common SNPs with modest effect and rare variants with stronger impact interfere. However, deep sequencing is not a panacea and there is a panel of alternative strategies that could be pursued to accurately disentangle the genetic architecture underlying susceptibility to VT. These include the study of copy-number variations, mRNA and miRNA expression, DNA methylation and histone modifications. The recent review on this topic could provide initial reading for those interested in these themes .
Disclosure of Conflict of Interests
The authors state that they have no conflicts of interest.