Anders Hamsten, Atherosclerosis Research Unit, Center for Molecular Medicine, Building L8:03, Karolinska University Hospital Solna, S-171 76 Stockholm, Sweden. (fax: +46 8 311298; e-mail: email@example.com).
The genetic basis of coronary artery disease (CAD) is complex, and the fact that an alarmingly high proportion of reported associations between genetic variants and CAD are not replicated has generated uncertainty as to whether molecular genetics is ever going to deliver on the promises delivered in the late 1990s. However, during 2007, the first generation of large-scale genome-wide association studies using high-density, single nucleotide polymorphism genotyping arrays have revealed genetic variants that are robustly associated with CAD and CAD-related traits such as type 2 diabetes and obesity. In particular, a robust susceptibility locus for CAD has been identified on chromosome 9p21. Also, evidence has been obtained that multiple rare alleles with fairly strong phenotypic effects may contribute to the genetic heritability of CAD, in addition to common variants with a modest impact on risk. Furthermore, new mechanistic connections have been discovered between different common complex diseases including CAD. This review focuses on the challenges and recent advances of molecular genetics in dissecting the molecular pathophysiology of atherothrombosis and defining novel targets for treatment.
Thousands of novel genes and millions of gene variants that might influence human health have been revealed in the past several years. The human genome is estimated to contain more than 10 million nucleotide positions that show common variation between individuals in a population . Atherosclerosis and thrombosis, the two primary processes leading to clinically manifest coronary artery disease (CAD), involve several different cells, organs and distinct pathophysiological processes . It therefore comes as no surprise that the genetic basis of CAD is complex, but a paucity of reproducible molecular genetic findings across studies and populations has generated uncertainty as to whether contemporary molecular genetics is ever going to fulfill the promises delivered in the late 1990s . However, tempered by slow progress and initial failures, CAD genetics is now gathering momentum . During 2007, the first generation of genome-wide association studies using high-density, single nucleotide polymorphism (SNP) genotyping arrays have reported several genetic variants that are robustly associated with CAD [5–8] and CAD-related traits such as type 2 diabetes and obesity [9–13], and other approaches including comparative genomics and mapping of quantitative traits in rodent models have delivered interesting findings in the past few years. Nevertheless, it can be rightly claimed that molecular genetics has not yet revealed gene variants that are useful in the everyday management of persons at risk of or already having manifest CAD, and that the optimistic forecasts of the future of DNA testing  have not materialized. Against this background, we will review the challenges and recent advances of the molecular genetic approaches in understanding and predicting CAD. Our conclusion is that prediction and prognostication are not the primary targets of contemporary molecular genetics, which instead should be regarded as a powerful toolbox for dissecting the molecular pathophysiology of the atherothrombotic diseases and defining novel targets for treatment. Whether multilocus genotyping will ever be applicable to risk assessment, in our view, is not a key issue.
Genetic basis of susceptibility to CAD
Family history of atherothrombotic manifestations is associated with a substantially increased risk of CAD [14, 15], and the genetic heritability of CAD is estimated to be in the range of 40–60% . Currently, known risk factors for CAD do not appear to account for all the variance in risk of CAD, and the genetic effect persists even when corrected for risk factors known to have a genetic component (such as plasma lipoproteins and blood pressure) [14, 17, 18]. The likelihood is thus high that hitherto unknown susceptibility genes exist, that once identified, may implicate novel disease pathways and potential targets for preventative therapy.
The vast majority of instances of CAD are thought to be multifactorial and result from the effects of many genes, each with a relatively small effect, as well as from multifaceted interactions between heritable and environmental factors. However, several Mendelian disorders caused by rare mutations with a high impact on risk also contribute to CAD, the best-known examples being familial forms of hypercholesterolemia . Most sequence variation studied so far is in the form of common variants [minor allele frequency (MAF) >0.01], predominantly SNPs, and the common disease/common variant (CDCV) hypothesis  retains support as a model for the genetic architecture of CAD . Under this model, susceptibility to CAD results from the joint actions of multiple common variants, a significant proportion of which are shared by unrelated affected individuals. However, as will be discussed in greater detail later, there is evidence of multiple rare alleles with a large phenotypic effect, and the extreme alternative to the CDCV hypothesis is the multiple rare-variant hypothesis, also referred to as the genetic heterogeneity model . The latter prescribes that disease susceptibility is because of distinct genetic variants with low population frequencies (MAF < 0.01) in different individuals. For the time being, it seems reasonable to assume that the allelic spectrum of CAD comprises a complex mixture of allele frequencies and that hundreds of common and rare variants account for the genetic heritability of CAD .
An alarmingly high proportion of reported associations between genetic variants and complex diseases including CAD are not replicated [24, 25]. Many circumstances are likely to contribute to this fact [reviewed in 26–29]. In the first place, delineating the genetics of a common and multifactorial disease like CAD is a formidable task because susceptibility is heterogeneous, involving multiple genetic and environmental factors, genes of modest individual effects, gene-gene and gene-environment interactions. However, whilst acknowledging the challenges confronting replication, it should be equally realized that frequent deviations from the features characterizing a good association study [28, 30, 31] have contributed a lot to the problems of obtaining robust replication of initial association findings. Lack of statistical power along with bias (confounding, population substructure or stratification and measurement error) and phenotypic heterogeneity probably account for a significant proportion of failures in replication. In fact, the majority of genetic association studies have probably been underpowered. On the other hand, the general biomedical standards of statistical proof are not appropriate for handling the multiple-testing problem of genome-wide association studies, but consensus has not yet been reached. In most camps, however, the Bayesian paradigm, particularly use of empirical Bayes methods [32, 33] or calculation of the false discovery rate  are considered preferable to classical post hoc Bonferroni correction for multiple testing, which does not take proper account of the correlation between SNPs that are in linkage disequilibrium (LD). Undetected population stratification leading to spurious associations when allele frequencies vary across subpopulations is a major concern in case–control studies and requires genotyping of large random panels of SNPs for assessment and correction. Population-specific LD and conceptual as well as computational differences in the way haplotypes are inferred from unphased genotype data also contribute to the lack of reproducibility afflicting case–control studies. Imprecisely defined and heterogeneous clinical phenotypes represent a significant problem. For example, coronary plaque erosion without plaque rupture precipitating thrombosis is more frequently seen in younger individuals and women than in middle-aged or elderly men, and these lesions produce less severe luminal stenosis and contain fewer inflammatory cells than do ruptured plaques .
Gene-gene and gene-environment interactions for several reasons remain poorly characterized in relation to CAD. Epistasis designates an interaction between genes that renders the phenotypic effect of a specific allele dependent on which alleles are present at other loci. The progressive nature of atherosclerosis in which one stage depends on the previous, strongly indicates that gene-gene interactions are important and may explain some inconsistencies in genotype-CAD associations. On the other hand, environmental influences that may differ widely between populations are likely to induce epigenetic changes through effects on DNA methylation that affect transcriptional activity. Examples of gene-environment interactions indicated to influence risk of CAD include interactions between smoking and apolipoprotein ɛ2/ɛ3/ɛ4 genotype [36, 37] and between moderate alcohol consumption and alcohol dehydrogenase 3 γ1/γ2 genotype .
The completion of the human genome sequence  and the subsequent identification of its haplotype structure [40, 41] along with rapid advances in large-scale genotyping and gene expression technologies have greatly facilitated the discovery of susceptibility genes for all complex diseases. The main practical objective of the International HapMap project was to identify SNPs that because of their relationship to the haplotype structure would allow more efficient genotyping . In 2005, a public catalogue of more than 3.9 million validated SNPs existed , and information on the LD between them was at hand. This and other database resources, which are continuously updated, enable selection of a subset of ‘tagging’ markers as proxies for the entire validated SNP set taking the discontinuous ‘block-like’ structure of LD into account [44, 45]. Today, two systems exist for high density genome-wide genotyping; the GeneChip system [46, 47] and the BeadArray system [48, 49]. The last generation of dense genotyping arrays of >500000 SNPs covering the majority of the human genome as well as specific regions that have proved to be overrepresented in complex diseases are now available at increasingly affordable costs. Collectively, these advances have set the stage for genome-wide association studies of common diseases and complex traits.
The completion of the reference sequence of the human genome and the emergence of new technologies to detect genomic alterations have also demonstrated that large fragments of our genome can be deleted or duplicated, leading to changes in the copy number of genes and in gene regulation. This represents a substantial source of genomic variation [50–52], encompassing two to three times the number of nucleotides affected by SNPs  and accounting for more than 15% of the assembled human genome sequence . However, copy number variations (CNVs) seem to be associated with a significantly smaller proportion of the total genetic variation in gene expression than SNPs . It now needs to be explored to what extent CNVs influence the risk of complex diseases such as CAD.
The candidate gene approach
Until recently, most progress in understanding the genetic basis of CAD has come from studies of candidate genes, i.e. genes whose functions are known and suggest a likely role in CAD. Accordingly, most of the genetic variations identified during the first two decades of molecular genetic research in the CAD area influence established risk factors, such as plasma concentrations of lipoprotein fractions, blood pressure, blood glucose control, haemostasis and matrix remodelling (see [55–57] for comprehensive reviews). However, comparatively few polymorphisms in biological candidate genes have been consistently replicated for CAD , not least because much of the early work was carried out on too small a scale. Accordingly, meta-analyses have been required to demonstrate robust statistical significance. Considering the many challenges confronting molecular genetic studies of complex diseases discussed above, this is really not surprising. Also, high-throughput genotyping allowing investigation of a large number of genes has so far only been conducted in a limited number of candidate gene association studies [58, 59].
Genome-wide linkage studies
Genome-wide approaches are applied to identify novel susceptibility genes without the restrictions of our current limited biological insights. In this sense, they constitute hypothesis generating methods aimed at identifying chromosomal regions that might harbour genes influencing a certain trait. Before the recent emergence of high density SNP genotyping arrays, linkage analysis, also referred to as positional cloning, using families with multiple affected individuals and dense maps of highly polymorphic markers covering the human genome was considered the most useful route to detect susceptibility loci for CAD and other complex diseases. However, as early as 1996 it was realized that power to detect a given size of genetic effect and resolution of location in linkage studies is less than with association studies , and so study size and stringency of ascertainment criteria are critical. Most often sibs with precocious CAD have been compared to identify locations at which they share significantly more alleles identical by descent than would be expected from straightforward Mendelian segregation and chance alone. The logarithm of the odds (LOD score) is routinely calculated for each marker to estimate the probability of linkage between the marker and a putative susceptibility locus, using criteria proposed by Lander and Kruglyak . Fine mapping of the regions of linkage using additional markers then follows in combination with scrutiny of the corresponding gene maps to positionally identify the causal genes. However, success with this approach proved to be limited for CAD , which is at least partly because of studies being too small to detect genes of modest effects, particularly when using a binary phenotype. Very few susceptibility genes have ultimately been identified on the basis of linkage studies, and even replication of linkage to specific chromosomal regions has been scarce. The identification of ALOX5AP, the gene encoding the 5-lipoxygenase activating protein (FLAP), basically represents the one successful attempt in the CAD field in detecting a robust novel candidate from a linkage finding . Furthermore, amongst observed linkage regions on chromosomes 1, 2, 3, 13, 14, 16, 17 and X [63–70], only the one identified by the PROCARDIS consortium on chromosome 17  has been convincingly replicated, whereas the susceptibility gene contained in this chromosomal region remains to be discovered. One lesson from the genome-wide linkage studies thus is that there are no CAD loci with strong effects, a situation comparable with those of other complex diseases.
Genome-wide association studies
Genome-wide association studies that utilize large numbers of ‘tagging’ SNPs, markers selected on the basis of LD which capture most of the variation across the human genome, have now for reasons of statistical power leading to recruitment of large samples come to replace linkage studies. The challenge lies in attaining adequate power in the context of multiple hypothesis testing whilst minimizing the amount of genotyping required. Of note, to achieve convincing statistical support for disease association (statistical power of 80% and significance level of P < 10−6) for a susceptibility allele with a MAF of less than 0.1 and an effect corresponding to an odds ratio of less than 1.3, more than 10000 cases and 10000 controls would be required [23, 71]. However, calculations of sample size requirements based on single risk alleles and loci are overly conservative and biologically fairly uninteresting, considering the fact that multiple susceptibility alleles have been detected for most complex diseases. Current strategies include a two-stage discovery phase where a fairly relaxed threshold for ‘passing’ markers as positive is adopted in the evaluation of the initial screen. Markers that pass the threshold are then tested in an independent sample of larger size, which is seen as a replication study. Alternatively, the two stages are subjected to a joint analysis, an approach leading to an increase in power . It should be emphasized in this context that current strategies for analysis of genome-wide association studies tend to focus on a locus-by-locus approach, which fails to identify interactions between unlinked loci. However, analytical methods that target interactions between loci do exist, and they add information to single-locus searches and are computationally feasible and seem to be more powerful than traditional interaction analyses .
The first success in identifying a novel candidate gene for myocardial infarction (MI), the lymphotoxin-α (LTA) gene using the genome-wide association study design was reported in 2002 . 2007 has then seen numerous proofs of concept, that large-scale, genome-wide association studies can identify novel susceptibility loci. Thus in the past 12 months, systematic genome-wide searches have delivered a range of entirely novel susceptibility loci and some tentative candidate genes for CAD [5–8, Table 1] and other complex diseases such as type-2 diabetes and obesity [9–13]. The design of these studies is quite straightforward: the frequencies of a large number of SNPs genotyped on arrays are compared between cases and controls, and sites that show significant differences between groups are then validated in independent samples. The associated variants discovered so far are in some instances part of well-known pathways, in others the indicated regions contain either genes of unknown function or no annotated genes. As expected, effects are quite modest with odds ratios typically <1.5. However, the causal gene variants remain to be defined by re-sequencing and subsequent functional analyses. Accordingly, the strength of the genotype-phenotype associations may have been underestimated. Of note, strong associations between disease-associated SNPs and quantitative intermediate phenotypes such as established risk indicators have not been detected. This indicates that the genes harboured in the CAD susceptibility loci may operate through novel mechanisms. Also, considering the limited statistical power of the reported genome-wide scans, many more disease-causing loci remain to be discovered. However, success in this endeavour will require even larger studies. It should furthermore be emphasized that the value of the biological insight obtained through the identification of a novel locus is unlikely to depend on the strength of the genotype-phenotype association.
Table 1. Susceptibility loci indicated in four genome-wide association studies of CAD
SNPs on array
CAD, coronary artery disease; SNP, single nucleotide polymorphism.
aRegions showing moderate evidence of association.
The currently strongest and most robust susceptibility locus identified for CAD is located on chromosome 9p21 [5–8]. As the most tightly associated SNPs did not map with an annotated gene sequence, the neighbouring CDKN2A, CDNK2B and MTAP genes were initially suggested as candidate genes (Fig. 1). The same region has also been implicated in some [11–13] but not all [75, 76] genome-wide association studies of type 2 diabetes, however, involving different SNP markers. The PROCARDIS consortium subsequently performed a thorough genetic analysis of this CAD/type 2 diabetes susceptibility locus on chromosome 9 to dissect the pattern of association in the context of European haplotype diversity and establish whether known risk factors, particularly diabetes, influence the susceptibility effect for CAD . No evidence of heterogeneity of susceptibility to CAD was observed across the four populations examined, and the strong consistent association with CAD was found to be accounted for by a pair of high and low risk haplotypes constituting a ‘yin-yang’ haplotype pattern spanning at least 53 kb. No gene-environment interactions were apparent. Gender, age at onset of clinically manifest CAD, smoking history, obesity, self-reported history of diabetes and hypertension did not influence the association between the most informative single SNP (rs2891168) and CAD, nor were there any associations between SNP rs2891168 and plasma concentrations of LDL cholesterol, HDL cholesterol, lipoprotein(a), fibrinogen and homocysteine. Paradoxically, however, the high-CAD risk G-allele was associated with a significantly lower plasma triglyceride concentration. This is not unlikely to be a false-positive result, and it has not been recorded previously . Importantly, simultaneous tests of susceptibility to CAD and diabetes conferred by CAD- and type 2 diabetes-associated SNPs in this population indicated that these associations were independent of each other. Taken together, the observation of no association with established intermediate phenotypes for CAD suggests that the underlying gene(s) at the CAD susceptibility locus may operate through a novel mechanism.
An additional key finding of this study was the demonstration of collocation of the high-risk haplotype with ANRIL, a newly annotated, large antisense noncoding RNA gene , here shown to be expressed in tissues and cell types affected by atherosclerosis. ANRIL thus now identified as the prime candidate gene for the susceptibility locus on chromosome 9p21 is shown to be consistently associated with CAD in European and North American populations. At this stage, the function of ANRIL is virtually unknown, but noncoding RNAs are generally considered to be operating in the transcriptional control repertoire of the cell . Indeed, some experimental evidence of co-ordinated transcriptional regulation of ANRIL, CDKN2A and CDKN2B  exists, allowing the possibility that any ANRIL effects can be mediated by the CDKN2A and CDKN2B genes or by a regulatory region located elsewhere that is common to these genes (Fig. 1).
Interestingly, the rs10757278 G-allele, initially reported to be associated with CAD [5–8] has now also been discovered to relate to increased risk of both abdominal aortic aneurysm (AAA) and intracranial aneurysm . A likely interpretation of this finding is that the ‘tagged’ functional variant promotes abnormal vascular remodelling, a process common to CAD, AAA and intracranial aneurysm [81, 82].
Integrated use of rodent models
Quantitative trait locus (QTL) mapping in rodent models has recently proved to be an interesting route to identify novel potential susceptibility genes for CAD in humans. This not unexpected as less than 1% of mouse genes lack a human homologue . Derivatives of inbred strains such as congenic and recombinant inbred strains provide efficient means to identify QTLs and the underlying causative genes, which can subsequently be confirmed with knock-out mice [84, 85]. The fact that mouse and human genes are arranged syntenically allows cross-identification of genes of interest , and QTLs for atherosclerosis or atherosclerosis-related traits are located in homologous regions in mice and humans [87, 88]. First, work based on a mouse model of diet-induced atherosclerosis led to the positional identification of tumour necrosis superfamily member 4 (TNFSF4), encoding OX40 ligand, as a gene that influences atherosclerosis susceptibility in both mouse and man . Secondly, the mixed histocompatibility (MHC) class II transactivator (MHC2TA) gene, the first example of a gene with an effect on both MI and classical autoimmune disorders [rheumatoid arthritis (RA) and multiple sclerosis (MS)] was identified on the basis of fine mapping of a rat QTL regulating MHC class II expression on microglia . The finding by Mehrabian et al. that variants in the ALOX5 gene underlie differences in susceptibility to atherosclerosis between mouse strains  and the subsequent demonstration in humans of a relationship between ALOX5 gene polymorphisms and carotid artery intima-media thickness constitutes a similar example .
Because of the extensive knowledge of mouse genetics and the availability of inbred strains that differ in susceptibility to atherosclerosis, the mouse is a good animal model for the localization and identification of genes that are likely to be involved also in the atherosclerotic process in humans. Thus, using various inbred mouse strains, several atherosclerosis susceptibility loci have been identified, some of them have been delimited further, and the emerging genes can subsequently be explored as candidate genes for human CAD. However, it should be kept in mind that although some recent evidence is encouraging, genetically manipulated mouse models and inbred mouse strains cannot really be expected to be fully informative on the main clinical complication of CAD in humans, namely MI, which at least in men presupposes plaque rupture or plaque fissuring. The fact that the TNFSF4 gene has been implicated in atherosclerosis in female mice and in MI in women but not in men may thus convey important information on the pathophysiological role of OX40L molecule.
As chronic inflammation is a feature of most common multifactorial diseases including CAD, and it is reasonable to assume that some susceptibility genes and pathogenic mechanisms are shared between the different disease entities, rodent models for chronic inflammation can also be used to identify inflammatory genes that are likely to be involved also in the atherosclerotic process in humans. For many years, congenic rat strains rather than mice have been used to define QTLs for MS-like neuroinflammation and RA as the rat appears to offer models that are more similar to the corresponding disease in man than do mouse models.
Significance of inflammatory and immune regulatory genes
It is notable that many of the genes that have been implicated in CAD until now, either based on small- to intermediate-size candidate gene association studies or work originating from QTL mapping in rodent models, are components of the innate and adaptive immune systems. The innate immune system comprises nonspecific and phylogenetically ancient mechanisms forming the first line of defence against infection, whereas adaptive immunity is concerned with long-lasting defence and memory for challenges encountered repeatedly. Interactions between lipoproteins and immunity are central to atherosclerosis. LDL accumulation in the artery wall has multiple consequences, including complement deposition, toll-like receptor expression, T-cell activation, cytokine secretion, antimicrobial peptide production and generation of a local inflammatory state . The observation of inflammatory responses common to atherosclerosis and chronic infections, and the notion of a possible triggering role of infectious agents in plaque formation generated considerable interest in candidate genes involved in innate immunity. A range of association studies indicated the genes encoding CD14, the membrane-bound glycoprotein receptor for bacterial liposaccharide, and toll-like receptor 4, a co-receptor for CD14, as novel candidate genes for CAD, but not surprisingly, results have not been consistent across studies [94–97]. Other inflammatory genes that have surfaced in candidate gene association studies of atherosclerosis include the ones coding for arachidonate 5-lipoxygenase (ALOX5) and leukotriene A4 hydrolase (LTA4H), both being part of the leukotriene pathway [92, 98]. ALOX5, which is activated by FLAP catalyses the biosynthesis of the leukotriene A4 and LTA4H catalyses the rate-limiting step in LT B4 synthesis. Added to these can be the LTA gene encoding a member of the TNF ligand family, variants of which have been reported to be associated with MI  and galectin 2 (LGALS2), the protein product of which interacts with LTA . Further links to immunological pathways were provided by the TNFSF4 and MHC2TA genes that were derived from QTL studies in rodent models [89, 90]. The OX40 ligand – OX40 pathway leads to enhanced lymphocyte proliferation and survival . Antigen presentation to T-cells by MHC molecules is central to adaptive immune responses. Interestingly, the disease-related G-allele of the type III promoter of the MHC2TA gene was found to be associated with lower expression of MHC2TA and MHC class II transcripts after stimulation with interferon-γ, supposedly, result in reduced expression of MHC molecules .
Mechanistic connections between diseases uncovered
Interestingly, new mechanistic connections between distinct disease entities have also been discovered in the recent genome-wide association studies. For example, IL23R has been identified as a susceptibility gene in both Chron’s disease  and psoriasis . The protein tyrosine kinase, non-receptor type 22 (PTPN22) gene, the protein product of which is an intracellular tyrosine phosphatase setting the threshold for T-cell receptor signalling, and IL2RA are both associated with type 1 diabetes and RA . As inflammation is increasingly recognized as a key component of most, if not all complex diseases, this is really what would be expected. In particular, CAD and RA show several interesting links. Whereas inflammation plays a key role in CAD [2, 93], death from cardiovascular disease is increased two-to-three fold in patients with RA . However, although there is now proof of principle that susceptibility genes and pathogenic mechanisms are shared between cardiovascular and inflammatory diseases , the mechanisms behind these associations have not yet been analyzed in a systematic manner and new genomic approaches have so far not been used.
Awareness of context-specificity of genetic effects
As eloquently elaborated upon by Charles Sing et al., CAD develops as a result of complex interactions between a multitude of susceptibility genes and environmental influences that vary over time and that are integrated in the individual by dynamic regulatory networks at levels above the genome . As a consequence, different sets of genes constitute susceptibility genes in different individuals and the genetic architecture of CAD is population-specific. Whereas the effects of a limited number of genes may be invariant across populations and environmental strata, most are likely to be context-dependent, e.g. limited to a segment of the population defined by age, gender and various lifestyle characteristics such as smoking and exercise habits. Additional complexity is secondary to the fact that technologies allowing the same precision as DNA measurements are lacking for measuring both the internal and the external environment.
Also sex should be considered as an environmental factor that can modify the expression of genes and genetic variants through known differences in the hormonal milieu. In fact, there is evidence of sex-specific genetic architectures of a wide range of intermediate and clinical phenotypes , including human MI . This needs to be taken into account when searching for novel susceptibility genes underlying complex traits like CAD.
Evidence of ethnicity-specific disease risks has also been presented. As observed in a candidate gene association study of MI , ethnicity is at least occasionally one important constituent of the complex network of interacting factors determining the risk of CAD. Thus, although the frequency of ethnicity-specific effects remains poorly known, susceptibility genes need to be evaluated on a population-by-population basis. Also, multi-ethnic studies are likely to confer a broader view of disease aetiology.
The importance of context-specificity notwithstanding, replication of findings made in association studies remains an important way of discerning true positive results from artefacts or chance phenomena.
Zooming in on gene function and functional gene variants
Only rarely will the SNPs originally indicating a disease susceptibility locus prove to be functional. Exhaustive re-sequencing will thus be a prerequisite to identify potentially functional candidates for subsequent studies in vitro and in vivo. Expression profiling and genome-wide mapping studies have clearly demonstrated that strong heritable factors govern differences in gene expression levels in both mouse and man [106, 107]. This suggests that systematic search for genetic variants affecting gene regulation could lead to identification of alleles modifying susceptibility to CAD. Such regulatory polymorphisms are commonly located in gene promoter regions and function by altering gene transcription but can also be distributed in exons, introns and 5′ and 3′ untranslated regions and influence gene expression by modulating different mRNA properties (reviewed in ). Polymorphisms in protein-coding regions, on the other hand, appear to have lower MAFs than the genome in general and thus susceptibility variants causing nonsynonymous changes would be expected to contribute to the multiple rare variant model . Identifying a true regulatory variant is complicated by high linkage disequilibria amongst SNPs as well as by epigenetic mechanisms modulating gene expression.
In vitro approaches for detection of allele-specific expression of a transcript most often involve transient transfection assays in pre-existing animal or human cell lines. The possibility of trans-acting influences on allelic expression from the host cell and concerns about the relevance of the observed data for human tissues constitute obvious limitations for these assays. In vivo monitoring of allelic RNA transcripts in tissues or cells from individuals who are heterozygous for an informative marker has several advantages [108, 110]. Only cis-acting effects are detected, and even subtle differences are measurable. Drawbacks of the allelic imbalance approach include the limitation to heterozygous samples and sensitivity to epigenetic influences. The HaploChIP (haplotype-specific chromatin immunoprecipitation) assay based on isolation of transcriptionally active DNA fragments, instead measures the relative transcriptional activity of the alleles as a surrogate for relative allelic expression [111, 112]. In all, more refined tools need to be developed for the downstream functional analysis of the many genes and genetic variants that are likely to be detected in the next couple of years.
Examples of putatively functional promoter polymorphisms in candidate genes for CAD that have been recently discovered and characterized with use of the current imperfect armament of in vitro techniques include the genes encoding plasminogen activator inhibitor-1 (PAI-1), coagulation factor VII, apolipoprotein (apo) B, apo A-II, a range of matrix metalloproteinases, cystatin C and TNF-α [113–123].
For any newly identified candidate gene, the biological pathways in which the gene product takes part needs to be established so that the effects of changes in expression levels, or function of the gene can be anticipated and specific experiments designed to evaluate its role. Once a putative role can be identified, the result of changes in expression levels [up by transient transfection, down by short interfering RNA (siRNA) techniques] and/or variations in the protein sequence (if different isoforms have been identified) has to be explored. Use of siRNA is considered of particular importance in this context. siRNA-directed ‘knockdown’ allows the inexpensive and rapid analysis of gene function in cell culture systems and whole animal studies [124, 125]. The final evaluation of the biological roles of a limited number of novel genes and gene variants then needs to be conducted in gene-targeted mouse models.
Gene expression profiling, monitoring mRNA levels in relevant tissue samples to measure gene activity may prove to be particularly helpful by indicating which genes are involved in disease pathogenesis secondary to both genetic and environmental factors as well as to gene-environment interactions . Obstacles include access to appropriate vascular tissue from carefully phenotyped individuals, assay reproducibility and need for more refined analytical tools for data processing.
Both common and rare variants matter
The relative contribution of rare versus common variants to CAD remains an open question. Only large-scale resequencing of carefully phenotyped individuals will provide an unbiased view of the importance of all types of DNA variants, common as well as rare. However, it is notable that the known genetic variants that have a major impact on CAD risk are relatively rare such as the variants causing LDL receptor deficiency or dysfunctional apo B. At present, population-based whole-genome resequencing remains prohibitively expensive, but more focused studies in subjects with low HDL cholesterol have already been completed and provide further evidence for a limited but significant contribution of rare alleles to CAD . Sequence variants in the PCSK9 gene have also been found to be associated with lower LDL cholesterol concentration and protection from CAD . Recently, the adipokine gene ANGPTL4 was resequenced in a total of 3,551 participants in the Dallas Heart Study, and variants were found to influence the plasma triglyceride concentration . The common susceptibility gene variants identified so far in SNP-based association studies only explain a small fraction of the genetic heritability of CAD. Clearly, the next generation of even larger genome-wide association studies are likely to add additional common variants with quite small effects (<1%). It is open to speculation whether large-scale resequencing will lead to detection of numerous additional rare gene variants with strong effects on CAD risk, collectively accounting for a substantial proportion of the genetic heritability of this important trait. Considering that the resequencing technology is progressing rapidly [130–132], future systematic efforts in this area may be directed towards genes outside the constellation of already established candidate genes influencing currently measurable quantitative traits.
The collaboration imperative
There is strong evidence that sample size is the key determinant of quality in an association study . As evidenced by the recently reported genome-wide association studies and their subsequent replication studies, identification of novel susceptibility genes for CAD will require very large case–control collections as the effect sizes of all but one or two of the underlying genes are expected to be quite small (genotype risk ratios <1.2–1.3). Thus, access of national or international consortia such as the Wellcome Trust Case–Control Consortium, Cardiogenics and PROCARDIS to large samples of cases and controls has been one of the reasons for the recent successes in susceptibility gene identification for complex diseases. A custom 50K Vascular Disease SNP array (the ‘IBC chip’) has recently been developed within the framework of a transatlantic collaboration (details of genes and probes can be found at http://bioinf.itmat.upenn.edu/cvdsnp). This SNP array is likely to be widely used in cardiovascular programmes worldwide and will promote and facilitate future pooled analyses of approximately 2150 genes selected for their potential involvement in a broad range of vascular disease processes. However, pooling the results of analyses made in different samples and populations may not necessarily increase the power to detect susceptibility genes with a modest effect for a disease that is as phenotypically complex as CAD as the phenotypic diversity of the study may increase in parallel.
The theoretical promise of the genetic association study in the analysis of CAD and related intermediate traits has now been proven, and time has come for propagating a mature and balanced view of its merits and limitations, refraining from a tendency to hyperbole. Combined with functional studies of individual susceptibility genes in vivo and in vitro, genetic association studies have greatly expanded our knowledge of the pathophysiology of atherothrombosis and identified a number of genes and gene variants that appear to be consistently associated with an increased risk of CAD, some of which are considered as novel drug targets. The hitherto consistently replicated susceptibility locus on chromosome 9 harbouring the noncoding RNA gene ANRIL, represents a region of high priority for further functional studies aiming at dissection of the molecular basis for this intriguing association. Admittedly, the population-attributed risks of identified CAD-related genetic variants are small, but again, they provide invaluable information about causal pathways.
In contrast, the potential of genetic testing to predict CAD remains unclear as elaborated upon by for example Humphries, Ridker and Talmud , and will ultimately have to be evaluated in the format of integrated analyses of large-scale population-based cohort studies such as the UK Biobank  and other similar projects. International harmonization of biobanks and data-pooling strategies will not be least required, as all initiated or planned biobank projects are predicated on the CDCV hypothesis and generally lack a specific disease focus. Of note, a minimum of 5000 cases would ideally be needed for each disease of interest in a national biobank and ideally 10000 cases to detect a moderately strong interaction effect [135, 136]. Nevertheless, a number of recently identified genes, including ANRIL contained in the chromosome 9p21 locus seem to be independent of measured traits, which may indicate a predictive value over and above established risk factors. It can also be speculated that gene variants associated with some established risk factors like plasma lipoprotein concentrations, the latter being influenced by lifestyle, diurnal and random factors are better markers of life-long dyslipoproteinemia than single lipid measurements . Furthermore, rare variants with high penetration may turn out to be more frequent and account for a greater proportion of the genetic heritability of CAD than previously anticipated. Be that as it may, more imminent developments are likely to be seen in the neighbouring field of pharmacogenomics, dealing with the tailoring of pharmacological treatment of disease or high-risk states to possession of specific genetic variants that influence response.
Finally, it should be reiterated that design and outcome of molecular genetic studies of CAD are context-specific which means that some inconsistency will prevail and that not one study design or analytical approach will be the best for all settings. A greater focus needs to be placed on attaining phenotypic specificity to reduce pathogenic and genetic heterogeneity. More refined clinical phenotypic classifications are warranted which could form the basis for subsequent identification of different molecular aetiologies. Until this has been achieved, more emphasis might have to be placed on intermediate phenotypes that are more amenable to genetic analysis. Furthermore, future genome-wide association studies should be combined with transcriptomic and functional genomic studies of genetic variants as well as with proteomics. The usefulness of integrating QTL and global gene expression analyses in the pursuit of pathways for atherosclerosis has already been convincingly demonstrated in mice , and the tools for genome-wide association studies of global gene expression are at hand [139–141].
Conflict of interest statement
No conflict of interest was declared.
The work performed in the authors’ laboratory was supported by the Swedish Medical Research Council (project 8691), the Swedish Heart-Lung Foundation, the Knut and Alice Wallenberg Foundation, the European Commission (LSHM-2007-037273), the Stockholm County Council (project 560183) and the Leducq Transatlantic Network of Excellence on Atherothrombosis Research (LENA).