Dr T. Ahmad, Gibson Laboratories, Radcliffe Infirmary, Woodstock Road, Oxford OX2 6HE, UK. E-mail: email@example.com
Recent epidemiological, clinical and molecular studies have provided strong evidence that inherited predisposition is important in the pathogenesis of chronic inflammatory bowel diseases. The model most consistent with the epidemiological data suggests that Crohn’s disease and ulcerative colitis are related polygenic diseases, sharing some but not all susceptibility genes. Investigators throughout the world have applied the complementary techniques of genome-wide scanning and candidate gene analysis. Four areas of linkage have been widely replicated on chromosomes 16 (IBD1), 12 (IBD2), 6 (IBD3—the HLA region), and most recently on chromosome 14. Fine mapping of these regions is underway. Of the ‘positional’ candidate genes, most attention has centred on the genes of the major histocompatibility complex. Genes within this region may determine disease susceptibility, behaviour, complications and response to therapy. Hope continues that studies of inflammatory bowel disease genetics will provide fresh insight into disease pathogenesis and soon deliver clinical applications.
Interest in the genetic predisposition to inflammatory bowel diseases has grown progressively since the familial occurrence of inflammatory bowel disease was brought to our attention, 37 years ago by Kirsner.1 The last decade, in particular, has seen an explosion in the volume of literature in this field. This has been fuelled by the realization that the identification of the relevant susceptibility genes might provide the key step to understanding the primary pathophysiology of inflammatory bowel disease. Furthermore, the suggestion that genetic influence may extend to determining not only overall susceptibility but also phenotype has generated optimism that unravelling the genetic influence may provide a more accurate molecular classification of disease, thus offering individuals a better prognostic evaluation and perhaps permit more specific, targeted medical and surgical treatment. The huge increase in DNA sequence information provided by the recent publication of two rough drafts of the human genome sequence,2,3 combined with the rapid advances in genotyping technologies and analytical techniques to exploit this data have undoubtedly determined the pace of scientific progress over the last few years. Now as we enter the ‘post genomic era’, future prospects will critically depend upon the complementary advancement in RNA and protein based measurements.
In genetic terms the inflammatory bowel diseases are ‘complex’ because classical Mendelian inheritance attributable to a single gene locus is not exhibited. A number of obstacles challenge scientists involved in the genetic dissection of such ‘complex’ disorders: first, these disorders are likely to be polygenic, involving the interaction of several gene mutations; second, mutations in any one of several genes may result in identical phenotypes. This is demonstrated by experiments in transgenic mice models, where multiple diverse genetic defects may lead to similar phenotypes. This genetic heterogeneity hampers genetic mapping as susceptibility genes may differ from patient to patient. Third, incomplete penetrance operates, suggesting that a genetically predisposed individual requires exposure to other additional environmental factors to develop the disease. The final obstacle, disease heterogeneity, must be considered in the genetic dissection of any complex disorder, and is well demonstrated in inflammatory bowel disease.
Disease heterogeneity in inflammatory bowel disease
Patients with inflammatory bowel disease have traditionally been classified as having either ulcerative colitis or Crohn’s disease on the basis of established clinical, endoscopic, radiological and histological features. However, it is clear that there is great variation in disease presentation within this basic classification. Thus, clinical sub-groups have been defined by anatomical location, extent (diffuse or localized), behaviour (primary inflammatory, fistulizing or fibrostenotic) and by operative history.4 There are increasing data, discussed later, to suggest that this disease heterogeneity is genetically determined. From these data one could speculate that different clinical sub-groups are caused by different genetic defects that require different environmental antigens to become functionally significant. If this hypothesis is correct, the current classification of inflammatory bowel disease will need to be modified to encompass a family of diseases that differ at the molecular level, but are similar at the clinical phenotypic level, all requiring the crucial interaction with a specific environmental agent.
A useful comparison can be made with the disease heterogeneity of non-insulin dependent diabetes (NIDDM). Traditionally, NIDDM was felt to be a distinct diagnosis, both pathologically and clinically. Recently it has become clear that NIDDM encompasses a group of disorders including a number caused by single gene defects leading to isolated beta cell dysfunction, insulin resistance or syndromes that include diabetes.5 A greater insight into NIDDM disease heterogeneity is already being used to predict disease progression and complications, to individualize treatment and to screen family members.6 A similar molecular classification of inflammatory bowel disease may be desirable to predict which patients are more likely to have extensive disease requiring surgery, those at risk of developing carcinoma.7–9,10 or those likely to relapse after surgery so that appropriate medical and surgical management can be planned.
The role of genetics in inflammatory bowel disease susceptibility has been suggested by epidemiological data (Table 1) including twin studies, ethnic differences in disease prevalence, studies of familial aggregation, and by association with recognized genetic syndromes.
Table 1. Epidemiological evidence for genetic susceptibility to inflammatory bowel disease
Twin studies provide the most powerful tool to disentangle the relative contribution of both genetics and the environment in the pathogenesis of inflammatory bowel disease. Greater concordance for a disease in a monozygotic twin pair (genetically identical clones that should be concordant for any genetically determined character) compared to a dizygotic twin pair (that share only half of their genes on average) suggests that genetic factors are responsible for this difference, assuming that the influence of the environment is the same in both twin pairs. However, this latter assumption may not be correct because identical monozygotic, same sex twins are more likely to be treated the same, and thus share more of the environment than dizygotic twins. Three twin studies have examined the relative genetic and environmental contributions to inflammatory bowel disease by studying a total of 322 twin pairs.11–14 Combining these results produces Crohn’s disease concordance rates of 37% and 7% for monozygotic and dizygotic twins, respectively, with equivalent results for ulcerative colitis of 10% and 3%. From these concordance rates, an estimation of the relative contribution of genetics to disease aetiology (coefficient of heritability) can be calculated. The figure for Crohn’s disease in particular is greater than that for IDDM, schizophrenia or asthma, diseases in which the importance of genetics in determining susceptibility has been established by epidemiological data for many years.11 It should be noted that concordance rates might be underestimated by these data, because the possibility of the second twin developing inflammatory bowel disease in later life cannot be excluded.
Considerable differences exist in the prevalence of inflammatory bowel disease between different ethnic groups, with the highest rates in Whites, lower rates in Black Americans and the lowest rates in Asians.15 The prevalence in the Jewish population, 2–4 times higher than any other ethnic group, is maintained irrespective of geographical location and time period. The greatest risk within the Jewish population is seen in Ashkenazi Jews compared to Sephardic or Oriental Jews.16
Familial aggregation of disease
Between 6% and 32% of patients with inflammatory bowel disease have an affected first- or second-degree relative.17–27 The prevalence of a family history is highest in Jewish patients and in patients with early onset of disease.17,25,28 Thus there may be a greater genetic contribution to disease in patients with early onset of disease, a hypothesis that is supported by studies that have demonstrated stronger evidence for linkage to markers on chromosomes 1, 3, 16 and 12 in families with at least one member less than 20 years old at diagnosis of inflammatory bowel disease.29,30 Conversely, patients with Crohn’s disease who have a family history are more likely to be younger at diagnosis and have small bowel disease, than patients with no family history. Disease severity, judged by the need for surgery, or rescue cyclosporin in the case of ulcerative colitis, is however, no different in inflammatory bowel disease patients with a family history compared to patients with sporadic disease.26,27
The greatest risk for developing inflammatory bowel disease is having a relative with the disease. The magnitude of this risk relative to the general population, λR, is greatest amongst siblings and least to offspring.23,24,31 The relative risk to a sibling, λS, of a patient with Crohn’s disease is 13–36 and for ulcerative colitis 7–17. This compares with a λS of 500 for cystic fibrosis, 15 for Type 1 diabetes and 8.6 for schizophrenia.
Epidemiological studies in multiply affected families report high concordance rates for disease type (ulcerative colitis or Crohn’s disease), site, behaviour (inflammatory, fibrostenotic, fistulizing), and presence of extraintestinal manifestations of disease, suggesting that disease phenotype as well as overall inflammatory bowel disease susceptibility is influenced by genetics (Table 2).24,32–34
Table 2. Putative sites at which genetic variation may influence IBD susceptibility
The interaction between genetics and the environment
Numerous abnormalities involving many components of the immune response in patients with ulcerative colitis and Crohn’s disease have been identified (reviewed by Fiocchi).35 It is increasingly recognized that these abnormalities may be determined by an individual’s genetic make-up. Thus, genes that either directly or indirectly influence the immune mediated inflammatory response may interact to confer genetic susceptibility to an unknown environmental trigger. Genetic studies in humans and in animal models suggest that the homeostasis of the mucosal immune system may be disrupted by mutations in any one of a number of different genes (Table 3). Inflammatory bowel disease genetic research has focused on genes involved in several areas of the mucosal immune system in particular. These include: genes involved in antigen presentation; the balance between pro- and anti-inflammatory cytokines; the abrogation of oral tolerance; maintenance of intestinal barrier integrity (includes genes involved in intestinal permeability and mucus production); and abnormal leucocyte homing.
Table 3. Future inflammatory bowel disease genetic research—avenues for further progress
The genetic model
The current popular genetic model of inflammatory bowel disease proposed by Satsangi et al. suggests that ulcerative colitis and Crohn’s disease are related polygenic diseases sharing some susceptibility loci but differing at others.36 The presence of ‘common inflammatory bowel disease’ susceptibility genes is suggested by the fact that both forms of inflammatory bowel disease can coexist in a family at a frequency greater than expected by chance. Linkage studies (discussed later) support this concept because some loci appear to influence susceptibility to inflammatory bowel disease overall, whilst others confer susceptibility to Crohn’s disease or ulcerative colitis. Interestingly, several of these loci have also been implicated in other diseases characterized by a dysregulated immune system. Becker et al. compared genome wide scans from a number of human auto-immune conditions and found that peak linkage areas map in a non-random fashion to 18 distinct clusters.37 Five of these areas were particularly well defined; including two regions on 7q and 12p that have been implicated in inflammatory bowel disease genome wide scans.38,39
This model proposes a hierarchy of genetic influence in inflammatory bowel disease. Some genes confer susceptibility to inflammatory bowel disease generally; disease specific genes determine whether an inflammatory bowel disease-susceptible individual develops ulcerative colitis or Crohn’s disease; and phenotype genes are likely to influence the site, behaviour, natural history and response to different medical therapies. The different clinical phenotypes of disease are caused by different genetic mutations at each of these levels that require the crucial interaction with a specific environmental antigen to become functionally significant.
METHODS EMPLOYED IN THE IDENTIFICATION OF SUSCEPTIBILITY GENES
The main tools exploited by geneticists are regions of DNA that differ between individuals, known as polymorphic loci. In some instances, changes in the nucleotide sequence may affect the product of the gene either in its structure or in the regulation of its expression. However, the majority of polymorphic loci have no effect on gene transcription and therefore do not contribute to phenotypic variability. Nevertheless, these areas are used as important markers to identify segments of DNA that are transmitted from a parent to their offspring.
The two approaches of genome wide scanning and candidate gene association studies have been used in the search for the genetic determinants of inflammatory bowel disease susceptibility and behaviour. The principles of these techniques will first be discussed separately and then their complementary application will be illustrated together, with examples from relevant chromosomes.
Genome wide screening in inflammatory bowel disease—principles
Genome wide scanning uses linkage studies to determine whether a microsatellite (VNTR) marker (composed of a series of tri- or tetranucleotide repeats) is inherited together with a disease in a family. This process is made possible because these markers are ‘highly informative’, as a large number of variations (alleles) of each marker exist in the population. If the actual disease-causing gene is located close enough to the marker it is less likely to be separated at meiosis and so will be co-inherited (non-recombinant). Markers closest to the disease gene show the strongest correlation with disease patterns in families. The main advantage of this approach is that no prior assumption is made of the nature of the susceptibility genes. One of the main drawbacks of classical linkage studies is a need to specify a precise genetic model, detailing the mode of inheritance, gene frequencies and penetrance of each genotype. Genome wide scans attempt to avoid this problem by using a model free system looking at chromosomal segments that are shared by affected individuals, either affected sibling or relative pairs. These studies were made possible by the development of microsatellite maps with markers covering 90% of the human genome and by the development of semi-automated genotyping techniques.40 Affected families are genotyped for about 400 such markers spaced at approximately 10 centimorgan intervals across the whole genome. Sub-chromosomal regions of interest, which hopefully contain a disease gene, are identified by genetic markers where the degree of allele sharing between affected individuals exceeds that expected by chance. The degree of over proportional sharing can be expressed statistically by the ‘LOD’ score (Z). This is calculated from the logarithm of the odds that the marker and disease gene are linked rather than unlinked. The regions identified are typically large, varying between 100 and several thousand kilobases, encompassing several hundred potential candidate genes. Narrowing down these regions is a major current challenge facing investigators.
To account for the problems of multiple testing, strict significance levels for declaring linkage have been suggested by Lander and Kruglyak based upon the number of times one would expect to see a result at random in a dense genome scan.41 Much confusion surrounds significance levels for genome wide scans, which should be carefully scrutinized. A ‘significant linkage’ threshold allows for one false positive in every 20 genome wide scans (maximum logarithm of odds, LOD score > 3.6 for sibling pair studies, P < 2 × 10−5). Confirmation of a ‘significant’ linkage in a replication study requires P < 0.01 (based on five markers). ‘Suggestive’ linkage (LOD score > 2.2, P < 7 × 10−4) at a marker is likely to occur once by chance in each genome wide scan. Although frequently forgotten, it has been proposed that such ‘suggestive’ results should be published with a warning label with no claims of linkage.
Over the last 15 years, linkage analysis has proven successful in the identification of gene mutations in a number of Mendelian disorders. Such discoveries have however, been limited almost exclusively to rare mutations exerting a large effect with strong genotype–phenotype correlations. Several of these genes account for an uncommon subset of generally more common disorders such as colon cancer (familial adenomatous polyposis, FAP, and hereditary non-polyposis colorectal cancer, HNPCC), breast cancer (BRCA-1 and -2) and NIDDM (maturity-onset diabetes of youth, MODY-1, -2, -3).
However, this success has not been seen for the large number of common familial polygenic disorders that do not demonstrate obvious mendelian inheritance. Rather disappointingly, many regions of linkage have not been replicated, even in the same population. The main reason for this is the relative lack of power of linkage studies, which is a particular problem if numerous common alleles of smaller effect (measured as genotypic risk ratio, GRR—relative risk attributed to possession of a specific allele) underlie such disorders. In this situation, when the GRR is low, computer models of polygenic disease in a virtual population demonstrate that confidence intervals are wide, suggesting that large numbers of affected sibling pairs are needed to replicate linkage and identify a disease gene. Thus, a disease allele with a population frequency of 10% and a low GRR of 2 would only be identified by linkage if 5382 families are studied, which clearly will not be feasible despite international collaboration.42
Candidate gene association studies—principles
Any gene with a putative role in inflammatory bowel disease in which polymorphisms (variations or mutations in the DNA sequence at a given locus) can be found is a potential candidate gene. An association study uses a case–control design and is based on the hypothesis that different alleles of a polymorphic gene confer different susceptibility to disease by altering the structure or expression of the encoded protein. Experimental evidence from functional or animal studies to support this hypothesis will make the study more plausible, although it may be difficult to determine whether changes in the expression of a gene represent the primary defect or a secondary phenomenon.
A positive association may imply that the polymorphism is itself the susceptibility locus. Alternatively it may be artifactual, commonly ascribed (although virtually never proven) to population stratification. Any trait that presents more frequently in a given ethnic group will apparently be associated with any allele that is more common in that group. It is far more likely, however, that a positive association is due to linkage disequilibrium. This occurs when a marker allele being studied and disease allele are co-inherited as a consequence of their close proximity. Linkage disequilibrium is seen in populations where most affected individuals are descended from a common affected ancestor—the founder affect. The ability to detect association between marker alleles and disease depends critically on the nature of the linkage disequilibrium operating between them. Failure to replicate an association in a different population is likely be due to different linkage disequilibrium, a consequence of a different founder effect. Replication studies in different populations can thus be used to resolve whether an association is truly causal or due to linkage disequilibrium. Previously simulations have suggested that linkage disequilibrium is unlikely to extend beyond 3 kb, encouraging scientists to test other candidate genes located nearby.43 More recently however, Moffatt et al. have suggested that linkage disequilibrium may operate over much larger distances, suggesting that a vast number of positional candidate genes will need to be considered. Using 26 markers within the TCR α/δ locus on chromosome 14q, significant linkage disequilibrium between markers was relatively common at 250 kb and detectable beyond 500 kb.44
Functional candidate genes
Traditionally, genes have been selected on the basis of their function and putative role in the pathogenesis of disease. As there is no reliable hypothesis on the pathological mechanism of inflammatory bowel disease, there are a vast number of possible candidates.
Positional candidate genes
More recently, positional candidate genes have been selected on the basis of their location within an area of replicated linkage. To select novel positional candidates in regions that have not been sequenced, scientists create a physical map (contig) of overlapping clones across the linkage area. Information about neighbouring microsatellite markers, known genes, and expressed sequence tags which have putatively been mapped to the linkage area by radiation hybrid mapping, can be sought from a website.45 Yeast artificial chromosomes (YAC) and P1-derived artificial chromosome (PAC) libraries can then be screened for these transcripts to create a map across the linkage area. Candidates identified in this way are discussed later. It should not be forgotten that in complex diseases the susceptibility gene may well lie outside the region of maximal linkage, in contrast to traits with mendelian inheritance where the gene always lies within the region of maximal linkage.46
Association studies, although more powerful than linkage studies, are also vulnerable to criticism as highlighted by two recent editorials.47,48 Most importantly, incorrect selection of controls may lead to spurious associations due to population stratification. In order to avoid this problem, family-based studies such as the transmission–disequilibrium test, have been designed that use genotypes from an affected individual and both parents.49 This allows the non-transmitted parental alleles to be used as internal matched controls. Several large collections of such inflammatory bowel disease patient–parent trios have been established world-wide, facilitated by the relatively early age of disease onset of inflammatory bowel disease.
Further problems are encountered with population heterogeneity, poor phenotypic definition of disease, failure to adjust significance levels for multiple-hypothesis testing, and publication bias for positive studies. A well-conducted association study should therefore test a polymorphism with a plausible functional biological significance using rigorously phenotyped patients and controls. Results should achieve low corrected P-values and be replicated in case–control and family-based studies. Studies of ethnically diverse individuals may not only provide support for an association but may also facilitate the resolution of causal from non-causal relationships due to linkage disequilibrium, and permit the identification of other important genetic and/or environmental modifying factors. Finally, it must be emphasized that a negative study of one polymorphism within a gene does not mean that the gene is not involved if other polymorphisms are subsequently discovered. Only when all polymorphisms have been tested can the gene be confidently excluded as a candidate.
REPLICATED SIGNIFICANT INFLAMMATORY BOWEL DISEASE LINKAGE REGIONS AND POSITIONAL CANDIDATES
Eight genome wide scans and numerous replication studies have been published since 1996.38,39,50–55 In contrast to many other common polygenic diseases, the data from these studies in inflammatory bowel disease have been remarkably consistent. ‘Significant’ areas of replicated linkage have been found on chromosomes 6p (IBD3), 12q (IBD2), 14q, and 16q (IBD1) and are discussed in detail below. Importantly, these data support the polygenic model of disease, as IBD3 appears to confer susceptibility to both ulcerative colitis and Crohn’s disease, whilst IBD1 and IBD2 appear to determine susceptibility specifically to Crohn’s disease and ulcerative colitis, respectively.
A further area of significant linkage on chromosome 19 awaits replication. In addition, eight areas of ‘suggestive’ linkage on chromosomes 1p, 1q, 3p, 3q, 4q, 5p, 7q, and 10p, have been identified. Although none of these have reached the strict criteria for ‘significant’ linkage, several have been replicated. Figure 1 illustrates these linkage areas and highlights relevant positional candidates found within their boundaries.
The first genome wide scan reported from Paris used a two-staged approach involving Crohn’s disease affected sib-pairs only.50 The first stage involved 40 sib-pair equivalents from 25 families. Analysis of allele sharing revealed three markers on chromosome 16 and one on chromosome 1 which achieved a P-value of less than 0.01. In a second set of 53 families comprising 70 sib-pair equivalents, suggestive linkage at a locus on chromosome 16 only was found, which was designated IBD1. The sibling risk ratio (λS) specific to this locus was estimated to be 1.3. The power of this first study was severely limited by sample size. In addition, a P-value of 1.5 × 10–5 did not strictly meet Lander’s criteria for significant linkage, although it has been argued that such strict criteria need not be applied because a P-value < 0.01 was found in both stages. Nevertheless, replication of this linkage in Crohn’s disease has been widely made in the genome wide scan from Oxford, Baltimore and Europe, and in replication studies from the United States and Australia.51,52,56–59 The latter study in particular is worthy of note as it used only completely informative sib-pairs (DNA from both parents analysed to confirm allelic identity by descent) and achieved the highest logarithm of the odds score yet reported for Crohn’s disease, of 6.3. Linkage for ulcerative colitis only has been reported from England and France and for inflammatory bowel disease overall from Italy.60,61 This year the inflammatory bowel disease consortium confirmed linkage to IBD1 in Crohn’s disease with a maximum logarithm of the odds score of 5.2 typing 581 affected sibling pairs pooled from 11 centres around the world.62
Chromosome 16 positional candidates.
The interleukin (IL)-4 receptor α-gene (IL-4R) located within IBD1 is both a functional and positional candidate gene because inflammatory bowel disease patients demonstrate reduced IL-4 production and impaired IL-4-mediated down-regulation of pro-inflammatory cytokines.63,64 However, no association was recently found between four identified single nucleotide polymorphisms and either Crohn’s disease or ulcerative colitis using a transmission–disequilibrium test study.65 E-cadherin is a transmembrane glycoprotein that mediates cell–cell adhesive interactions in the intestinal epithelium. In vivo this molecule appears to be critical in maintaining epithelial barrier function.66 Up-regulation occurs in both overt inflammatory bowel disease and in sub-clinical inflammation in spondyloarthropathy patients.67 Mutation screening of this candidate gene is awaited. CD19 and CD43 are involved in B cell function and intercellular adhesion molecule 1 (ICAM1) interactions, respectively. No significant association in a transmission–disequilibrium test study was found in Crohn’s disease with a missense mutation in Exon 3 of CD19.68
The second genome wide scanning from Oxford identified significant linkage with a 41-cM region on Chromosome 12, termed IBD2, with a peak logarithm of the odds score of 5.47 at the marker D12S83.38 In contrast to IBD1, this loci achieved significance both in patients with ulcerative colitis and Crohn’s disease. The IBD2 locus specific relative risk of 2.0 was calculated from this study. Replication for ulcerative colitis and Crohn’s disease was reported in the genome wide scanning from Europe and for Crohn’s disease from Los Angeles.39,52 Further confirmation of linkage of ulcerative colitis and Crohn’s disease to IBD2 was provided by the replication study from Northern Europe and for Crohn’s disease only by the studies from the United States69–72 However, it is noteworthy that this result has not been replicated in all data sets, including that from the inflammatory bowel disease consortium which found maximum logarithm of the odds (LOD) scores for inflammatory bowel disease, ulcerative colitis and Crohn’s disease of only 1.8, 1.2 and 1.1, respectively.62 A recent study from Oxford and Pittsburgh that examined linkage at 12 markers across IBD2 in 581 affected relative pairs has suggested that this discrepancy may reflect the relatively small number of affected relative pairs with ulcerative colitis included in previous studies. A multipoint logarithm of the odds score of 3.91 was detected for pairs from 138 ulcerative colitis-only families, compared to 1.66 for Crohn’s disease-only and 1.29 for mixed families. The difference between the logarithm of the odds scores for ulcerative colitis and Crohn’s disease was significant at P=0.0057 in a formal test for heterogeneity. This study suggests that IBD2 makes a major contribution to ulcerative colitis susceptibility, but has only a relatively minor effect with regard to CD.73
Chromosome 12 positional candidates.
Vitamin D has multiple immune functions, including suppression of lymphocyte proliferation and inhibition of several pro-inflammatory cytokines. The Taq1 t allele and tt genotype of the Vitamin D receptor, previously associated with susceptibility to HIV infection and resistance to mycobacterium tuberculosis, have been significantly associated with Crohn’s disease but not UC.74 Natural resistance-associated macrophage protein 2 (NRAMP2) regulates metal ion transport across the apical cell membrane. No significant findings using both transmission–disequilibrium test and linkage analysis were found when three restriction length polymorphisms were studied.75 A subsequent transmission–disequilibrium test analysis in 350 ulcerative colitis and Crohn’s disease families again found no association with seven novel single nucleotide polymorphisms.76 Several novel genes with a putative role in the pathogenesis of inflammatory bowel disease, including STAT-6 and MMP-18, have recently mapped to IBD2 using high-density transcript mapping.77
The role of the HLA area in the susceptibility to inflammatory bowel disease has attracted great interest since 1972.78 This area, one of the first multi-mega based regions to be sequenced in 1999, contains 224 densely packed gene loci, many of which appear to be involved in the immune system.79 The high degree of polymorphism, thought to be driven by the battle for supremacy between the immune system and infectious pathogens, makes these genes attractive potential candidates.
HLA molecules present modified peptides to the T cell receptor. Class I molecules, expressed on all cells, consist of a single heavy chain encoded by three highly polymorphic genes (HLA-A, -B, -C). Class II molecules, expressed only on specialized immune cells, are made up of an α and β chain encoded by three genes (HLA-DP, -DQ, -DR). HLA-DP and -DQ are polymorphic for both chains whilst HLA-DR is polymorphic for the β chain only. The high degree of polymorphism of these genes involves the binding groove, suggesting that HLA associations with disease may relate to differences in peptide binding. Several other hypotheses have been proposed to account for the association between HLA type and disease susceptibility, including the role of the HLA in T cell repertoire selection and antigen-induced Th1/Th2 selection.
Linkage studies and the HLA region.
Since 1980, numerous studies have examined class I and II allele sharing between affected sib or relative pairs in an attempt to demonstrate linkage between the HLA region and inflammatory bowel disease. The early studies were limited by the use of serological typing methods in small cohorts.80–83 Using modern molecular typing methods however, Satsangi et al. demonstrated significant allele sharing in ulcerative colitis when the DRB1 locus was genotyped in only 29 affected sib pairs.84 More recently, linkage in Crohn’s disease was found using several forms of non-parametric analysis in 323 individuals from 49 Crohn’s disease multiplex families.85 The resurgence of interest in the HLA region owes much to the results of two genome wide scans published in the last 12 months. Significant linkage to this region was found in a European study of a large number of sibling pairs and replicated by suggestive linkage in a genome wide scan from Los Angeles.52,53 Using the method described by Risch that uses both data from these linkage and epidemiological studies, the relative contribution of the HLA region to overall genetic risk has been estimated to be 64–100% for UC and 10–33% for CD.84–86
Association studies and the HLA region.
Most of the association studies of class I and II molecules in inflammatory bowel disease have examined association by individual alleles. Dissecting out which allele within an HLA haplotype is the relevant candidate is difficult because of tight linkage disequilibrium. Trans-racial mapping and HLA haplotype-matched analysis are techniques used in an attempt to resolve this problem. Further complexity is added by the hypothesis that combinations of alleles may interact on the same (cis) or the opposing (trans) haplotype to cause disease. Thus a detailed allele association study will need large numbers of patients to adjust for the multiple statistical comparisons that will need to be made.
Over 20 HLA class I association studies have been reported between 1972 and 1996, all of which have used serological typing methods. Once corrections for multiple comparisons have been made, significant allele associations have been demonstrated in approximately half of these studies, but the results are inconsistent.78,87–96 The encouraging data from the recent chromosome 6 linkage studies may lead scientists to revisit the class 1 region using modern molecular typing methods.
Most interest in the HLA region has focused on association with class II alleles. A meta-analysis of 29 studies from 1966 to 1998 compiled by Stokkers et al. made a most valuable contribution to this area.97 The main limitations of the study were first the concessions that were necessary to allow for changes in nomenclature to incorporate data from both serological and molecular studies, and second the problems caused by the inclusion of studies involving different ethnic populations. Significant positive associations in ulcerative colitis were found with DR2 (OR 2.00, CI: 1.5–2.63), DRB1*1502 (OR 3.74, CI: 2.20–6.38), DR9 (OR 1.54, CI: 1.06–2.24), and DRB1*0103 (OR 3.42, CI: 1.52–3.69) and a negative association with DR4 (OR 0.54, CI: 0.43–0.68). The DR2 association appeared to be due to the subtype DRB1*1502, a result which probably reflects the high frequency of this allele in the Japanese population. When the Japanese were excluded from the analysis the DR2 was still significant but the DRB1*1502, common in the Japanese, was not, suggesting that other alleles were responsible for the association. The meta-analysis highlighted a novel association with DR9, which may have been missed in previous studies due to its low frequency in the general population. Significant positive associations in Crohn’s disease were found with DR7 (OR 1.42, CI: 1.16–1.74), DRB3*0301 (OR 2.18, CI: 1.25–3.80) and DQ4 (OR 1.88, CI: 1.16–3.05) and negative associations with DR3 (OR 0.71, CI: 0.56–0.90) and DR2 (OR 0.83, CI: 0.70–0.98).
Association of HLA class 1 and 2 genes with inflammatory bowel disease phenotype.
The majority of studies examining genetic association by inflammatory bowel disease phenotype have involved HLA class I and II alleles. Indeed it seems likely that the HLA may have a greater role in modifying inflammatory bowel disease phenotype than on overall disease susceptibility. Patients must be classified into homogeneous phenotypic groups of sufficient size if associations are not to be overlooked. However, this approach is not without risk. Definitions of extent and behaviour of disease must be accurately defined with attention paid as to whether patients are classified by macroscopic or histological criteria, which often conflict in an individual patient.98 Furthermore, classification by extent is limited because disease distribution may change with time.98,99 The most replicated association in ulcerative colitis is between the rare class 2 allele DRB1*0103, which is present in 0.2–3.2% of the population, and ulcerative colitis overall (6–10%), extensive disease (15.8%) and severe disease requiring colectomy (14.1–25%).7–9,84,100 This association is stronger if the extra-intestinal manifestations uveitis, arthritis and erythema nodosum, are present, with 22.8% of patients with extra-intestinal manifestation possessing this allele compared to 3.2% controls (P < 0.0001).7 A further replicated association has been found between the common HLA DR3 DQ2 haplotype and extensive ulcerative colitis. In a study from Oxford, 32.9% of cases of extensive ulcerative colitis vs. 10.7% of distal disease carried DR3 DQ2.84
HLA influence on phenotype is most clearly demonstrated when inflammatory bowel disease-associated arthritis is examined. A non-erosive, sero-negative peripheral arthritis occurs in 5–20% of inflammatory bowel disease patients. A new classification recognizes two distinct clinical forms with specific genetic associations.101 Type I peripheral arthropathy (defined as acute self-limiting inflammation affecting fewer than five joints, lasting less than 5 weeks, associated with inflammatory bowel disease relapses and the presence of extra-intestinal manifestations) has associations with HLA-B27, HLA-B35 and the strongest association to date with HLA DR1*0103 (40% v 3%). Type 2 arthritis (affecting five or more joints, median duration of symptoms of 3 years, associated with uveitis but not erythema nodosum) is associated with HLA-B44.102 Several other phenotypic association studies involving small numbers of patients with Crohn’s disease have been reported but await replication. These include the protection against perianal fistulating Crohn’s disease provided by DRB1*03 and the association of HLA DR7 with ileal rather than colonic disease.103,104
Association of Class 3 and non-classical class 1 HLA genes with inflammatory bowel disease.
The tumour necrosis factor alpha (TNFα) gene (6p21) is located 250 kb centromeric to HLA-B. It encodes a proinflammatory cytokine that is found in increased concentrations in the mucosa of patients with Crohn’s disease.105 Evidence of the importance of TNFα is seen from the dramatic response following infusion of an anti-TNFα monoclonal antibody.106,107 The regulation of TNF expression is in part genetically determined because the polymorphisms -238, -308, -863, -857 and -1031 found in the promoter region are associated with increased TNF production in vitro.108–110 TNF-308 allele 2 has been found to be significantly reduced in ulcerative colitis and Crohn’s disease.111,112 However, these results have not been consistently replicated even by the same centres.103 Recent data suggests that TNF polymorphisms may be more important in determining susceptibility to Crohn’s disease rather than ulcerative colitis. A significant association with TNFα-1031 and Crohn’s disease only has been found in case–control studies in both Japanese and British patients and recently also in a transmission–disequilibrium test study from the same British centre (personal communication).113,114 A significant association has also been found in Crohn’s disease with a haplotype a2b1c2d4e1 constructed from five TNF microsatellite markers across the TNF locus.115 The clinical significance of these associations has been explored by two recent studies that investigated whether TNF genetic markers could predict response to infusion with the chimeric anti-TNFα antibody. Data from Belgium suggest that both homozygotes and heterozygotes for the TNFα-308 allele 1 have higher response rates than patients with other TNF alleles, whilst the TNF microsatellite haplotypes have been shown to have no predictive value.116,117 Further work is underway investigating the predictive value of other TNF markers.
The major histocompatibility class 1 chain like gene (MICA) is located 46 kb centromeric to HLA-B. It has a similar structure to the class I molecule although cell surface expression occurs without β2 microglobulin.118 The expression of MICA, which is coupled to cellular stress, appears to be almost exclusively restricted to epithelial cells of the gastrointestinal epithelium.119 Polymorphisms in the MICA gene appear not to affect the binding groove, which is likely to remain empty, but may affect interaction with NK and γδ T-cells, cells that possess the MICA receptor, NKG2D.120 Two forms of polymorphisms have identified in the MICA gene, GCT repeats in exon 5 that encodes the transmembrane portion, and polymorphisms in the extracellular domain encoded by exons 2–4. Significant associations have been found between the number of GCT repeats and susceptibility to ulcerative colitis and protection from Crohn’s disease.121,122 When the extracellular polymorphisms were studied, significant associations were identified between MICA*007 and susceptibility to ulcerative colitis and MICA*008 and Type 2 inflammatory bowel disease peripheral arthropathy.123,124 Future studies will need to disentangle the linkage disequilibrium with neighbouring HLA-B.
The linkage area for Crohn’s disease only on chromosome 14 was initially found in the genome wide scanning from Los Angeles, which involved a total of 65 sib-pairs with Crohn’s disease from 46 families.39 This suggestive linkage was largely due to the contribution from non-Jewish families. Confirmation of linkage was subsequently provided by a genome wide scan from Pittsburgh that reached a ‘significant’ threshold in 127 Crohn’s disease-affected relative pairs and also by a replication study from Belgium.54,125
Chromosome 14 positional candidate genes.
Positional candidates include the T-cell receptor alpha and delta genes, a cluster of genes for proteasomes responsible for MHC class 1 antigen presentation, and the leukotriene B4 receptor gene. Association studies with polymorphism in these genes in inflammatory bowel disease are awaited.
FROM LINKAGE TO GENE
The chromosomal regions identified by genome wide scanning in inflammatory bowel disease are large. Regions identified span over 20 million base pairs, potentially containing several hundred candidate genes. The presence of more than one susceptibility gene on a chromosome may further complicate refinement efforts.51 Several strategies have been proposed to improve the prospects of identifying these positional candidate genes.
Fine mapping attempts to narrow down these identified linkage regions by saturating the areas with a higher density of markers. Linkage and association analysis can then be carried out at each of these new markers. This technique has been applied to areas on chromosomes 3, 6, 12 and 16 (personal communication).126–128
Larger numbers of families must be studied to increase the power of linkage studies, because the genotypic relative risk at each locus is likely to be small. Improved international collaboration directed by the inflammatory bowel disease Consortium aims to achieve this through a meta-analysis of results from 11 centres from around the world. This group recently presented their linkage results for IBD1 and IBD2 (discussed earlier) involving a total of 581 families.62
The chances of identifying a disease allele may also be increased by studying probands with more affected relatives, probands with a young age of disease onset or probands with greater disease severity. The study of specific sub-groups defined by clinical criteria or sub-clinical markers such as ANCA may reduce disease heterogeneity and thus also lead to disease allele enrichment.
The recent study by Cho et al. in American Chaldeans with inflammatory bowel disease demonstrates further powerful strategies to aid in the refinement of linkage regions and identification of disease loci.55 The study has attempted to counter the problems associated with genetic heterogeneity by studying a population that as a result of religious, cultural and language differences has remained genetically isolated. In a small number of multiply affected pedigrees, the authors applied linkage disequilibrium mapping to assess ancestral sharing of closely spaced markers to refine an area of suggestive linkage previously identified on chromosome 1p in an outbread population.51 The authors replicated the chromosome 1p linkage area with an increased multipoint logarithm of the odds score of 3.01 and refined the region to less than 1 cM.
Despite the application of these strategies, the success at narrowing the linkage areas in inflammatory bowel disease has been limited. It is likely that the near future will see the application of several alternative approaches (Table 3). Scientists are reverting back to the study of candidate genes testing thousands of single nucleotide polymorphism markers simultaneously as part of genome wide association and linkage disequilibrium mapping studies. Single nucleotide polymorphism markers may be selected randomly, or on the basis of their likely functional significance. Thus single nucleotide polymorphisms leading to non-conservative amino-acid substitutions in coding or promoter regions may be favoured. These studies have been made possible by the rapid development of chip technology and the generation of human single nucleotide polymorphism maps. The complete human single nucleotide polymorphism map compiled by the single nucleotide polymorphism consortium will be available in mid 2001. If a single nucleotide polymorphism is close to a disease-causing mutation it will be co-inherited by linkage disequilibrium. Linkage disequilibrium mapping exploits this phenomenon by correlating phenotype with large numbers of closely spaced single nucleotide polymorphisms. Therefore, to predict disease susceptibility or phenotype by linkage disequilibrium mapping it is not necessary to identify exact disease-causing mutation, because the single nucleotide polymorphism profile will suffice.
The use of closely spaced single nucleotide polymorphisms in an association study was recently illustrated by Martin et al., who set out to determine whether an established susceptibility gene for Alzheimer’s disease, APOE4 (apolipoprotein E allele 4) on chromosome 19q, could be identified from within an area of linkage. No association with APOE alleles and microsatellite markers had previously been found, suggesting that APOE would prove to be a rigorous practical challenge for assessing the value of single nucleotide polymorphism testing. Ten single nucleotide polymorphisms very close to APOE were typed in 1093 patients with onset over 59 years of age, using both case–control and family-based association studies. The two markers either side of the APOE spanning 40 kb showed the strongest association results.130
Linkage and linkage disequilibrium analysis has been described as the ‘reverse genetic approach’ to the identification of susceptibility genes. Developments in the new millennium are more likely to follow the application of the ‘forward genetic approach’.131 The starting point of this approach, the identification of all the genes in the human genome, will soon be reached. Advances in RNA and protein-based technologies will then enable the functional variation within all of these genes to be identified. The development of the high-density array, in particular, has generated much excitement. Nucleic acid arrays that contain more than 250 000 different oligonucleotide probes per square centimetre can now be produced that allow vast numbers of RNA and DNA molecules to be simultaneously and quantitatively detected. This forward genetic approach suggests that, once identified, the impact of all of these variants on disease phenotype should be studied as part of a large-scale single nucleotide polymorphism association study.
The field of molecular genetics has consumed a vast amount of research money and time over the last 10 years. Clinicians and patients are increasingly impatient to see a return from this huge investment, in individual patient care. Although many of the genes that cause Mendelian disease have been identified, these are rare and collectively affect only a few percentage of the population. In contrast, the vast majority of susceptibility genes for common ‘complex’ disorders, estimated to affect 60% of the western population, remain unmapped.132 In comparison to many ‘complex’ disorders however, there has been tremendous progress over the last 5 years in unravelling the role of genetics in inflammatory bowel disease. The clinical impact of this work is now becoming apparent (Table 4). The ultimate aim of discovering the primary pathophysiology of inflammatory bowel disease, and thus effecting a cure, through the identification of the relevant susceptibility genes, is still far off. More immediate clinical applications are likely to follow from a greater understanding of disease heterogeneity and the application of pharmacogenomics.
Table 4. Future clinical applications of IBD genetic research
Tariq Ahmad is supported by a project grant from the National Association for Colitis and Crohn’s disease.
Allele. One of several alternative forms of a gene or DNA sequence at a specific chromosomal position (locus).
ANCA. Antineutrophil cytoplasmic antibodies, a reported seromarker for ulcerative colifis.
Centimorgan (cM). A unit of genetic distance equivalent to a 1% probability of recombination during meiosis. 1 cM is equivalent to a physical distance of approximately 1 megabase.
Coefficient of heritability. Relative contribution of genetics to disease aetiology.
Expressed sequence tags (ESTs). A short sequence of a cDNA clone for which an assay is available for PCR.
Founder effect. A mutation or DNA variant in a population arises from a common shared ancestor who was one of the founders of the population.
Genotype relative risk (GRR). Relative risk attributed to possession of a specific allele at a given locus vs. another.
Haplotype. A collection of alleles found at different, linked loci on a single chromosome.
Linkage. The tendency of genes or other DNA sequences at a specific loci to be inherited together as a consequence of their physical proximity on a chromosome.
Linkage disequilibrium (LD). Two alleles at different loci that occur together in an individual more often than would be expected by random chance. Alleles situated in close proximity are less likely to be separated at meiosis and so are more likely to be co-inherited.
LOD score. A measure of the likelihood of genetic linkage between loci. See text for significance levels.
Microsatellite marker. A series of 2, 3 or 4 nucleotide repeat sequences used as informative markers in linkage studies.
Phenotype. The physical characteristics of a cell or individual, as determined by the genetic constitution.
Promoter. A short sequence of DNA at the 3? end of a gene to which RNA polymerase binds in order to initiate transcription of a gene.
Radiation hybrid mapping. Human chromosomal fragments generated by lethal irradiation of a somatic cell hybrid are fused with a rodent cell. A panel of such radiation hybrids are then used to interrogate DNA clones to produce a linear map.
Restriction length polymorphisms. DNA sequence variability leading to cutting or not by a restriction enzyme. Visualized by different patterns of fragment sizes.
Sibling relative risk. Disease risk for a sibling of an affected individual compared to the disease risk in the general population.
Single nucleotide polymorphism (SNP). DNA sequence variation resulting from a change in a single nucleotide.