The known genetic risk factors do not account for most of the heritability observed in many complex genetic traits.1, 2 Both Crohn's disease (CD) and ulcerative colitis (UC) are complex genetic traits: they occur in families, usually in non-Mendelian inheritance patterns, and have both environmental and genetic components that contribute to their pathogenesis. Moreover, multiple genome-wide association studies (GWAS) have identified common and low-penetrance genetic variants that are associated with CD, UC, or both disorders.3–14 Like many complex genetic disorders, inflammatory bowel disease (IBD) also demonstrates missing heritability: identified genetic risk factors explain only a fraction of the estimated heritability. For example, it is estimated that the 32 loci associated with CD identified via GWAS explain ≈20% of the overall genetic risk.3 The source of this “missing heritability”1 remains unknown, although rare kindred-specific genetic variants may explain this for IBD.2, 15, 16
Because of missing heritability, there is renewed interest in studying families with IBD. Standard linkage analysis has been used to map highly penetrant genetic mutations in families in whom an IBD-like phenotype demonstrated Mendelian inheritance patterns.17 However, such families are rare. It is unclear how best to identify rare genetic variants that may contribute to complex disease that may explain missing heritability. In this study we used the Utah Population Database (UPDB), a large genealogical database containing healthcare records to identify families at high risk for developing IBD. We calculated the familial standardized incidence ratio (FSIR), a statistic developed for family cancer studies in Utah kindreds.18 We identified multiple kindreds with a statistical excess of individuals with UC, CD, or both disorders. Given the need for novel genetic mapping strategies to explain the apparent missing heritability in IBD, further studies of these high-risk kindreds is justified.
- Top of page
- MATERIALS AND METHODS
A total of 4935 cases of CD and 5448 cases of UC were identified in UPDB. Among CD-affected individuals, 3601 (73%) had at least one relative in UPDB. Among UC-affected individuals, 3976 (73%) had at least one relative in UPDB. These cases constituted the study sample. In all, 7247 subjects were considered to have IBD. Compared to excluded subjects, included subjects were much more likely to have been born in Utah, were younger, and a smaller proportion were females (Table 1). The observed differences between included versus excluded cases were explained by the differences in birth state: in the logistic regression model, only “born in Utah” was statistically significant, and is reflective of the inherent sampling scheme of the UPDB. Birth state drove most of the observed differences between included and excluded cases for both UC and CD.
Among the 3601 subjects with CD, 178 (4.9%) had first-degree relatives who were also affected with CD (Table 2). Among 3976 subjects with UC, 164 (4.1%) had UC-affected first-degree relatives. Among 7247 IBD subjects, 455 (6.3%) had IBD-affected first-degree relatives. The adjusted odds ratios for first-degree relatives of individuals with CD, UC, or IBD were 5.72 (95% confidence interval [CI]: 4.61–7.1), 3.92 (3.2–4.81), and 3.37 (2.78–4.08), respectively. While effect sizes were smaller, there were statistically significant excess risks for all three conditions in second-degree relatives and in first cousins. The adjusted population attributable risks for familial CD, UC, and IBD were 0.20 (95% CI: 0.17–0.23), 0.17 (0.14–0.21), and 0.19 (0.17–0.22), respectively.
Table 2. Risk of Crohn's Disease, Ulcerative Colitis, and IBD to Degree Relatives
|Disease||Relatives||Relatives of Cases||Relatives of Controls||OR (95% CI)||P-value|
|Affected Relatives/ Unaffected Relatives||Percent of Relatives Affected||Affected Relatives/ Unaffected Relatives||Percent of Relatives Affected|
|Crohn's disease||First-degree||178/17,506||1.02%||155/87,526||0.18%||5.72 (4.61–7.1)||<10−9|
| ||Second-degree||84/30,763||0.27%||242/145,526||0.17%||1.64 (1.28–2.1)||<10−4|
| ||First-cousins||102/32,052||0.32%||309/147,621||0.21%||1.53 (1.22–1.91)||<10−3|
|Ulcerative Colitis||First degree||164/19,215||0.85%||207/95,588||0.22%||3.92 (3.2–4.81)||<10−9|
| ||Second degree||98/34,909||0.28%||273/141,487||0.19%||1.45 (1.15–1.83)||0.00162|
| ||First cousins||126/36,410||0.35%||391/144,802||0.27%||1.29 (1.05–1.57)||0.0134|
| ||Second-degree||324/63,365||0.51%||197/63,072||0.31%||1.64 (1.37–1.95)||<10−7|
| ||First-cousins||362/65,158||0.56%||280/64,682||0.43%||1.29 (1.1–1.51)||0.00145|
Using the familial standardized incidence ratios, we identified multiple kindreds with excess risk of developing CD, UC, or IBD. We limited kindreds to those kindreds that have a greater than expected excess of disease (P < 0.05; see Materials and Methods for details) and those that contain five or more living affected individuals. We identified 655 high-risk CD kindreds with a median FSIR of 2.8 and a range of 1.4–19.4. In all, 615 UC kindreds were identified, with a median FSIR value of 2.7 and a range of 1.4–13.8. A total of 1177 high-risk IBD kindreds were identified, with a median FSIR value of 2.5 and a range of 1.3–10.4. The FSIR distributions for each of these kindreds are shown in Figure 1. Table 3 demonstrates the characteristics of kindreds with the five highest FSIRs for CD, UC, and IBD. These kindreds have an FSIR ranging from 8.7–19.4, a range of 332–828 living relatives, and founder birth years between 1804–1870. Figure 2 is an example of a pedigree with an FSIR of 3.8 (for IBD) and multiple affected individuals descended from a common ancestor.
Figure 1. Distribution of FSIR values for (A) CD, (B) UC, and (C) IBD. For each panel, bars represent number of kindreds for each value of FSIR. A box plot shows median and interquartile range. Points represent statistical outliers with FSIR values >1.5 times the interquartile range. FSIR is the familial standardized incidence ratio (see Materials and Methods for details).
Download figure to PowerPoint
Table 3. Kindreds with the Highest Five FSIR Values by Disorder
|Disease||Founder Birth Year||No. Descendants||FSIR||Obs||Exp|
|Inflammatory bowel disease||1870||332||10.35||5||0.48|
Figure 2. Kindred 6210. A trimmed pedigree (showing only affected individuals) is depicted demonstrating UC-affected individuals (right-shaded shapes) and CD-affected individuals (left-shaded shapes). Circles represent females and squares represent males. The founders were born in 1815 and 1818. The IBD familial standardized incidence for this kindred is 3.8.
Download figure to PowerPoint
- Top of page
- MATERIALS AND METHODS
Our findings can be summarized as follows. First, we identified multiple high-risk Utah kindreds with CD, UC, and IBD. Each kindred has 1) a statistical excess of diseased individuals when compared to the general population, and 2) genealogical evidence that members are descended from a common ancestor (e.g., founder). Second, we characterized the familial clustering of IBD using healthcare data and a population-based sample, demonstrating risks for CD, UC, and IBD to relatives of affected individuals that closely resemble previously reported estimates. Given the need for novel genetic mapping strategies to explain the apparent missing heritability in IBD, further studies of these high-risk kindreds is justified.
Both common and rare genetic variants that cause IBD have been most successfully identified using distinct but complementary study designs. GWAS have demonstrated that multiple genetic variants contribute to CD, UC, or both conditions. IBD-associated genetic variants identified in GWAS are common, with most risk variants having allele frequencies greater than 5% in individuals of European ancestry. These IBD susceptibility loci contribute modestly to disease susceptibility, with odds ratios between 1.1–1.5 (CARD15 variants are a notable exception). Rare genetic variants causing IBD-like phenotypes also have been described. Relevant examples include mutations genes encoding interleukin (IL)-10 receptors, which cause infantile-onset IBD, and the IBD/primary sclerosing cholangitis phenotype associated with CD40 ligand deficiency. These disorders segregate in Mendelian patterns, and were mapped by identifying pedigrees with multiple affected children using linkage analysis.17, 30
An emerging theme from the genetics of complex traits such as IBD is that the observed heritability is not entirely explained by known genetic variants. One potential explanation for missing heritability is genetic variants that are kindred-specific—and therefore rare in the general population—increased risk for disease. Existing study designs are unable to identify such variants. GWAS are inefficient, as these kindred-specific variants are likely not genotyped on existing microarrays, and it is unlikely that genotyping microarrays ascertain a variant that is coinherited (e.g., in linkage disequilibrium) with a kindred-specific genetic variant to a degree sufficient enough to be detected in a GWAS. Classic linkage studies require families with multiple affected members; such families are rare, and mutations identified using this approach are unlikely to explain the observed heritability. Moreover, linkage studies for complex genetic traits like IBD are inefficient because of 1) lack of statistical power given the low penetrance of risk alleles; 2) phenotypic and locus heterogeneity among families/affected relative pairs under study; 3) the multiple genetic variants (as opposed to single-gene mutations) that underlie complex genetic traits.31 Taken together, novel approaches are needed to understand the source of missing heritability for complex genetic traits in general, and IBD in particular.
In this study we used genealogical and healthcare data to identify multiple high-risk IBD kindreds. More specifically, within each kindred there were multiple IBD-affected individuals for whom a common ancestor could be identified. These kindreds potentially allow for identification of specific genetic variants that are not identifiable using GWAS or linkage analysis and unique to a kindred. We speculate that these kindreds are particularly amenable to genetic mapping strategies using shared genomic segment analysis.32, 33 Genomic segments represent long stretches of multiple genetic variants inherited together on a single chromosome (a haplotype). The number of shared segments and their size depend primarily on the number of meioses separating related individuals. Closely related individuals share numerous long segments, while distantly related individuals share shorter and fewer genomic segments. If multiple affected and distantly related individuals share long genomic segments, those segments are likely related to disease. Shared genomic segment analysis has been used successfully to map benign recurrent intrahepatic cholestasis using four individuals,34 and action myoclonus-renal failure syndrome using three individuals.35 Given the number of affected individuals in our high-risk kindreds, previous successes with Mendelian disorders, the ease and relatively low cost of genotyping microarrays, shared genomic segment analysis is an attractive genetic mapping strategy to use for the kindreds identified in this study.
We see several strategies to refine genetic mapping studies in these families. First, kindreds can be prioritized for study using, for example, phenotypic extremes such as high FSIR values (such as the outliers shown in Fig. 1); kindreds with an excess of childhood-onset IBD cases; kindreds with more severe disease; or kindreds containing individuals with multiple immune-mediated disorders in addition to individuals with IBD. Mechanisms are in place to contact individuals within these kindreds and subsequently enroll them. Second, phenotypes should be better refined, as undoubtedly some of the individuals were misclassified as having disease when they do not, or having UC when they in fact have CD (the converse is also true). Third, multiplex IBD families can be ascertained in a clinical setting. Fourth, in an effort to identify kindred-specific genetic variants, shared genomic segment analysis using genotyping microarrays can be coupled with next-generation sequencing of observed shared segments, resequencing of positional or candidate loci based on data from GWAS, and/or whole exome sequencing.36–38 Such approaches are possible only because of the demographic data, healthcare data, and the extensive genealogies contained in the UPDB.
We found, among first-degree relatives, a relative risk of 5.7, 3.9, and 3.4 for CD, UC, and IBD, respectively; that 4%–6% of IBD-affected individuals had an affected first-degree relative; and statistically significant excess risk to relatives as distant as first cousins. Prior studies have demonstrated if an individual has UC, the risk of UC to first-degree relatives varies from 3.4–15.39, 40 Similarly, if an individual has CD, the risk of CD to relatives varies from 5–35.41, 42 Our risk estimates to first-degree relatives are consistent with some reports, and lower than others. This observation highlights a recurring theme in family studies of IBD: there is significant variation in the reported risk to first-degree relatives of affected individuals. There are several potential explanations for the disparity of reported risk to relatives of IBD-affected individuals.
First, variation in prevalence among cases and control groups may explain variation in risk to relatives. Most family studies report risk ratios. Higher risk ratios are therefore possible if 1) individuals have many affected relatives, or 2) the prevalence in the control population is low. For example, Probert et al41 reported a sibling recurrence risk for CD of 35, an estimate that is widely reported in genetic studies of CD. This estimate was calculated using a CD prevalence of 1,931/100,000 among siblings, and a CD prevalence of 75.8/100,000 in the general population. However, studies from a similar geographic region reported a CD population prevalence of 147 to 214 per 100,000 individuals,43, 44 suggesting that the Probert study may have overestimated the CD sibling recurrence risk. A second potential explanation for the wide disparity in reported risk to family members is provided by our data here: IBD risk to relatives is variable and dependent on kindred. Thus, if individuals in a study are members of “high-risk” kindreds, then prevalence estimates will be greater, and result in a greater risk ratio.
In interpreting case–control studies assessing the risk of disease to family members, it is critically important for disease prevalence in controls to approximate the general population. In our study, the crude prevalence of UC and CD among relatives of controls closely mirrors previous North American estimates,45–47 even in light of the pitfalls of calculating prevalence estimates using healthcare encounter data.48 Thus, our study's large sample size, the likelihood that it is population-based, the careful matching of cases and controls, and the close approximation of IBD prevalence in controls to other North American studies all suggest that we accurately quantified disease risk to relatives and validated UPDB as a resource for future studies.
Importantly, our findings are likely generalizable to other populations of Western and Northern European ancestry. Early genetic studies demonstrated that the population of Utah is neither a genetically isolated population nor genetically distinct from other populations of European ancestry,49 and we recently reaffirmed this using large-scale genotyping arrays.50 Indeed, Utah subjects are the source of European samples in both the HapMap Project and the 1,000 Genomes Project. Thus, to the extent that genetics contributes to IBD pathogenesis, findings in Utah families are generalizable to other populations of European ancestry. It should be pointed out, however, that environmental exposures that modify IBD pathogenesis are likely different: the smoking rate in Utah is 9.3%, the lowest smoking rate of any state, and approximately half of the national average.51
The main limitation of our study is the method we used to ascertain IBD cases. We used healthcare encounter data and defined cases as those with a single ICD-9 coded encounter for CD or UC, which may have resulted in misclassification bias. In previous epidemiologic studies, the accuracy of ICD-9 codes to identify IBD cases depends primarily on the nature of the encounter data and the number of encounters over time. Studies using ICD-9 codes for studying the epidemiology of IBD are usually performed in regions with vertically integrated healthcare data such as Canadian Provincial data45, 52 or health maintenance organizations.46 Of note, even using validated algorithms, encounter data can lead to disparate disease rate estimates.48 We selected a single encounter for several reasons. First, an IBD-affected individual may not have multiple encounters over time if they are well, and would therefore be misclassified if multiple encounters were required for the diagnosis. Second, IBD individuals may have a single encounter in the administrative dataset simply because of referral patterns in Utah. Third, for future studies we want to find as many possibly affected individuals as possible, to refine their phenotype. A second potential limitation is that the healthcare data utilized was available only to the early 1990s, and thus may limit the number of relatives that can be observed. Thus, relatives may be misclassified as unaffected when they are in fact affected. We suspect that misclassification rates among relatives of cases and relatives of controls are similar since they were drawn from the same sample, although it is possible that close relatives may be more likely to have an IBD encoded encounter. This may have resulted in overestimation of the familial risk.
In summary, we identified a large number of kindreds with excess risk of CD, UC, and IBD. We hypothesize that IBD is caused by kindred-specific genetic variants that can be identified using extended kindreds. Genetic mapping studies using these kindreds are justified, and may improve our understanding of the genetic architecture of these disorders.