• Crohn's disease;
  • ulcerative colitis;
  • inflammatory bowel disease;
  • genetic epidemiology;
  • genetic predisposition to disease;
  • pedigree;
  • genealogy and heredity


  1. Top of page
  2. Abstract
  6. Acknowledgements


The observed heritability of inflammatory bowel disease (IBD) is incompletely explained by known genetic risk factors. Kindred-specific genetic variants that cause IBD may be a source of “missing heritability.” Given that they have been previously difficult to identify, we sought to identify high-risk IBD kindreds.


We used a large population-based database—the Utah Population Database (UPDB)—which contains genealogical and healthcare data to characterize the risk of Crohn's disease (CD), ulcerative colitis (UC), and IBD in kindreds. We identified CD and UC cases using ICD-9 codes. We calculated the adjusted relative risk to relatives of affected individuals. We calculated the familial standardized incidence ratio (FSIR) to quantify the kindred-specific disease risk.


In all, 3601 CD cases and 3976 UC cases met inclusion criteria. A total of 655 CD kindreds and 615 UC kindreds had a statistical excess of disease. Risk of disease varied among kindreds, with some kindreds demonstrating ≈20-fold elevated risk. For CD, UC, and IBD, relative risks were significantly elevated for first- and second-degree relatives and first cousins. The adjusted population attributable risks for familial CD, UC, and IBD were 0.20 (95% confidence interval [CI]: 0.17–0.23); 0.17 (0.14–0.21); and 0.19 (0.17–0.22), respectively.


We identified multiple kindreds with a statistical excess of CD, UC, and IBD, and validated the UPDB as a resource for family studies in IBD. Given the need for novel genetic mapping strategies to explain the apparent missing heritability in IBD, further studies of these high-risk kindreds is justified. (Inflamm Bowel Dis 2011)

The known genetic risk factors do not account for most of the heritability observed in many complex genetic traits.1, 2 Both Crohn's disease (CD) and ulcerative colitis (UC) are complex genetic traits: they occur in families, usually in non-Mendelian inheritance patterns, and have both environmental and genetic components that contribute to their pathogenesis. Moreover, multiple genome-wide association studies (GWAS) have identified common and low-penetrance genetic variants that are associated with CD, UC, or both disorders.3–14 Like many complex genetic disorders, inflammatory bowel disease (IBD) also demonstrates missing heritability: identified genetic risk factors explain only a fraction of the estimated heritability. For example, it is estimated that the 32 loci associated with CD identified via GWAS explain ≈20% of the overall genetic risk.3 The source of this “missing heritability”1 remains unknown, although rare kindred-specific genetic variants may explain this for IBD.2, 15, 16

Because of missing heritability, there is renewed interest in studying families with IBD. Standard linkage analysis has been used to map highly penetrant genetic mutations in families in whom an IBD-like phenotype demonstrated Mendelian inheritance patterns.17 However, such families are rare. It is unclear how best to identify rare genetic variants that may contribute to complex disease that may explain missing heritability. In this study we used the Utah Population Database (UPDB), a large genealogical database containing healthcare records to identify families at high risk for developing IBD. We calculated the familial standardized incidence ratio (FSIR), a statistic developed for family cancer studies in Utah kindreds.18 We identified multiple kindreds with a statistical excess of individuals with UC, CD, or both disorders. Given the need for novel genetic mapping strategies to explain the apparent missing heritability in IBD, further studies of these high-risk kindreds is justified.


  1. Top of page
  2. Abstract
  6. Acknowledgements

Data Sources

The primary source of genealogical data was the UPDB. The foundation of the genealogical portion of UPDB is Utah family histories containing pedigree information. UPDB has been described extensively elsewhere.19 Briefly, in the 1970s ≈185,000 Utah families were identified from archives at the Utah Family History Library, a data source maintained by the Church of Jesus Christ of Latter-day Saints (LDS or Mormons). Each family was identified from Family Group Sheets, which contain demographic and pedigree data on three generations. Families were selected if at least one member of the family had a vital event (birth, marriage, death) either on the Mormon pioneer trail or in Utah. These core families comprise 1.6 million individuals born from the early 1800s to the mid-1900s. UPDB has been extended by record-linking all Utah birth certificates from 1945 to the present; other vital statistics including marriage and death certificates; census data; cancer registry data from both Utah and Idaho; and drivers' license data. As a result of linking data from original Utah founders and vital statistics on past and current Utah residents, UPDB contains pedigree data on kindreds as small as two and three generations to very large kindreds incorporating as many as 11 generations.

UPDB also contains a master subject index from two separate healthcare entities, the University of Utah's University Healthcare and Intermountain Healthcare, which allows for cross linkage with health care administrative records in electronic data warehouses (EDW). These two EDWs include diagnostic and procedural codes for patient encounters with their respective healthcare facilities. Intermountain Healthcare is the largest healthcare system in Utah and operates multiple hospitals, outpatient clinics, ambulatory surgery centers, laboratories, and health insurance plans. Intermountain Healthcare also operates Primary Children's Medical Center (PCMC), the sole tertiary pediatric center for an estimated pediatric population of >1 million children20 and the primary children's hospital in an urban area with >270,000 pediatric patients.21 Based on referral patterns in the State, it is estimated that these healthcare datasets encompass ≈75% of the population of Utah, and also represent a population-based sample.22

Study Design and Subjects

We used a retrospective case–control study design. Cases were included if they had encounters between 1990–2008 for University Healthcare and 1984–2008 for Intermountain Healthcare. We defined CD (ICD-9 code 555.0–555.9) or UC (ICD-9 codes 556.0–556.9) as an individual with at least one ICD-9 encoded encounter. We defined IBD as an individual with at least one UC encounter or one CD encounter. In all, 375 individuals had at least one CD encounter and one UC encounter, and were included in both CD and UC analyses. They were counted only once in analyses for IBD. In addition to satisfying disease definitions, the case had to have at least one first-degree relative in UPDB that was either a parent or a child, and the relative would also have to be in the EDW of either UUHSC or Intermountain Healthcare. Subjects were excluded if they failed to meet these case criteria and constitute “excluded cases” in the analyses described below. In total, 1472 (27%) subjects with UC and 1334 (27%) subjects with CD were excluded. The most common reason for exclusion was incomplete pedigree data (Table 1).

Table 1. Characteristics of the Study Sample
Ulcerative ColitisIncluded (n = 3,976)Excluded (n = 1,472)Total (n = 5,448)P-value
Mean age at diagnosis (SD)42.6 (19.4)46.4 (19.3)43.6 (19.4)<0.001
Median age414643 
Female1,943 (48.9)781 (53.1)2724 (50)0.006
Born in Utah*2,923 (73.5)86 (5.8)3,009 (55.2)<0.001
Data source    
 Intermountain Healthcare2475 (62.5%)806 (55.5%)3281 (60.6%) 
 University of Utah976 (24.6%)493 (34.0%)1469 (27.1%)<0.001
 Both512 (12.9%)153 (10.5%)665 (12.3%) 
Genealogical data    
 None0190 (12.9)190 (3.5) 
 Parent-child relationship only43 (1.1)1126 (76.5)1169 (21.5) 
 2 generations, ≥4 members177 (4.5)11 (0.75)188 (3.5)<0.001
 Multigenerational families3756 (94.5)145 (9.85)3901 (71.6) 
Birth year    
 <1920115 (2.9)51 (3.5)166 (3.1) 
 1921–19501178 (29.6)554 (37.6)1732 (31.8)<0.001
 1951–19802120 (53.3)694 (47.2)2714 (51.7) 
 >1980563 (14.2)173 (11.8)736 (13.5) 
Crohn's DiseaseIncluded (n = 3,601)Excluded (n = 1,334)Total (4,935)P-value
  • *

    Significant in the multivariate analysis.

Mean age (SD)39.6 (18.8)43.8 (18.9)40.8 (18.9)<0.001
Median age374439 
Female1,947 (54.1)791 (59.3)2,738 (55.5)0.001
Born in Utah*2,715 (75.4)75 (5.6)2,790 (56.5)<0.001
Genealogical data    
 None0140 (10.6)140 (2.9) 
 Parent-child relationship only35 (0.97)1051 (79.6)1086 (22.1)<0.001
 2 generations, ≥4 members170 (4.7)6 (0.45)176 (3.6) 
 Multigenerational families3388 (94.3)123 (9.3)3511 (71.5) 
Encounter location    
 Intermountain2245 (62.5%)769 (58.3%)3014 (61.4%) 
 University of Utah765 (21.3%)373 (28.6%)1138 (23.2%)<0.001
 Both583 (16.2%)178 (13.5%)761 (15.5%) 
Birth year    
 <192067 (1.9)32 (2.4)99 (2.0) 
 1921–1950876 (24.3)437 (32.8)1,313 (26.6)<0.001
 1951–19801,970 (54.7)649 (48.7)2,619 (53.1) 
 >1980688 (19.1)216 (16.2)904 (18.3) 

We defined controls as individuals selected from the UPDB who had at least one first-degree relative in UPDB who was a parent or a child, were alive at the time of the IBD diagnosis of their matching case, and who had no healthcare encounter with the ICD codes for IBD. We selected about five such individuals for each case for UC and CD, and one control individual for each case for IBD. We matched cases and controls on birth year, birth state (Utah versus non-Utah), and sex. Relatives (first-, second-degree, and first cousins) were identified and classified as affected based on whether they had CD, UC, or either coded encounters. Institutional Review Board approval was obtained from the University of Utah, from Intermountain Healthcare, and from Primary Children's Medical Center. Additionally, the Resource for Genetic Epidemiology, a regulatory body that approves all studies utilizing UPDB, approved this study, and also approved publication of the pedigree shown in Figure 2.

Statistical Analysis

Four analyses were performed: 1) The differences between included and excluded cases; 2) the risk to relatives of an IBD-affected individual; 3) the population attributable risk for familial IBD; and 4) the familial standardized incidence ratio for identified kindreds. Included cases were compared to excluded cases across several demographic variables including gender, age (in years) defined as the age a first encounter, the proportion of subjects born in Utah (yes/no), the quality of pedigree information available, the source of the encounter data (Intermountain Healthcare, University of Utah, or both), and birth year. We used Student's t-test to compare means for continuous variables and the chi-square test of independence for categorical variables. Because the vast majority of subjects included in the analysis were born in Utah, we tested whether this was the primary determinant of demographic differences using logistic regression. Here the outcome variable was included case (yes/no). We included age (in years, continuous), gender, born in Utah (yes/no), data source (University of Utah/Intermountain Healthcare/or both systems as multiple dichotomous variables) as predictor variables in the logistic regression model. We used a backwards method for variable selection, retaining variables with a P-value < 0.1.

To determine the risk of disease to relatives, we determined the proportion of disease-affected relatives of cases and relatives of controls. We counted all affected relatives of each selected case and control, even if that relative had been counted previously. Bai et al23 demonstrated that failure to do so results in a biased estimate of the familial risk. We calculated odds ratios using conditional logistic regression adjusting for the number of biological relatives, their degree of relatedness to the proband, and their person-years at risk for detection. Observations within individual families are correlated with one another (intra-cluster correlation). We accounted for the clustered correlation structure of these data using the sandwich method of robust standard error estimation.24, 25

We calculated population attributable risks (PAR) for familial CD, UC, and IBD using the technique described by Bruzzi et al.26 Here, for each case the probability that disease is caused by kindred membership (probability of causation, or PAC) is calculated, where PAC = (RR-1)/RR. RR, the relative risk, is estimated for each individual from FSIR, and PAR is the mean PAC across all cases.

We subsequently identified common ancestors (e.g., founders) for cases using a software suite developed specifically for the Utah Population Database.18, 27–29 In studies of large pedigrees, it is not unusual to encounter multiple cases of specific diseases simply due to chance. The FSIR provides a statistical test of the probability of chance occurrence of multiple affected individuals within a pedigree.18 FSIR represents the ratio of the observed number of disease cases in a kindred divided by the expected number of cases in a series of matched control kindreds with statistical significance based on a Poisson probability distribution. We define “high-risk” IBD pedigrees as those kindreds with a P-value < 0.05 and at least five living affected family members. For all analyses, reported P-values were two-sided, results were considered statistically significant if P < 0.05, and confidence intervals were 95%.


  1. Top of page
  2. Abstract
  6. Acknowledgements

A total of 4935 cases of CD and 5448 cases of UC were identified in UPDB. Among CD-affected individuals, 3601 (73%) had at least one relative in UPDB. Among UC-affected individuals, 3976 (73%) had at least one relative in UPDB. These cases constituted the study sample. In all, 7247 subjects were considered to have IBD. Compared to excluded subjects, included subjects were much more likely to have been born in Utah, were younger, and a smaller proportion were females (Table 1). The observed differences between included versus excluded cases were explained by the differences in birth state: in the logistic regression model, only “born in Utah” was statistically significant, and is reflective of the inherent sampling scheme of the UPDB. Birth state drove most of the observed differences between included and excluded cases for both UC and CD.

Among the 3601 subjects with CD, 178 (4.9%) had first-degree relatives who were also affected with CD (Table 2). Among 3976 subjects with UC, 164 (4.1%) had UC-affected first-degree relatives. Among 7247 IBD subjects, 455 (6.3%) had IBD-affected first-degree relatives. The adjusted odds ratios for first-degree relatives of individuals with CD, UC, or IBD were 5.72 (95% confidence interval [CI]: 4.61–7.1), 3.92 (3.2–4.81), and 3.37 (2.78–4.08), respectively. While effect sizes were smaller, there were statistically significant excess risks for all three conditions in second-degree relatives and in first cousins. The adjusted population attributable risks for familial CD, UC, and IBD were 0.20 (95% CI: 0.17–0.23), 0.17 (0.14–0.21), and 0.19 (0.17–0.22), respectively.

Table 2. Risk of Crohn's Disease, Ulcerative Colitis, and IBD to Degree Relatives
DiseaseRelativesRelatives of CasesRelatives of ControlsOR (95% CI)P-value
Affected Relatives/ Unaffected RelativesPercent of Relatives AffectedAffected Relatives/ Unaffected RelativesPercent of Relatives Affected
Crohn's diseaseFirst-degree178/17,5061.02%155/87,5260.18%5.72 (4.61–7.1)<10−9
 Second-degree84/30,7630.27%242/145,5260.17%1.64 (1.28–2.1)<10−4
 First-cousins102/32,0520.32%309/147,6210.21%1.53 (1.22–1.91)<10−3
Ulcerative ColitisFirst degree164/19,2150.85%207/95,5880.22%3.92 (3.2–4.81)<10−9
 Second degree98/34,9090.28%273/141,4870.19%1.45 (1.15–1.83)0.00162
 First cousins126/36,4100.35%391/144,8020.27%1.29 (1.05–1.57)0.0134
IBDFirst-degree455/35,0591.30%137/35,6240.38%3.37 (2.78–4.08)<10−9
 Second-degree324/63,3650.51%197/63,0720.31%1.64 (1.37–1.95)<10−7
 First-cousins362/65,1580.56%280/64,6820.43%1.29 (1.1–1.51)0.00145

Using the familial standardized incidence ratios, we identified multiple kindreds with excess risk of developing CD, UC, or IBD. We limited kindreds to those kindreds that have a greater than expected excess of disease (P < 0.05; see Materials and Methods for details) and those that contain five or more living affected individuals. We identified 655 high-risk CD kindreds with a median FSIR of 2.8 and a range of 1.4–19.4. In all, 615 UC kindreds were identified, with a median FSIR value of 2.7 and a range of 1.4–13.8. A total of 1177 high-risk IBD kindreds were identified, with a median FSIR value of 2.5 and a range of 1.3–10.4. The FSIR distributions for each of these kindreds are shown in Figure 1. Table 3 demonstrates the characteristics of kindreds with the five highest FSIRs for CD, UC, and IBD. These kindreds have an FSIR ranging from 8.7–19.4, a range of 332–828 living relatives, and founder birth years between 1804–1870. Figure 2 is an example of a pedigree with an FSIR of 3.8 (for IBD) and multiple affected individuals descended from a common ancestor.

thumbnail image

Figure 1. Distribution of FSIR values for (A) CD, (B) UC, and (C) IBD. For each panel, bars represent number of kindreds for each value of FSIR. A box plot shows median and interquartile range. Points represent statistical outliers with FSIR values >1.5 times the interquartile range. FSIR is the familial standardized incidence ratio (see Materials and Methods for details).

Download figure to PowerPoint

Table 3. Kindreds with the Highest Five FSIR Values by Disorder
DiseaseFounder Birth YearNo. DescendantsFSIRObsExp
Crohn's disease184338719.4050.26
Ulcerative colitis184570113.8480.58
Inflammatory bowel disease187033210.3550.48
thumbnail image

Figure 2. Kindred 6210. A trimmed pedigree (showing only affected individuals) is depicted demonstrating UC-affected individuals (right-shaded shapes) and CD-affected individuals (left-shaded shapes). Circles represent females and squares represent males. The founders were born in 1815 and 1818. The IBD familial standardized incidence for this kindred is 3.8.

Download figure to PowerPoint


  1. Top of page
  2. Abstract
  6. Acknowledgements

Our findings can be summarized as follows. First, we identified multiple high-risk Utah kindreds with CD, UC, and IBD. Each kindred has 1) a statistical excess of diseased individuals when compared to the general population, and 2) genealogical evidence that members are descended from a common ancestor (e.g., founder). Second, we characterized the familial clustering of IBD using healthcare data and a population-based sample, demonstrating risks for CD, UC, and IBD to relatives of affected individuals that closely resemble previously reported estimates. Given the need for novel genetic mapping strategies to explain the apparent missing heritability in IBD, further studies of these high-risk kindreds is justified.

Both common and rare genetic variants that cause IBD have been most successfully identified using distinct but complementary study designs. GWAS have demonstrated that multiple genetic variants contribute to CD, UC, or both conditions. IBD-associated genetic variants identified in GWAS are common, with most risk variants having allele frequencies greater than 5% in individuals of European ancestry. These IBD susceptibility loci contribute modestly to disease susceptibility, with odds ratios between 1.1–1.5 (CARD15 variants are a notable exception). Rare genetic variants causing IBD-like phenotypes also have been described. Relevant examples include mutations genes encoding interleukin (IL)-10 receptors, which cause infantile-onset IBD, and the IBD/primary sclerosing cholangitis phenotype associated with CD40 ligand deficiency. These disorders segregate in Mendelian patterns, and were mapped by identifying pedigrees with multiple affected children using linkage analysis.17, 30

An emerging theme from the genetics of complex traits such as IBD is that the observed heritability is not entirely explained by known genetic variants. One potential explanation for missing heritability is genetic variants that are kindred-specific—and therefore rare in the general population—increased risk for disease. Existing study designs are unable to identify such variants. GWAS are inefficient, as these kindred-specific variants are likely not genotyped on existing microarrays, and it is unlikely that genotyping microarrays ascertain a variant that is coinherited (e.g., in linkage disequilibrium) with a kindred-specific genetic variant to a degree sufficient enough to be detected in a GWAS. Classic linkage studies require families with multiple affected members; such families are rare, and mutations identified using this approach are unlikely to explain the observed heritability. Moreover, linkage studies for complex genetic traits like IBD are inefficient because of 1) lack of statistical power given the low penetrance of risk alleles; 2) phenotypic and locus heterogeneity among families/affected relative pairs under study; 3) the multiple genetic variants (as opposed to single-gene mutations) that underlie complex genetic traits.31 Taken together, novel approaches are needed to understand the source of missing heritability for complex genetic traits in general, and IBD in particular.

In this study we used genealogical and healthcare data to identify multiple high-risk IBD kindreds. More specifically, within each kindred there were multiple IBD-affected individuals for whom a common ancestor could be identified. These kindreds potentially allow for identification of specific genetic variants that are not identifiable using GWAS or linkage analysis and unique to a kindred. We speculate that these kindreds are particularly amenable to genetic mapping strategies using shared genomic segment analysis.32, 33 Genomic segments represent long stretches of multiple genetic variants inherited together on a single chromosome (a haplotype). The number of shared segments and their size depend primarily on the number of meioses separating related individuals. Closely related individuals share numerous long segments, while distantly related individuals share shorter and fewer genomic segments. If multiple affected and distantly related individuals share long genomic segments, those segments are likely related to disease. Shared genomic segment analysis has been used successfully to map benign recurrent intrahepatic cholestasis using four individuals,34 and action myoclonus-renal failure syndrome using three individuals.35 Given the number of affected individuals in our high-risk kindreds, previous successes with Mendelian disorders, the ease and relatively low cost of genotyping microarrays, shared genomic segment analysis is an attractive genetic mapping strategy to use for the kindreds identified in this study.

We see several strategies to refine genetic mapping studies in these families. First, kindreds can be prioritized for study using, for example, phenotypic extremes such as high FSIR values (such as the outliers shown in Fig. 1); kindreds with an excess of childhood-onset IBD cases; kindreds with more severe disease; or kindreds containing individuals with multiple immune-mediated disorders in addition to individuals with IBD. Mechanisms are in place to contact individuals within these kindreds and subsequently enroll them. Second, phenotypes should be better refined, as undoubtedly some of the individuals were misclassified as having disease when they do not, or having UC when they in fact have CD (the converse is also true). Third, multiplex IBD families can be ascertained in a clinical setting. Fourth, in an effort to identify kindred-specific genetic variants, shared genomic segment analysis using genotyping microarrays can be coupled with next-generation sequencing of observed shared segments, resequencing of positional or candidate loci based on data from GWAS, and/or whole exome sequencing.36–38 Such approaches are possible only because of the demographic data, healthcare data, and the extensive genealogies contained in the UPDB.

We found, among first-degree relatives, a relative risk of 5.7, 3.9, and 3.4 for CD, UC, and IBD, respectively; that 4%–6% of IBD-affected individuals had an affected first-degree relative; and statistically significant excess risk to relatives as distant as first cousins. Prior studies have demonstrated if an individual has UC, the risk of UC to first-degree relatives varies from 3.4–15.39, 40 Similarly, if an individual has CD, the risk of CD to relatives varies from 5–35.41, 42 Our risk estimates to first-degree relatives are consistent with some reports, and lower than others. This observation highlights a recurring theme in family studies of IBD: there is significant variation in the reported risk to first-degree relatives of affected individuals. There are several potential explanations for the disparity of reported risk to relatives of IBD-affected individuals.

First, variation in prevalence among cases and control groups may explain variation in risk to relatives. Most family studies report risk ratios. Higher risk ratios are therefore possible if 1) individuals have many affected relatives, or 2) the prevalence in the control population is low. For example, Probert et al41 reported a sibling recurrence risk for CD of 35, an estimate that is widely reported in genetic studies of CD. This estimate was calculated using a CD prevalence of 1,931/100,000 among siblings, and a CD prevalence of 75.8/100,000 in the general population. However, studies from a similar geographic region reported a CD population prevalence of 147 to 214 per 100,000 individuals,43, 44 suggesting that the Probert study may have overestimated the CD sibling recurrence risk. A second potential explanation for the wide disparity in reported risk to family members is provided by our data here: IBD risk to relatives is variable and dependent on kindred. Thus, if individuals in a study are members of “high-risk” kindreds, then prevalence estimates will be greater, and result in a greater risk ratio.

In interpreting case–control studies assessing the risk of disease to family members, it is critically important for disease prevalence in controls to approximate the general population. In our study, the crude prevalence of UC and CD among relatives of controls closely mirrors previous North American estimates,45–47 even in light of the pitfalls of calculating prevalence estimates using healthcare encounter data.48 Thus, our study's large sample size, the likelihood that it is population-based, the careful matching of cases and controls, and the close approximation of IBD prevalence in controls to other North American studies all suggest that we accurately quantified disease risk to relatives and validated UPDB as a resource for future studies.

Importantly, our findings are likely generalizable to other populations of Western and Northern European ancestry. Early genetic studies demonstrated that the population of Utah is neither a genetically isolated population nor genetically distinct from other populations of European ancestry,49 and we recently reaffirmed this using large-scale genotyping arrays.50 Indeed, Utah subjects are the source of European samples in both the HapMap Project and the 1,000 Genomes Project. Thus, to the extent that genetics contributes to IBD pathogenesis, findings in Utah families are generalizable to other populations of European ancestry. It should be pointed out, however, that environmental exposures that modify IBD pathogenesis are likely different: the smoking rate in Utah is 9.3%, the lowest smoking rate of any state, and approximately half of the national average.51

The main limitation of our study is the method we used to ascertain IBD cases. We used healthcare encounter data and defined cases as those with a single ICD-9 coded encounter for CD or UC, which may have resulted in misclassification bias. In previous epidemiologic studies, the accuracy of ICD-9 codes to identify IBD cases depends primarily on the nature of the encounter data and the number of encounters over time. Studies using ICD-9 codes for studying the epidemiology of IBD are usually performed in regions with vertically integrated healthcare data such as Canadian Provincial data45, 52 or health maintenance organizations.46 Of note, even using validated algorithms, encounter data can lead to disparate disease rate estimates.48 We selected a single encounter for several reasons. First, an IBD-affected individual may not have multiple encounters over time if they are well, and would therefore be misclassified if multiple encounters were required for the diagnosis. Second, IBD individuals may have a single encounter in the administrative dataset simply because of referral patterns in Utah. Third, for future studies we want to find as many possibly affected individuals as possible, to refine their phenotype. A second potential limitation is that the healthcare data utilized was available only to the early 1990s, and thus may limit the number of relatives that can be observed. Thus, relatives may be misclassified as unaffected when they are in fact affected. We suspect that misclassification rates among relatives of cases and relatives of controls are similar since they were drawn from the same sample, although it is possible that close relatives may be more likely to have an IBD encoded encounter. This may have resulted in overestimation of the familial risk.

In summary, we identified a large number of kindreds with excess risk of CD, UC, and IBD. We hypothesize that IBD is caused by kindred-specific genetic variants that can be identified using extended kindreds. Genetic mapping studies using these kindreds are justified, and may improve our understanding of the genetic architecture of these disorders.


  1. Top of page
  2. Abstract
  6. Acknowledgements

The authors thank Alison Fraser, Cheri Hunter, and Grant Wood for bioinformatics expertise.


  1. Top of page
  2. Abstract
  6. Acknowledgements