It is well known that genetic variability affects the risk of many cancers, but details of the patterning of inherited cancer risk across different sites and age groups still are not well quantified.
It is well known that genetic variability affects the risk of many cancers, but details of the patterning of inherited cancer risk across different sites and age groups still are not well quantified.
The authors conducted a nested case–control study of the familial risk of 40 cancers based on a cohort of 662,515 individuals from the Utah Population Database. From 1 to 10 controls selected from the cohort were matched individually on gender, birth year, and birthplace to each cancer case; and familial standardized incidence ratios (FSIR) were calculated for both cases and controls. Conditional logistic regression was used to estimate relative risks and population-attributable risks (PARs) of cancer in relation to FSIR. Relative risks of cancer in first-degree through fifth-degree relatives of cases, compared with controls, were calculated using the proportional hazards methods. All analyses were adjusted for spouse affection status and Latter Day Saints church affiliation.
Thirty-five of 40 cancers exhibited positive associations between risk and FSIR, and 21 of those associations were statistically significant. PAR estimates were strikingly high for prostate carcinoma (57%), breast carcinoma (39%), colon carcinoma (32%), lip carcinoma (31%), chronic lymphocytic leukemia (35%), and melanoma (32%). Both the proportion and the number of all cancers attributable to family history peaked at 32% in the group ages 65–84 years and remained high in the group age ≥ 85 years.
A substantial portion of cancer risk was attributable to familial factors. The patterns of familial cancer recurrence among distant relatives suggested that simple genetic mechanisms may explain much of the familiality of cancer. Cancer 2005. © 2005 American Cancer Society.
Studies of families with disease histories traditionally have been key research elements in establishing the genetic basis of disease. In recent years, however, well characterized phenotypes that segregate among families in simple Mendelian fashion have become the narrow example of genetic disease. The larger and more compelling problem is one of discovering the particular genetic dysfunctions underlying complex traits that involve multiple genes and functional pathways, heterogeneous expression, and probably larger roles for environmental, behavioral, and developmental factors.
Expanded sequence information for the human genome and high-throughput technologies have fueled enthusiasm for studies of cancer phenotypes in “high-risk families.” At the same time, however, wide variation has developed in how such families are ascertained, and growing confusion surrounds the entire nomenclature of “hereditary disease” as it applies to cancer and assessment of disease risk.1, 2 Whereas most studies of common cancers document increased risks for individuals with a family history, risk estimates vary considerably depending on the particular family configuration studied. Moreover, although the attributable proportion of particular cancers due to familial factors has been reported as high as 53%,3, 4 the familial susceptibility syndromes with known genetic causes account for only ≈ 1% of all cancers.5, 6 The much larger category, “sporadic” cancer, occurs with no obvious family history or associated genotype.
Previous studies of familial cancer based on data from the Utah Population Database (UPDB) have been either tests of familial aggregation across many cancer sites,7–9 detailed analyses of familial risks for single cancers,10–12 or studies of cancer risks in first-degree relatives of patients.13 For the current study, we undertook to establish characteristic features of disease risk among cancers and families in the Utah population. We began with a basic comparison among 40 types of cancer documented in the Utah Cancer Registry (UCR) since 1966 with regard to the level of familial aggregation demonstrated for each cancer type. Rather than simply testing for nonrandom familial clusters of disease,7, 9 we used the familial standardized incidence ratio (FSIR) method14 to calculate disease risk for individuals from disease status among all members of an individual's family. Of particular interest was the population-attributable risk (PAR), an estimate of the proportion of each cancer caused by familial factors. We also estimated the PAR separately for each of several age groups to highlight patterns of variation in familial risk as we moved through the age structure of the population. In addition, for each type of cancer, we demonstrated the pattern of familial recurrence risk over multiple classes of relatives in UPDB families.
The UPDB was assembled originally from “Family Group Sheet” records contributed by the Utah Family History Library. The majority of these records were filled out by members of the Church of Jesus Christ of Latter-Day Saints (LDS) and were linked to form the original genealogic data base. For the past 30 years, these core UPDB records have been developed as a resource for biomedical research in conjunction with methods to collect data systematically and with minimal duplication from external sources, such as the Utah Department of Health, the UCR, the Drivers License Bureau, and the Health Care Finance Administration (HCFA), using probabilistic record-linking approaches.15, 16
The result of UPDB data developments over the last decades has been to capture the population of the state of Utah during the pioneer migration and settlement phase, then extend and add records through the current period. UPDB currently contains information pertaining to > 5.5 million individuals.17, 18 More than 1.7 million of these records describe a core genealogic set whose members and founders span Utah's pioneer settlement period, including documented points of emigration from other nations. These individuals were born between 1800 and 1970 and form genealogies of > 4 generations' depth on average.19 Probabilistic record linking, with the use of names and birth dates of children and parents, has allowed us to connect almost 50 years of Utah birth certificate data to the existing genealogic data, so that the largest families in the data base now span 8–10 generations.
The population represented in the UPDB is not inbred, closed, or isolated compared with other northern European Caucasian populations.20 Whereas these core genealogies do represent a relatively homogeneous population in terms of religious affiliation, this alone does not imply restrictions to gene flow, particularly not given the active member-recruitment history of the LDS church. Many genealogic relationships are known among UPDB members, but there is not necessarily more relatedness among them than among members of similar populations. The population of the UPDB reflects the growth that has taken place in Utah from the time of initial European immigration until the present.
Cancer incidence data for Utah residents comes from the records of the UCR. The UCR has been a statewide cancer registry since 1966. Since 1972, it has been a member of the National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) Program. SEER registries are monitored continually for quality accuracy and completeness of data. Cancer data for the current study consist of incident cancers diagnosed between 1966 and 1996 and linked by probabilistic methods and software to records of individuals in the UPDB genealogy. Approximately 40% of all cancers diagnosed in Utah during this period were linked directly to an individual represented in the UPDB genealogic data.
In 1997, we linked 240,000 UPDB records of individuals born between 1852 and 1931 to the vital status file of the HCFA, which contains data for individuals ages 65 years and older. In total, 139,061 UPDB records matched HCFA records and provided follow-up information on 96,812 living individuals and 42,249 deceased individuals. No individual who was born before 1870 was linked positively to an HCFA record. Individuals age < 65 years and individuals age > 65 years who did not link to an HCFA record were linked to Utah driver license (DL) data. This source yielded an additional 418,742 individuals who were born prior to 1985 and who lived in Utah during the risk period. In addition, UPDB records were linked to State death certificates (DCs) for the years 1903–1999; approximately 60% of Utah DCs for these years linked to existing UPDB records.
Using the combined UPDB, HCFA, DL, and DC data, we identified a cohort of 662,515 individuals who were born between 1870 and 1984 for whom we had vital status follow-up in the form of either a UPDB death record (from the genealogic data or DCs), an HCFA vital status record that placed them in the state of Utah between 1966 (the first year the UCR collected statewide incidence information) and 2000, or a current Utah driver's license. The Institutional Review Board of the University of Utah approved this study.
The cohort we constructed was left-truncated, which means that no cancers diagnosed prior to 1966 could be observed, although many cancers certainly did occur among cohort members prior to that date. To avoid underestimating the risk to individuals at younger ages, no one was counted as at risk until 1966 regardless of age. Each individual was added to the set of those at risk at his or her age in 1966 (or at birth, if after 1966), and was followed until they were diagnosed or censored. Boucher and Kerber21 showed that this approach yields unbiased estimates of survival parameters.
Although a cohort of > 660,000 individuals provides a powerful tool for cancer epidemiology, repeated analysis of much of the data represents a waste of human and computational resources. In particular, young members of the cohort contribute little to our understanding of cancer risk, because many of the cancers of interest mostly affect individuals age ≥ 60 years. This is especially true in Utah, where sustained high birth rates have led to the youngest population in the United States. A nested case–control study22, 23 is an efficient alternative method for analyzing epidemiologic cohort data.
We constructed nested case–control data sets by identifying patients with each of 40 different types of cancer. We drew controls randomly from among cohort members matched according to year of birth, gender, place of birth (Utah, Idaho, or other), and presence in the risk set at the time the case was diagnosed. In a nested case–control design, cases who are diagnosed at later ages are eligible to be selected as controls for cases who were diagnosed at younger ages.22, 23 Variable numbers of controls were matched to cases, depending on the frequency of each kind of cancer in the cohort. Table 1 shows the number of cases of each kind of cancer, and the number of controls selected. For a small number of cases, the desired number of controls could not be matched, because the stratum of candidates was exhausted.
|Cancer type||Description||No. of cases||No. of controls||Match ratio|
|AGL||Acute granulocytic leukemia||475||4708||10|
|ALL||Acute lymphocytic leukemia||179||1780||10|
|CGL||Chronic granulocytic leukemia||238||2366||10|
|CLL||Chronic lymphocytic leukemia||575||5694||10|
|CNS||Central nervous system malignancies (including brain)||920||9158||10|
|Endometrium||Uterine corpus (endometrial) carcinoma||2411||12,024||5|
|Kidney||Kidney or renal pelvis carcinoma||999||9900||10|
|Lung||Lung, bronchial, or tracheal malignancies||3180||3180||1|
|Melanoma||Melanoma of the skin||2667||2667||1|
|Monocytic leukemia||Monocytic leukemia||96||952||10|
|Mouth||Oral or pharyngeal malignancies||819||8124||10|
|Other endocrine||Pituitary and other endocrine carcinomas (not thyroid)||65||641||10|
|Other leukemias||Other leukemias||254||2515||10|
|Rectum||Rectal and anal carcinoma||1903||9469||5|
|Small intestine||Small intestine carcinoma||193||1916||10|
|Soft tissue||Soft tissue carcinoma||363||3609||10|
All relatives of patients (cases) and of matched controls with follow-up information were included in the familial recurrence risk calculations. In sibships that contained multiple cases of the same cancer, for example, each case was employed as a separate “proband,” and the risk among all siblings of each case was tabulated separately. Bai et al.24 showed that this approach leads to unbiased estimates of familial recurrence risks. Familial recurrence risk ratios (RRs) within each category of relative were estimated by proportional hazards methods after adjusting for gender and year of birth.
Because sibships and individuals were represented multiple times in the familial recurrence risk data, the observations were not independent, and the usual variance estimates for proportional hazards models do not apply. Thus, we applied the Huber–White sandwich estimator of variance,25 which is robust to nonindependence of observations and, in this instance, adjusts for nonindependence within sibship clusters. Because sibships sometimes were represented multiple times, we ensured that all siblings were linked together in the same clusters for estimating the variance, regardless of how many times they appeared in the data. Although this adjustment does not address the lack of independence between clusters, the inflation of variance estimates introduced by the correlation among first-degree relatives (within clusters) is very small, and we expect that any introduced by more distant relatives (between clusters) is even smaller. Currently, bootstrap or other Monte Carlo estimates of variance are not feasible computationally on this scale. Familial recurrence risks among spouses of cancer cases, controls, and their kin also were calculated according to the procedure described above.
FSIRs were calculated according to the method described by Kerber.14 Two further refinements of FSIR are used in the analysis. The first of these is a simple logarithmic transformation, LFSIR = ln(FSIR + 1), which improves the behavior of FSIR as a covariate in a regression model. The second transformation, which has been described in detail by Boucher and Kerber,26 is an empirical-Bayes (EB) adjustment for uncertainty of the type suggested by Greenland and Robins.27 Each FSIR estimate has an associated standard error, a function of both variation in risk among an individual's relatives and the number of family members observed. We adjust for this measurement error by iteratively moving the FSIR estimates toward the mean in proportion to the magnitude of the standard error, using an expectation-maximization (EM) algorithm.28 Thus, the least certain estimates (those with the highest standard errors) are moved close to the mean, whereas estimates with smaller variances remain closer to their original values. We refer to this adjusted FSIR as the AFSIR. Both measures were used as covariates in conditional logistic regression models of cancer incidence in the cohort, adjusted for gender and year of birth. The sandwich estimator of variance25 was used to correct for nonindependence of observations due to the inclusion of siblings among cohort members. From the resulting effect estimates, we calculated population attributable risks according to the method of Bruzzi et al.29
Family history results are presented in Table 2 for 40 cancers. Thirty-five cancers showed a positive association between LFSIR and relative risk. Twenty-one of those associations were statistically significantly at the 0.05 level. AFSIR showed a similar pattern of association, typically with larger effect estimates and larger variances.
In Table 3 we report the proportion of each cancer attributable to familial factors based on LFSIR and AFSIR. For most cancers, the AFSIR-based estimate of attributable risk is substantially larger (≈ 50% on average) compared with the corresponding LFSIR-based estimate.
|Cancer type||Cases||PAR (%)||95%CI (%)|
|No. analyzed||No. familial|
Table 4 shows the PAR estimates for all cancers broken down by age at diagnosis. For the majority of cancers, the proportion attributable to familial factors decreases slowly with increasing age; although, for most late-onset cancers, the total number of cancers attributable to familial causes increases through the age category 65–84 years. Figure 1 shows the age-specific burden of familial cancers in a different manner. Stacked bars indicate the total number of cancers in each age category and the number estimated to be of familial origin broken down by type. Solid bars indicate familial cases, whereas hatched bars of the corresponding color represent nonfamilial “sporadic” cancers.
|Cancer type||Age group|
|0–44 yrs||45–64 yrs||65–84 yrs||≥ 85 yrs|
|PAR (%)||No. of patients||PAR (%)||No. of patients||PAR (%)||No. of patients||PAR (%)||No. of patients|
Table 5 displays familial recurrence risk estimates for each type of cancer for first-degree through fifth-degree relatives. For each category of relative, the relative risk of recurrence is given along with a 95% confidence interval (95%CI) based on the robust variance. No penis cancers were observed among first-degree through fifth-degree relatives of cases, so penis cancer was dropped from Table 5. For seven other cancer types, familial recurrence risks could not be estimated for first-degree relatives, because no relatives with cancer were observed for either cases or controls. All 32 remaining cancer types yielded a risk of recurrence > 1 of the same cancer in a first-degree relative. Of these, 23 associations were statistically significant. The largest relative risks of recurrence were observed for monocytic leukemia (RR = 12.99; 95%CI, 1.76–96.07), chronic lymphocytic leukemia (RR = 7.85; 95%CI, 4.65–13.24), and thyroid cancer (RR = 6.93; 95%CI, 4.12–11.67). Only eight cancer types (breast, colon, chronic lymphocytic leukemia, Hodgkin lymphoma, lip, melanoma, non-Hodgkin lymphoma, and prostate) had significantly elevated relative risks of recurrence among second-degree relatives. Eleven cancer types (breast, colon, chronic lymphocytic leukemia, gallbladder, lip, liver, melanoma, ovary, prostate, stomach, and testis) exhibited elevated familial recurrence risks among third-degree relatives. Familial recurrence risks for breast, colon, liver, lung, endocrine (excluding thyroid), peritoneum, prostate, stomach, and testis cancers were elevated significantly among fourth-degree relatives. Among fifth-degree relatives of cases, risks of breast, colon, larynx, lung, pancreas, and prostate cancer were elevated significantly. Familial recurrence RRs for each of the 4 most common cancers in the UPDB cohort and the weighted average of the familial recurrence RRs for the 36 less common cancers studied are shown in Figure 2.
It is well known that many cancers aggregate in families. However, quantification of the degree to which different types of cancers cluster in families and the proportion of cases attributable to familial aggregation has not progressed very far. For this study, we made an effort to provide improved estimates of these quantities based on the unique resources of the UPDB. We employed a cohort study design to minimize biases caused by loss to follow-up, which can greatly affect estimates of familial cancer risk. We adjusted our estimates to reflect risks associated with shared adult environment reflected by spouses diagnosed with the same disease. In addition, we adjusted for LDS church affiliation among individuals, a factor that has been associated with reduced mortality from cancer and other diseases, presumably due to reduced alcohol and tobacco consumption.30, 31
In principle, models based on LFSIR should produce estimates of relative risk (and, hence, estimates of the attributable fraction) that are biased downward by misclassification of exposure. Models based on AFSIR should be less prone to such bias, although the correctness of the normality assumption will affect the efficacy of the adjustment. Relative risk estimates for the nested case–control analyses of AFSIR and LFSIR indicate that the strongest familial effects overall were observed for chronic lymphocytic leukemias and for cancers of the prostate, lip, and thyroid. Other investigators have reported strong familiality among cancers of the prostate, lip, and thyroid using data from the UPDB and elsewhere.3, 4, 9 Strong patterns of familial recurrence also have been reported previously for lymphocytic leukemias.13, 32
Our estimates of population-attributable risks of familial cancers are similar in spirit to heritability estimates from studies of concordance in twins and other relative pairs,3, 4 although our estimates are derived from observations of distantly related kin as well as close relatives. Our findings are broadly similar to the results of twin and first-degree relative studies, indicating that complex, multifactorial interactions need not be invoked to explain the familial recurrence patterns of many cancers.33
The striking persistence of increased risk among relatively distant kin of patients, not only for such cancers as breast and colon but for less obviously familial cancers, such as chronic lymphocytic leukemia and cancers of the testis and liver, suggests that relatively simple mechanisms of shared susceptibility are at work in these families. If elevated familial recurrence risks beyond close relatives were based on interactions among multiple factors (independent genes and/or specific environmental exposures), then the probability of sharing these with third-degree, fourth-degree, or fifth-degree relatives would be small and indistinguishable from baseline. Autosomal-dominant gene effects are among the possible explanations for such persistence of risk, but not the only explanation. A single environmental exposure with a large effect on risk and highly correlated among relatives also could enhance positive familial recurrence risks among more distantly related kin.
The estimated proportion of cancers attributable to familial factors at various ages shows how some known trends in cancer incidence vary noticeably among cancer sites. In general, we have shown again that there is a strong association between cancer incidence and advancing age and how some types of cancers are outliers with respect to this trend. Together with the familial factor, the dynamic between incidence and age helps to distinguish various cancers according to better known and less well known aspects of their etiologies. For example, we demonstrate that some cancers indeed conform to the expectation that the highest proportion of familial cases occur at the earliest ages of onset: Bladder, breast, colon, melanoma, testicular, and prostate cancers are representative. However, this association looks very different at a more subtle level, depending on the type of cancer. Prostate cancer, for instance, is the highest incident cancer in the population, and breast cancer is second. Both are highly familial, both increase dramatically in incidence between the ages of 45 years and 85 years, and both show small declines in familial proportion with age. Prostate cancers, however, occur very rarely before age 45 years, when most are attributable to familial factors, compared with breast cancer. Breast and prostate cancers share the status of most common cancers, increasing in incidence to age 85 years, and with a relatively high and sustained familial proportion throughout.
Cervical and testicular cancers occur most frequently before age 45 years than at any other age and, in the case of cervical cancer, with the highest incidence of any cancer before age 45 years. Despite this, these cancers are rare at older ages and are only moderately familial if at all. This pattern suggests less genetic involvement and more exposure to factors confined to younger ages and possibly moderated by immune function, as suggested by the association between HPV and cervical cancer. Hodgkin lymphoma shares the same pattern—uncommon, mostly early in onset, and not significantly familial—and also may suggest an immunologic response to exposures encountered earlier in life.
In a third pattern, early-onset cancers, such as ovarian and CNS malignancies, have distinct etiologies that may have less to do with familial susceptibility than later onset forms. The age-specific attributable risks of thyroid cancers are particularly interesting, with a PAR of 22% in the group age 0–45 years, increasing steadily to 57% in the group age ≥ 85 years, despite the very large drop in baseline risk over the same age groups. The pattern suggests that recent birth cohorts have experienced an increase in thyroid cancer risk unrelated to familial factors, perhaps because of environmental changes. A similar, but less striking pattern occurs for non-Hodgkin lymphoma. Endometrial carcinoma exhibits an inverted-U pattern, with relatively high PAR in the youngest and oldest age groups, but much lower values between ages 45 years and 84 years. The familial fraction of most cancers, however, typically is highest at younger ages and declines gradually with increasing age. Because the risk of cancer rises much more rapidly than the attributable fraction falls, however, the total number of familial cancers increases substantially in each age category except the oldest.
The largely ignored mass of familial cancers among the elderly has important implications both for the designers of diagnostic screening protocols and for investigators who seek to identify genes that affect cancer susceptibility and to model their effects. The facts that family history is a well defined risk factor for cancer that can be measured with reasonable accuracy, even in an interview setting,34, 35 and that the majority of familial cancers among the elderly are cancers for which early detection is possible (prostate, breast, and colorectal malignancies) indicate that individual family histories of disease have an important role to play in targeting cancer screening procedures to those who would benefit most from them.
Ultimately, family histories of disease are simply proxies for genomic data, which someday will prove to be more reliable and readily available as a source of information about individual variation in disease susceptibility. To progress from our current level of understanding to the next, we will need to know much more about the effects of genetic variation on disease risk. Family studies, which have been critical in advancing our knowledge to date, are likely to remain valuable tools for studying complex diseases if nontraditional approaches to gathering and using such data are employed, as shown in the recent spate of successful gene mapping studies from Iceland.36–39
The authors are indebted to Geraldine Mineau and Alison Fraser of the Huntsman Cancer Institute for continuing expansion and quality-control work on the Utah Population Database and to Charles Wiggins of the Utah Cancer Registry for assistance in classifying cancers.