Genetic epidemiology is a discipline elucidating and characterizing the contribution of genetic factors in disease causation, starting from familial clustering of a disease and ending in identification of the underlying genetic mechanisms. Genetic epidemiology contributes to the assessment of cancer risks between family members, providing the scientific basis for clinical counseling and helping therapeutic decisions and cancer prevention. Modeling of cancer in families is informative of the modes of inheritance and etiologic apportioning to hereditary and environmental causes. Genetic epidemiologic data are crucial for gene identification strategies and characterization of genotype-specific risks. A further advocacy for the search of heritable cancer genes is that they may also be important in common sporadic cancers: a promise fulfilled for some but not all heritable cancer susceptibility genes known to date.1
Genetic epidemiology is a relatively new and interdisciplinary science, which has originally dealt with modeling, design and statistical issues in human genetics. At times the integration of genetics and epidemiology was viewed suspiciously by scientists in both disciplines, as described in the introduction to the book Fundamentals of Genetic Epidemiology.2 In the cancer area, different views have culminated on the question of heritable and environmental causes of cancer, in which epidemiologists vigorously argued for the latter. On the other hand, the literature on heritable cancers is full of overstatements regarding their prevalence. Until the end of 1990s, genetic epidemiology has been a small science, but since then it has greatly benefited from the interest in human genome and genetic diseases. With the emergence of single nucleotide polymorphisms (SNPs) as versatile tools, studies on gene-environment interactions have become popular, however, with many embedded controversial issues. Mass publication is going on and results are reported without consideration of the functionality of SNPs or tissue of expression of the relevant genes in subgroups lacking any biologic rationale.3 It is ironic that the historical roles of epidemiologists and geneticists appear to be completely changed: the proponents of gene-environment interactions appear to trust the overwhelming importance of heritable factors when they act in concert with environmental factors while geneticists are raising concerns.4, 5 There is a common failure to recognize that SNPs are inherited and that the related studies focus essentially on heritable effects. Thus, any cancer with a small familial effect cannot show large effects in genetic association studies on common polymorphisms. Analogously, nesting of an association study in familial cancers is beneficial for statistical power.6, 7, 8 We hope that those who plan association studies would take notice of the familial risks and proportions presented in this review.
In this review, we discuss familial risks in the main types of cancer, based on results from the Swedish Family-Cancer Database, which is the largest data set of familial cancer ever used. For unbiased risk estimates, it is important that all data on cancers are medically verified and that family structures are derived from registered sources. This review will not be a synopsis of the 150 publications from the Database. Rather, we will focus on examples on clinical implications of the results and on their guidance to gene identification efforts. We will point out cancers and population subgroups that would appear particularly amenable to molecular analysis. We use the term “familial” to denote cancers in 2 or more first-degree relatives and heritable when an inherited gene defect is known or inferred due to a high risk.9
CONCEPTS AND DATA SOURCES
Familial risk of a disease is a measure of its clustering in family members. Commonly, familial risk is defined between those who have a relative (e.g., parent or sibling) with cancer compared to those whose relatives are free from cancer. The risk can be given as a relative risk or standardized incidence ratio (SIR). The SIRs presented here have been adjusted for age, socioeconomic status and many other variables specified in the original studies. In general, with the exception of age, none of the variables available in the Family-Cancer Database appear to confound familial risk. However, no data are available on tobacco smoking, which is likely to confound familial risks on lung and perhaps on other tobacco-related sites. Population attributable faction (PAF) is an epidemiologic concept, which shows the proportion of cancer that could be prevented if familial causes could be removed.
Genetic epidemiologists working on cancer carry a double responsibility in ensuring that the obtained results are reliable first in guaranteeing the progress of cancer research and second in influencing clinical counseling with potential consequences to the life quality of the patients and their families. Of the estimated 600 global studies reporting familial risks on cancer during the last 2 decades, about 500 are case-control studies. The main problem in these studies may be the possible inaccuracy of data on cancer in family members, who had died a long time before the study. Most recent case-control studies have medically verified diagnosis of the cases but very few make sure that the diagnoses of the relatives are also verified. Even many cohort studies lack verification of cancers in family members.
There is ample literature illustrating the problem of false reporting, the consequences of which, however, are largely ignored. False reporting appears to be low for breast cancer, intermediary for prostate cancer and melanoma and worst for other internal neoplasms, for which the accuracy of reporting may be less than 50%.10, 11, 12, 13, 14, 15, 16 This level of inaccuracy may cause severe bias in the derived risk estimates. Curiously, the tendency has been that the bias has led to increased familial risks, with a probable indication that a patient with a certain type of cancer falsely reports the same cancer in a relative.17, 18, 19, 20 Failure to consider the level of false reporting is the basis of common misquotations on the exaggerated proportion of familial cancer among all cases, which is common even among experts in the field. For example, an authoritative body representing the World Health Organization classification of tumors recently commented on gastric carcinoma that “familial clustering occurs in 12 to 25% with a dominant inheritance pattern.”21 Case-control studies were cited in this work but even their results were misquoted. Our study with medically verified cases of gastric cancer observed only a clustering of 1.9% and an unclear inheritance pattern.18
Reliable familial risks can only be derived when all the component sources of data are reliable and of full coverage. The Swedish Family-Cancer Database fulfills these criteria, except that even currently a small proportion of the offspring population lacks a link to parents, as discussed elsewhere.22 The Database is a compilation of existing data sets, including the Multigeneration Register from Statistics Sweden and cancer cases form the Swedish Cancer Registry (started in 1958). Parents of each offspring have been registered at the time of birth of the child. Thus, it is possible to track biologic parents and half-siblings in spite of divorce and remarriage. The national personal identification code has been deleted from the Database, baring an access to data on a specified individual. In the latest update of the Database in 2002, over 10 million individuals and over 1 million tumors are included. As comparison, the Utah Population Database, successfully used in many cancer studies, has a different structure in containing more than 2 generations and 42,000 cancers in 1994.23
FAMILIAL RISKS AND PAFS
In a recent publication, the Family-Cancer Database was used to calculate familial risks and PAFs for a 0- to 66-year-old population.24 Table I shows familial SIRs for offspring when a parent had the same cancer. Table I also shows the number of observed cases, familial proportion, i.e., the percentage of all affected offspring who have an affected parent, and PAF. Of the 28 shown SIRs, all but those for small intestinal carcinoids, liver, pancreas, other female genitals and connective tissue were significantly increased and they are shown in Table I. The SIR (12.69) was highest for anal cancer but only based on 2 cases; other high SIRs were for the thyroid (7.13), testis (4.58) and esophagus (3.82). Endometrial, ovarian, prostate, skin (squamous cell carcinoma) and nonthyroid endocrine gland tumors, melanoma and myeloma had SIRs in excess of 2.00; the remaining significant increases ranged from 1.50 to 2.00.
Table I. SIRs, Familial Proportions (% of Affected Offspring with Affected Parent) and PAFs for Offspring with Parental History
Familial proportion ranged from 0.33% for connective tissue tumors to 15.34% for prostate cancer. SIR and familial proportion were used to calculate familial PAF; the calculation was done for each cancer site, irrespective of the significance of the familial risk. The PAFs ranged from 0.10% for connective tissue tumors to 9.01% for prostate cancer. Other cancers with a large PAF were colorectal adenocarcinoma (5.15) and breast (3.67), lung (2.93) and skin (2.11) cancer. PAFs have been commonly calculated for environmental exposures but for familial cancer limited data are available.25, 26 If environmental factors have been excluded or quantified, the PAF values give an estimate on the heritable effects for cancer when only nuclear families can be studied. However, because of low penetrance, familial PAFs underestimate true heritable effects. The available data suggest that apart from prostate, breast and colorectal cancer, familial risks observable in nuclear families contribute a small etiologic proportion.
Histopathology may be an important variable in familial cancer but it has not been considered in most studies on familial cancer.22 In Figure 1, we show an example on ovarian cancer in daughters who presented with a specified histology (data not shown). The histology of ovarian cancer in mother could not be specified. The SIRs for daughters of all ages were highest for papillary serous cystadenocarcinoma (SIR = 3.42; n = 27; 95% CI = 2.25–4.98), serous carcinoma (SIR = 3.18; n = 8; 95% CI = 1.36–6.30) and lowest for the endometrioid type (SIR = 2.73; n = 11; 95% CI = 1.36–4.91).
HERITABLE VERSUS ENVIRONMENTAL
The review by Doll and Peto27 on the causes of cancer from 1981 concluded that cancer is mainly an environmental disease. This conclusion has remained unchallenged through the golden epoch of molecular biology and the Human Genome Project. Many migrant studies have provided ever stronger evidence that the environment is able to change the incidence of cancer in one generation.28, 29, 30, 31, 32 However, there is only fragmentary understanding on what the environment actually is and on the mechanisms through which the environment is able to cause cancer. We have tried to estimate the degree of environmental contribution to the familial risk by comparing cancer risks between spouses. Spouse concordance, which does not generally exceed an SIR of 1.4, can be noted only for cancers with known strong environmental risk factors: lung and genital cancers and early-onset gastric and pancreatic cancer and melanoma.33, 34 Spouse correlation does not consider environmental sharing early in the life; this has been estimated by comparing cancer risks between siblings with a small or large age difference.35 Thus, for most other sites, including breast and colorectal cancers, heritability is likely to be the main contributor.36, 37 Environmental factors are probably the main contributor to the familial aggregation of cervical, lung and upper aerodigestive tract cancers, and minor contributor to familial risks for melanoma and squamous cell skin cancer.38, 39
Mendelian hereditary causes of cancer have been estimated to account for 1% of all cancer, and these include syndromes of high penetrance.40 Low penetrance, polygenic effects and gene-environment interactions obscure the distinction between heritable and environmental causes, and there is increasing evidence that these causes are important in common cancers.41, 42, 43
Heritability can be estimated by comparing concordance of cancer between family members and by considering the degree of genetic and environmental sharing between the family members. We will present 2 estimates on the heritable causes of cancer, derived from studies on familial clustering of cancer. The first study used the classic twin design, i.e., comparison of correlation of cancer in monozygotic and dizygotic twins from 3 Nordic countries.44 In this model, it is assumed that both types of twins share equally the environmental effects; monozygotic twins are genetically identical whereas dizygotic twins are like any siblings, sharing by average 50% of their genes. The second study was based on the nationwide Swedish Family-Cancer Database on 3 million families.45 It compared correlation of cancers between all family members using the same statistical model that was used in the twin study. It had a much higher statistical power than the twin study because the whole Swedish population and its 1 million tumors were scrutinized. The twin study gave statistically significant heritability estimates (where the 95% confidence interval did not include zero) only for cancers of the colorectum (35%), breast (27%) and prostate (42%). The family study gave an identical estimate for the breast but a lower estimate for the colorectum. Heritability of cervical cancer was 22% but that of lung and bladder cancer and leukemia was below 10%.
An interesting question is the correspondence of familial PAFs, discussed in the previous section, to the heritability estimates derived from twin studies or family relationships modeled from this Database.44, 45 In the twin study, the heritability estimates for colorectal, breast and prostate cancer were some 2 to 5 times higher than the familial PAFs. One obvious reason for the difference between these estimates is the inability of the PAF calculations fully to cover recessive, low-penetrant and polygenic effects. Low-penetrant gene effects will make familial patterns difficult to observe but they may be detected in twin studies.
For counseling of individuals from cancer families, empirical data on familial risks are necessary. Risk models based on genetic epidemiologic data are being used in counseling individuals from breast and prostate cancer families. The 2 commonly applied models for breast cancer risk assessment, the Gail and the Claus models, were devised before the identification of BRCA1 and BRCA2. The Gail model, developed from a large follow-up study, predicts a woman's risk for breast cancer based on her individual risk factors, including a family history.46 The Claus model, developed from a case-control study, has been useful particularly for the estimation of heritable risks of breast cancer.47
Familial risks may be independent of the cause, but the recommendations should of course consider whether the causes are environmental or heritable. There is a strong familial risk for lung cancer but the primary recommendation to a family member of a lung cancer patient is to stop smoking.39 Similarly, clinical counseling on sexually transmitted cancers is concerned mainly on the avoidance of risk factors rather than family history. In the case of melanoma and squamous cell skin cancer, counseling has to consider both; an affected sibling with one of these skin neoplasms signals a 3-fold risk, and if additionally a parent was affected, the risk was 10-fold or more.48
The diagnosis of cancer in a sibling, particularly at an early age, raises concerns about the risks of the remaining sibling. The concern is even deeper if a parent has also been diagnosed and the familial risk in such rare families are substantial.48 In Figures 2–4, we show sibling risk for breast, colorectal and ovarian cancer by age of onset as an example of the clinically relevant data available from the large Family-Cancer Database. Familial risk for early-onset (age 31–40 years) breast cancer among sister is about 3.1 and it declined to about 2.0 at a higher age (Fig. 2). For colorectal cancer, SIR among siblings is over 4.0 until age 50 years, and thereafter declining to less than 2.0 (Fig. 3). The SIR for ovarian cancer among sisters is 65 at ages before 30 years; it declined to about 10 premenopausally and further postmenopausally (Fig. 4). These kinds of age-incidence tabulations can be developed for most cancers, similar to the above Gail and Claus models, and they can be used in clinical counseling in the absence of any genetic data.
There are no commonly accepted clinical counseling guidelines or action plans for familial cancer at large. For familial breast and ovarian cancer, guidelines have been developed for genetic testing for BRCA1/2, and these consider age of onset and sex of the affected family members and their tumors.49, 50 Similar guidelines have been developed for hereditary colorectal cancer, also encompassing endometrial cancer.49, 51, 52 We have estimated that the American Cancer Society guidelines on familial colorectal cancer consider a familial risk of 2.2 for an action level.48 Although many types of data need to be considered for an action level of familial cancer, the data in Table I show that familial SIRs exceeded 2.2 for most cancers, at least at some age groups. These data would urgently call for site-specific or more uniform guidelines on a clinical counseling and a covering action plan for familial cancer.
CANCERS FOR GENE IDENTIFICATION
Linkage analysis has been the main tool in gene identification. Although association studies may be more practical than linkage studies for complex diseases, linkage is still a powerful tool if informative families are available.53, 54, 55 Before family-based studies should be undertaken, familial clustering of cancer has to be demonstrated and known syndromes have to be excluded. Gene identification has been most successful for genes that cause a high risk, such as BRCA1/2 and mismatch repair genes. Results can be obtained even for rare cancer syndromes, such as Peutz-Jeghers or skin leiomyomas, if the families are homogeneous and the risk is high.56, 57
We have been able to define population-level risks in known syndromes, such as von Hippel-Lindau, heriditary nonpolyposis colorectal cancers and multiple endocrine neoplasia 2, when histopathology has been considered.58, 59, 60 It would be important to estimate the extent of the familial clustering that can be ascribed to these syndromes, and complementarily, the proportion of familial clustering that has other causes and probably other yet unknown susceptibility genes. This question needs to be addressed when new gene identification strategies are being planed. In Table II, we list germline mutations known to underlie heritable cancers based on a recent review on human cancer genes.1 For some cancers, we also give an estimate for the proportion of familial risks that can be explained by the genes considered. The proportion in Table II is an estimate of the familial risk that the listed germline mutations could explain. For example, for breast cancer, the listed genes are thought to explain 30% of a familial risk of 1.80. The familial excess risk is 1.80 − 1.00 = 0.80; 30% of 0.80 is 0.24. Thus, the explained familial excess risk for breast cancer is 0.24 and unexplained risk 0.56. For colorectal, breast and ovarian cancer and for melanoma, unselected populations have been screened and the estimates may be accurate; the relevant literature is cited in our recent publications.36, 37, 61, 62 For other genes, no representative population level screening data are available and we refrain from giving any estimates for the proportion; however, the proportions in these cases are likely to be less, even much less, than the cited proportions.
Table II. Estimated Proportion of Familial Risk that can be Explained by Mutations in Known Heritable Cancer Genes
We have shown examples on familial clustering even in cancers for which genes are largely unknown, including squamous cell carcinoma of the skin, intestinal carcinoids, thyroid papillary tumors, brain astrocytomas and pituitary adenomas.63, 64, 65, 66 Some of the new findings are shown in Table III. In some common cancers, such as lung and kidney cancer, an early-onset recessive component appears to cause a high risk among siblings in families where parents are unaffected. In the case of familial melanoma-breast cancer clustering, the susceptibility was ascribed to p16 mutations.67, 68 These examples illustrate how family studies may facilitate identification of the susceptibility genes by groups who have an access to biologic samples from affected families.
Table III. Evidence of Familial Risks in Cancers Where Heritable Genes are Probably Unknown. Based on the Swedish Family-Cancer Database
The prerequisite for all gene identification and quantification studies is an assurance that the type of cancer has a genetic component.69 Although this condition has been shown for most cancers at the population level,23, 70 there are others for which the evidence may be based on rare or extremely rare syndromes, and an attempt to genotype such cancers drawn from the whole population may be a futile enterprise. Reliable data for familial risks, proportions of familial cases and PAFs of familial cancer can be used to calculate the maximal gene effect (gene frequency and risk) that can be expected for heritable susceptibility genes. For example, knowing the familial risk for a certain cancer, it is possible to calculate the heritable gene effect, which can account for the risk, assuming a single gene or, alternatively, any number of genes.43, 53 Such data will provide empirically based expectations to gene identification efforts. Genotype relative risk (GRR) is a measure commonly used in genetic epidemiologic calculations, and it is identical to odds ratio (OR) of association studies. GRR is defined as the risk of cancer for individuals with the susceptibility genotype(s) divided by the risk of cancer for individuals without the susceptibility genotype(s).54, 71, 72 The relationship between SIR and GRR is complex even if SIR only depended on heritable factors and we refer to the above sources for a detailed presentation of the methods used below.
We apply a single-locus additive model to develop relationships between GRR and the frequency of the susceptibility allele (p) using the SIR data obtained for breast, ovarian and colorectal cancer from the Family-Cancer Database (Fig. 5). We make no allowance for known genetic components (such as BRCA1/2 or mismatch repair genes) or possible shared environmental effects. GRR was defined as the cancer risk for the variant homozygote divided by the cancer risk for the wild-type homozygote. All the curves are U-shaped, showing high (> 100) GRRs at low gene frequencies (< 0.001) and again at higher gene frequencies (at p > 0.1 for colorectal and ovarian cancer and at p > 0.2 for breast cancer). Similar data have recently been presented for a number of SIR values by Wang et al.55 The high GRRs for rare genes are typical of the high-penetrance genes, such as BRCA1/2. On the other hand, the high GRR values at high gene frequencies are only theoretical (because they would account for excessive familial PAF) and, in reality, common alleles will show low GRRs, i.e., low-penetrance genes would cover such gene frequencies.
U-shaped relationships between the frequency of the susceptibility allele and GRR are also found for dominant, multiplicative and recessive single-locus models (results not shown). However, the convexity of the curve, its minimum and the corresponding gene frequency depend on the genetic model (dominant > additive > multiplicative > recessive), increasing with increasing values of SIR. The representation of the relationship between susceptibility allele frequency and GRR for other kinds of relatives, e.g., for mothers, may be used to discriminate between genetic models.
All the main neoplasms appear to have a familial component that ranges from below 1% to 15% for prostate, depending on the particular site. Familial risks observed among twins and patients with multiple primary cancers provide support for the multistage carcinogenesis in human cancers at a population level.41, 43 There are some practical implications from such findings. One is that in the search for new susceptibility factors in cancer, low-penetrance genes may be better identified in association studies with a case-control design than in linkage studies.72, 73, 74 However, even in association studies familial cases would be preferable for statistical power.6 The second implication is that in clinical counseling polygenic and recessive conditions imply uncertainty.43 The disease strikes apparently randomly even though there is an inherited background. However, the age-incidence tables that can be derived for most familial cancers would be helpful for clinical counseling, analogous to the Gail and Claus risk estimation tables used for breast cancer.
There are no available data on the etiology of cancer that would refute the predominant role of environment as a causative factor. However, since the epochal review by Doll and Peto27 in 1981, modest progress has taken place in the search for new causes of environmental carcinogenesis. One likely reason is that environmental carcinogenesis is due to the interaction of external and host factors that cannot be unraveled by epidemiologic or molecular biologic means alone. There is hope that merging of these approaches into molecular epidemiology or, even better, into molecular genetic epidemiology will be able to tool the exogenous/endogenous interphase of human carcinogenesis. Gene-environment interactions are a promising area for cancer research but so far many obstacles exist on the approaches to be used, and the daunting problems of generating false positive results have not been solved. It may be doubtful whether gene-environment studies should be conducted if the direct effect of the gene on the relevant cancer has not been demonstrated. Other requirements for hypothesis-driven research would be an a priori knowledge on the functional effects of the SNP to be selected for study. The availability of powerful genomics technologies and samples of large numbers of well-characterized populations of preferentially familial cases should advance our knowledge on main heritable underpinnings of cancer.
The Family-Cancer Database was created by linking registers maintained at Statistics Sweden and the Swedish Cancer Registry.