Both environmental and hereditary factors cause cancer. Studies of familial cancer aggregation have been the main approach in the assessment of the hereditary effects in cancer. Population-based studies provided reliable quantitative estimates on familial risks, particularly when the family relationships and cancers in relatives have been confirmed.1–6 Such data increase the understanding of cancer etiology and provide rationale for identification of the involved genes. These studies are, however, less advantageous in the dissection of the effects of shared genes from the effects of shared environment as they use only one order of genetic relationships. Family members share many environmental factors, such as lifestyle, including diet and habits, which may increase or decrease exposures to cancer-causing or protective factors. Experience from the Swedish Family-Cancer Database suggests that lifestyle-related factors explain at least in part the familial aggregation observed in some cancers.6, 7
Comparison of monozygotic and dizygotic twins has been used in modeling studies to apportion the genetic and environmental effects of susceptibility of cancer8–14 and a similar modeling has recently been applied in a general family setting.15, 16 Another study has assessed the effect of shared environment by comparing cancer risks between spouses.17 Among all the main cancers, only 2 sites, stomach and lung, exhibited spouse concordance. Despite this, family studies are still rarely used to identify and quantify genetic or shared environmental factors in cancer susceptibility. Based on our twin data,14 Risch18 recently argued that single locus or additive models of inheritance explain familial risks of cancer better than the multifactorial threshold models used in the twin studies. We have questioned Risch's approach of pooling cancers of different etiology for interpreting genetic models.19 However, most of the 100 studies published from the Swedish Family-Cancer Database have tested or assumed single locus/additive models and in our present article we will compare results from these 2 models.
Here we apply structural models, originally developed for twin studies, to assess the genetic and environmental components in main types of cancer. Family and cancer data were obtained from the nationwide Swedish Family-Cancer Database (the 1999 update) with more than 700,000 cancers and a population of 9.6 million, the largest dataset ever used for family studies. It offers unique possibilities for reliable modeling because all the data on family relationships and cancers were obtained from registered sources of practically complete coverage. Results on colon cancer, melanoma and lung cancer using an earlier version of the Database have been previously published.15
MATERIAL AND METHODS
The Swedish Family-Cancer Database includes persons born after 1934 (“offspring”) and their biological parents, totalling more than 9.6 million individuals.20 Cancers were retrieved from the nationwide Swedish Cancer Registry from years 1958–1996. In addition of invasive cancers, cervical in situ cancers were obtained. Of the invasive cancers, more than 600,000 were diagnosed among parents and 92,000 among offspring. The Database is otherwise practically complete but has a gap among those born from 1935–1940 who died before the end of follow-up. These individuals lack links to parents in the Database, thus probably causes a deficit of some 10,000 cancer cases. The effect is a somewhat inflated estimate for fatal cancers, particularly among siblings.
Offspring were diagnosed for their first primary cancer in years 1958–1996 at ages 0–61 years. A 4-digit diagnostic code according to the 7th revision of the International Classification of Diseases (ICD-7) was used for cancer-type identification. The following ICD-7 codes were pooled: 162 and 163 (“lung”), 204–207 (“leukemia”), 208 and 209 (polycytemia vera and myelofibrosis, respectively). Rectal cancer, ICD-7 code 154, was mucosal rectum (154.0). Primary sites that were included in the category of endocrine glands are those at adrenal, parathyroid and pituitary glands.
Family pairs (parent-child [both biological and adoptive], sib-sib and spouse-spouse) were characterised for each specific cancer: either no cancer, one with cancer or both with cancer. Each individual is entered more than once into the analysis, e.g., a parent appears in the spouse comparison once and in the parent-offspring comparison as many times as he or she has children. Each individual may thus contribute cancers to as many pairs as he or she is a part of. A sibship that contains 1 affected and 1 unaffected sib counts as a discordant pair; a sibship with 2 affected sibs results in 1 concordant pair; a sibship with 3 affected individuals (sib A, sib B and sib C) counts as 3 concordant pairs (AB, AC and BC); a sibship with 2 affected and 1 unaffected individual is counted as 1 concordant and 2 discordant pairs and so on.
In the case of cervical cancer, melanoma and lung cancer, we performed the additional analyses separately for 2 distinct birth-cohort groups. The birth-cohorts are divided into (i) both sibs born 1941–1950 and the offspring in the parent-offspring relations born 1941–1950; (ii) both sibs born 1951–1970 and the offspring in the parent-offspring relations born 1951–1970. In the case of cervix in situ, due to earlier age of onset cohorts were 1941–1955 and 1956–1975, respectively. The restriction applied that both sibs had to be in the same birth-cohort to be incorporated into the calculation led to a reduction of eligible sibpairs.
For statistical analysis, cancer types were selected for which sufficient numbers of concordant pairs in the offspring generation were observed, thus excluding some common cancers such as prostate and nonmelanoma of the skin. In the model-fitting analysis, we treated cancer as a binary trait with the assumption of an underlying normal distribution of liability, with additive contribution of multiple factors. This liability is the sum of genetic and environmental effects and the liability distribution has a threshold value that discriminates between cancer and not cancer.21 The thresholds are estimated from the prevalence of disease in each type of relative separately. Thus, an individual affected by a specific cancer would exceed the (unmeasured) threshold value of the liability distribution for that type of relative (e.g., father or daughter).
The importance of genetic effects (G) is indicated by higher correlation among relatives that are more closely related to each other genetically. Shared environmental effects (S) result in a familial resemblance among family members that are not genetically related to each other. Childhood environmental effects (F) form a part of shared environment and result in greater environmental resemblance among siblings than among parent-children. Finally, nonshared environmental effects (E) are evidenced by within-pair differences.
To estimate the parameters, we can write the structural equation for one individual as D = G + S + F + E, where D is the liability to cancer and G, S, F and E are the genotype, shared environment, childhood environment and nonshared environment, respectively. The correlations between any pair of relatives for environmental and genetic effects are set to fixed values according to their degree of genetic and environmental relationship. Thus, first-degree relatives (parent-offspring and full siblings) are correlated 0.5 for the genetic (G) factors, whereas the correlation among half-siblings is 0.25. All types of family members are correlated 1.0 for the shared environment (S) factor. For the childhood environmental (F) factor only siblings are correlated (1.0), whereas parent-offspring has no correlation for this factor. The expectations in the model used here will therefore be the following equations:
The analysis proceeded in 2 steps: (i) Similarity in cancer between family members (parent-children, siblings and spouses) was calculated as tetrachoric correlations, which is a measure of correlation in disease liability;22(ii) two-by-two contingency tables of disease status in the different pairs of relatives were entered simultaneously into a structural equation model-fitting program, Mx,23 to provide estimates of the relative importance of genetic and environmental effects. The types of relationships that were evaluated included (a) mother-father, (b) sisters, (c) brothers, (d) sister-brother, (e) half siblings, (f) mother-daughter, (g) mother-son, (h) father-daughter, (i) father-son, (j) father-offspring (adoptive). In case of small numbers in relationships [(b) sisters, (c) brothers or (d) sister-brother], these 3 groups were pooled (full siblings). Gender-specific thresholds for the parent and childhood generations, respectively, were estimated in the models. The degrees of freedom were the differences in the number of observed statistics and the number of parameters in the model. In the full model (all 10 categories of relationships) there were 30 observed statistics (3 from each type of relationship described above) and 14 parameters (3 parameter estimates [G, S and F; E is calculated as 1-G-S-F] and 11 gender- and age-specific thresholds) resulting in 16 degrees of freedom. Significance of parameters was evaluated by 95% CI. The root mean squared error approximation (RMSEA) has also been used to assess the fit of models. Values below 0.10 indicate a good fit and values below 0.05 indicate a very good fit.23
The numbers of family member pairs were calculated from the Swedish Family-Cancer Database that includes 9.6 million persons. In Table I, we give only the numbers of concordant family pairs, i.e., both members affected. However, an example for 2 different relationships is given on stomach: For the spouses (mother-father), there were 3.16 million, 30,699 and 203 pairs with none, 1 or 2 affected members, respectively. For mother-daughter pairs, the respective numbers were 3.1 million, 10,376 and 5. All contingency tables are available upon request.
The observed tetrachoric correlations are shown in Table I. The highest correlations for genetically related individuals were noted for thyroid cancer, ranging from 0.34–0.51 for sibs. Half-sibs had a correlation of 0.19 and parent-child correlations ranged from 0.22–0.33 for this cancer type. Conversely, correlation in the genetically nonrelated relationship of spouses was found to be the lowest (0.04), indicating strong contribution of genetic factors in thyroid cancer. The highest correlations for spouses were noted for lung and stomach cancer (0.15), indicating the importance of shared environmental factors.
The pair-wised observations were subjected to a structural equation modeling that provided parameter estimates for genetic and environmental effects (Table II). The environmental effects were tested separately for environmental effects that were shared by all family members (“shared”), shared environmental effects limited to childhood period (“child”) and nonshared environmental effects (“nonshared”). For illustration, heritability was estimated to account for 1% of the variation in susceptibility to stomach cancer, shared environmental effects for 15%, childhood environmental effects for 13% and nonshared environmental effects for the remaining 71%. Our statistical model provided an excellent fit to the observed data for this cancer (χ2 = 4.5, with 11 df; p = 0.95). The RMSEA was <0.001 (the smaller the value of RMSEA, the better the fit).23
Table II. Effects of Genetic and Environmental Factors in Cancers at Various Sites
In case of cervical and lung cancer and melanoma, our original analysis did not provide a very good fit to the observed data (p < 0.2; data not shown). Additional analyses using 2 distinct birth-cohort groups resulted in estimates on genetic and environmental effects similar to the earlier (without birth cohorts) but improved the fit of the model (Table II).
Statistically significant proportions of susceptibility to cancer accounted for by genetic effects (i.e., for which the 95% confidence interval did not include zero) were obtained for all studied cancers except for leukemia. The estimates ranged from 1% (stomach and leukemia) to 53% (thyroid). For shared environmental effects, the estimates ranged from 0% (cervix) to 15% (stomach) and for childhood shared environmental effects from 2% (nervous system) to 17% (testis) (Table II).
Results from the present model were compared to a standard analysis of familial risks calculated as standardized incidence ratios. In Figure 1, the proportion of variance explained by familial effects (genetic + shared environmental + childhood shared environmental effects) is presented together with previously published risks for offspring of affected parents from the same Database.24 Spearman correlation analysis of data gave a correlation coefficient of 0.87 and p < 0.001.
There is a broad consensus on the predominant importance of environmental factors and somatic events in human cancer. Previous twin and family studies agree that some 60–90% of the studied cancers can be explained by environmental factors.8, 11, 13–16 Our results support these estimations, as the contribution of genetic factors was shown to exceed that of environment only in thyroid cancer. For all other cancer types, the main contribution of the total liability (from 58–88%) was assigned to nonshared environmental effects. These factors include any unique environmental causes of cancer that are not inherited, not shared among family members and can be interpreted as a purely sporadic causation. Many, if not most of the environmental causes of cancer remain unidentified.25 According to a recent estimate, only 33% of male and 29% of female cancers in the Nordic countries can be ascribed to presently identified environmental causes.26 The most important identified environmental causes of cancer are tobacco smoking, infections, ultraviolet and ionizing radiation, alcohol and occupational exposures.25–28 The unexplained fraction is often ascribed to identified or suggested dietary factors.28 In cancers with no known environmental causes, the proportion of sporadic (chance) mutations is likely to be high.
Shared environmental effects
Previously, the effect of shared environment was assessed from this Database by comparing cancer risks between spouses.17 Among all the main cancers, only 2 sites, stomach and lung, showed spouse concordance. These results are in agreement with our present study, as highest tetrachoric correlations for spouses among all cancers were estimated for stomach (0.15) and lung (0.15) (Table I). A recent twin study14 estimated the effects of shared environment as 0–20%, but none of these values were statistically significant.
Our model revealed the significant contribution of the environment shared among family members to many cancer types. The effect of shared environment was reflected by nonzero correlations between spouses and between parents and adoptive children. The highest proportion of shared environment was found for gastric cancer (15%), where the correlation between spouses was 0.15. One of the most likely causative agents of gastric cancer is chronic infection. Up to 60% of gastric cancers is attributed to Helicobacter pylori in developed countries and the infections run in families.29, 30 Still, one cannot exclude the modification of the familial cancer risk or bacterial infection by, e.g., dietary factors. The high correlation of lung cancer between spouses (0.15), as also reflected by a 9% proportion of shared environment, is obviously caused by shared smoking habits (or environmental tobacco smoke) among the spouses. Bladder cancer, another tobacco-related site, showed 12% of shared environment. The same was true for colon cancer (12% of shared environment), most probably due to common diet of family members. Cervical cancer, on the other hand, showed no proportion of shared environment.
Childhood shared environmental effects
Our study shows the significance of shared childhood environment for cancer sites at testis (17%), stomach (13%), cervix in situ (13%), endocrine glands (11%) and melanoma (8%). As for the latter, the childhood effect of 8% may be due to sunburns in childhood and adolescence, suggested to be critical for melanoma.31, 32 This result is in line with a previous study31 that suggested the importance of sun exposure for melanoma prior to establishment of families, thus high sharing in childhood and adolescence and lower sharing among spouses. As to in situ cervical cancer, one risk factor shared by sisters might be human papilloma virus infection due to similar sexual behavior. Interestingly, the shared proportion was higher for in situ than invasive cervical cancer. Apparently, other environmental and/or genetic factors are important for getting an in situ cancer compared to getting a malignancy.
The total contribution of hereditary factors to the causation of sporadic cancer is unclear; previous assessments have mostly estimated the proportion of cancers caused by monogenic syndromes. The estimates on the contribution of the hereditary factor vary depending on the definition. “Unmistakable hereditary cancer syndromes” are thought to account for about 1%,33 “highly penetrant single-gene mutation” for 5%,34 and “primary genetic factors” for 5–10%.35 of all cancer cases. Apart from highly penetrant single-gene mutations, the estimation of the hereditary contribution is extremely difficult and the risks posed by low-penetrance single-gene mutations, polygenes and recessive genes are poorly understood.
Contribution of inherited factors to the causation of cancer has been previously quantified in modeling studies among twins.11, 14 However, the rarity of twinning has limited this approach, even for the common forms of cancer. In a sample of more than 90,000 Swedish, Danish and Finnish twins, statistically significant effects of heritable factors were observed only for prostate (42%), colorectal (35%) and breast cancer (27%). Similar modeling has recently been applied to a general family setting and the estimated genetic components were 10–18% in colorectal cancer, lung cancer and melanoma15 and 27% in cervical cancer (invasive and in situ combined).16
In our study, inherited genetic factors accounted for 1–53% of causation of cancer. The highest contribution of heritable factors (53%) to the total variation in liability to the disease was shown for thyroid cancer. Other cancers with a high proportion of genetic factors were that of endocrine system (28%), testis (25%), breast (25%) and cervix (22%) but only 13% in in situ cervical cancers and melanoma (20%). Low to moderate contribution of hereditary factors was estimated for cancers of colon (13%), nervous system (13%), rectum (12%), non-Hodgkin lymphoma (10%), lung (8%), urinary bladder (7%) and kidney (7%). Importance of shared genes was found to be very low for stomach (1%) and leukemia (1%). Interestingly, the genetic effect of nervous system cancers was equally high whether or not childhood cancers were included; in leukemia the inclusion of childhood neoplasms decreased the genetic effect (Table II). This is consistent with a previous study where the familial risks in adult and childhood nervous system tumors were equally high, but in childhood leukemia there was no familial effect.6, 36
Even though candidate genes have been described at many cancer sites, they are likely to explain only part of the familial cancer aggregation. Thyroid and other endocrine cancers, which have shown the highest effects of heritability in our study, are likely to be linked to the RET gene in multiple endocrine neoplasia 2.37, 38 Mutations in the p16 gene have been described in families with melanoma, particularly with multiple effected individuals.33 More than 100 melanoma kindreds were sequenced for germline p16 mutations in Sweden and only 8% were found to carry mutation.39 Thus it is likely that other, yet unidentified susceptibility genes are involved in melanoma. Similarly, it has been estimated that germline BRCA1 and BRCA2 mutations each account for some 11% of breast cancer in families with 3 or more affected relatives.40 Most familial cancers, however, encompass only families with 2 affected relations; thus the proportion attributable to BRCA1 and BRCA2 is lower, 2% of all familial breast cancer.41, 42 The population attributable proportion of the recessive ATM gene to breast cancer is equally large.43
A number of studies have reported on the associations of HLA class II alleles with susceptibility to cervical cancer.44–46 However, these associations (with different grades of neoplasia or invasive cervical cancer) appear to be relatively weak and it remains to be determined whether they account for a significant proportion of the heritability of the disease. Equally little is known how the genetic predisposition to cervical tumors is affected by susceptibility or sensitivity to HPV infection.
Our model also revealed cancer sites with low to moderate magnitude of heritability; candidate genes for these traits have been previously identified. The genetic basis of colorectal cancer has been characterized for 2 syndromes: polyposis coli, a rare disease involving numerous colonic polyps where germline mutations in the APC gene have been noted,33 and hereditary nonpolyposis colorectal cancer (HNPCC), where mutations in the DNA mismatch repair genes have been detected.47 HNPCC may stand for 1–3% of all colorectal cancer cases.48, 49 For gastric, renal and bladder cancers, some dominant susceptibility genes have been identified, including E-cadherin, VHL and Rb, but they are rare and explain only a small proportion of familial cancers.37, 38 Cancer of nervous system is increased in several known dominant cancer syndromes.37, 38 Much less is known about the genetic basis of lung cancer; common polymorphisms in genes coding for xenobiotic metabolism have been suggested to play a role in susceptibility to lung cancer.50
One possible bias of our study model is the inability to cope with nonadditive (recessive) genetic effects, as these would be completely masked by childhood environmental effects. For example, the high proportion of shared childhood environment found for cancer of testis could actually be due to recessive mode of inheritance. Epidemiologic studies and segregation analysis have suggested that family data on testicular cancer fits a recessive or X-linked pattern.51, 52 Another potential limitation of our study is that all similarities among spouses were assumed to stem solely in shared environmental effects after marriage. There is, however, the possibility of assortative mating, which—if present in the cancers discussed here—would lead to the overestimation of the shared environmental parameter and a concomitant underestimation of the genetic influences.
Another limitation with our study is that, except for cohort analyses that we used for cervical cancer (invasive and in situ), melanoma and lung cancer, the model used cannot take age-dependent genetic effects into account. However, our results correlated well (Spearman correlation coefficient = 0.87; p < 0.001) with previously published risks for offspring of affected parents from the same Database24 where age is taken into consideration. Therefore, our models seem to identify the magnitude of the familial effects reasonably well. However, the limitations of our study design suggest that the heritability estimates provided by our study are the lower limits of the importance of genetic effects. In addition, it is likely that statistical analyses better detect the genetic effects for early-onset cancers than for late-onset cancers where the offspring has not yet had the chance to contract the disease.
We have used structural equation modeling developed for twin studies to study the importance of genetic and environmental effects for cancer susceptibility. Our statistical model provided a good fit to the data from the Database, but whether the model is biologically correct remains to be established.18, 19 Statistically significant estimates of heritability were obtained for all studied cancers except leukemia. These estimates were from 1–53% and for most cancers clearly exceeded the estimated proportions due to single-gene defects, which may imply operation of low penetrant genes or polygenes. Interestingly, the combined estimates for heritability and shared environmental effects correlated highly with the parent-offspring familial effects even though such comparison does not give information about how the absolute values of the estimates from these 2 different models would agree. Nonshared environment accounted for the main effect, but shared and childhood environments could also be apportioned. The model and results presented herein will be useful parameters in the interpretation of familial aggregation of cancer.