The Swedish Family-Cancer Database comprises a total of 11.8 million individuals covering the Swedish population of the past 100 years. Version VIII of the Database is described in the present article. Cancer cases were retrieved from the Swedish Cancer Registry for the period 1958–2006, including more than 1 million first primary cancers. The number of familial cancers in offspring is 14,000 when a parent was diagnosed with a concordant (same) cancer and the number of concordant siblings was 6,000. From the year 1993 onwards histopathological data according to the SNOMED classification were used, which entails advantages for certain cancers, such as breast cancer. Even though the specific morphological classification only covers a limited number of years, it does cover most familial cancers in the offspring generation. The Database records the country of birth for each subject. A total of 1.79 million individuals were foreign born, Finns and other Scandinavians being the largest immigrant groups. The cancer incidence in the first-generation immigrants was compared to that in native Swedes using standardised incidence ratios (SIRs) to measure relative risk. The SIRs ranged widely between the immigrant groups, from 1.9-fold for myeloma to 25-fold for melanoma. The differences in SIRs were smaller in the second-generation immigrants. The usefulness and the possible applications of the Family-Cancer Database have increased with increasing numbers of cases, and the numerous applications have been described in some 300 publications. Familial cancer studies are in the stimulating interphase of the flourishing disciplines of genetics and epidemiology.
Studies on familial cancer clustering have been fundamental for the understanding of heritable determinants of cancer and for discovering cancer-related genes.1 Improved understanding of cancer genetics has been helpful in clinical genetic counselling, targeted screening activities and genetic testing.2, 3 In the clinical setting, familial cancer clustering has been studied through identification of probands and multiply affected individuals in their multigeneration pedigrees. Many forms of cancer in which a single gene poses a high risk have been identified, including at least 440 single-gene traits in which cancer is a complication.4–6 Case ascertainment in multigeneration families and availability of biological specimens are usually feasible in the clinical settings. The disadvantages include difficulties in obtaining large numbers of cases and in securing unbiased risk estimates. It has been estimated that clinical observation probably works for dominant diseases for which the relative risks are over 10. However, for recessive conditions, clinical observation is less sensitive, and most results for recessive conditions have come from isolated populations with high rates of consanguineous marriage. Another approach for studying familial cancer has been to analyse cancer risks of the relatives of the index cases in analytical epidemiological studies. Although the number of cases may be large in such studies, the reliability of the information, usually based on recall, tends to be less certain. However, when registered data on family relationships are available and when they can be coupled to cancer register data, an unbiased retrieval of familial cases is possible, as has been shown for the Swedish family data by Leu, Reilly and Czene.7, 8 Such population-based datasets have been constructed in some countries and geographical areas, including the Utah Mormon population database and the Icelandic genealogical database.9–11 Denmark, Norway, Finland and Sweden can establish family relationships from multigenerational registers, and these have been used in cancer and other disease studies.12–15
The Swedish Family-Cancer Database was first assembled from the national databases in 1996 and since then it has been updated periodically.16 It has been used by us and others, who have used various aliases for the assembled databases, in probably some 300 cancer studies. These publications have been the main global source of unbiased data on familial risks. The Database is the largest in the world for familial cancer and here we described the year 2009 update, which is version VIII in sequence. In addition to the present contents of the Database, we will show some unique potentials that the current data offer in terms of histological specification and immigrant data.
The focus in the present article is on the structure and applications of a population-based family register. Space does not allow a survey of other types of family or pedigree datasets which are usually based in a clinical setting and which have been the major approach to identifying Mendelian diseases. The advantages in such clinic-based studies are diagnostic accuracy and access to biological material which enables genetic studies. With the exception of the Icelandic population records used by DeCode, the population-based family databases have contributed only indirectly to genetic studies on human diseases. The ethical framework of many databases has hampered, if not prevented, contact with the registered individuals or access to biological material. Thus, the databases have allowed excellent studies on the genetic epidemiology of diseases but have not allowed access to samples. The success of DeCode in disease genetics demonstrates the lost opportunities for the other population databases and may call into question the ethical motivations of limiting the scientific use of population data.
Structure of the Database version VIII
Statistics Sweden created a family database, the ‘Second Generation Register’ in 1995, which was later renamed the ‘Multigeneration Register’ because it contained more than 2 generations. We have linked this register to the Swedish Cancer Registry (started in 1958) to create the Family-Cancer Database, in which the families are composed of offspring (second generation) born in 1932 and later with their parents. The personal identifiers have been changed to unique numbers by Statistics Sweden, whereby no individual can be identified in the Database. However, Statistics Sweden maintains the codes, and on special focused requests it has been possible to identify individuals, for example, to collect data within the health care system; no contacts to the individuals have been allowed. Some of the previous updatings of the Database have been reported and these give more details on types and organization of the data.16–19 The Family-Cancer Database is supplied with longitudinal demographic and socio-economic data from each national census for 1960, 1970, 1980 and 1990; causes of deaths are obtained from the Causes of Death Register. In spite of the aging offspring generation (<75 years), its age structure is such that the median age of diagnosed cancers is still under 60 years (in the Cancer Registry the median is about 71 years for men and 70 years for women). Thus, we are not fully capable of addressing questions about familial late-onset cancers, particularly limiting the identification of late-onset affected sibling pairs. An article from Utah pointed out the fallacy of the current thinking that familial risks only affect relatively young individuals: most familial cancers were diagnosed at ages over 65.20 Even hereditary cancers can be found at a late age; when mutation screening for hereditary non-polyposis colorectal cancer (HNPCC) was extended to older patients, mutation-positive individual were detected with a median age of 61, compared to the long-held median age of 44.21 Age truncations may cause a bias in risk estimates and it is necessary to consider this possibility in each analysis. Another limitation has been the lack of parental information among a proportion of offspring, even among those born in Sweden (remaining at 2.1% in Database version VIII). However, as this deficit affects mainly those born in the 1930s, any new updating and the complementation work carried out at Statistics Sweden will eventually phase out these even currently small problems as has been shown.7, 8
The total number of first invasive cancers is 1.06 million. Table 1 shows the effects of the past updating on the numbers of familial cancers in offspring, covering the period when the Multigeneration Register assumed its current structure, covering offspring born after 1931 and their parents. In the earlier versions of the Database, the offspring population was defined as those born after 1940 or 1934. In Table I, the numbers of familial cases refer to offspring whose family members were diagnosed with concordant cancers. In a short period of 8 years, the number of offspring of affected parents increased from 5,200 to 14,000 (2.7-fold) and the number of affected siblings increased from 1400 to 6000 (4.3-fold). The large number of parental probands is due to the unlimited age of the parental population. About three times as many cancer cases are recorded in the parental as in the offspring population. Nevertheless, there are increasing possibilities to analyze cancer over 3 generations; for example, for breast cancer close to 1,000 familial second degree relatives can be identified.
Table 1. Updatings of the Swedish Family-Cancer Database with offspring born after 1931. The numbers of familial cases refer to concordant cancer between the family members
The increasing case numbers do not only benefit familial risk estimation but also the updating of longitudinal data also increases some important datasets. The detailed topological-histopathological classification SNOMED has been uniformly adopted for use in the Cancer Registry since 1993. Thus, any update from the Cancer Registry has increased the number of histopathology-specific familial cancers in the Database. Another advantage of the updatings involves the vast increase in cancer in immigrants who have come to Sweden relatively recently.
A 4-digit diagnostic code according to the 7th revision of the ICD has been used in the Swedish Cancer Registry since the start of cancer registration in 1958.22 The ICD 7 code can be used at the 4-digit level to identify topology. A code for histological type (WHO/HS/CANC/24.1 Histology Code, ‘PAD’) has also been used since 1958. This code describes the histology in main subgroups, such as adenocarcinoma and squamous cell carcinoma. From 1993 onwards, ICD-O-2/ICD with histopathological data according to the Systematized Nomenclature of Medicine (SNOMED, http://snomed.org) has been used. This coding system gives a detailed histology-topology of tumours. Even though the specific morphological classification only covers a limited number of years, it covers most familial cancers in the offspring generation (of the order of 90%) because the offspring population is reaching the high-risk age for cancer.
We have assessed the usefulness of the SNOMED histology as available from the Cancer Registry. For some sites this coding system offers no advantage over the PAD codes because no further specification is provided for sites where 1 type of histology prevails. Examples are colorectal and prostate cancers, for which the overwhelming SNOMED histology is ‘adenocarcinoma, not otherwise specified’. Some opposite examples are collected in Table 2. In lung cancer, the PAD code gave extensive specification and the SNOMED codes extended this by separating large cell carcinoma from undifferentiated cancer. Concordance was defined as the percentage of cases with the SNOMED histology that could be found within the same or related designation in the PAD histology. The confirmation of SNOMED codes was close to 100%. The SNOMED codes were much more detailed for breast and thyroid cancers and for melanoma while, for testicular cancer, only 1 additional histological type, ‘embryonal carcinoma’ was provided. The confirmation of the SNOMED histology as PAD histologies was excellent throughout.
Table 2. Concordance for histologies between PAD and SNOMED codes in the Swedish Family-Cancer Database from years 1993–2006
The extent to which histology proves to be an important familial trait remains to be established. It is true that familial risks vary by histological type23; for example, the risk of concordant nervous system tumours ranges from about 2.0 for glioma to 130 for hemangioblastoma,15 testicular cancers show histology-specific familial risks,24 and, most dramatically, for familial eye cancer, ocular melanoma is barely significant while the familial retinoblastoma risk is 1,000.25 Furthermore, many cancer syndromes display a specific set of histologically defined tumours in defined organs, such as hemangioblastoma in von Hippel-Lindau syndrome and retinoblastoma, but whether the histology is pathognomonic may not be obvious. For example, such syndromes as BRCA1/2 for the breast, HNPCC for the colorectum and von Hipple-Lindau for the kidneys show minor specific histological features,1, 26–29 but they are far from replacing mutation testing in accuracy. Environmental factors appear to be able to influence histology as has been witnessed in the major changes from squamous cell carcinoma of the lung towards adenocarcinoma upon changing the tar content of tobacco. Different histologies of lung cancer appear among family members, but the risk of concordant histology exceeds that of discordant histology.30
‘Classical’ cancer studies on immigrants to the USA and Australia have shown that the incidence of common cancers changes to the level of the new host country within 1 or 2 generations.31 These findings were fundamental to the understanding of the environmental aetiology of human cancer.32 Studies in Sweden have shown that the second-generation immigrants, those born in Sweden, have already transitioned to the Swedish cancer incidence.33, 34 However, these results hold specifically for the immigrants that arrived from European countries during 1940–70; these immigrants, who typically integrated quickly into Swedish society, came from countries without drastically different indigenous cancer incidence rates compared to Sweden. The cancer experience of the large groups of Balkan and non-European immigrants has not been extensively studied, partly because of the relatively young age of this immigrant population which arrived mainly after 1970 and who are only now entering the risk age for adult cancer. Although most immigrants generally settled down in various parts of Sweden and mixed with the native population, including the European and Chilean groups, the Eastern Mediterranean populations (e.g., Turks, Kurds and Iraqis) tend to live in colonies apart from other groups. Nevertheless, Sweden is an excellent country for studying the cancer experience of immigrants because of a uniform cancer registration and health care system and the large number of immigrants from practically all around the world: in the Database comprising 11.8 million individuals 1.79 million are with a foreign background. Both birth country and dates of immigration/emigration are available in the Database. Table 3 shows the numbers of immigrants in the Database by country of origin. Finns and Scandinavians are the largest immigrant groups, and Finns dominate the numbers of diagnosed cancers (30% of all cancers in immigrants) because of their long median residential period of 26 years in Sweden. The 30 most populous immigrant groups in Table III account for 85% of all immigrants and 93% of cancers in immigrants.
Table 3. Data on immigrants in the Family-Cancer Database and on their cancers from years 1958–2006
Many immigrants have arrived in Sweden as young couples, whereby their Sweden-born children have a completely indigenous genotype.35 Such data led us to conclude that the childhood environment, rather than genotype, is very important in setting the individual's cancer destiny.33, 34 Many immigrant groups originate from countries with no cancer registration, and thus the Swedish data may provide estimates of the indigenous cancer rates. Some surprising findings in the previous studies have been, for example, high stomach cancer rates in Rumanians, among the world's highest testicular cancer rates in Chileans, high non-Hodgkin's lymphoma rates in Greeks and high thyroid cancer rates in Yugoslav and Turkish women, which may signal truly elevated cancer risks in these countries lacking reliable cancer registration.
In Figure 1, we have updated the immigrant studies carried out using Database version IV along with data from version VII based on 1.55 million first-generation immigrants.34 In Figure 1, we show standardised incidence ratios (SIRs, standardised for age, period, region) for first-generation immigrants as compared to native Swedes (SIR for Swedes 1.0). Large differences (observe logarithmic scale) exist between the immigrant groups and native Swedes, as indicated by the index ‘Highest/lowest’. The smallest difference (1.9-fold) was for myeloma and the largest was for melanoma (25-fold, for which East Asians had a rate of only a few % of the Swedish rate). For both myeloma and melanoma, native Swedes had the highest incidence (SIR = 1.00). Native Swedes showed the lowest rate of no cancer. For some immigrant groups, rates far exceeding the Swedish rates were observed, e.g., liver (‘Other Africans’) and thyroid cancers (Southeast Asians) and Chileans show high rates not only for testicular cancer, but also for stomach cancer.
Far fewer cases were observed among the second-generation immigrants because the population was smaller, 993,742 individuals, and they were relatively young compared to the parental generation (Fig. 2). In line with our previous publication based on Database version IV, the differences to the native Swedes were smaller than in the parental generation. The largest differences (Highest/lowest) were observed for bladder (4.9-fold) and testicular cancer (6.4-fold), but the Swedish rates were intermediary for these cancers. The largest differences between second-generation immigrants and Swedes were 2.8-fold for nervous system tumours.
The usefulness and the possible applications of the Family-Cancer Database have improved with increasing numbers of cases and it is far beyond the scope of this article to describe the themes of the some 300 odd papers that have emanated from the Database (or its aliases). Familial risk is a population measure of the potential heritability and will guide gene identification studies and help to evaluate the impact of the found genes in terms of the familial risk explained.32 Survival in certain cancers has been found to be familial, implying yet to be found genes regulating survival in cancer.36–39 The other major applications of the familial risk data are the provision of clinical genetic risk estimates40 and the advancement of the aetiological understanding of cancer, both of which will be briefly discussed later.
The two main problems in the literature on familial cancer prior to the Swedish Family-Cancer Database were the inaccuracy of reporting diagnostic data on family members and small sizes of the published studies, whereby rare cancer were not covered. The level of inaccuracy, particularly for internal cancers, may cause a severe bias in the derived risk estimates, with a tendency towards exaggerated risks because the cases overreport and the controls underreport cancers in family members.41 Inaccurate risk estimates are unacceptable for clinical genetic counselling.2 Familial cancer has become an issue in oncology clinics because of the success in implementing genetic testing and screening methods for many cancer syndromes. Public awareness of familial risks and the demand for counselling of patients and their family members have increased. A family history is a risk factor for which advice and management may bring both medical and psychosocial benefits. However, in order to provide advice, the counsellors and the caregivers along the entire medical referral system need to be aware of the true familial risks, particularly for the many cancers that are not covered by the current familial risk management guidelines. Whether or not it is useful to incorporate tumour histology into familial risk algorithms will be seen when histology-specific data become available in the near future.
The global cancer incidence varies extensively. According to the Cancer Incidence in Five Continents, the highest overall male incidence rates, about 500/100,000, are recorded among several US black populations.42 The highest female incidence rates, about 300/100,000, are recorded among US white populations. The lowest overall male and female incidence rates are below 100/100,000 in some African and Indian populations.42 However, the differences between incidence rates at individual sites are usually much larger than those between overall incidence rates because the developing countries, with low incidence rates for most cancers, show very high risks for certain cancers, such as liver, oesophageal, stomach and cervical cancers.32, 42 The causes of these high-risk cancers are ascribed to microbial infections and nutritional imbalances and toxins, while the reasons for most high-risk cancers in the developed countries remain unknown beyond ‘Western lifestyle’ and ‘affluence’.32, 43, 44 Although the overall incidence rates of cancers have increased in the developed countries over the past half century, the vast increases in prostate and breast cancers and in melanoma and non-Hodgkin lymphoma have been partially compensated for by decreases in stomach and cervical cancers.45 Obviously, the reasons for the differential time trends for various cancers are complex and only partially understood. The vast increases in prostate and breast cancers and in melanoma and non-Hodgkin lymphoma over the past half century have remained largely unexplained, being one of the main enigmas of cancer aetiology.32
The reason for the high cancer incidence in the developed countries is the main challenge to aetiological cancer research and immigrant studies are one of the most powerful tools in addressing the challenge. The clues from the previous Swedish studies, updated here, suggest that the origins of many cancers occur in the first 2 decades of life; otherwise, it is difficult to explain the large differences in incidence in the first-generation immigrants, compared to small differences in their Sweden-born children.33, 34 The caveat so far has been the low number of cancers in second-generation immigrants. Support for the early origins of cancer also comes from studies on spouse correlations in various cancers.46 Spouse correlation has been found only for cancers with strong environmental risk factors, such as lung and stomach cancer.
The first 13 years of the Swedish Family-Cancer Database has been characterised by expansion to a size where even relatively rare and histology-specific familial risk can be assessed according to various family relationships. In the future, it should be possible to define some moderately penetrant familial clusters and to formulate clinically useful algorithms for all major cancers that can be transmitted to cancer clinics. Data on immigrants should stimulate the formulation of new aetiological hypotheses and allow testing of them in a multiethnic environment. Familial cancer is in the interphase of genetics and epidemiology, whereby mutual interactions should enhance aetiological understanding and provide practical benefits for cancer prevention and treatment.
The Family-Cancer Database was created by linking registers maintained at Statistics Sweden and the Swedish Cancer Registry.