The Cancer Incidence in Five Continents (CI5) series comprises nine volumes that bring together peer-reviewed results from population-based cancer registries worldwide. The aim of each is to make available comparable data on cancer incidence from as wide a range of geographical locations as possible. In addition, the existence of long time series of data allows the evolution of risk in different populations over time to be studied. The CI5 I–IX database brings together the results from all nine volumes, spanning a period of some 50 years. In addition, unpublished annual data, with more diagnostic detail, are made available for many cancer registries with 15 or more years of recent data. We describe the construction and composition of the CI5 databases, and provide examples of how they can be used to prepare tables and graphs comparing incidence rates between populations. This is the classical role of descriptive statistics: to allow formulation of hypotheses that might explain the observed differences (geographically, over time, in population subgroups) and that can be tested by further study. Such statistics are also essential components in the planning and evaluation of cancer control programmes.
The Cancer Incidence in Five Continents (CI5) series, started in the 1960s, brings together incidence data meeting acceptable quality criteria from population-based cancer registries throughout the world. In the foreword to volume I, the editors explained: ‘The most valuable data [amongst the sources available for the comparative study of cancer incidence], are undoubtedly, the rates obtained by recording the occurrence of every case of cancer over a specified period’.1 The overall objective of the series has been, therefore, to make available comparable data on cancer incidence from as wide a range of geographical locations worldwide as possible. At the same time, no attempt was made at exhaustive inclusion of all eligible data sets—indeed, it was specifically noted that some were excluded, if a particular country was already well represented. The series has continued to develop according to these broad principles—stressing geographic (and ethnic) diversity of the data included, while adhering to various criteria of quality, which have gradually evolved in complexity and stringency. The volumes include tabulations of cancer incidence rates in three basic formats:
Registry-specific tables showing incidence rates according to sex, age group and cancer site;
tables of summary rates for each cancer site, permitting comparisons between registries;
tables presenting certain simple indices of the validity and completeness of the different contributions.
A description of each contributing registry is provided, but there is no commentary on the results.
The volumes have been published at ∼5-year intervals, the first two by the International Union Against Cancer,1, 2 while the next seven were a collaboration between the International Agency for Research on Cancer and the International Association of Cancer Registries.3–9 Volume I of CI5 presented data on 32 registries from 29 countries; the most recent, the ninth, includes data from 225 registries in 60 countries (Table 1). Each volume has seen substantial innovation with a view for providing more information while preserving the basic format. Volume VI was the first to provide the data on computer-readable media (diskette) and volumes VII and VIII were accompanied by CDs containing software allowing extraction and further analysis of the data, including creation of groups of cancer sites or registries, and facilities for performing statistical comparisons between registries, together with standard graphical representations.10, 11
Table 1. Coverage in nine volumes of Cancer Incidence in Five Continents
In 2005, an electronic database (CI5 I–VIII) was made available on CD-ROM.12 This presented all the published data from volumes I through VIII (and allowed those registries that were still functioning to substitute updated and corrected datasets). In addition, a new database was made available—the annual detailed dataset (CI5-ADDS). CI5-ADDS contained data from 83 registries appearing in the most recent volume, with a minimum of 15 consecutive years of data, and allowed their analysis as annual rates. For the vast majority of these, this was possible not only for a limited range of cancer sites (27) defined by codes on the 10th revision of the International Classification of Diseases (ICD-10)13 but also for a detailed list (181 entities) defined both by ICD-10 and, for 15 cancers, by histological subtype. The CI5 I–VIII CD included analytic software, allowing considerable flexibility in analysis and presentation, including creation of combinations of populations and sites, statistical analyses and graphical outputs.
In this article, we describe web-based tools that update and extend these earlier publications and which now include the most recent volume (IX) of CI5.
Material and Methods
CI5 I–IX comprises two public domain websites, CI5 I–IX and CI5plus (available at ci5.iarc.fr).
The database accessible through the CI5 I–IX application contains the data exactly as they were published in the nine volumes of CI5. In order to present this data within the same application, a common cancer dictionary has been defined. Because the data in the first three volumes of the series were tabulated according to the International Classification of Disease, 7th revision (ICD-7),14 this necessarily forms the basis of the site dictionary, although the equivalent ICD-10 codes are provided (Table 2).
Table 2. Diagnostic entities in the CI5 databases and their corresponding ICD codes
Tables (in PDF format) are available showing some indices of data quality: the percentage of cases microscopically verified (MV%), the percentage of cases registered from a death certificate only (DCO%) and the ratio of the number of deaths to the number of cases registered (MI%). These can be examined by volume, cancer site and sex, for each registry supplying the necessary data in volumes V–IX (they were in different format before that time).
An online analysis option allows creation of tables of incidence rates for a selected population or cancer. The results can be sorted and presented in PDF format or exported in text file.
A download page gives access to the corresponding data (number of cases and rates by sex, 5-year age group and cancer and the corresponding population at risk), by volume, for all contributing registries, as CSV text files. In volumes VII–IX, the standard three-character ICD-10 anatomical sites have been replaced by a set of 252 categories (244 in volume IX) based on a combination of ICD-10 three- or four-character site codes and morphological groups based on codes of the International Classification of Diseases for Oncology (ICD-O).15 The entire contents of volumes VIII and IX are available as PDF files (as well as the age-specific tables that had appeared in printed form in previous volumes).
The CI5plus database contains annual incidence for 101 selected populations from 86 cancer registries published in CI5, for the longest period available (up to 2002), and for 28 major cancer sites—shown in bold type in Table 2. In addition, combined groups of cancer registries in the same country have been added for 11 countries (Canada, China, India, Japan, France, Italy, Poland, Spain, Switzerland, United Kingdom and Australia).
An online analysis option allows creation of tables of incidence rates for a selected population or cancer similar to those of CI5 I–IX, except that it is possible to tabulate summary rates by year (rather than a period corresponding to a particular volume). A graphics option is also available, allowing creation of age-specific incidence curves, or time trends in summary or age-specific rates (by period of diagnosis or birth cohort).
A download page gives access to two databases:
CI5plus, summary: contains the tabulated annual data used in the online application [101 populations, 28 cancer sites (Table 2)].
CI5plus, detailed: contains a subset of the summary database, including incidence data for 88 selected populations for which histological data were available for a minimum of 15 consecutive years (up to 2002). The cancer site dictionary contains 181 diagnostic units, which comprise cancer sites at the third digit level of the ICD-10 categories, and, for some 16 cancer sites, by histological subtype (final column of Table 2).
The ‘main tables’ of CI5 have become a conventional presentation of cancer registry data. They show age-specific rates, by site and sex, together with some summary indices. Table 3 is a typical example, generated with CI5plus (so that any period of time can be selected, rather than those defined in the actual volumes). It shows the rates for males in Goiania (Brazil) for the period 1992–2002, and, as well as the age-specific rates, includes the numbers of cases, crude and age-standardised incidence rates, for 24 sites of cancer.
Table 3. A Cancer Incidence in Five Continents “main table”: Age-specific rates, by site and sex, with crude and age standardized incidence rates, for Goiania (Brazil) 1992–2002
This property of the CI5 databases, providing a repository for the incidence rates of all the registries that have been published, is a valuable resource. The real value of the CI5 project, however, is allowing comparisons between registries. In each volume, this was done using the ‘summary tables’, which show the age-standardised incidence rates for a given period in the different populations. With the CI5 applications, the period of interest can be chosen, as a CI5 volume (in CI5 I–IX) or as any selected period of years, using CI5plus (Table 4). Note that the web application enables this list to be sorted by alphabetical order or according to the incidence rates (the ASR in Table 4). Since both web applications include several entries per registry, a summary table can be produced for a given registry, showing the evolution of incidence rates over time (Table 5).
Table 4. A “summary table”: Age standardized incidence rates of stomach cancer in 2000–2002, in different populations
Table 5. A “summary table”: Incidence rates of breast cancer, at ages 0–74, in females in Denmark, in different time periods
Tables are an excellent method for presenting numerical data (and they illustrate well the values that would be included if the selected data were downloaded by the user). However, visualisations of time trends are much more vivid, and the possible reasons underlying them more apparent, when they are presented as graphs. The graphic options in the web applications are relatively limited in scope and confined to CI5plus (although users can export any data and use their own selected software). Figure 1a shows trends in age-standardised rates of cancer of the cervix uteri in age group 25–64 years in three English registries combined. A graph including only ‘cervix cancer’ could have been generated with CI5plus, but Figure 1a shows, in addition, trends for three histological subtypes: squamous cell carcinomas, adenocarcinomas and other histological types (either carcinomas or neoplasms, with no further detail as to subtype). Within the overall category, it is clear that the decline in incidence since 1989 (the year that the national screening programme was redesigned16) has been entirely a result of a decline in squamous cell tumours. There has been almost no change in incidence of adenocarcinoma over the same time period. The same phenomenon has been noted elsewhere, for example, in Finland17 and the United States.18 Cervical cytology in the context of an organised screening program—especially when the cytological specimen is taken from the exocervix using the Ayre's spatula—is successful in detecting preinvasive squamous cell lesions but relatively ineffective in detecting the glandular precursors of adenocarcinoma.19
Figure 1b shows another example—the striking increase in the incidence of cancer of the thyroid in young women in France. Analysis by histological subtype shows that most of this is due to an increase in papillary carcinomas. The fact that the biggest increase in incidence concerns very small tumours supports the hypothesis of the role of medical practice (an increased rate of biopsy of thyroid nodules, together with their careful histological examination) in a context of high prevalence.20 A third example (Fig. 1c) shows the trends in incidence of oesophageal cancer in white males in the nine SEER registries. It clearly illustrates that the increasing incidence of oesophageal cancer is due to adenocarcinomas not squamous cell tumours. The incidence of squamous cell carcinoma in this population has, in fact, been declining. Similar increases in oesophageal adenocarcinoma rates have been observed in many countries.21 The increase has been linked to factors favouring gastro-oesophageal reflux such as the increased prevalence of obesity. The increasing use of a variety of medications that can relax the lower oesophageal sphincter may also have an impact on Barrett's oesophagus and the consequent increased risk of adenocarcinomas.
Figure 2 shows a graph generated by CI5plus of the trends in cancer of the breast by age, in Norway. Because of the relatively small numbers upon which some of the rates are based (individual years, 5-year age groups), the curves have been smoothed using a 3-year moving average of the rates (an option in CI5plus). The moderate increase in most age groups in the earlier years is replaced by more rapid increments in incidence after the introduction of screening at ages 50–69 years, where there is a temporary increase, followed by a fall to a level somewhat above that of the prescreening phase. Mammographic screening started in four counties of Norway in 1995–1996 and was steadily expanded nationwide.22, 23
Figure 3 shows age-specific liver cancer rates, plotted by period of birth (birth cohort), for males aged 45–79 years in Osaka prefecture, Japan. There is clearly a peak in incidence coinciding with birth between 1930 and 1935. The prevalence of chronic infection with Hepatitis C virus (HCV) among males in Japan is highest in this generation. An epidemic of HCV infection in Japan started in the 1940s, coinciding with an outbreak of parenteral amphetamine use in the devastated society after the Second World War, and the spread was probably amplified by blood transfusions and parenteral medical procedures in the 1950s and 1960s. It ended by the early 1990s at the latest, as evidenced by the very low incidence of HCV infection among repeat blood donors.24
CI5 is a unique repository of statistical information, which can be used for comparative studies, between different populations (defined by geography or ethnicity) and over time. The data in the CI5 applications do not consist of individual records of cancer cases. Rather, they are in the form of a matrix that includes the numbers of cases and population at risk by cancer type, sex, age group, population and period. In CI5 I–IX, periods are generally about 5 years in length, in CI5plus, they are 1-year intervals. To preserve confidentiality, data are only made available when the population at risk in any given cell is greater than 500, making identification of individual patients impossible.
In the study of differences in risk of cancer between populations, or over time, statistics on cancer incidence have several advantages over mortality data. Mortality differences can only provide an accurate proxy for risk differentials if the probability of dying from cancer is constant in the populations being compared; this is unlikely to be the case for most types of cancer, as comparative survival studies show.25, 26 Deciding what condition actually causes death is a complex undertaking,13 especially in older individuals with multiple chronic diseases, and innumerable studies have documented the inaccuracy of cause of death statements (and their coding) in vital statistics.27–29 On the other hand, incidence data from registries are frequently criticised as being ‘incomplete’, as it is always possible for the registration process to fail to identify diagnosed cancers, particularly if they are non-fatal. The whole point of CI5, though, is that the data are of high quality and comparable, as was stressed in the introduction to the very first volume.1 “Incidence” of a cancer is synonymous with its initial diagnosis. If diagnostic methodology changes, especially as a consequence of screening (which may also identify ‘latent’ cancers that might otherwise never have been found), changes in incidence may reflect effectiveness and timing of diagnosis, and not just changes in exposure to risk factors. This observation has led to scepticism concerning the value of registry data.30 However, comparative studies of incidence may have more than a single goal, and the consequences of change in diagnostic methods, or the introduction of screening, may also be of legitimate concern.31
An important feature of the CI5 series is that the users can be confident that differences in rates between populations or over time are unlikely to be due to variability in the quality of the data. Although differences over time in registration practices and coding may make it necessary to interpret trends with caution, possible pitfalls of this nature are signaled in the Notes accompanying the numerical data.
The editorial process includes a careful scrutiny of submitted datasets to ensure that they meet objective criteria of completeness and validity. While the aim is to provide maximum geographic coverage, perfectly acceptable datasets from some countries are not included, on the grounds that national data (or a representative sample of it) are already present, and CI5 is not aiming to produce detailed subnational analyses (the task of national investigators). Nevertheless, publication in CI5 is seen as recognition of a cancer registry's maturity and quality, and many use this fact in order to secure funding for their activities; thus, for many countries, multiple contributors are included.
Since volume V, the inclusion of some registries was qualified by an asterisk (*), which implies that some care is required in interpretation of the results for some or all cancer sites (the reasons for which are provided). This relates principally to registries which the editors considered showed some evidence of questionable quality or completeness of information on cases or the population at risk. The warning is not available for data published in the first four volumes and the user should be aware of possible pitfalls, for example, the data from Singapore in volume I comprised histologically diagnosed cases only and cannot be properly compared to the Singapore data in succeeding volumes.
The level of detail with respect to cancer types has varied over time, especially as a consequence of changes in the revisions of the ICD. The online application of CI5 make use of a cancer site dictionary based on the 7th revision,14 the classification that was used for the first three volumes of the series and which is the lowest common denominator of the four classifications used in the series. Nevertheless, a few sites are not fully compatible between different versions of the ICD and the study of time trends for these sites should consider possible discrepancies (potential problems are indicated in the Notes page).
CI5plus includes downloadable data with histological subtype for 16 cancers (the choice was for sites where histology is thought to imply differences in etiology). Groupings into these sites is made based upon the allocated histology code (usually, the morphology codes from ICD-O), and is therefore available only for registries that were able to provide at least 15 years worth of data coded to ICD-O (or a schema convertible to it). Hence, some very high-quality registries, with very long time series, may be omitted (e.g., Finland). Some care is required when using the detailed database and when comparing incidence rates for histological subtypes of cancers. The values for the specified histological types will be greatly influenced by the proportion of ‘unspecified’ cases at a given site. The user is advised either to select datasets with few cases in the ‘missing’ categories or to make an appropriate adjustment before comparing rates. Thus, in the examples shown in this article, trends in specified histological subtypes is accompanied by a review of those in other, or unspecified types, to ensure that observed trends are not simply the consequence of changes in completeness of ascertainment or coding.
The CI5 database includes data on cancer incidence collected by population-based cancer registries over some 50 years. Such data are fundamental elements of research into causation. Traditionally, observed differences in incidence (geographically, over time, in population subgroups) are used to formulate hypotheses that might explain them and that can be tested by further study. Such statistics are also essential components in the planning and evaluation of cancer control programmes.32 The work of the worldwide network of cancer registries is recognised in this database, and merits gratitude not only from investigators who use the information they provide but also from all those who may benefit from the increase in our knowledge of cancer patterns and trends.
The authors wish to thank all the contributing cancer registries and, in particular, those who agreed to make their results available through CI5plus, at a level of detail not hitherto possible in the printed volumes. The authors also would like to recognise and acknowledge the work of the 22 editors who were involved in collecting, evaluating and preparing the data appearing in one or more of the nine volumes of CI5. Particular thanks are due to the late Sir Richard Doll, who was in large measure responsible for bringing to fruition the idea of creating a source of comparable international data on cancer incidence for the purposes of research into cancer cause and prevention.