SEARCH

SEARCH BY CITATION

Abstract

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. MATERIALS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. References

The American Cancer Society (ACS) has published the estimated number of new cancer cases and deaths in the current year for the United States that are commonly used by cancer control planners and the media. The methods used to produce these estimates have changed over the years as data (incidence) and statistical models improved. In this paper we present a new method that uses statistical models of cancer incidence that incorporate potential predictors of spatial and temporal variation of cancer occurrence and that account for delay in case reporting and then projects these estimated numbers of cases ahead 4 years using a piecewise linear (joinpoint) regression method. Based on evidence presented here that the new method produces more accurate estimates of the number of new cancer cases for years and areas for which data are available for comparison, the ACS has elected to use it to estimate the number of new cancer cases in Cancer Facts & Figures 2007 and in Cancer Statistics, 2007.


INTRODUCTION

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. MATERIALS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. References

The American Cancer Society (ACS) has published the estimated number of new cancer cases and deaths in the current year for the total United States and individual states in its annual Cancer Facts & Figures (CFF) publication since 1960 and in Cancer Statistics in this journal since the early 1970s.1,2 These estimates for the current year provide important information for cancer control planners, public policy analysts, and state health departments, who need to know the relative burden of different types of cancer and for different geographic regions.

The method to produce the ACS estimates has been refined as more incidence data have become available and statistical methods have improved. Beginning with the 1998 estimates, the statistical projection methods for cancer cases and deaths were changed from linear projections to an autoregressive quadratic time trend model.3 The projection method for deaths was further changed to a state-space model (SSM) beginning with the 2004 estimates, after a study demonstrated that the SSM produced more accurate predictions than the autoregressive quadratic time trend model.4

In order for the methods now used by the ACS to project accurate estimates of new cases and deaths to the current year, long-term data (8 or more years) must be available for all US states or for a subset of states that are representative of the entire United States. Long-term cancer mortality data exist for all US states since 1933, while long-term incidence data are available since 1975 only from the original registries included in the National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) Program (SEER9), covering about 10% of the population.5 The ACS method projects the total number of cases in the United States to the current year by a two-step process. First, the annual age-specific rates in the 9 oldest SEER areas are applied to the corresponding age-specific population from 1979 to the most current year for which data are available to estimate the number of new cancer cases diagnosed in each of those years. Then, a quadratic autoregressive time series model is applied to these estimates to project 4 years ahead to produce the projected total number of cases in the current year. State estimates are derived by apportioning the total US case estimates by state, based on the distribution of estimated cancer deaths. Underlying assumptions of this method are that age-specific incidence rates from the combined 9 oldest SEER cancer registries are representative of the US population and that the incidence-to-mortality ratios are constant across all states.

Cancer registries have now been established in every state and territory in the United States, and high-quality incidence data are available for several years for most, providing the opportunity to improve the ACS case projections by taking geographic variability of incidence rates into account. However, since only about half of states outside the SEER9 areas have incidence data that have met national criteria of high quality and completeness for 8 or more years,6 with no data available at all for some states, a new method for case projection was developed.

The new method uses statistical models of cancer incidence that incorporate potential predictors and spatial and temporal variation of cancer occurrence and that account for delay in case reporting. This paper describes the new method and compares its case projections for 2007 to those using the existing ACS method. Based on evidence that the new method produces more accurate estimates of the number of new cancer cases for years and areas for which data are available for comparison, the ACS has elected to use it to estimate the number of new cancer cases in CFF 2007 and in Cancer Statistics, 2007.7,8

MATERIALS AND METHODS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. MATERIALS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. References

The detailed methodology used for the spatial component of the method has been presented elsewhere.9 This method was previously used to produce state- and county-level maps of estimated cancer incidence in 19999; a simpler ecologic regression model was shown to be a reasonable method for projecting breast cancer incidence rates from SEER areas to the entire United States for a single time period.10 The number of cases for all US counties was estimated based on the association of demographic and lifestyle profiles with cancer occurrence in the 480 SEER counties (SEER17) in 1999 using a hierarchical Poisson regression model (L. W. P., unpublished data, 2006). This method was found to produce accurate estimates of state incidence counts and rates for all cancer sites combined, for the four most frequent cancers, and for all other cancer sites combined.

To validate the proposed methods for estimating the numbers of new cases in 2007, the spatial and temporal components of the method were tested separately. First, the spatial model described above was used to estimate the numbers of new cases in every US state for four major cancer sites (breast, prostate, lung and bronchus, colon and rectum) in each year for which state-specific results were available in the U.S. Cancer Statistics Report (USCS).11, [12]13 USCS reports included the numbers of cases for 25 types of cancer reported by 42 states in 1999 and 2000 and by 44 states in 2001. This test was based on the 17 SEER registries with data available for each test year. Output from this model consisted of the numbers of cases estimated for each state that year; these are either modeled estimates for states that have data or “spatial projections,” ie, estimates for states that have no observed data for a given year, based on data available from other registries. For comparison, the numbers of cases were also estimated for each state and year using the previous ACS method. Results from each method were compared with the observed numbers of cases as published in the USCS reports either by the squared deviations (square of the estimated minus observed counts) of the total summed over available states or by the sum of the squared deviations for each state.

As a second step in the validation process, output from the spatial projection model applied to each of a number of years was used to find which temporal projection method was best for projecting incidence counts 4 years ahead in time. This study was based on observed numbers of malignant cases from the SEER registries beginning in 1988 (with varying numbers of registries over time as SEER expanded from SEER9 to SEER17).5 Data from 1988 to 1995 were used to predict the 1999 estimated number of new cases, from 1988 to 1996 to predict 2000, and from 1988 to 1997 to predict 2001.

Four different methods for temporal projection of model-based estimates were tested: the previous ACS quadratic time series method (PROC FORECAST [PF]), a state-space method (SSM) currently used to project mortality counts ahead in time for CFF,4 a piecewise linear regression method (joinpoint method [JP])14,15 currently used to describe trends in incidence and mortality in many cancer registry reports,16 and a newly proposed semiparametric Dirichlet process method (DIR).17 Each of these methods was used to determine the time trends in the estimated counts across the available data years, then to project the number of cases 4 years ahead. The projected state-specific numbers of cases from each method were compared with the observed numbers of cases as published in the USCS reports on the basis of the sums of squared deviations.

Following the determination of the best spatial models and temporal projection method, the model was extended to incorporate time trends over the data period (L. W. P., unpublished data, 2006). The time trend was modeled as a quadratic function, similar to the previous ACS method, but the temporal effect could vary by geographic region or by county characteristic (eg, time trends could differ in urban and rural counties). The model included extra variation due to correlation of the numbers of cases over time and place (county, state, and region) and an additional term to account for any remaining “overdispersion,” ie, greater than expected variation in Poisson-distributed counts. This model was implemented using SAS PROC GLIMMIX software with its optional spline-based approximation for spatial and temporal autocorrelation18 (also L. W. P., O. Schabenberger, A. Stephens, unpublished data, 2006). One advantage of this more complex spatio-temporal model is that only a single application of the model to data for the entire time span is required, rather than separate applications of the model to each year's data. More importantly, the spatio-temporal model shares information across nearby points of time and place simultaneously to provide the best results.

The spatial projection component of the model, ie, estimation of numbers of new cases in states without observed data, requires good spatial coverage in all regions of the United States, so utilizing data from a large and geographically dispersed portion of the United States was critical. For the 2007 projection, an incidence database covering 1995 to 2003 was obtained through an agreement with the North American Association of Central Cancer Registries (NAACCR). The data source was the response to the NAACCR Call for Data submissions as of December 2005. US cancer registries reporting data to NAACCR participate in the National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) Program or the Centers for Disease Control and Prevention (CDC) National Program of Cancer Registries (NPCR), or both, and receive support from the state, province, or territory where they are located. Registries for 40 states, the District of Columbia (DC), and the Detroit metropolitan area (Figure 1) met NAACCR registry certification standards as providing complete, accurate, and timely data for at least 3 consecutive years during 1995 to 20036 and agreed to release county-level incidence data for this project. Together, these registries cover 86% of the US population, although not every state included in this modeling effort had data for every year.

thumbnail image

Figure FIGURE 1. Shaded Areas Indicate Registries Included in 1995 to 2003 Cancer Incidence Database Available From the North American Association of Central Cancer Registries (NAACCR).

Download figure to PowerPoint

Projections were made initially at the county level to capture within-state variation of incidence counts. However, for 19 of the rarest sites included in CFF at the national level, projections were made at the Health Service Area (HSA) level to provide more stable counts for analysis. The HSA units are groups of counties defined on the basis of where county residents aged 65 years and over obtained hospital care, and have been used in other geographic analyses.19,20 Case records without a valid county identifier were randomly assigned to a county in proportion to the distribution of cancer cases with known county of residence in that state.

The cancer site was coded according to the SEER Program recodes in the same manner used for previous CFF reports.21 Race was grouped as White, Black, and Other. Although the NAACCR file identifies much finer race categories, the numbers of cases observed among Hispanics and Asian American/Pacific Islanders, for example, were too low in most regions of the United States to permit stratification of individual cases beyond three broad categories. However, the percentages of Hispanics, Asian American/Pacific Islanders, and American Indian/Alaskan Natives in each county were included in the model to capture variations in incidence due to different racial mixes of the population. Age at diagnosis was initially coded to age groups 0 to 4 years, 5 to 14 years, 15 to 24 years, 25 to 34 years, 35 to 44 years, 45 to 54 years, 55 to 64 years, 65 to 74 years, 75 to 84 years, and 85+ years; younger age groups were usually aggregated to ensure adequate numbers of cases in each stratum for analysis, typically age 0 to 34 years, depending on the cancer site. Input to the models consisted of numbers of new cases stratified by site, sex, race, age group, county or HSA of residence, and year of diagnosis rather than individual case records. Similarly stratified populations were obtained from the Census Bureau.22

Approximately 35 covariates were considered as potential predictors of incidence in the new models. Only age, sex, race, county of residence, and type of cancer were available for the individual cases. All other predictors were population characteristics for the county or HSA, including measures of income, education, housing, racial distribution, urban/rural status, availability of physicians and cancer screening facilities, health insurance coverage, cigarette smoking, obesity, cancer screening rates, and mortality rates. These covariates were available for every US county from a variety of sources, including the Census Bureau, Area Resource File,23 CDC,24 and the National Center for Health Statistics.25 Behavioral risk factor and screening variables from the CDC Behavioral Risk Factor Surveillance System were calculated as mean proportions at the state level for each year. Differences between each county's calculated proportion and its state value for the aggregated period 1994 to 2003 were also calculated to measure within-state variation of the risk and cancer screening behaviors. Annual values for all other covariates were calculated by linear interpolation between available data years and linear extrapolation to 2003 beyond the last available year.

Results of the spatio-temporal models are cancer- and sex-specific smoothed annual estimates for registries that provided data and annual modeled estimates for registries with missing data for each year (1995 to 2003). The assumed spatial and temporal autocorrelation plus covariates included in the model result in a sharing of information across areas that are similar in location, time, and county characteristics. For example, the number of new cases for registries with no input data at all will be estimated using several years of data from neighboring states and from other states and counties with similar sociodemographic and lifestyle profiles; estimated numbers for a registry with a single missing year of data are based on observations from that registry before and after the missing time point, as well as from states that are neighbors or have similar characteristics.

Model estimates were added over age, race, and county to produce state-year-cancer-specific estimates for the time span of the available incidence data. These estimated numbers were then adjusted to account for the delay expected in reporting cancer cases to the registry.26 The number of new cases reported to the SEER registries in the most recent data year are on average 3.5% to 4.5% below what they eventually will be after case finding by the registry is complete, but can range as high as 21% (for leukemia), depending on the type of cancer and the sex, race, and age of the patient. The delay adjustment modifies the observed numbers more in the most recent reporting years to account for future anticipated corrections to the data.27 To date, delay adjustment estimates have only been developed for the long-running SEER9 registries. However, results from all registries, not just SEER9, were delay adjusted, assuming that these SEER-derived factors hold for the entire United States. As longer incidence time series are available from more registries, more appropriate delay factors can be developed. Although the factors used in this new method are not ideal, without any adjustment at all the number of new cases could falsely appear to be trending downward in the most recent years, impacting the projected trend into the future. The delay-adjusted numbers were then projected ahead to 2007.

RESULTS

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. MATERIALS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. References

Table 1 and Table 2 present the results of the validation study of the spatial and temporal components of the model, respectively. Table 1 presents the numbers of new cases reported for the 4 most common cancer sites by sex for the sum of all states in USCS for each year from 1999 to 2001, as estimated by the previous ACS method and the new spatial projection model. The spatial model estimates for female breast and male lung cancers were closest to the observed values for all 3 estimated years (1999 to 2001), whereas for prostate cancer, the estimates from the ACS method were the closest to the observed values. Estimates for colon cancer were equally discrepant from the observed values, with the ACS method underestimating and the new method overestimating the reported numbers. Overall, estimates based on the new method were closest to the observed values for 8 of the 18 estimates. However, for state-specific estimates, the new method provided closer estimates to the observed values than the old method, occasionally by a wide margin. For example, the sum of state-specific squared deviations averaged over 1999 to 2001 for female breast cancer was 20 times higher in the ACS method compared with the new method (Table 1).

Table TABLE 1. Validation of Spatial Model: Observed and Predicted Numbers of New Cases for 1999 to 2001 by Cancer Site and Sex, With Sums of State-specific Squared Deviations From Observed Number of New Cases for Registries Reporting in U.S. Cancer Statistics Report(USCS), Averaged Over 1999,2000, and 2001
Sex/Cancer Site/YearUSCSEstimated Number of Cases Summed Over USCS RegistriesSum of State-specific Squared Deviations, Averaged Over 1999 to 2001
 ObservedACS MethodSpatial ModelACS MethodSpatial Model
  1. * Indicates best results: estimated number of cases closest to observed values or lowest average sum of squared deviations.

  2. Source of input to models is incidence data from the National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) Program registries 1988–2001.5

Female     
   Breast   36,774,0001,817,554*
      1999174,912178,478176,045*  
      2000164,895166,989165,721*  
      2001180,038184,321178,833*  
   Lung and bronchus   5,371,894*8,333,344
      199972,99968,120*80,369  
      200069,31664,064*77,550  
      200177,22168,93982,624*  
   Colon and rectum   3,306,224*3,573,888
      199967,18664,692*70,159  
      200063,90760,876*66,977  
      200168,51365,237*72,932  
Male     
   Prostate   49,125,85721,639,454*
      1999173,263181,768*183,717  
      2000164,753177,619*179,034  
      2001183,643194,975*195,966  
   Lung and bronchus   12,702,36711,499,802*
      199996,46882,596100,557*  
      200089,48977,39798,826*  
      200198,68783,728105,589*  
   Colon and rectum   3,306,695*3,443,079
      199968,16664,868*73,074  
      200064,96760,96767,537*  
      200169,82265,314*74,432  
Table TABLE 2. Validation of Temporal Projection Method: Observed and Predicted Numbers of New Cases for 2001 by Cancer Site and Sex, With Sums of State-specific Squared Deviations From Observed Number of New Cases for Registries Reporting in U.S. Cancer Statistics Report (USCS), Averaged Over 1999, 2000, and 2001
Cancer Site2001 USCS2001 Estimated Number of Cases Summed Over USCS RegistriesSum of State-specific Squared Deviations, Averaged Over 1999 to 2001
 ObservedSSMJPPFDIRSSMJPPFDIR
  1. SSM = state-space model.

  2. JP = joinpoint method.

  3. PF = PROC FORECAST; previous ACS quadratic time series method.

  4. DIR = semiparametric Dirichlet process method.

  5. * Indicates best results: estimated number of cases closest to observed values or lowest average sum of squared deviations. Source of input to models is incidence data from the National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) Program registries 1988–2001.5

Female         
   Breast180,038188,673181,154*189,430186,98762,039,0987,515,979*11,812,9999,426,462
   Lung and bronchus77,22169,01189,94544,53180,389*137,883,20022,217,849541,501,07517,540,282*
   Colon and rectum68,51383,44772,58670,468*76,83331,986,0252,541,926*9,061,9017,334,034
   Non-Hodgkin lymphoma22,39928,61723,858*27,91527,8888,839,760150,428*1,638,0511,833,078
   Melanoma18,01922,46116,865*21,79320,3147,890,147479,010*2,055,4401,562,063
   Esophagus2,9512,6373,1423,6822,816*270,61423,378*145,67198,089
Male         
   Prostate183,643223,541162,52139,000192,557*1,334,558,296365,082,2401,648,765,768117,864,930*
   Lung and bronchus98,687132,85098,854*94,658104,6061,204,137,0746,415,843*185,851,92884,090,958
   Colon and rectum69,82287,01277,278*82,77983,93340,230,9439,281,194*18,002,66814,532,673
   Non-Hodgkin lymphoma24,97823,08625,133*23,72324,4286,097,704718,596*1,579,6811,041,678
   Melanoma26,32528,89824,37922,99127,600*54,544,44017,315,126*23,280,77719,094,648
   Esophagus9,58510,8119,604*10,3399,7514,313,90371,510*771,490565,223
   Testis6,9123,3039,3077,415*5,35120,256,415439,004*1,761,048646,233

For the temporal method validation, non-Hodgkin lymphoma, melanoma, esophageal cancer, and testicular cancer were added to test performance for rarer sites. The spatial projection model was used to estimate the numbers of new cancer cases for each year, beginning in 1988, and these estimates were then used to project the numbers 4 years ahead using 4 temporal projection methods. As shown in Table 2, the JP provided projections closest to the observed total number of new cases for 7 of the 13 comparisons, DIR was best for 4, and PF for 2. Similar results were found for the other 2 years the projections were done (1999 and 2000). The JP was by far the best projection method when state-specific squared errors were compared.

The JP is more flexible than the PF because it fits multiple linear segments to the time series, and thus is more sensitive to sudden changes in trend than the presumed quadratic time trend used by PF. The semiparametric method (DIR) and the SSM apparently require a longer time series than was available in order to project several years ahead and cannot provide state-specific estimates for missing data states. On the basis of this validation study, the JP is the preferred method to project the number of new cases ahead in time, at least until a much longer time series is available for most states. Therefore, the projected numbers of cancer cases in 2007 for each sex/cancer site combination were produced by the following steps:

  • • apply the spatio-temporal model to data from registries available from NAACCR for 1995 to 2003 to estimate the numbers of cases by sex, race, age group, and year for every US county;

  • • sum these estimates over race, age, and county to calculate estimated numbers by sex and year for every US state and for the total United States, then adjust these results to account for case reporting delay;

  • • apply the JP to extrapolate each state's or US time series of the delay-adjusted estimated number of new cases to 2007.

Table 3 presents the results projected for the total United States by cancer site using the new and old methods for 2007. The new method projects that there will be 1,444,913 new cancer cases among men and women in 2007, which is 1.8% higher than the 2007 projection using the previous ACS method. The total number of cases estimated by the old and new methods are quite similar over the period of 1995 to 2003, although the faster increase over time estimated by the new method leads to the slightly higher projected number of cases in 2007 (Figure 2). However, there are substantial differences between the two methods in the number of cases projected by site. Among the 4 most common cancer sites, projections from the new method compared with the old method are 15.3% higher for lung cancer, 3.7% higher for colorectal cancer, 5.5% lower for prostate cancer, and 15.2% lower for female breast cancer (Table 2). Cancer site groupings where the estimates of new cases are more than 10% higher than the previous CFF method predicted are oral cavity and pharynx (+11.1%), with a notable increase in pharyngeal cancer (+29.6%); respiratory system cancers (+17.1%), with notable increases in all 3 cancer site categories; urinary system (+11.7%), with a notable increase in cancer of the kidney and renal pelvis (+23.1%); multiple myeloma (+18.4%); and leukemia (+23.4%), with notable excesses in all 4 major subtypes of leukemia. There were 2 cancer site groupings where the estimates of new cases are more than 10% lower than the previous CFF method predicted; these are bones and joints (–11.1%) and female breast cancer (–15.2%), as previously noted. Differences of more than 10% lower were also observed for relatively uncommon cancer sites (Table 3).

thumbnail image

Figure FIGURE 2. Comparison of Estimates and Projections of the Total Number of New Cancer Cases for All Sites, for Males and Females Combined by Year, Produced by the Previous Cancer Facts & Figures (CFF) Method and the New Method of Spatio-temporal Model Estimates Plus Projection Ahead in Time by the Joinpoint Method (JP). Vertical line is shown at midpoint between the 2003 estimate and the 2004 projection. Source of input to models is incidence data from the North American Association of Central Cancer Registries (NAACCR) Call for Data submissions as of December 2005.6

Download figure to PowerPoint

Table TABLE 3. Comparison of Predicted Number of New Cancer Cases by Method for All Sites,Both Sexes, in 2007
 New MethodPrevious ACS MethodPercent (%) Difference
  1. Results were produced by the previous Cancer Facts & Figures (CFF) method (“previous ACS method”) and the new method of spatio-temporal model estimates, plus projection ahead in time by the joinpoint method (JP) (“new method”).

  2. Note the figures presented in Table 3 do not exactly equal those published in Cancer Statistics, 2007 or CFF, where the numbers of new cases are rounded to the nearest 10.

  3. Source of input to models is incidence data from the North American Association of Central Cancer Registries (NAACCR) Call for Data submissions as of December 2005.6

All sites1,444,9131,419,0001.8
Oral cavity and pharynx34,34630,92011.1
   Tongue9,7989,4104.1
   Mouth10,6529,9307.3
   Pharynx11,7989,10029.6
   Other oral cavity2,0992,480−15.4
Digestive system271,254265,2302.3
   Esophagus15,55814,1709.8
   Stomach21,25921,960−3.2
   Small intestine5,6376,610−14.7
   Colon112,347108,0004.0
   Rectum41,41840,2103.0
   Anus, anal canal, and anorectum4,6534,840−3.9
   Liver and intrahepatic bile duct19,15520,800−7.9
   Gallbladder and other biliary9,2518,4709.2
   Pancreas37,17634,1308.9
   Other digestive organs4,8006,040−20.5
   Respiratory system229,393195,84017.1
   Larynx11,2958,33035.6
   Lung and bronchus213,374184,99015.3
   Other respiratory organs4,7242,52087.5
Bones and joints2,3732,670−11.1
Soft tissue (including heart)9,2209,550−3.5
Skin (excluding basal and squamous)65,06268,420−4.9
   Melanoma-skin59,94461,850−3.1
   Other nonepithelial skin5,1186,570−22.1
Breast180,506212,950−15.2
Genital system306,388318,940−3.9
   Uterine cervix11,15210,10010.4
   Uterine corpus39,07940,580−3.7
   Ovary22,43320,5509.2
   Vulva3,4944,220−17.2
   Vagina and other genital, female2,1452,410−11.0
   Prostate218,885231,550−5.5
   Testis7,9178,050−1.7
   Penis and other genital, male1,2841,480−13.2
Urinary system120,396107,82011.7
   Urinary bladder67,15463,7105.4
   Kidney and renal pelvis51,19041,59023.1
   Ureter and other urinary organs2,0532,520−18.5
Eye and orbit2,3362,490−6.2
Brain and other nervous system20,49619,5704.7
Endocrine system35,52135,0201.4
   Thyroid33,55532,6102.9
   Other endocrine1,9662,410−18.4
Lymphoma71,37168,2404.6
   Hodgkin disease8,1897,6007.7
   Non-Hodgkin lymphoma63,18260,6404.2
Multiple myeloma19,90716,81018.4
Leukemia44,23635,84023.4
   Acute lymphocytic leukemia5,2064,07027.9
   Chronic lymphocytic leukemia15,33511,64031.7
   Acute myeloid leukemia13,40411,56016.0
   Chronic myeloid leukemia4,5724,05012.9
   Other leukemia5,7194,52026.5
Other and unspecified primary sites32,10828,69011.9

In an effort to understand the reasons for these differences, estimates from the previous and new methods were plotted over the available time span. Using breast cancer as an example, Figure 3 shows that the estimated numbers of cases were fairly close during 1995 to 2003, but that the projected trend from 2003 to 2007 differed between the methods. Separating the data by registry group indicates that there were lower female breast cancer incidence rates in the other cancer registries in the United States compared with the rates in the nine oldest SEER registries (Figure 4). However, during 1999 to 2003, when 40 of the 41 states had observed numbers of cases for comparison, estimates from the new spatio-temporal model matched the observed data well (Figure 5).

thumbnail image

Figure FIGURE 3. Comparison of Estimates and Projections of the Total Number of New Cases of Female Breast Cancer by Year Produced by the Previous Cancer Facts & Figures (CFF) Method and the New Method of Spatio-temporal Model Estimates Plus Projection Ahead in Time by the Joinpoint Method (JP). Vertical line is shown at midpoint between the 2003 estimate and the 2004 projection. Source of input to models is incidence data from the North American Association of Central Cancer Registries (NAACCR) Call for Data submissions as of December 2005.6

Download figure to PowerPoint

thumbnail image

Figure FIGURE 4. Estimated Age-adjusted Rates of Female Breast Cancer Incidence by Year Produced by the Spatio-temporal Model, According to Source of Data and Whether Data Were Available From the Registry. Source of input to models is incidence data from the North American Association of Central Cancer Registries (NAACCR) Call for Data submissions as of December 2005.6

Download figure to PowerPoint

thumbnail image

Figure FIGURE 5. Numbers of New Cases of Breast Cancer Among Women Observed in Registries With Data Included in the NAACCR 1995–2003 File (Circles) Compared With Estimates Produced by the Spatio-temporal Model (Line). Estimates are not adjusted for potential delay in reporting. Data are only shown for 1999 to 2003 when nearly all registries provided data. Source of input to models is incidence data from the North American Association of Central Cancer Registries (NAACCR) Call for Data submissions as of December 2005.6

Download figure to PowerPoint

Plots of observed and estimated numbers of cases for other types of cancer also suggested that the new model fit the observed data well (data not shown). Tables that compare the number of cancer cases projected by state by the old and the new method are available on the Web site (http://www.cancer.org/docroot/stt/stt_0.asp), along with a comparison of model-projected and observed cancer cases for 2003, the most recent year for which observed data are available.

DISCUSSION

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. MATERIALS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. References

The new spatio-temporal method with JP projection 4 years ahead shows a slightly greater increase in the total numbers of cases from 2006 to 2007 than would have been observed using the previous ACS method. The estimated number of cancer cases in 2006 was 1,399,790; the 2007 estimate using the old method is 1,419,000, a 1.4% increase. Thus, the 3.2% increase between the estimates of total new cases published in Cancer Statistics2 and CFF1 in 2006 and the estimate based on the new method in 2007 (1,444,913) reflects in part the discontinuity resulting from introduction of the new method and in part the underlying continuing increase in estimated cases resulting from growth and aging of the population.

Projections for specific cancer sites vary more substantially than projections of total cases. There are several reasons why the projections from the new method are likely to be more accurate than those from the earlier method:

  • • The new model allows for geographic variation in the underlying cancer incidence rates, consistent with observed incidence and mortality patterns, while the previous ACS method assumes that the age-specific incidence rates from the combined nine oldest SEER cancer registries are representative of the US population. Substantial geographic variability in cancer incidences rates has been demonstrated for many cancer sites.13,28

  • • The observed incidence data on which the 2007 estimates are based cover 86% of the US population, compared with 10% coverage by SEER9 registries. These expanded data are morerepresentative of the United States, eg, the aggregated SEER9 registries were more urban and affluent and had lower proportions of tobacco users than the United States on the whole, so that cancers related to these factors might not be well estimated by the previous ACS method. Another consequence of greater geographic coverage is that fewer missing states need to be estimated.

  • • The previous ACS method assumes that the incidence-to-mortality ratios are constant across all states. However, mortality is poorly linked to incidence for less fatal cancers, and survival rates for most cancers have been demonstrated to vary geographically,29 depending on screening patterns, the stage of disease at diagnosis, availability of treatment options, affordability of cancer care, comorbidity, socioeconomic factors, etc.

  • • The new spatio-temporal model includes many sociodemographic, medical facility, lifestyle, and cancer screening behavior variables as predictors of incidence; these factors were not considered in the previous ACS method.

  • • The new model includes spatial and temporal autocorrelation, ie, the method “borrows information” from other observations close in time and/or space. Thus, it can fill in “holes” in a state's time series, eg, before its becoming a certified high quality registry, or fill in “holes” in the map for a year when some states did not report their number of new cases. It has been shown that when autocorrelation is present, it must be accounted for in the model or results can be incorrect.30

  • • The new method inflates the reported number of cases to account for expected delays in case reporting. Thus, these new figures should better reflect the actual number of cases in the US population, once case finding is complete.

Like any method for projecting the number of new cancer cases 4 years ahead from observed data, the new method also has some limitations. Not all states and cancer sites are predicted equally well. The accuracy of the model results is dependent on inclusion of a sufficient set of covariates to explain the incidence patterns across the United States. The numbers of new cancer cases can be adequately predicted for most states using the new model, even without observations from them, but the presence of unmeasured riskfactors or effective cancer control programs can impact the number of cases in ways that cannot be predicted. For example, a model using data from NAACCR 1995 to 2002, which did not include data from Pennsylvania, substantially underestimated the number of new lung cancer cases among males in Pennsylvania, but when Pennsylvania data were included in the expanded dataset used for the 2007 projections, its predicted count was very close to the observed count for 2003. North Carolina, on the other hand, was well estimated whether or not its observed data were included as input to the model.

Another limitation of the new model as implemented for 2007 is its assumption of a quadratic time trend over the short time span of data (1995 to 2003). Although no evidence was seen for a lack of fit, this assumption may impose a curvature onto the time trend that is not present in the observed data and which limits the sensitivity of the model to short-term variations or sudden changes in the trend. In the future, as the time span of the data available from most state registries lengthens, improved time series models can be used.

Inaccurate projections of the numbers of cases to 2007 may result from applying delay-adjustment factors that are based on case finding patterns in SEER registries to all registry data. When additional information on cumulative reporting patterns is available for other areas, more appropriate factors can be used.

Large differences in projections by the old and the new methods for the major cancer sites are of special importance since they have the greatest impact on the cancer burden. The 15.3% increase in estimates of lung cancer cases in the new compared with the old method most likely results from recognized differences in tobacco use patterns between the SEER9 areas and the fuller geographic data set used in the new model. Average annual age-standardized lung cancer incidence rates (1999 to 2003) for males and for females in the 41 states providing input to the new method are 11% (male) and 5% (female) higher than those in the 9 oldest SEER areas used by the old method. Several other smoking-related cancers showed similar patterns.

The greater number of cases projected for leukemia and all of its subtypes appears to be due to the effect of delay adjustment, which was not included in the previous ACS method. Before projection to 2007, model estimates of the number of leukemia cases in 2003 were inflated by 10% for cases under age 45 years, by 21% for age 45 to 64 years, and by 18% for cases over age 64 years, resulting in a 12% greater total number of leukemia cases estimated in 2003 and 23% greater in 2007 by the new method. These factors have been used for several years to adjust SEER incidence rates that, for leukemia, can result in an apparent increasing trend when the observed rate trend is declining.28 The long estimated delay in case reporting is due to the nature of cancers of the hematopoietic system. Because no surgery is required for diagnosis or treatment of leukemia, many cases are not seen in a hospital, making case finding more difficult for the cancer registry. Also, children and young adults are diagnosed with acute more often than chronic leukemia. These younger cases often initially present with a medical crisis and so are identified by a hospital record more often than older cases with chronic disease. Because of the new adjustment for these expected delays in case finding, the number of cases projected by the new method should better reflect the actual number of new leukemia cases.

For breast cancer, the reasons for the 15.2% decrease in projected cases for 2007 using the new compared with the old method may be somewhat more complex. Age-adjusted rates in SEER9 registries, which are the basis for the previous ACS method, were about 6% higher than similarly adjusted rates in the geographic areas used for the spatio-temporal model(Figure 4), suggesting that use of an expanded registry database is at least partly responsible for the lower projected number of breast cancer cases. Another factor that may contribute to the differences is the uncertainty in projecting ahead in time when the underlying incidence trends appear to be changing. Trends in breast cancer incidence rates in most geographic areas used as input to the spatio-temporal model have shown a recent stabilization, possibly even a downturn, after increasing for several years.16 These changes have been modeled differently by the methods used to project numbers of cases to 2007 (Figure 3), and at the present time it is unclear which method is more accurate. However, for 1999 to 2003, the observed numbers of new cases in the geographic areas whose incidence data were used in the spatio-temporal model were well fit by the new model (Figure 5).

The lower case estimate (by 5.5%) for prostate cancer by the new method is in part due to regional differences in prostate cancer incidence rates covered by the 2 methods. Average annual age-standardized prostate cancer incidence rates for 1999 to 2003 in the 41 states providing input to the new method are 8.8% lower than that of the 9 oldest SEER areas used by the old method, perhaps reflecting regional differences in utilization of prostate specific antigen.

Despite some limitations, the new spatio-temporal model plus JP regression for temporal extrapolation appears to provide improved estimates of the numbers of new cases, both for individual states and for the nation, even for the less common cancers. Based on these results, the ACS has decided to use this method to project incidence number of new cases for CFF 2007.

References

  1. Top of page
  2. Abstract
  3. INTRODUCTION
  4. MATERIALS AND METHODS
  5. RESULTS
  6. DISCUSSION
  7. References
  • 1
    American Cancer Society. Cancer Facts & Figures 2006. Atlanta, GA: American Cancer Society; 2006.
  • 2
    Jemal A, Siegel R, Ward E, et al. Cancer statistics, 2006. CA Cancer J Clin 2006; 56:106130.
  • 3
    Wingo PA, Landis S, Parker S, et al. Using cancer registry and vital statistics data to estimate the number of new cancer cases and deaths in the United States for upcoming year. J Reg Management 1998; 25:4351.
  • 4
    Tiwari RC, Ghosh K, Jemal A, et al. A new method of predicting US and state-level cancer mortality counts for the current calendar year. CA Cancer J Clin 2004; 54:3040.
  • 5
    National Cancer Institute, U.S. National Institutes of Health. Surveillance, Epidemiology, and End Results (SEER) Program (http://www.seer.cancer.gov) SEER*Stat Database: Incidence-SEER 18 Regs Public Use.
  • 6
    North American Association of Central Cancer Registries. NAACCR Incidence—CiNA Analytic File, 1995–2002.
  • 7
    Jemal A. Cancer Facts & Figures 2007. Atlanta, GA: American Cancer Society; 2007. In press.
  • 8
    Jemal A. Cancer statistics, 2007. CA Cancer J Clin 2007; 57:4366.
  • 9
    Pickle LW, Feuer EJ, Edwards BK. U.S. predicted cancer incidence, 1999: complete maps by county and state from spatial projection models. NCI Cancer Surveillance Monograph Series, number 5. Bethesda, MD: National Cancer Institute; 2003.
  • 10
    Frey CM, Feuer EJ, Timmel MJ. Projection of incidence rates to a larger population using ecologic variables. Stat Med 1994; 13:17551770.
  • 11
    United States Cancer Statistics Working Group. United States Cancer Statistics: 1999 Incidence. Atlanta, GA: Department of Health and Human Services, Centers for Disease Control and Prevention, and National Cancer Institute; 2002.
  • 12
    United States Cancer Statistics Working Group. United States Cancer Statistics: 2000 Incidence. Atlanta, GA: Department of Health and Human Services, Centers for Disease Control and Prevention, and National Cancer Institute; 2003.
  • 13
    United States Cancer Statistics Working Group. United States Cancer Statistics: 2001 Incidence and Mortality. Atlanta, GA: Department of Health and Human Services, Centers for Disease Control and Prevention, and National Cancer Institute; 2004.
  • 14
    Kim HJ, Fay MP, Feuer EJ, Midthune DN. Permutation tests for joinpoint regression with applications to cancer rates. Stat Med 2000; 19:335351.
  • 15
    Joinpoint Regression Program [computer program]. Version 3.0. Washington, DC: National Cancer Institute, U.S. National Institutes of Health; 2005.
  • 16
    Howe HL, Wu X, Ries LA, et al. Annual report to the nation on the status of cancer, 1975–2003, featuring cancer among U.S. Hispanic/Latino populations. Cancer 2006; 107:17111742.
  • 17
    Ghosh K, Tiwari RC. Prediction of U.S. cancer mortality counts using semiparametric Bayesian techniques. J Am Stat Assoc. In press.
  • 18
    SAS Institute. SAS/STAT GLIMMIX Procedure. June 2006. Available at: http://www.sas.com/apps/demosdownloads/sasstatglimmix_PROD_sysdep.jsp?packageID=000353. Accessed August 3, 2006.
  • 19
    Makuc DM, Haglund B, Ingram DD, et al. Health service areas for the United States. Hyattsville, MD: National Center for Health Statistics, Vital and Health Statistics; 1991.
  • 20
    Pickle LW, Mungiole M, Jones GK, White AA. Atlas of United States Mortality. Hyattsville, MD: National Center for Health Statistics; 1996.
  • 21
    National Cancer Institute, U.S. National Institutes of Health. Surveillance, Epidemiology, and End Results (SEER) Program. SEER Cause of Death Recode. Available at: http://seer.cancer.gov/codrecode/. Accessed November 7, 2006.
  • 22
    National Cancer Institute, U.S. National Institutes of Health. Surveillance, Epidemiology, and End Results (SEER) Program. Population Estimates Used in NCI's SEER*Stat Software. Available at: http://seer.cancer.gov/popdata/methods. Accessed November 7, 2006.
  • 23
    Office of Research and Planning, Bureau of Health Professions, Health Resources and Services Administration. Area Resource File, 1999. Available at: http://bhpr.hrsa.gov/healthworkforce/data/arf.htm. Accessed November 7, 2006.
  • 24
    United States Department of Health and Human Services, Centers for Disease Control and Prevention, National Center for Chronic Disease Prevention and Health Promotion, Division of Adult and Community Health. Behavioral Risk Factor Surveillance System, 2003. Available at: http://www.cdc.gov/brfss/technical_infodata/index.htm. Accessed November 7, 2006.
  • 25
    United States Department of Health and Human Services, Centers for Disease Control and Prevention, National Center for Health Statistics. Mortality Data from the National Vital Statistics System. Available at: http://www.cdc.gov/nchs/deaths.htm. Accessed November 7, 2006.
  • 26
    Clegg L, Feuer EJ, Midthune DN, et al. Impact of reporting delay and reporting error on cancer incidence rates and trends. J Natl Cancer Inst 2002; 94:15371545.
  • 27
    National Cancer Institute, U.S. National Institutes of Health. Cancer Query Systems: Delay-Adjusted SEER Incidence Rates. Available at: http://srab.cancer.gov/delay/canques. Accessed November 7, 2006.
  • 28
    National Cancer Institute. SEER Cancer Statistics Review, 1975-2003. Available at: http://seer.cancer.gov/csr/1975_2003/results_merged/sect_04_breast.pdf. Accessed November 7, 2006.
  • 29
    Huang L, Pickle LW, Stinchcomb DG, Feuer EJ. Spatial survival clusters of patients diagnosed with lung cancer and late stage colorectal cancer in California. Epidemiology. In press.
  • 30
    Waller LA, Gotway CA. Applied Spatial Statistics for Public Health Data. Hoboken, NJ: John Wiley & Sons, Inc.; 2004.