Centers for Disease Control and Prevention, National Program of Cancer Registries, Atlanta, Georgia
Cancer Surveillance Branch, Division of Cancer Prevention and Control, National Center for Chronic Disease Prevention and Health Promotion, Centers for Disease Control and Prevention, 4770 Buford Highway, Mailstop K-53, Atlanta, GA 30341-3717
Audit data were collected by cancer registries participating in the National Program of Cancer Registries (NPCR) of the Centers for Disease Control and Prevention (CDC); Annual Program Evaluation Instrument data were collected by CDC-NPCR staff. New York State Department of Health and University of Missouri-Columbia participated under CDC cooperative agreements U55/CCU222012-03 and U55/CCU721904-04, respectively.
The findings and conclusions in this report are those of the authors and do not necessarily represent the views of the CDC.
Issues of case completeness (CC) and data quality within the National Program of Cancer Registries (NPCR)-Cancer Surveillance System (NPCR-CSS) are assessed in part by the NPCR Technical Assistance and Audit Program (NPCR-TAA). In addition, the NPCR Annual Program Evaluation Instrument (NPCR-APEI) provides information about NPCR-supported central cancer registries (CCRs). The current report includes a unique, national-level analysis of NPCR-TAA results linked with NPCR-APEI data and other covariates.
NPCR-TAA results for 34 CCRs were aggregated across diagnosis years 1998 to 2001 for analysis of average CC rates and site-specific data accuracy (DA) rates by covariates obtained from the NPCR-APEI, United States Cancer Statistics (USCS) publications, and the North American Association of Central Cancer Registries (NAACCR) Web site. Site-specific DA rates were calculated for the 13 data elements examined in the audit program. Small-sample Student t tests were used to determine statistically significant differences in covariates (α = .05).
Overall, the average CC and DA rates were 96.4% and 95%, respectively. Both site- and data element-specific DA issues were highlighted. Higher CC and DA rates were observed for CCRs that were staffed with more certified tumor registrars, had supplementary sources reporting, and met USCS publication standards and/or achieved NAACCR certification.
Cancer surveillance is the ongoing, systematic collection, analysis, and interpretation of data essential to the planning, implementation, and evaluation of the public health practices used to address cancer morbidity and mortality.1, 2 Surveillance data are used for various purposes in cancer prevention and control, including quantifying the overall burden of disease, assessing trends in incidence over time among different populations and in different geographic areas, defining priorities and allocating resources, improving quality of care, evaluating prevention and treatment programs, and advancing research.3–5 Because cancer incidence and associated mortality rates in the United States vary significantly by state due to differing population behavior and genetic and demographic make-up,6, 7 the aggregation of standardized, high-quality data from population-based central cancer registries (CCRs) is vital to the cancer surveillance process.8, 9
The Cancer Registries Amendment Act (Public Law 102-515) of 1992 authorized the Centers for Disease Control and Prevention (CDC) to establish the National Program of Cancer Registries (NPCR), which provides resources and guidance to help state health departments establish and enhance CCRs.9–11 The NPCR began funding states in 1994 and currently supports CCRs in 45 states; the District of Columbia; and the territories of Puerto Rico, the Republic of Palau, and the Virgin Islands.
In 2000, the NPCR-Cancer Surveillance System (CSS) was established; and, in 2001, NPCR-supported CCRs began submitting data (from their NPCR reference year onward) to the NPCR-CSS on patient demographics, tumor characteristics, and treatment.12 Between the NPCR-CSS and the National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) Program, which collects data on cancer cases diagnosed in the remaining states, cancer incidence data are collected for the entire United States population.12 These data are collected first by hospitals and other cancer diagnostic and treatment facilities and then are reported to CCRs, which edit the data, perform linkages, and consolidate cases before submitting the information to the NPCR-CSS. Upon receipt, the NPCR-CSS evaluates and compares the data with program standards for case completeness (CC), percentage of death-certificate-only (DCO) cases, percentage of unknown/missing values for selected required data elements, and percentage of cases passing critical edits for data quality (DQ) (Table 1). Submission evaluation results are communicated to participating registries in the form of NPCR-CSS Data Evaluation Reports, which include information on quality, completeness, and timeliness of the data.13
Table 1. Data Evaluation Standards—National Program of Cancer Registries-Cancer Surveillance System, United States Cancer Statistics Publication, and North American Association of Central Cancer Registries Gold and Silver Certification
Population-based public health surveillance systems depend on the completeness of case reporting and the accuracy of submitted information.14 In addition to evaluating the NPCR-CSS submissions for both CC and DQ (Table 1), it is important to assess the completeness and accuracy of the data at the level of the facility reporting to the CCR (ie, where patient diagnosis and treatment occurred).15, 16 Consequently, NPCR implemented the NPCR Technical Assistance and Audit Program (TAA), which conducts audits to evaluate both CC and data accuracy (DA) at facilities reporting to the CCRs. While focusing on several common primary cancer sites, the audits use case-finding techniques and the reabstraction of a sample of records from selected facilities. NPCR also has implemented the Annual Program Evaluation Instrument (APEI), a Web-based survey that is completed by all NPCR-supported CCRs to assess whether they meet the NPCR's objectives and to monitor their operations and use of data.11
Many studies have used incidence data from the NPCR-CSS.17–20 To provide further insight regarding the completeness and accuracy of the data, in the current study, we included a first-time analysis of aggregate NPCR-TAA results for the diagnosis years 1998 to 2001 linked with information obtained from the NPCR-APEI and information on whether the CCRs achieved United States Cancer Statistics18 (USCS) publication standards and certification from the North American Association of Central Cancer Registries (NAACCR) (Table 1).
MATERIALS AND METHODS
Because lung and bronchus, colorectal, prostate, and female breast cancers are the most common cancers in the United States and represent >50% of the incident cancer cases and deaths among the United States population,17, 18 the NPCR-TAA selected these primary sites for auditing. Audits were conducted in hospitals, which are the source of the majority of cases in CCRs. Federal hospitals (eg, Department of Defense, Veterans Administration) and children's hospitals were excluded. Because the study analysis represents program evaluation, institutional review board approval was not required.
NPCR-TAA audits follow guidelines established by the CDC and NAACCR. All CCRs that receive NPCR funds are eligible for auditing. The criteria for the selection of CCRs for each audit year varied; for example, CCRs that are newly supported by NPCR may not be audited immediately. CC is assessed by independent case finding for the selected cancer sites in a sample of reporting hospitals. For each CCR, eligible hospitals are ranked from the highest to the lowest caseload in the state and are placed into 1 of 3 caseload strata (low, medium, or high) based on the total number of analytic cases for the selected cancer sites in the audit diagnosis year. Sample hospitals are selected using the probabilities proportional to size method, so that the probability of selecting a hospital is proportional to its caseload. Because the volumes of the 3 strata are not consistent across states, these strata were not used in the current analysis.
The audit team examines various case-finding data sources in the sample hospitals, including medical records disease indices (MRDI), which are hospital-maintained databases of cases by diagnosis code; pathology reports; radiation therapy logs; cytology reports; autopsy records; and other sources. The level of DA is assessed by reabstracting a sample of the 4 types of cancer cases at each sample hospital and comparing reabstracted values with those originally recorded in the CCR. For each hospital that is selected for a case-finding audit, a fixed sample of 33 cases is selected randomly for reabstraction, resulting in total reabstraction sample sizes of approximately 2000 to 3000 cases per audited diagnosis year (Table 2).
Table 2. Central Cancer Registries Included in the Analysis
No. of CCRs
Diagnosis year of NPCR audit
Case-finding sample size
Reabstraction sample size
Year of NPCR-APEI
CCRs indicates central cancer registries; NPCR, National Program of Cancer Registries; APEI, Annual Program Evaluation Instrument.
Because APEI information from 2002 was not available; information from the 2003 APEI was used.
Results for 34 of 45 NPCR-supported state central registries that had been audited at the time of analysis were included (Table 2). To satisfy confidentiality agreements between the NPCR and the registries it supports, only ORC Macro (a CDC contractor) and CDC staff accessed or analyzed data for this project, and no state was identified individually in the results. All data were analyzed using SAS21 software. Data were aggregated for a descriptive analysis of average CC rates and cancer site-specific DA rates by selected covariates that were obtained from the NPCR-APEI, USCS publications, and information from the NAACCR Web site (Table 3).
NPCR indicates National Program of Cancer Registries; TAA, Technical Assistance and Audit Program; APEI, Annual Program Evaluation Instrument; USCS, United States Cancer Statistics; NAACCR, North American Association of Central Cancer Registries.
Variables are for each state and audit diagnosis year.
Not examined by primary site because of unavailability of denominator data by primary site.
Ratio of full-time equivalent positions to central registry caseload
Ratio of certified tumor registrars to central registry caseload
Supplementary reporting sources report cases
Case-finding audits at reporting facilities
Reabstracting audits at reporting facilities
Annual report issued
Met USCS publication standards (1999–2001 only)
Certified by NAACCR
NAACCR Web site
The NPCR-APEI information that we used described CCR operations at the time these registries collected the audited cases. Because most cases are reported within 24 months of diagnosis, and many of the NPCR-APEI questions inquire about the past 12 or 24 months, information from the NPCR-APEI that was administered 2 years after the audit diagnosis year was used. Because information from 2002 was not available, information from the 2003 NPCR-APEI was used for CCRs that were audited for the diagnosis year 2000 (Table 2).
Responses to the following questions from the NPCR-APEI were included in the analysis: type of NPCR-funding support (enhancement or planning); ratios of full-time equivalent positions (FTEs) and certified tumor registrars (CTRs) to registry caseload; whether supplementary sources report incident cases; whether the CCR conducts case-finding and reabstraction at reporting facilities; and whether the registry issues an annual report of cancer data (Table 3). Information on whether the CCR met USCS publication standards for the audited diagnosis year was obtained from USCS publications. CCR certification for the audited diagnosis year (yes or no) and certification level (gold or silver) were determined from the NAACCR Web site.
Average CC and site-specific DA rates were calculated. CC rates were not examined by primary site, because site-specific denominator data were not available. DA rates also were computed by primary site for each of the 13 data elements examined under the NPCR-TAA: date of birth (DOB), race, sex, state of residence at the time of diagnosis, diagnosis date, sequence number, primary site, subsite, laterality, histology, behavior, grade, and SEER summary stage. Detailed descriptions of these data elements are provided elsewhere.22 Discrepancies for DOB and diagnosis date were categorized as major (ie, discrepancies in the year or >30 days in the month and day) or minor (ie, discrepancies ≤30 days in the month and day). Only major discrepancies were used in calculating overall average DA rates and DA rates by primary site. CC and DA rates were calculated as follows: 1) CC rates (%) = 100 − (no. of missed cases/total no. of cases identified) × 100; 2) DA rates, overall and site-specific (%) = (no. of data elements with no discrepancies/total no. of data elements reabstracted) × 100; and 3) DA rates, audited data elements (%) = (no. of reabstracted cases with no discrepancies on data element/total no. of cases reabstracted) × 100.
Covariate data (Table 3) were linked by state and diagnosis year to the NPCR-TAA data. Average CC and site-specific DA rates were examined across the covariates. In addition, data element-specific DA rates for each audited site were stratified by whether a CCR had attained NAACCR certification and, if it had, by the type of certification received. Average CC also was examined by level of NAACCR certification. Small-sample Student t tests were used to determine significant differences in average CC and DA rates by covariates using the CCR as the unit of analysis (α = .05). Continuous covariates were dichotomized at the median when possible.
Across the 34 CCRs that were audited for the diagnosis years 1998 to 2001, in total, 41,521 cancers of the lung and bronchus, colorectum, prostate, and female breast were identified in sample hospitals. Of these cases, 1503 were identified as originally missed by CCRs, resulting in an overall CC rate of 96.4%. Cases of female breast and prostate cancer accounted for 61.3% of all missed cases (Fig. 1); this finding was not unexpected, because the incidence of female breast and prostate cancers is higher than the incidence of lung and bronchus and colorectal cancers.
Most of the missed cases (91%) were identified through a single case-finding source, whereas 8.8% of cases were identified in 2 sources, and 0.2% of cases were identified in ≥3 sources; this distribution generally was similar across primary sites (Table 4). The overwhelming majority of missed cases were identified in MRDI or pathology reports; however, the number of missed cases and the distribution of sources varied by primary site (Fig. 2). In particular, missed lung and bronchus cases were identified through pathology reports less often than missed cases for other sites.
Table 4. Number of Missed Cases by Case-finding Sources and Primary Site of Cancer*
No. of cases (%)
Identified in 1 source
Identified in 2 sources
Identified in 3 sources
Case-finding sources included medical records disease indices, pathology and cytology reports, radiation therapy clinic logs, autopsy records, and other sources.
Lung and bronchus
Colon and rectum
In total, 130,130 data elements (10,010 records × 13 data elements) were examined. After reabstraction, no discrepancies were observed for 123,599 data elements, resulting in an overall accuracy rate of 95%. DA rates for individual data elements varied by both element and primary site. The lowest overall DA rate was observed for SEER summary stage (88.8%), followed by grade (89.4%), subsite (90%), histology (91%), and date of diagnosis (96.3%; major). Remaining data elements all had DA rates ≥97.4%. DA rates for minor discrepancies in diagnosis date and DOB were 88.7% and 99.2%, respectively.
Lung and bronchus
The data element with the lowest percent agreement among lung and bronchus cases was SEER summary stage (83.5%), followed by histology (89.1%), subsite (89.1%), and grade (89.7%) (Fig. 3). DA rates for the remaining data elements all were >95%, and rates for sex, state of residence, DOB, primary site, and behavior approached 100%. No minor discrepancies in diagnosis date were observed in 82% of cases. The overall DA rate for lung and bronchus cases was 93.6%.
Colon and rectum
Like the lung and bronchus cases, SEER summary stage was the data element with the lowest percent agreement for colon and rectum cases (84.3%) (Fig. 3). Other elements with DA rates <95% were histology (88.3%), subsite (93.6%), and grade (93.9%). DA rates for the remaining data elements all were >97%, and rates for sex, state of residence, DOB, and laterality approached 100%. No minor discrepancies in diagnosis date were observed in 90.6% of cases. The overall DA rate for colon and rectum cases was 95.2%.
With the exception of DA rates for grade (89.9%) and SEER summary stage (92%) (Fig. 3), DA rates for prostate cancer cases were high, ranging from 95.3% for diagnosis date (major) to 100% for sex, primary site, and subsite. No minor discrepancies in diagnosis date were observed in 94.4% of cases. The overall DA rate for prostate cases was 97.1%.
The data elements with the lowest DA rates for female breast cancers were subsite (81.2%), grade (85.9%), histology (89.7%), and SEER summary stage (93.5%) (Fig. 3). DA rates for the remaining data elements were all ≥96.2%, and rates for sex, state of residence, DOB, primary site, and laterality approached 100%. No minor discrepancies in diagnosis date were observed in 88.4% of cases. The overall DA rate for female breast cases was 94.3%.
Several significant findings were obtained in the covariate analysis of average overall CC and site-specific DA rates (Table 5).
Table 5. Overall Case Completeness and Site-specific Data Accuracy Rates by Selected Covariates, 1998–2001
NPCR indicates National Program of Cancer Registries; FTEs, full-time equivalent positions; CTRs, certified tumor registrars; USCS, United States Cancer Statistics; NAACCR, North American Association of Central Cancer Registries.
N = <34 when ≥1 registries did not respond to the question examined.
P value from Student t test.
Type of current funding from NPCR
Ratio of FTEs to caseload
>6.62 FTEs/10,000 cases
≤6.62 FTEs/10,000 cases
Ratio of CTRs to caseload
>2.18 CTRs/10,000 cases
≤2.18 CTRs/10,000 cases
Supplementary sources report cases
Yes (pathology laboratories or radiation therapy centers)
Case-finding audits at reporting facilities
Reabstracting audits at reporting facilities
Annual report issued
Yes (hardcopy or electronic/Web)
Met data standards for publication in USCS (1999, 2000, and 2001 only)
Certified by NAACCR
Yes (gold or silver)
Ratio of FTEs and CTRs to CCR caseload
When average CC and site-specific DA rates were stratified by the ratio of FTEs to the CCR caseload, higher CC and DA rates were observed for the more well-staffed registries, but the results were not significant (P > .05). When site-specific DA rates were stratified by the ratio of CTRs to the CCR caseload, a significantly higher rate was found for the colorectal and prostate sites (P = .03 and P = .04, respectively) for registries that were staffed with a greater number of CTRs, and a similar but nonsignificant pattern (P > .05) was observed across the other primary sites. In addition, a higher CC rate was observed for registries that were staffed with a greater number of CTRs, although the pattern was nonsignificant (P > .05).
Supplementary reporting sources
When average CC rates were stratified by whether the CCR had pathology laboratories or radiation therapy centers that reported to it, a significantly higher rate was observed for registries with either or both facilities reporting (P < .01). In addition, when average site-specific DA rates were stratified by the same variable, a higher although nonsignificant average DA rate (P > .05) was observed across all 4 primary sites for registries that procured information from these sources.
Met USCS publication standards/achieved NAACCR certification
In analyses that were stratified according to whether the CCR met USCS publication standards and whether it had achieved NAACCR certification, although higher CC and average site-specific DA rates were observed for CCRs that had achieved these 2 milestones, only the findings for colon and rectum cancer DA rates were significant (P = .02 for USCS publication; P = .04 for NAACCR certification).
When data element-specific DA rates for each audited primary site were stratified by NAACCR certification, certified CCRs generally had higher accuracy rates than noncertified CCRs for the data elements across the primary sites; the results were significant for diagnosis date in lung and bronchus and colon and rectum cases (P = .04 and P = .03, respectively), SEER summary stage and grade in female breast cases (P = .03 and P = .02, respectively), and subsite in colorectal cases (P = .02; results not shown).
When average CC and data element-specific DA rates were stratified by the type of NAACCR certification, gold-certified CCRs had higher CC rates (P = .04; results not shown). However, although average DA rates generally were higher for gold-certified CCRs, this pattern was consistent across cancer sites for diagnosis date, SEER summary stage, and subsite only. Unexpectedly, gold-certified CCRs had lower average DA rates for prostate cases (P = .04); and a similar but nonsignificant pattern (P > .05) was observed across the other sites.
NPCR-CSS standards for estimated CC in a CCR require that 95% of reportable cancer cases be reported within 24 months of the end of the diagnosis year (Table 1). For the 34 CCRs that were audited for the diagnosis years 1998 to 2001, the average CC rate determined though a case-finding audit was 96.4%, which was higher than the NPCR-CSS CC standard.
An encouraging outcome of the case-finding audit was that 91% of missed cases were identified in a single case-finding source and that the majority of missed cases were identified in the MRDI or in pathology reports. These results indicate that CC rates can be increased from reporting hospitals by assisting hospital staff in routinely accessing these 2 sources to find cases. In 2004, the NPCR launched the Modeling Electronic Reporting Project, NPCR–MERP, an endeavor to develop recommendations and guidelines for the electronic transmission of data from a hospital's electronic health records and other data sources (eg, pathology laboratories) to hospital and state CCRs. The transition from manual processes to automated, electronic reporting may result in more complete, timely, and accurate cancer surveillance data.23
The overall DA rate of 95% indicates that NPCR-CSS data for the primary sites of lung and bronchus, colon and rectum, prostate, and female breast accurately represent what was recorded in the medical records. Although, in general, average data element-specific DA rates were high, lower rates found for SEER summary stage, histology, subsite, and grade indicate a need for increased education and continued reinforcement of coding standards. These findings suggest that multiple changes in coding standards implemented for some of these data elements, such as histology (International Classification of Diseases for Oncology, version 2 [ICD-O-2] to ICD-O-3 and complex histology coding rules), are generating a need for increased education and training on the abstraction of these data elements.
The numerous strengths of the NPCR-CSS (eg, it is comprised of national, population-based data; it is among the largest sources of cases for most cancers; and it provides easily accessible data) make it an invaluable resource to researchers. The high DA rates observed for demographic data elements and primary site lend considerable confidence in incidence rates generated from the data, particularly those published by the USCS. The identified site-specific DQ issues will aid researchers in appropriately interpreting the findings from studies using NPCR-CSS incidence data.
Although the site-specific DA rates observed for some of the data elements (eg, stage for lung and bronchus cancer and subsite for female breast cancer) may warrant attention, all data have inherent errors. If discrepancies occur randomly, then findings tend to be biased toward the null. Consequently, if, for example, differences in stage are observed by race, then the findings may be a conservative estimate of disparities.
Opportunities for further improving CC and DA have been identified from the current study. The case-finding audits found missed cases for inclusion in the registries, and the reabstraction audits identified specific problem areas in need of improvement. Although the NPCR-TAA currently is conducted for selected cancer sites, in the future, the audits will be expanded to include all reportable cancers.
In addition, the current study demonstrates that the linkage of results from the NPCR-TAA with indicators of CCR operations from the NPCR-APEI enables an examination of the association between particular practices of a CCR and higher or lower CC and DA from that registry. The findings from the covariate analysis underscore the importance of CCRs having adequate, well-trained staff; procuring case information from supplemental reporting sources, such as pathology laboratories and radiation therapy centers; and attaining compliance with national data standards; and the findings also illustrate the positive effect of these practices on CCR CC and DA.
Among the limitations of the current study is that the results may not be generalizable to all NPCR-supported registries. We compared the states represented by the CCRs that were included in the analysis with all of the states supported by NPCR in 2000 and with the entire United States population in 2000. The states of the CCRs that were included in this study had slightly lower proportions of Hispanics and of individuals with long duration of residency in the United States (>10 years) and a slightly higher proportion of whites than the populations represented by all NPCR-supported registries or the entire United States population (results not shown). However, results were comparable across these 3 groupings with regard to sex, age, education, poverty level, and marital status.
A second limitation is the lack of results for average CC rates by primary site. This limitation has been addressed, because NPCR-TAA audits now retain denominator data by primary site. Although site-specific CC rates could not be calculated in the current study, it is known that issues of CC by primary site exist. A well-documented example is the under-reporting of prostate cancers, because CCRs historically have relied on hospital-based reporting rather than reporting from nonhospital facilities, where a large number of prostate cancers are diagnosed and treated. Although the CC rates reported in this study are not affected by the extent of missed cases from nonhospital reporting sources (audits were conducted in hospitals only), studies need to be conducted on the effect of missing such nonhospital cases on overall CCR CC.
The fact that the NPCR-APEI is based on CCR self-reports also may be considered a study limitation. In addition, dichotomizing continuous covariates at the median resulted in small numbers within some of the covariate categories. Still, the numbers represented true counts and not samples.
In conclusion, results from the summary analysis of NPCR-TAA data 1) underscore the importance and effectiveness of conducting CC and DA audits at reporting hospitals; 2) enable the identification of general and site-specific case-finding and abstracting issues; 3) demonstrate the overall high accuracy and completeness of NPCR-CSS incidence data on cancers of the lung and bronchus, colon and rectum, prostate, and female breast; and 4) provide guidance to users of the data. Results from the covariate analysis show that combining data from the NPCR-TAA and the NPCR-APEI provides additional valuable information that neither program can provide individually, offering a global perspective on how CCR operations effect CC and DQ. Because NPCR funding and technical assistance help CCRs in developing and enhancing effective registry operations—especially in areas such as staffing, training, and monitoring and in improving the completeness and quality of registry data—the current results indicate the positive outcome of NPCR support for a high-quality, statewide, population-based CCR.
We thank the ORC Macro, Inc. staff for the provision of the SAS data file of NPCR-Technical Assistance and Audit Program data for analysis.