A goal of the NCDB is to lower the morbidity and mortality of cancer by providing information regarding cancer management and outcomes. Its major products include 1) hospital benchmark summaries, 2) hospital data edit reports, 3) community and state reports, 4) clinical publications based on the national data, and 5) a Web page (http:/www.facs.org). The NCDB annually collects data for all forms of cancer throughout the country, maintaining longitudinal surveillance. These data are based on cases abstracted and computerized by the hospital cancer registries. Participating hospitals submit all analytic cases seen at their hospital for any particular data year. All submissions of data were voluntary between 1985-1995. Hospital participation in the NCDB steadily increased during 1985-1995. Beginning for the diagnosis year 1996, cancer program approval by the COC required submittal of data to the NCDB.
Sources of Data
Seven calls for data have been issued. Mailings routinely were sent to 2100 hospitals (1450 ACoS Approval Programs, 650 others), and all known central/state registries and software vendors/suppliers. Care has been taken to solicit data from all known computerized hospital cancer registries both with and without COC approval status. The cumulative data received includes 305,566 cases from 683 hospitals for 1985 (34% of estimated U.S. cases), 235,696 cases from 525 hospitals in 1986 (25%), 359,650 cases from 790 hospitals in 1987 (37%), 465,508 cases from 977 hospitals in 1988 (47%), 470,351 cases from 935 hospitals in 1989 (47%), 600,885 cases from 1169 hospitals in 1990 (58%), 512,414 cases from 891 hospitals in 1991 (47%), 640,738 cases from 1167 hospitals in 1992 (57%), 620,748 cases from 1107 hospitals in 1993 (53%), 691,206 cases from 1324 hospitals in 1994 (57%), and 655,627 cases from 1114 hospitals in 1995 (52%), totalling 5,558,389 cases. A total of 1849 hospitals contributed data for at least 1 diagnosis year.
The baseline data items of the NCDB include: 1) patient characteristics (gender, age/date of birth, race/ethnicity, zip code of residence at first diagnosis, admission date, discharge date, class or analytic status); 2) tumor characteristics (primary site, laterality, histology, grade, regional lymph nodes positive/examined, tumor size, general summary stage, cAJCC stage group, and pAJCC stage group; 3) first course of treatment (surgery, radiation, chemotherapy, hormone therapy, biologic modifier, and other, as well as reconstruction); and 4) follow-up (date/type of recurrence, sites of distant metastasis, last contact date, vital status, and tumor status). These data were transmitted to the NCDB following a standard data transfer specification.7 The case data for each patient were coded in the traditional manner by trained cancer registars in their respective hospitals before being transmitted.4, 8, 9
Duplicate reports of the same person-cancer were identified by cases with exact matches on 5-digit zip code, 8-digit birthdate, gender, and 3-digit primary site code, and included 4.5% of the original reports, which then were removed from analysis. The NCDB sample of cases is estimated to include approximately 52% of all incident cancers. Thus, this 4.5% rate in the 52% sample can be projected to be the equivalent of 4.5% divided by 0.52, equaling 8.7% in the total population.
The definition of the six geographic regions used herein is as follows: Northeast: Maine, Vermont, New Hampshire, Massachusetts, Rhode Island, Connecticut, New York, Pennsylvania, and New Jersey; Southeast: Delaware, District of Columbia, Maryland, West Virginia, Virginia, North Carolina, South Carolina, Georgia, and Florida; Midwest: Wisconsin, Michigan, Illinois, Indiana, Ohio, Minnesota, North Dakota, South Dakota, Iowa, Nebraska, Kansas, and Missouri; South: Kentucky, Tennessee, Mississippi, Alabama, Oklahoma, Arkansas, Texas, and Louisiana; Mountain: Montana, Idaho, Wyoming, Nevada, Utah, Colorado, Arizona, and New Mexico; and Pacific: Washington, Oregon, California, Alaska, and Hawaii.
Income was inferred for each case based on the average family income of the zip code of residence at time of first diagnosis. Tertiles were defined creating a small low income group ($0-19,999) (11.6%), a small high income group (≥$47,000) (10.6%), and a large middle income group ($20,000-46,999) (77.8%) as reported to the U.S. Census.10
Hospitals were grouped into nine categories using a combination of American Hospital Association (AHA) codes11 and the approval categories of the COC.8 First, the NCI-recognized hospitals were identified. Then, government and profit hospitals were categorized using AHA codes. The remaining hospitals then were grouped into six approval categories including teaching, large community, medium/small community, other approved, and nonapproved.
An important aspect of the usefulness of the data is the quality of case reports sent to the NCDB. The accuracy of the NCDB data collected has been discussed previously.12-14 We have relied on the hospital cancer committee or its equivalent to supervise the quality control of casefinding and abstracting, internal reviews of abstracts by registry staff, hospital-based computer data edits, and the editing checks of regional or state registries to provide adequate accuracy. In addition we have made editing checks for inconsistent or impossible codes available to assist hospital registrars in correcting their data.15