Regional diversity and breadth of the National Cancer Data Base




The National Cancer Data Base (NCDB), a joint project of the Commission on Cancer of the American College of Surgeons and the American Cancer Society, is a cancer management and outcomes data base for health care organizations. It provides a comparative summary of patient care that is used by participating hospitals and communities for self-assessment. This article describes the most current (1995) data.


Since 1989, 7 calls for data have been issued, yielding a total of 5,558,389 cancer patient reports for the years 1985-1995. A total of 1849 hospital cancer registries have participated in at least 1 of the calls for data.


One thousand one hundred and fourteen hospitals from 50 states and the District of Columbia reported 655,627 cases for the diagnosis year 1995. The hospitals represented a wide range of sizes (187 [16.8%] with 1000+ cases annually, 405 [36.4%] with 500-999 cases annually, 255 [22.9%] with 300-499 cases annually, 211 [18.9%] with 100-299 cases annually, and 56 [5%] with < 100 cases annually) and types (21 [1.9%] National Cancer Institute [NCI]-recognized cancer centers, 119 [10.7%] government hospitals, 102 [9.2%] teaching hospitals, 256 [23.0%] large community hospitals, 297 [26.7%] medium/small community hospitals, and 257 [23.1%] nongovernmental hospitals without approval status from the Commission on Cancer or NCI recognition). Remarkably similar distributions of cases by primary site and age were reported from each of six U.S. geographic regions. In addition, within each of these six regions, the cases were reported from a wide range of income strata and ethnicities. For several states, relatively few cancer cases were reported. For several examples of relatively rare patient and tumor groups, all reported cases between 1985-1995 included potentially useful quantities of patients in whom further study of such special groups was warranted.


The authors conclude that the reported cases most likely are representative at the regional (but not state) level of cancer patients diagnosed and treated at U.S. hospitals with regard to types of cancer and ages of the patients. They conclude further that cancer reporting may be quite diverse within each region with regard to other known patient and reporting institution characteristics. Cancer 1998;83:2649-2658. © 1998 American Cancer Society.

Surveillance of patterns of care and outcome from cancer for clinical purposes increasingly has been recognized as an important function of cancer registration. Achievement of this goal on a national basis requires both a clinical data set and nationally representative data coverage. Thus such a cancer registry ideally would include data items such as staging, other tumor characteristics, treatment, recurrence, and survival, and would be collected nationally in a population-based or representative sample schema. Such a clinical national cancer registry currently is not operative.

Three national registries currently are operating, including the Surveillance, Epidemiology, and End Results Program of the National Cancer Institute (NCI),1 the combined statewide activities of the National Program for Cancer Registries of the Centers for Disease Control,2 and the National Cancer Data Base (NCDB) of the American College of Surgeons (ACoS) Commission on Cancer (COC) and the American Cancer Society.3 The operating procedures of these three registries were developed to closely follow the goals and objectives of their different sponsors and constituents and are somewhat different.

The NCDB is a community-oriented clinical surveillance mechanism for healthcare organizations. Its purpose is to perform clinical surveillance of patterns of care and outcomes. In addition to the data base itself, the NCDB is comprised of 16 site-disease teams of clinicians and 2 oversight committees. The NCDB is linked closely with a national network of hospital liaison physicians representing >2000 U.S. hospitals and the COC hospital approvals program, which currently includes nearly 1500 hospitals. The surveillance effort of the NCDB is planned annually by the site-disease teams on a cancer specific basis. The NCDB collects its data from cancer registries of U.S. hospitals and is not population-based. The NCDB estimates that it collected 57% of all 1994 cancer data and 52% of all 1995 cancer data, and that it will collect >70% of cancer data starting with the diagnosis year 1996.3

The NCDB data set is clinical in nature and includes patient characteristics (nonpatient identifying), diagnosis, other tumor characteristics, treatment, recurrence, and survival. To achieve consistency with clinical use of the American Joint Committee on Cancer (AJCC) classification schema, the data set includes TNM staging.4 It publishes peer-reviewed scientific articles, and returns individualized benchmark summaries to the participating hospitals. The NCDB is run by cancer registry experts and multidisciplinary clinicians drawn from the COC. The majority of the work of the NCDB is performed on a voluntary basis.

Because no comprehensive and complete mechanism for collecting clinical information on all cancer patients in the U.S. yet exists, the feasibility of a national sampling schema for a statistically representative cancer registry is being explored,5 and may provide a future option. In the absence of a population-based cancer registry or random sample thereof, the NCDB may serve a useful and unique role in approaching the desirable goal of a clinically relevant mechanism for surveillance of patterns of care and outcome.

Interest in better understanding the regional diversity and demographic and disease inclusiveness of the NCDB, as well as the potential for state by state analysis, led us to review recent hospital reporting patterns within the 50 U.S. states and the District of Columbia, having aggregated the states into 6 geographic areas of the country. Were the data suggestive of national, regional, and/or state inclusiveness? What missing or underrepresented groups could be identified? What data limitations could be identified?


A goal of the NCDB is to lower the morbidity and mortality of cancer by providing information regarding cancer management and outcomes. Its major products include 1) hospital benchmark summaries, 2) hospital data edit reports, 3) community and state reports, 4) clinical publications based on the national data, and 5) a Web page (http:/ The NCDB annually collects data for all forms of cancer throughout the country, maintaining longitudinal surveillance. These data are based on cases abstracted and computerized by the hospital cancer registries. Participating hospitals submit all analytic cases seen at their hospital for any particular data year. All submissions of data were voluntary between 1985-1995. Hospital participation in the NCDB steadily increased during 1985-1995. Beginning for the diagnosis year 1996, cancer program approval by the COC required submittal of data to the NCDB.

Sources of Data

Seven calls for data have been issued. Mailings routinely were sent to 2100 hospitals (1450 ACoS Approval Programs, 650 others), and all known central/state registries and software vendors/suppliers. Care has been taken to solicit data from all known computerized hospital cancer registries both with and without COC approval status. The cumulative data received includes 305,566 cases from 683 hospitals for 1985 (34% of estimated U.S. cases), 235,696 cases from 525 hospitals in 1986 (25%), 359,650 cases from 790 hospitals in 1987 (37%), 465,508 cases from 977 hospitals in 1988 (47%), 470,351 cases from 935 hospitals in 1989 (47%), 600,885 cases from 1169 hospitals in 1990 (58%), 512,414 cases from 891 hospitals in 1991 (47%), 640,738 cases from 1167 hospitals in 1992 (57%), 620,748 cases from 1107 hospitals in 1993 (53%), 691,206 cases from 1324 hospitals in 1994 (57%), and 655,627 cases from 1114 hospitals in 1995 (52%), totalling 5,558,389 cases. A total of 1849 hospitals contributed data for at least 1 diagnosis year.

The baseline data items of the NCDB include: 1) patient characteristics (gender, age/date of birth, race/ethnicity, zip code of residence at first diagnosis, admission date, discharge date, class or analytic status); 2) tumor characteristics (primary site, laterality, histology, grade, regional lymph nodes positive/examined, tumor size, general summary stage, cAJCC stage group, and pAJCC stage group; 3) first course of treatment (surgery, radiation, chemotherapy, hormone therapy, biologic modifier, and other, as well as reconstruction); and 4) follow-up (date/type of recurrence, sites of distant metastasis, last contact date, vital status, and tumor status). These data were transmitted to the NCDB following a standard data transfer specification.7 The case data for each patient were coded in the traditional manner by trained cancer registars in their respective hospitals before being transmitted.4, 8, 9

Duplicate reports of the same person-cancer were identified by cases with exact matches on 5-digit zip code, 8-digit birthdate, gender, and 3-digit primary site code, and included 4.5% of the original reports, which then were removed from analysis. The NCDB sample of cases is estimated to include approximately 52% of all incident cancers. Thus, this 4.5% rate in the 52% sample can be projected to be the equivalent of 4.5% divided by 0.52, equaling 8.7% in the total population.

The definition of the six geographic regions used herein is as follows: Northeast: Maine, Vermont, New Hampshire, Massachusetts, Rhode Island, Connecticut, New York, Pennsylvania, and New Jersey; Southeast: Delaware, District of Columbia, Maryland, West Virginia, Virginia, North Carolina, South Carolina, Georgia, and Florida; Midwest: Wisconsin, Michigan, Illinois, Indiana, Ohio, Minnesota, North Dakota, South Dakota, Iowa, Nebraska, Kansas, and Missouri; South: Kentucky, Tennessee, Mississippi, Alabama, Oklahoma, Arkansas, Texas, and Louisiana; Mountain: Montana, Idaho, Wyoming, Nevada, Utah, Colorado, Arizona, and New Mexico; and Pacific: Washington, Oregon, California, Alaska, and Hawaii.

Income was inferred for each case based on the average family income of the zip code of residence at time of first diagnosis. Tertiles were defined creating a small low income group ($0-19,999) (11.6%), a small high income group (≥$47,000) (10.6%), and a large middle income group ($20,000-46,999) (77.8%) as reported to the U.S. Census.10

Hospitals were grouped into nine categories using a combination of American Hospital Association (AHA) codes11 and the approval categories of the COC.8 First, the NCI-recognized hospitals were identified. Then, government and profit hospitals were categorized using AHA codes. The remaining hospitals then were grouped into six approval categories including teaching, large community, medium/small community, other approved, and nonapproved.

An important aspect of the usefulness of the data is the quality of case reports sent to the NCDB. The accuracy of the NCDB data collected has been discussed previously.12-14 We have relied on the hospital cancer committee or its equivalent to supervise the quality control of casefinding and abstracting, internal reviews of abstracts by registry staff, hospital-based computer data edits, and the editing checks of regional or state registries to provide adequate accuracy. In addition we have made editing checks for inconsistent or impossible codes available to assist hospital registrars in correcting their data.15


The primary thrust of the NCDB is the reporting of patterns of care and outcome. Geographic region of the U.S. frequently is presented as a covariable of interest.16, 17 Of potential analytic interest would be state of residence. Some requests for data received by the NCDB from healthcare organizations have sought information regarding patterns of care and outcome for their state, or a subpart of their state, stratified by size and type of hospitals so that benchmarking to comparable hospitals in or near their community could be accomplished.


Hospitals by State

Hospitals reported to the NCDB from each of the 50 states and the District of Columbia. The number of hospitals from an individual state differed from a low of 1 in Vermont and Alaska to 81 in California. Clearly, the reporting hospitals were not statistically representative of each state. However, in general the most populous states had the largest number of hospitals reporting. The 10 states with the largest number of hospitals reporting were, in order, California (81), Illinois (72), Pennsylvania (69), Ohio (68), Texas (63), New York (52), Florida (48), New Jersey (42), Massachusetts (40), and Washington (37).

Hospitals by Region

Reporting hospitals numbered 308 (28%) from the Midwest, 243 (22%) from the Northeast, 184 (16%) from the Southeast, 172 (15%) from the South, 144 (13%) from the Pacific, and 63 (6%) from the Mountain region, totalling 1114 hospitals. These regional totals suggest a fair degree of inclusiveness and likely diversity per region. In 5 of the regions, >100 different hospital environments were represented (Fig. 1 (6K)).

Figure 1.

A map of the number of National Cancer Data Base reporting hospitals by region, 1995.

Hospitals by Size

A very diverse group of different sized hospitals reported to the NCDB. Of the 1114 hospitals reporting in 1995, 56 (5%) reported an annual cancer caseload <100, 211 (18.9%) reported 100-299 cases, 255 (22.9%) reported 300-499 cases, 405 (36.4%) reported 500-999 cases, and 187 (16.8%) reported ≥1000.

The average number of cases per hospital reported to the NCDB for the 1995 diagnosis year was 589, or 49 cases per month. In each of the six regions there were hospitals from each of the five size categories analyzed. Some regional caseload differences were apparent. The largest average caseload, 679 cases per year, was reported from the Southeast. The smallest average caseload, 495, was reported by the Mountain region.

Type of Hospitals

A wide range of different types of hospitals reported to the NCDB. Of the 1114 hospitals, 857 (76.9%) had gained approval status from the COC; 257 (23.1%) were not in the Approvals Program. Of the 1114 hospitals, 21 (1.9%) were NCI-recognized (comprehensive cancer centers, clinical cancer centers), and 119 (10.7%) were government hospitals (military, Veterans Administration). In addition to these, 102 (9.2%) were teaching hospitals.

Reporting community hospitals with approval status included 256 (23.0%) categorized as large, and 297 (26.7%) categorized as medium or small. Thus 553 (49.7%), or nearly half, of the hospitals reporting to the NCDB were approved community hospitals; 242 (21.8%) were approved NCI, government, or teaching hospitals; and the remainder (28.5%) were largely nongovernmental hospitals without approval status or NCI recognition.

Cases by State

More than 1000 cases were reported from each of the 50 states, with the exception of Wyoming (326 cases) and Alaska (43 cases) (Table 1). The largest numbers of cases were reported from California (45,466), Pennsylvania (43,547), Florida (43,747), New York (41,514), Texas (36,818), Illinois (34,882), and Ohio (31,644). More than 30,000 cases were reported from each of the 6 regions, including 159,426 cases from the Northeast, 124,930 from the Southeast, 163,380 from the Midwest, 100,125 from the South, 31,195 from the Mountain region, and 76,571 from the Pacific (Fig. 2 (6K)). The relation between reporting hospitals per region and reported cases is shown in Figure 3 (7K).

Figure 2.

A map of the number of cancer cases reported to the National Cancer Data Base by region, 1995.

Figure 3.

The number of hospitals and cases reported to the National Cancer Data Base by region, 1995.

Table 1. Number of Reported Cancer Cases by State and Region, 1995
  1. Cases were categorized by state of reporting hospital.

New Hampshire2148Ohio31,644Montana1859
Rhode Island4029Iowa7260New Mexico1414
New Jersey25,681Kansas4413Utah4281
New York41,514Minnesota10,753Wyoming326
  North Dakota2142  
  South Dakota2060  
Maryland/Washington DC13,485Tennessee13,379Oregon9092
North Carolina19,762Arkansas5514Washington17,945
South Carolina7371Louisiana9463  
West Virginia6390Texas36,818U. S.655,627

Cases by Age

For the U.S., 3517 childhood cancers were reported, including at least 250 cases per region (Table 2). Numerous adolescent cancers were reported from each region, from a high of 636 cases from the Midwest to a low of 171 cases from the Mountain region. Numerous adult cancers from each of seven age groups were reported from each region. In the U.S., 65% of the adults were age ≥60 years. This percent of patients age 60+ years varied little by region, from a low of 63% in the South to 68% in the Northeast.

Table 2. Percentage of Reported Cancer Cases by Region, Age, and Neighborhood Income, 1995
Age (yrs)NortheastSoutheastMidwestSouthMountainPacificHospitals%
Children, 0-1474360177258729252235170.1
Adolescents, 15-1961648163643817135626980.0
Total adult159,596125,043163,479100,22631,24976,654656,247100.0
Under $20,000746411,88815,34227,0723263307968,10810.4

Cases by Zip Code Income

Numerous cases were reported from each of the income categories for the six regions (Table 2). In the U.S. 10.4% were reported living in zip codes with an average family income of less than $20,000, 70.6% were from zip codes of $20,000 to $46,999, and 10.5% were from zip codes with income greater than $47,000. However, regional income patterns varied markedly, ranging from the Northeast, where only 4.7% of the patients were reported residing in a zip code of income <$20,000 to the South, where fully 27.0% were reported from zip codes with an average family income <$20,000.

Cases by Ethnicity

A wide range of ethnicities were reported to the NCDB in 1995 including 59,119 (9.0%) African-Americans, 20,292 (3.1%) Hispanics, 1025 (0.2%) Native Americans, 2262 (0.3%) Japanese, 1863 (0.3%) Filipinos, and 1782 (0.3%) Chinese (Fig. 4 (3K)). Some regional pattern of reporting by ethnicity was apparent (Table 3). For example, higher percentages of African-Americans were reported from the Southeast (14.3) and the South (12.0), than in the Mountain (1.8%) or Pacific (4.2%) regions. Reports of various Asian groups were concentrated heavily in the Pacific. Despite these regional patterns, the two most populous minorities (African-Americans and Hispanics) were reported in the thousands of cases for all regions for Hispanics, and all but the Mountain region for African-Americans.

Figure 4.

The number of cases reported to the National Cancer Data Base by selected ethnicity, 1995.

Table 3. Percentage of Reported Cancer Cases by Ethnicity and Region, 1995
  1. Hisp: Hispanic; Cent: Central; Am: American; NOS: not otherwise specified; Pac: Pacific Islander.

Non-Hisp white138,722100,415146,35181,20827,67261,353555,72184.8
  Puerto Rican106522814643206315650.2
  South/Cent Am48347167482229813890.2
  Spanish, NOS2250243377020981418177710,7461.6
Native American6010224123522216510250.2
Micronesian, NOS0000011110.0
Polynesian, NOS110036110.0
Fiji Islander00000660.0
Asian/Pac, NOS6253743712678959123170.4

Cases by Primary Site

The percentages of different types of cancer generally were remarkably similar by region (Table 4). Breast cancer, for example, included 17.0 of cases from the Northeast, 16.6% of cases from the Southeast, 16.1% of cases from the Midwest, 15.5% of cases from the South, 17.3% of cases from the Mountain region, and 18.0% of cases from the Pacific. Lung cancer cases included 12.7% cases from the Northeast, 15.5% from the Southeast, 14.0% from the Midwest, 16.8% from the South, 13.8% from the Mountain region, and 13.7% from the Pacific. Differences by region were reported in the form of a possible excess of mouth and salivary gland cancer from the Midwest, and an excess of melanoma from the Pacific.

Table 4. Percentage of Reported Cancer Cases by Primary Site and Region, 1995
Head and neck        
  Salivary glands3993578822639522322190.3
  Other oral/pharynx1261101129430555270.1
  Small intestine4373674613298721318940.3
  Other digestive3262203522008520713900.2
  Nasal cavity/sinus2502002301614812610150.2
  Other respiratory87851882839313038431310.5
Soft tissues103376795660326251341340.6
Other skin362319283579251534879412,7872.0
Breast and female genital        
  Vagina, vulva11271201118591021365552910.8
Male genital        
Nervous system29372443283122771184132012,9922.0
  Hodgkin's disease1064716100060123248640990.6
  Non-Hodgkin's lymphoma56524283591533821138277223,1423.5
  Multiple myeloma134411411514107726263959770.9

Infrequent Cancers/Patient Groups

Because of the nearly 6 million cases currently in the NCDB, even rare types of cancer and infrequent demographic groups are present in possibly useful quantities for some purposes. A summarization of sample infrequent group counts is given in Table 5. Selecting on reported birthplace, 140 cases from Bermuda, 165 cases from Nigeria, 604 cases from Egypt, and 949 cases from Peru were reported. Selecting by reported religion, 20,324 Mormons, 2478 Seventh Day Adventists, and 52,724 Jews were classified in the NCDB file.

Table 5. Number of Reported Cancer Cases in a Sample of Normally Infrequent Groups, 1985-1995
GroupNo. of cases
  Seventh Day Adventist2478
Primary site 
  Ampulla of Vater6231
  Hairy cell leukemia2945
  Breast carcinoma, combined ductal and lobular24,056

Included in the cancers reported were 18,824 cancers of the tonsil, 2351 cancers of the jejunum, 6231 cancers of the ampulla of Vater, 1559 cancers of the trachea, 2945 hairy cell leukemias, and 24,056 breast cancers, including both ductal and lobular histology.


The NCDB cases in 1 year, 1995, were reported from 50 states and the District of Columbia; 1114 hospitals drawn from an estimated 39,338 physician practices; different size hospitals (from <100 to 1000+ cases per year); different types of hospitals (NCI-recognized, teaching, government, large, medium, and small community hospitals); and hospitals with and without COC approval status. The patients included all age groups, all income groups, and 27 enumerated ethnic groups. Their diagnoses included the full range of cancers.

Within each of 6 geographic regions there was a remarkably similar distribution of cancers by 43 primary site and by 9 age categories.18 This is statistically revealing. It suggests that either 1) cancer is evenly distributed between the six regions with regard to site and gender, and that our sample is equally representative of each region with regard to site and gender, 2) cancer is not evenly distributed between the six regions, but that it appears evenly distributed in our findings because of a bias in our sample that perfectly offset the possible differences, or 3) we have an equal and consistent bias in each region causing the omission of some form or forms of cancer.

We believe the second possibility to be unlikely. With regard to the third possibility, we do speculate that the NCDB could be most representative of cancer patients who receive definitive diagnosis and/or treatment at a U.S. hospital. The NCDB is less likely to be representative of patients who fell through the care net, or had diseases not likely to be diagnosed or treated in a hospital environment, or by a physician whose office has a close relations to a hospital cancer registry or research center. Some evidence of this nonhospital effect has been reported previously.19 For example, a reporting deficit has been described for melanoma, in which dermatopathologists who pathologically diagnose and treat significant numbers of patients outside the hospital setting, and without any relation to a hospital cancer registry, can be underreported significantly by a hospital-based collection schema.

In addition to the regional gender and primary tumor site similarity, there also was great diversity and breadth of reporting within each of the regions. Within the six regions there were cases reported from each state, and a wide distribution and large numbers of cases reported with regard to different hospital sizes, individual medical practices, types of hospital, zip code income levels, and ethnicities, although the distribution often was different between regions for these covariables. The differences in these covariables were not unexpected.

The diversity of these data with regard to region is important. We previously have reported on regional differences in breast cancer surgery.16, 17 In these studies, the NCDB data were used to assess comparatively time trends in AJCC stage specific use of tissue-sparing surgery for breast cancer. Thus, inferences could be drawn regarding geographic factors influencing patterns of care.

Although the regions may have been well reported within the NCDB, some states clearly are not well represented by the NCDB because of low levels of hospital participation within those states combined with a relatively sparse resident population.

Because of the mandatory COC reporting requirement in effect since 1996, it has been estimated that NCDB inclusiveness will increase to >70% of U.S. cancer cases.3 It can be expected that this increase will improve the representativeness of the NCDB at the national, regional, and state level.

The theoretic question can be asked, is there some point of inclusiveness at which the hospital-based nature of NCDB reporting becomes representative of the general population at risk? Is that point 80%, 85%, 90%, or 95%? We have found that this question is subject to different perceptions. Some researchers are committed firmly to the need for population-based or, alternatively, random sample health data. Anything short of this criteria may lack merit and may be unusable. In contrast, clinicians are used to convenience samples of various kinds.

Clinical trials are conducted on convenience samples, which then are randomized for control purposes. Reports of an interesting series of cases collected by one physician, or diagnosed and treated at one institution, frequently are published and scrutinized. These are understood to be not statistically representative. This possible difference in perception between representative data versus best available data may be related to the need of clinicians to use any and all available data, representative or not, to assist their often urgent daily regimen of decision making in the presence of uncertainty.

We conclude that the NCDB data are quite diverse. They most likely are representative of cancer in the six regions described herein with regard to types of cancer and ages of the patients, but not at the state level. The data's representativeness is strongest for cancer patients who actually have been definitively diagnosed and treated. We further conclude that the cancer reporting is quite diverse within each region with regard to many other patient characteristics and known reporting institution characteristics. We speculate that the possibility that incorrect inferences may be drawn based on the possible nonrepresentativeness of the NCDB is dwarfed by its potential usefulness in describing patterns of care and outcomes for U.S. cancer patients, and in its possible role in positively influencing patient care in America. However, there is a need for the NCDB to expand its quality review, and to validate its comparability with other data systems. Efforts in this regard have been initiated, and will be expanded further.