Underascertainment of radiotherapy receipt in Surveillance, Epidemiology, and End Results registry data


  • Presented in preliminary form at the American Society of Clinical Oncology Annual Meeting, Health Services Research Oral Presentation Session, June 7, 2010; Chicago, IL.

  • We thank our project staff: Barbara Salem, MS, MSW and Ashley Gay, BA (University of Michigan); Ain Boone, BA, Cathey Boyer, MSA, and Deborah Wilson, BA (Wayne State University); and Alma Acosta, Mary Lo, MS, Norma Caldera, Marlene Caldera, Maria Isabel Gaeta, and Meryl Leventhal (University of Southern California). (All of these individuals received compensation for their assistance.) We also thank the breast cancer patients who responded to our survey.



Surveillance, Epidemiology, and End Results (SEER) registry data have been used to suggest underuse and disparities in receipt of radiotherapy. Prior studies have cautioned that SEER may underascertain radiotherapy but lacked adequate representation to assess whether underascertainment varies by geography or patient sociodemographic characteristics. The authors sought to determine rates and correlates of underascertainment of radiotherapy in recent SEER data.


The authors evaluated data from 2290 survey respondents with nonmetastatic breast cancer, aged 20 to 79 years, diagnosed from June of 2005 to February 2007 in Detroit and Los Angeles and reported to SEER registries (73% response rate). Survey responses regarding treatment and sociodemographic factors were merged with SEER data. The authors compared radiotherapy receipt as reported by patients versus SEER records. The authors then assessed correlates of radiotherapy underascertainment in SEER.


Of 1292 patients who reported receiving radiotherapy, 273 were coded as not receiving radiotherapy in SEER (underascertained). Underascertainment was more common in Los Angeles than in Detroit (32.0% vs 11.25%, P < .001). On multivariate analysis, radiotherapy underascertainment was significantly associated in each registry (Los Angeles, Detroit) with stage (P = .008, P = .026), income (P < .001, P = .050), mastectomy receipt (P < .001, P < .001), chemotherapy receipt (P < .001, P = .045), and diagnosis at a hospital that was not accredited by the American College of Surgeons (P < .001, P < .001). In Los Angeles, additional significant variables included younger age (P < .001), nonprivate insurance (P < .001), and delayed receipt of radiotherapy (P < .001).


SEER registry data as currently collected may not be an appropriate source for documentation of rates of radiotherapy receipt or investigation of geographic variation in the radiation treatment of breast cancer. Cancer 2011;. © 2011 American Cancer Society.

The National Cancer Institute's Surveillance, Epidemiology, and End Results (SEER) program began collecting cancer registry data in 1973. Currently, SEER collects data from regional cancer registries that cover 26% of the US population.1 These data include information on cancer incidence, patient demographics, clinical and treatment factors, and survival, information of considerable relevance to those pursuing the agenda for comparative effectiveness research in health care.

SEER data have been used to answer a variety of research questions.2 Several influential studies have relied upon SEER data alone to determine the appropriateness of care delivered to breast cancer patients, including rates of receipt of radiotherapy (RT) after breast-conserving surgery (BCS).3, 4 These studies have suggested underuse of RT as well as disparities in use by race, age, and geography.

The SEER program issues a standardized coding manual that indicates that all treatments administered as part of the first course (which is defined in detail in the manual and is no longer limited to a 4-month period of time) are to be considered for the radiation treatment summary field.5 However, some studies have suggested that registry data may be incomplete, particularly for treatments like RT that may be delivered in the outpatient setting.6, 7 These studies have led to some increased caution in the use of SEER data alone, but have not convinced researchers to abandon publishing studies of RT use based on analyses of SEER data alone,8-10 nor even always to acknowledge potential limitations. Furthermore, existing studies assessing the adequacy of ascertainment of treatments in SEER registry data have lacked adequate sample diversity by race, age, and geography to assess whether ascertainment varies by these subgroups. In addition, rates of RT underascertainment may have risen in recent years, with increasing use of chemotherapy before RT leading to the delay of RT beyond a time period readily ascertained by registrars.

In light of these gaps in the literature, we conducted a study comparing SEER data on RT receipt from 2 large regional registries to self-report by patients recently treated for breast cancer to answer 3 questions. First, how well do these SEER registries ascertain RT receipt in the current era; second, do different SEER sites differ in rates of RT ascertainment; and third, if RT underascertainment exists, does it vary systematically by clinical or sociodemographic factors?


Sampling and Data Collection

Details of the study design have been published elsewhere.11-13 Women in the metropolitan areas of Los Angeles and Detroit aged 21 to 79 years, diagnosed with stage 0 to 314 primary ductal carcinoma in situ or invasive breast cancer from June 2005 through February 2007 were eligible for sample selection. Latina (in Los Angeles) and African American (in both Los Angeles and Detroit) patients were oversampled.

Eligible patients were selected using rapid case ascertainment as they were reported to the Los Angeles Cancer Surveillance Program and the Metropolitan Detroit Cancer Surveillance System-SEER program registries. This method was used to obtain a representative sample of cases sooner after diagnosis than can be provided by routine ascertainment. We selected all African Americans in both sites and all Hispanic patients in Los Angeles followed by a random sample of non-African American/non-Hispanic patients in both regions to achieve the target sample size.12 Asian women in Los Angeles were excluded because these women were being enrolled in other studies.

Physicians were notified of our intent to contact patients, followed by a patient mailing of survey materials and $10 to eligible subjects. The questionnaire was translated into Spanish,15 and the Dillman method was used to encourage response.16 The protocol was approved by the institutional review boards of the University of Michigan, University of Southern California, and Wayne State University.

We accrued 3252 eligible patients, including approximately 70% of Latina and African American patients and 30% of non-Latina white patients diagnosed in Los Angeles and Detroit during the study period. After initial selection, another 119 were excluded because 1) physician refused permission to contact (n = 20); 2) patient did not speak English or Spanish (n = 17); 3) patient was too ill or incompetent to participate (n = 59); and 4) patient denied having cancer (n = 23). Of the 3133 eligible patients included in the final accrued sample, 2290 (73.1%) completed surveys, and 2268 (72.4%) were later able to be matched to quality-controlled incident cases ascertained by the SEER registries. On average, patients were surveyed 10 months after diagnosis.

As shown in Figure 1, 2179 patients responded to a question asking whether they had received or planned to receive RT. We excluded the 237 patients who indicated that they had yet to begin RT at the time of the survey; we also excluded the 15 patients missing SEER data on RT receipt, leaving 1927 patients for analysis.

Figure 1.

Source of analyzed sample is shown. This flow diagram details the way in which the analytic sample was developed. MD, medical doctor; RT, radiotherapy; SEER, Surveillance, Epidemiology, and End Results.


We measured RT receipt by asking: “Did you or are you going to have radiation therapy to treat your breast cancer?” We also asked about the timing of treatment (completed, started, or planned), as well as whether initiation of RT had been delayed for any reason. As noted above, those who reported that they planned to receive RT but had yet to start were excluded from analysis, so that the self-reported measure of RT receipt in this study was considered positive only for patients who reported already receiving radiation treatment.

We determined the final surgical procedure by asking about the initial surgical procedure after biopsy and whether subsequent procedures were performed. We also assessed age, race/ethnicity, comorbidities, insurance status at time of diagnosis, total household yearly income at time of diagnosis, and educational attainment through separate survey questions. For age and race/ethnicity, we used SEER data for the few patients (<1%) missing data by self-report. We used SEER data for clinical information on tumor size and nodal status and to identify hospital of diagnosis, which we then categorized based upon whether that hospital was accredited by the American College of Surgeons.

To determine the RT receipt status in SEER registry data, we used the radsum variable, which is used to indicate any receipt of RT as part of initial therapy in the SEER database. Those whom SEER coded as 0 (none) or 7 (refused) were coded as not receiving RT; those whom SEER assigned codes 2 through 6 (codes for various modalities of RT) were coded as receiving RT; the few who were coded as 9 (unknown) and 8 (recommended; unknown if given) were excluded from analysis as noted above.

We defined underascertainment as patient report of RT receipt among patients coded in SEER as not having received RT.


We compared self-reported data on RT receipt to RT as reported in SEER registry records. We described the frequency of RT underascertainment at each SEER site. We then calculated rates of underascertainment at each site after grouping patients by clinical and sociodemographic characteristics, as well as by treatment and hospital characteristics.

We performed univariate analyses using chi-square testing. We then regressed underascertainment within each SEER site on stage, age, surgery type, race/ethnicity, income, insurance, chemotherapy receipt, self-reported delay of RT initiation, and American College of Surgeons accreditation status of the diagnosing hospital as independent variables, adjusting for clustering by hospital. We evaluated all first-order interactions between significant variables, and none was significant except as reported. All results were weighted to account for the sampling design and differential nonresponse. Results are presented as unweighted values, with weighted percentages.


Table 1 compares the RT receipt code in SEER to patient self-report. Of the 1292 patients who reported receiving radiation, 273 were coded as not receiving RT in SEER (underascertained). RT underascertainment was much more common in Los Angeles than in Detroit (32.0% vs 11.25%, P < .001).

Table 1. SEER Registry Data Compared With Patient Self-Report of RT Receipt
SEER StatusPatient Reports RT ReceiptPatient Reports No RT Receipt
  1. Abbreviations: RT, radiotherapy; SEER, Surveillance, Epidemiology, and End Results.

SEER Records RT Receipt1019 (79%)47 (7%)
SEER Records No RT Receipt273 (21%)588 (93%)
Total1292 (100%)635 (100%)

In Los Angeles, RT underascertainment was more frequent in patients with higher stage, multiple comorbidities, younger age, lower income, underinsurance, mastectomy receipt, chemotherapy receipt, delayed initiation of RT, and diagnosis at an American College of Surgeons unaccredited hospital (Table 2). In Detroit, patients with higher stage, mastectomy receipt, chemotherapy receipt, and diagnosis at an unaccredited hospital were associated with higher rates of RT underascertainment (Table 3).

Table 2. Characteristics of Patients Who Reported RT Receipt and Rates of Underascertainment of RT by the Los Angeles Surveillance, Epidemiology, and End Results Registry
CharacteristicNo.Weighted %% UnderascertainedaPb
  • Abbreviations: ACoS, American College of Surgeons; DCIS, ductal carcinoma in situ; RT, radiotherapy.

  • a

    Percentage underascertained calculated within the weighted sample.

  • b

    P values for differences in the proportion of RT receipt by the categories shown; separate category included for “unknown” when unknown values exceeded 5% (income).

Stage   <.001
 0 (DCIS)1352226 
Comorbidity   .001
Age, years   <.001
Race   .18
Income   <.001
Insurance   <.001
Surgery type   <.001
 Breast conservation5488728 
Receipt of chemotherapy   <.001
Delay in initiating RT   <.001
ACoS accreditation of diagnosing hospital   <.001
Table 3. Characteristics of Patients Who Reported RT Receipt and Rates of Underascertainment of RT by the Detroit SEER Registry
CharacteristicNo.Weighted %% UnderascertainedaPb
  • Abbreviations: ACoS, American College of Surgeons; DCIS, ductal carcinoma in situ; RT, radiotherapy.

  • a

    Percentage underascertained calculated within the weighted sample.

  • b

    P values for differences in the proportion of RT receipt by the categories shown; separate category included for “unknown” when unknown values exceeded 5% (income).

Stage   .02
 0 (DCIS)142218 
Comorbidity   .14
Age, years   .06
Race   .23
 Other or unknown3366 
Income   .53
Insurance   .50
Surgery type   <.001
 Breast conservation539838 
Receipt of chemotherapy   <.001
Delay in initiating RT   .91
ACoS accreditation of diagnosing hospital   <.001

On multivariate analysis, as shown in Table 4, RT underascertainment was significantly associated in both registries (P values for Los Angeles, Detroit) with stage (P = .008, P = .026), income (P < .001, P = .050), mastectomy receipt (P < .001, P < .001), chemotherapy receipt (P < .001, P = .045), and diagnosis at a hospital that was not American College of Surgeons accredited (P < .001, P < .001). In Los Angeles, additional significant variables included younger age (P < .001), nonprivate insurance (P < .001), and delayed receipt of RT (P < .001).

Table 4. Logistic Regression Models of RT Underascertainment
CharacteristicLos AngelesDetroit
Odds Ratio95% Confidence IntervalPaOdds Ratio95% Confidence IntervalPa
  • Abbreviations: ACoS, American College of Surgeons; RT, radiotherapy.

  • a

    P values for group variables are reported from Wald tests; standard errors were adjusted for hospital clustering.

  • b

    There were too few Hispanic patients in Detroit to support a separate category. Thus, Hispanics in Detroit were included in the “other” race category.

Stage   .008   .026
 01.00   1.00   
 I0.820.641.03 0.670.351.28 
 II1.320.941.83 0.690.301.56 
Age, years   <.001   .36
 <503.012.094.31 1.340.543.30 
 50-641.911.382.66 0.950.422.16 
 65+1.00   1.00   
Race   .14   .13
 White1.00   1.00   
 Black0.840.631.12 0.570.321.04 
 Other    0.590.21.71 
Income   <.001   .050
 <$20,0001.120.791.60 1.150.482.71 
 $20,000-$69,9991.661.312.11 1.961.193.24 
 $70,000+1.00   1.00   
 Unknown1.020.761.37 1.420.722.78 
Insurance   <.001   .87
 None1.941.233.05 <0.001<0.001>999 
 Medicaid1.260.861.87 0.820.312.17 
 Medicare2.191.603.00 0.700.301.67 
 Other1.00   1.00   
Surgery type   <.001   <.001
 Breast conservation1.00   1.00   
 Mastectomy2.071.502.85 4.932.848.57 
Chemotherapy receipt   <.001   .045
 Yes1.841.432.37 1.861.023.41 
 No1.00   1.00   
Delay initiating RT   <.001   .15
 Yes1.931.522.44 0.660.381.16 
 No1.00   1.00   
Diagnosed at ACoS accredited hospital   <.001   <.001
 Yes1.00   1.00   
 No1.301.101.55 2.051.403.00 


The 2 SEER registries included in our study differed substantially in both rates and correlates of RT underascertainment. RT underascertainment in Los Angeles was nearly 3× higher than in Detroit and was associated with age, insurance coverage, and delayed initiation of RT in addition to variables that were significant at both locations (stage, income, mastectomy receipt, chemotherapy receipt, and hospital accreditation status). These results suggest that SEER registry data, collected by routine methods, may not be an appropriate source for documenting rates of RT receipt by breast cancer patients or for investigating geographic variation in RT receipt.

SEER data alone have long been used to evaluate the appropriateness of breast cancer treatment, including RT receipt. Nearly 2 decades ago, a seminal analysis of SEER data from 1983 to 1986 documented rates of RT receipt after BCS that varied significantly by geography, race, and age.3 Another landmark study of SEER data from 1983 to 19954 found a decrease in the use of appropriate primary therapy for early stage breast cancer over time (with only 78% of women receiving appropriate primary therapy in 1995), driven by an apparent increase in use of BCS that was not followed by RT or axillary surgery.

However, several studies have since raised questions about the completeness of registry data, especially for treatments such as RT, which are often delivered in the outpatient setting. Bickell and Chassin took data collected as part of a quality improvement project on 365 cases of stage I-II breast cancer diagnosed between 1994 and 1996 at 3 New York hospitals and compared them with data in the hospitals' tumor registries, finding that only 58% of RT was captured by the registries.6 Malin et al compared California Cancer Registry data with data abstracted from medical records of 304 patients in the PacifiCare of California health plan who were diagnosed with breast cancer from 1993 to 1995 in Los Angeles; they found that only 72.2% of RT was captured by the registry.7 Given the sample size and the finding that studied patients were older and less diverse than the Los Angeles population, the study's ability to detect sociodemographic differences was limited. Systematic differences in ascertainment by disease stage were observed, however, and the authors noted that patients with more advanced disease more often received treatment in the ambulatory setting that was less likely to be reported to the registry.

Despite these studies, researchers have continued to use SEER data for evaluation of RT receipt. For example, a study published this year used SEER data to study rates of RT receipt among patients with locally advanced breast cancer, by race and surgery type.8 The authors concluded that “rates of RT were low for all populations”; although they considered several possible explanations for this finding, they did not mention the possibility of RT underascertainment. They did state, 'We considered RT use as a single surrogate marker of quality cancer care, but there are certainly others. Rates of breast reconstruction and adherence to hormonal or systemic therapy guidelines are all potential surrogate markers of quality cancer care, but these data fields are either limited or unavailable in the SEER database.” However, they did not consider the possibility that the RT field in the SEER database might also have limitations. Ironically, in their introduction, the authors cited Malin's study, but only in support of the statement, “Adherence to RT guidelines improves overall and disease-specific survival and has been used as a surrogate marker of quality BCa care.”

Other recent studies have acknowledged concerns about limitations of SEER registry data but have dismissed these largely based upon comparisons to merged SEER-Medicare data. For example, when Du and Gor published an analysis of RT receipt based on SEER data from 1992 to 2002,10 they referenced a study using merged SEER-Medicare data on women aged 65 to 74 years diagnosed with breast cancer in 1992, finding that among 2784 women whom SEER recorded as not receiving RT, Medicare identified only 377 (13.5%) as receiving RT.17 They also referenced a study18 of SEER-Medicare data from 1991 to 1996 in patients 65 and older that found 94% agreement between SEER and SEER-Medicare for RT receipt in breast cancer patients, with only minimal variation between individual SEER registries. Similarly, when Freedman and colleagues conducted a study using SEER data from 1988 to 2004, they included an appendix assessing RT underascertainment using the SEER-Medicare dataset from 1992 to 2002.9 They found 91% agreement between the 2 data sources and concluded that “it is unlikely that our findings would be explained by problems ascertaining radiation therapy by the SEER registries.” Sufficiently reassured to consider SEER data alone, they found a decrease in RT after BCS from 79% in 1988 to 66% in 2004, with differences by race, SEER site, and age. These rates are markedly lower than those reported by our patients in Los Angeles and Detroit in 2006.19

More recently, Dragun and colleagues published an interesting analysis that documented a 66% overall rate of radiation receipt, and significantly lower receipt in rural/Appalachian populations, in the Kentucky Cancer Registry (KCR).20 These researchers discussed potential limitations in registry data but noted that the KCR “has been awarded the highest level of certification by the North American Association of Central Cancer Registries for an objective evaluation of completeness, accuracy, and timeliness every year since 1997. The KCR is also part of the … SEER program, which has the most accurate and complete population-based cancer registry in the world.” Unfortunately, North American Association of Central Cancer Registries accreditation does not consider accuracy of coding of treatment receipt, including radiation receipt; it only confirms that there is accuracy, completeness, and timeliness with respect to identification of incident cases of cancer and demographic characteristics.21 Thus, although the study findings may well reflect a true problem with undertreatment in settings where health care facilities are more limited, one must exercise caution in drawing firm conclusions unless treatment information in the Kentucky registry has been validated in ways not discussed in that article, as RT underascertainment may also be more likely in such settings.

Of note, the current study shows that RT underascertainment appears to be more frequent among younger patients not represented in SEER-Medicare comparisons and can occur even in SEER registries holding North American Association of Central Cancer Registries accreditation for high-quality incident case ascertainment. Moreover, more breast cancer patients receive chemotherapy before RT today than in the time periods of studies comparing SEER to SEER-Medicare data. Although the first course of treatment is no longer defined by SEER as a 4-month period from diagnosis, increased time between diagnosis and RT because of the administration of chemotherapy increases the difficulty for registrars to ascertain radiation receipt. In light of the findings of our current study, showing substantial RT underascertainment in 1 of the largest SEER registries, we believe that future studies should not use the SEER dataset alone to determine rates or correlates of RT receipt until the quality of the data in the other SEER registries are investigated more closely.

Of note, SEER itself recognizes the limitations in registry data collected by routine methods and so regularly also conducts Patterns of Care studies focused on different cancer sites.22, 23 These studies involve reabstraction of treatment information from hospital records and requests to physician offices to capture therapy administered in the outpatient setting. Additional analysis using these methods would be valuable to assess rates of RT underascertainment in other registry sites before further research relies upon SEER data alone to assess RT receipt. Certainly, the findings of the current study are sobering.

The substantial differences in rates of ascertainment by the 2 registries likely reflect differences in the methods of surveillance. In the Detroit registry, surveillance is active, and radiation facilities are surveyed as part of the process of incident case identification, which also allows for updating of the radiation receipt variable. In the Los Angeles registry, where surveillance depends upon reporting by registrars, it is not surprising that rates of RT underascertainment are higher. Reporting of treatment received in the outpatient setting is particularly difficult for registrars to capture. Moreover, in California, the state law that established the registry system does not require capture of treatment given outside the reporting facility. Thus, it is not surprising that RT underascertainment is strongly associated with hospital accreditation status, as American College of Surgeons-accredited cancer programs are required to capture all first-course treatments regardless of location, in contrast to the more basic requirements of state law that govern other institutions. The independent association of numerous clinical and sociodemographic variables with ascertainment likely reflects the way in which differences in these factors affect the timing of care and the type and/or number of facilities within which these patients receive medical care.

This study has several strengths, including its large sample size and diverse population. It also has several potential limitations. First, it relied upon self-report as the gold standard to which SEER data were compared. Although we recognize that there is no true gold standard, previous studies have supported the validity of self-report regarding RT and have documented very high correlations between self-report and medical record review.24-26 Criterion validity is supported by the finding that the overwhelming majority of patients who reported receiving radiation went on to respond that they had received information regarding management of RT side effects and reported receiving ≥5 weeks of treatment, as well as the finding that self-reported receipt of radiation was highly correlated with clinical factors that direct treatment recommendations.

Second, although this is the first study to our knowledge of RT underascertainment, other than the SEER-Medicare studies, to include >1 geographic location, the study was limited to 2 metropolitan SEER registries. It is therefore not possible to comment definitively upon the rates of RT underascertainment in other registries, particularly more rural registries. Nevertheless, these data are sufficient to conclude that RT underascertainment is not uniform across SEER sites and can be quite substantial.

The call for comparative effectiveness research has led to heightened interest in the analysis of population-based registry data. As increasing numbers of researchers begin to use registry data, it is critical to evaluate the quality of those data. The SEER regional registry network represents a golden opportunity to continue to build population-based translational cancer research with real value to patients and their clinicians. Indeed, recent changes and additions to the content of SEER data reflect the increasing complexity of clinical information and treatment modalities for cancer and the interest of stakeholders in enhancing the use of these data for the purpose of assessing quality of care and outcomes.27 However, increased demand for breadth and depth of data against decreasing budgets for its collection may be counterproductive. This study provides only 1 example of the ways in which misuse of poor quality data may lead to spurious policy conclusions. One increasingly common strategy to improve data quality is for registries to partner with investigators to use supplemental research funding for special studies in cancer outcomes and effectiveness research. Another strategy might be to allow and encourage regional SEER registries to subspecialize in the collection of the more challenging and resource-intensive data elements related to the first course of therapy. This would create regional registries of excellence with particular strengths in certain cancers or treatment modalities. The increasing complexity of cancer care and increasing demand to evaluate it motivate researchers to find creative solutions to ensure the highest validity and quality of data collected by regional cancer registries.


This work was funded by grants R01 CA109696 and R01 CA088370 from the National Cancer Institute (NCI) to the University of Michigan. R.J. was supported by a Mentored Research Scholar Grant from the American Cancer Society (MRSG-09-145-01). S.J.K. was supported by an Established Investigator Award in Cancer Prevention, Control, Behavioral, and Population Sciences Research from the NCI (K05CA111340).

The collection of Los Angeles County cancer incidence data used in this study was supported by the California Department of Public Health as part of the statewide cancer reporting program mandated by California Health and Safety Code Section 103885; the NCI's Surveillance, Epidemiology, and End Results (SEER) program under contract N01-PC-35139 awarded to the University of Southern California; contract N01-PC-54404 awarded to the Public Health Institute; and the Centers for Disease Control and Prevention's National Program of Cancer Registries, under agreement 1U58DP00807-01 awarded to the Public Health Institute. The collection of metropolitan Detroit cancer incidence data was supported by the NCI SEER Program contract N01-PC-35145.

The ideas and opinions expressed herein are those of the author(s), and endorsement by the State of California, Department of Public Health, National Cancer Institute, and Centers for Disease Control and Prevention or their contractors and subcontractors is not intended nor should be inferred.


The authors made no disclosures.