The US National Cancer Institute's SEER database contains 1715 patients aged <40 years diagnosed between 1973 and 2005. Our analysis included 17 SEER registries representing data from multiple institutions from diverse geographic parts of the United States.14 The SEER program currently collects and publishes cancer incidence and survival data from 17 population-based cancer registries that cover approximately one‒quarter (26%) of the US population. In the SEER program registries, data regarding patient demographics, primary tumor site, tumor morphology and stage at diagnosis, first course of treatment, and follow-up for vital status are routinely assembled.
On January 1, 1973, SEER began collecting data on cancer cases. Currently, the following population-based cancer registries are part of the SEER program: Alaska Native Tumor Registry, Arizona Indians, Los Angeles, San Francisco-Oakland, San Jose-Monterey, Greater California, Connecticut, Detroit, Atlanta, Rural Georgia, Hawaii, Iowa, Kentucky, Louisiana, New Jersey, New Mexico, Seattle-Puget Sound, and Utah. Data for 2005 include the standard set based on July 1 populations and a set that has been adjusted for the population shifts due to hurricanes Katrina and Rita. Inclusion criteria for geographic areas to be selected for the SEER program are based on their ability to operate and maintain a high-quality, population-based cancer reporting system and their epidemiologically significant population subgroups. The population covered by SEER is comparable to the general US population with a trend toward being somewhat more urban and to have a higher proportion of foreign-born persons than the general US population.15
All patients with histologically confirmed ES; peripheral primitive neuroectodermal tumors; and Askin tumors of the bone, soft tissue, and organs were eligible for the study. Patients were identified using the corresponding International Classification of Diseases (ICD) 0-2 and ICD 0-3 codes for these diagnoses. Patients aged ≥40 years at the time of diagnosis (n = 239) were excluded from the analysis.
SEER classifies race into 28 mutually exclusive groups using information from the medical record. Race was classified as white, black, or Asian if there was concordant evidence in the SEER registries coding “Race/ethnicity” and “Race recode (W, B, AI, API)”. Hispanic ethnicity was determined using stated ethnicity in the medical record, national origin on the death certificate, life history and/or spoken language, place of birth, and surname. Classification of Hispanic patients was based on concordant evidence in the SEER variables “NHIA derived Hispanic origin” and “Spanish surname or origin”. In case of discordance between these variables (n = 2), missing values (n = 15), or no more evidence of Hispanic ethnicity than Spanish surname (n = 19), data were declared as unknown and excluded from the analysis.
For this analysis, patients were grouped into 4 different groups as white non-Hispanic, white Hispanic, Asian non-Hispanic (Asian), and black non-Hispanic (black) based on their race and ethnicity classification. Native American patients (n = 14), Asian Hispanic patients (n = 1), and black Hispanic patients (n = 1) were excluded due to small patient numbers.
We examined the variables of age (<20 years at diagnosis vs ≥20 years at diagnosis), sex, tumor size (≤5 cm vs >5 cm), tumor site (soft tissue vs bone), pelvic site, stage of disease (metastatic vs localized), and year of diagnosis (in sequential 5-year blocks) to evaluate racial and ethnic differences in clinical presentation. On the basis of the SEER historical staging system, disease stage was categorized as localized/regional or distant. In sensitivity analyses, age was also evaluated as a continuous variable and year of diagnosis was evaluated according to specific calendar year intervals that corresponded to the sequential national phase 3 clinical trials open to US patients with ES. Analyses with these variables coded in this manner yielded results that were similar to the presented analyses.
Data regarding treatment received were also collected, with radiotherapy dichotomized as not given or given if performed at any time point during treatment (including radioactive implants and radioisotopes). Similarly, surgery was dichotomized as not used (except for diagnostic biopsy) or used as a component of local control. Data regarding the use of limb-sparing surgery were not reliably available. Adequate data to control for socioeconomic status, environmental factors, and access to health care were not available. Variables from the SEER database such as county or SEER registry were deemed inadequate to control for these factors.
Selected patient and tumor characteristics that appeared to differ between groups were evaluated statistically using chi-square tests with the white non-Hispanic group, the group with the largest sample size, as the reference. OS was estimated using Kaplan-Meier survival curves, and group differences were compared using the log-rank test, again using the white non-Hispanic group as the reference. OS was expressed as Kaplan-Meier estimate with 95% confidence interval (95% CI). OS time was calculated as the number of completed months between the date of diagnosis and whichever occurred first: the date of death, the date last known to be alive, or December 31, 2005. The median follow-up time for the analyzed cohort was 92 months.
Cox proportional hazards models were used to assess the effect of race and ethnicity on OS while controlling for known confounders. The proportional hazards assumption was tested using time-varying covariates and confirmed with log-log survivor function plots. For combined race and ethnicity models, the proportional hazards assumption could not be confirmed when metastatic status was included as a variable. Subsequent models were stratified by metastatic status, and these models met the proportional hazards assumption. For sensitivity models that evaluated ethnicity separately from race, the proportional hazards assumption was only confirmed after stratifying for metastatic status and year of diagnosis.
The SEER database was accessed using SEER*Stat (version 6.4.4). All statistical analyses were performed using SAS (version 9; SAS Institute, Inc, Cary, NC) and STATA (version 10; StataCorp, College Station, Tex) statistical software.