Observational data are used increasingly to assess the effectiveness of therapies. However, selection biases are likely to have an impact on results and threaten the validity of these studies.
Observational data are used increasingly to assess the effectiveness of therapies. However, selection biases are likely to have an impact on results and threaten the validity of these studies.
The primary objective of the current study was to explore the effect of selection biases in observational studies of treatment effectiveness in cancer care. Patients were identified from the Surveillance, Epidemiology, and End Results-Medicare linked database. The following groups of patients were included: 5245 men treated with and without androgen deprivation for locally advanced prostate cancer, 43,847 men with active treatment versus observation for low- and intermediate-risk prostate cancer, and 4860 patients with lymph node-positive colon cancer who were treated with and without fluorouracil chemotherapy. Patients were compared by therapy for the outcomes of cancer-specific mortality, other-cause mortality, and overall mortality.
In all comparisons, the observational data produced improbable results. For example, when evaluating outcomes of men who were treated with and without androgen deprivation for locally advanced prostate cancer, men who underwent androgen deprivation had higher prostate cancer mortality (hazard ratio, 1.5; 95% confidence interval, 1.29–1.92) despite clinical trial evidence that this treatment improves cancer mortality. Controlling for comorbidity, extent of disease, and other characteristics by multivariate analyses or by propensity analyses had remarkably small impact on these improbable results.
The current results suggested that the results from observational studies of treatment outcomes should be viewed with caution. Cancer 2008. © 2008 American Cancer Society.
There has been a growing interest in using observational data to study cancer outcomes. This interest is driven in part by the availability of population-based data—in particular, data from the Surveillance, Epidemiology, and End Results (SEER) Tumor Registry that have been merged with Medicare charge data.1 These databases have the advantages of excellent external validity, and they allow for the study of populations that often are not included in clinical trials, such as the elderly, minorities, and patients with higher burdens of comorbidities. In addition, large administrative databases can provide information on patterns of care and treatment compliance2–4; can detect rare toxicities and assess treatment toxicities in representative, population-based cohorts5–9; and can permit the comparison of toxicities across different patient populations.5, 8–10 However, more recently, administrative datasets are being used to compare the effects of different treatments on overall survival. This approach has been used across many tumor types, including breast, lung, colon, rectal, prostate, and ovarian cancers.11–19
Selection biases, particularly confounding by indication, are the primary threat to the validity of using observational data to estimate benefits of therapies.20, 21 These biases can operate in several ways. For example, in a comparison between therapies where 1 therapy is considered potentially more efficacious (eg, adjuvant chemotherapy vs no chemotherapy), a bias may be expected whereby patients with poorer prognosis cancers would be more likely to receive that therapy. Alternatively, in a comparison involving potentially more toxic treatments versus less toxic treatments (eg, invasive surgery vs radiation treatment or chemotherapy vs no chemotherapy), a selection bias may be expected whereby patients with better underlying health—those considered more likely to tolerate the treatment—would be more likely to receive the more toxic therapy. Investigators clearly are aware of these potential biases and use statistical techniques to address them. Multivariate analyses, stratification, matching, restricting, and propensity analyses often are used adjusting for information available in the datasets, such as age; ethnicity; neighborhood socioeconomic level; and prior diagnoses, procedures, and hospitalizations.11–13, 22, 23 Nevertheless, unmeasured confounders are likely to persist.
In this article, we explore the strong effects of selection biases in observational studies. We hypothesized that we would obtain results from observational analyses that were implausible when considered in the light of published data from clinical trials. We also hypothesized that the usual means of dealing with selection biases, such as controlling for patient and tumor characteristics using multivariate and propensity analyses, would not eliminate the improbable results. We present several examples, including reanalyses of previously published data, to illustrate the effects of common selection biases. For the first example, we selected a situation in which we believed that selection biases might produce implausible results compared with results from a randomized controlled trial. For the second and third examples, we reanalyzed previously published data. In all cases, we examined cancer mortality, noncancer mortality, and overall mortality. We reasoned that any real benefit of cancer therapy could be manifested only through differences in cancer-specific mortality but that selection biases might result in differences in noncancer mortality that would be as great or greater than the differences in cancer mortality, calling into question the reliability of using mortality endpoints to assess treatment efficacy in nonrandomized data.
Our approach was similar in all cases and was analogous to approaches that have been used by previous investigators,11–17, 24–27 in that we compared the effectiveness of different cancer therapies on outcomes. We used the merged SEER-Medicare database as our data source, which also has been used in prior outcome studies.11–17, 24–27 The SEER Program is a national population-based tumor registry run by the National Cancer Institute that collects information on incident cancer cases. Patients in the SEER database who are eligible for Medicare have been linked to their Medicare records.1
This analysis included 5245 men with locally advanced prostate cancer (either tumor [T] classified as T2/T3 with a Gleason score of 8–10 or tumor classified as T4) in the SEER-Medicare database who were recipients of primary radiation therapy, aged ≥66 years, and diagnosed between 1992 and 1999. Patients were excluded if they had primary surgical therapy, if they had health maintenance organization (HMO) coverage, or if they were not enrolled in Medicare Parts A and B for the 12 months before to the 6 months after their cancer diagnosis. Prostatectomy, radiation therapy, and androgen deprivation were defined as described previously.10 Briefly, radical prostatectomy was defined from SEER coding on site-specific surgery or any of the following codes from Medicare claims: Current Procedural Terminology (CPT) codes 55,810, 55,812, 55,815, 55,801, 55,821, 55,831, 55,842, 55,845; or International Classification of Diseases, ninth revision (ICD-09) procedure code 60.5. Radiation therapy was identified from SEER coding on site-specific surgery or any of the following codes from Medicare claims: CPT codes 77,401 through 77,499 and codes 77,750 through 77,799; and ICD-09 codes 92.21 through 92.29, V58.0, V66.1, and V67.1. Androgen deprivation was defined as either orchiectomy or at least 1 claim for a gonadotropin-releasing hormone (GNRH) agonist within 6 months of diagnosis. GNRH codes are any of the following Healthcare Common Procedure Coding System (HCPCS) codes: J9202, J1950, J9217, J9218, and J9219. Comorbidity scores were calculated by using Klabunde's adaptation of the Charlson comorbidity index.28, 29
A series of Cox proportional-hazards models were developed that incorporated increasing numbers of covariates. Men who underwent androgen deprivation were compared with men who did not receive androgen deprivation for the outcomes of mortality from prostate cancer, other-cause mortality, and all cause mortality. We adjusted for the following variables: T classification, histologic grade (low, moderate, poorly differentiated; unknown), year of diagnosis, age (continuous), comorbidity (0, 1–2, ≥3), ethnicity (non-Hispanic white, non-Hispanic black, Hispanic, other), SEER region, census tract education (percent of individuals living in a given census tract with <12 years of education, divided into quartiles), census tract poverty (percent of individuals living in a given census tract living below the poverty level, divided into quartiles), number of claims for prostate-specific antigen measurements in the 12 months before diagnosis (continuous), and number of provider visits in the 12 months before diagnosis (continuous). Missing values were coded as unknown and were included in the analyses. Follow-up was through December 31, 2000.
The propensity that a patient would receive adjuvant androgen deprivation was generated from the logistic regression model that incorporated the potential confounding factors listed in Table 1.13, 30–32 Then, we grouped the patients into 5 strata representing quintiles of the propensity score. The Cochran Mantel-Haenszel chi-square test was used to determine whether the covariates were balanced after adjusting for propensity quintiles. The covariates that retained a significant difference between the treated and untreated groups were adjusted together with propensity scores in the Cox proportional-hazards model. We also noted the association between treatment and mortality within each stratum of propensity quintiles.
|Characteristic||Adjuvant androgen deprivation versus none for locally advanced prostate cancer, %||P|
|Yes, N = 1863||No, N = 3382||Before adjustment*||After adjusting for propensity score†|
|Incidental, clinically/radiographically apparent tumor||35.1||33.1||.1521||.7055|
|Extension beyond prostate||35.9||38.5|
|Year of diagnosis|
|Census tract education, % of adults with <12 y of education|
|Census tract poverty, % of adults living below poverty line|
|No. of provider visits in the 12 mo before diagnosis|
|Mean no. of PSA tests in first 6 mo after diagnosis|
For this example, we reanalyzed previously published data.19 The original study compared survival between men who were treated actively for prostate cancer (surgery or radiation) with men who were observed. We replicated the methods in that study as described below. In brief, the study population included men between ages 65 years and 80 years with an incident prostate cancer diagnosed between 1991 and 1999 in the SEER-Medicare database. Men who had moderately to well differentiated, nonmetastatic T1 or T2 tumors were included. Men were excluded if they were diagnosed at autopsy or death, if they had Medicare entitlement based on endstage renal disease, or if they died within 1 year of diagnosis. Patients were excluded if they had HMO coverage or if they were not enrolled in Medicare Parts A and B from the 3 months before to the 6 months after their cancer diagnosis. Patients were considered to have received active treatment if they received external-beam radiation therapy, had radiation implants, or underwent radical prostatectomy. Cox models were developed to compare outcomes of men with active treatment versus observation. The final models adjusted for year of diagnosis, age, race, urban residence, marital status, income, education, SEER region, tumor size, tumor grade, and patient comorbidity, as described in detail in the original study.
We replicated the previously published analyses and added the following additional analyses: In addition to the endpoint of overall survival, we developed Cox models for the endpoints of prostate cancer survival and other-cause mortality. We plotted survival curves from stratified Cox proportional-hazards models, adjusting for age, comorbidity, SEER region, and year of diagnosis. These survival curves are presented for type of therapy (radical prostatectomy, radiation, or observation) and for a noncancer control population. The noncancer control population was selected as follows: From the 5% sample of Medicare beneficiaries who did not have any cancer in the SEER-Medicare data, we selected men ages 65 years to 80 years who were resident in a SEER area from 1991 through 1999. Noncancer controls were assigned randomly to match the distribution of year of diagnosis for the cancer cohort. If men were not enrolled in Medicare Parts A and B, if they were enrolled in an HMO from 3 months before study entry to 6 months after study entry, or if the died within 1 year of study entry, then they were excluded. From these noncancer controls, a cohort of 43,847 men was built to match the age distribution of the patients with prostate cancer. Among these, 12,234 men died during follow-up. We also developed Cox models with the endpoints of mortality from heart disease, other cancers, cardiovascular disease, chronic obstructive pulmonary disease, pneumonia, diabetes, accident, other infections, and dementia to illustrate the effects of confounding.
In our third example, we also reanalyzed previously published data.13 The original study evaluated survival associated with 5-fluorouracil (5-FU)-based adjuvant chemotherapy among elderly patients with lymph node-positive colon cancer. We replicated the methods as described in the original report.13 Patients were included who met the following criteria: first diagnosis of primary colon cancer between 1992 and 1996, aged ≥65 years, and stage III disease. Men were excluded if they were enrolled in an HMO or if they were not covered by Medicare Parts A and B from 12 months before diagnosis until 16 months after diagnosis. Adjuvant chemotherapy with 5-FU was identified by claims with an HCPCS J-code of J9190 within 120 days of diagnosis. We constructed Cox models to estimate both overall survival, as reported previously, and other-cause and colon cancer-specific survival for patients who did and did not receive adjuvant chemotherapy. Cox models were adjusted for year of diagnosis, age, sex, urban residence, SEER region, lymph nodes, tumor grade, extent of disease, comorbidity, and propensity score.
In our first analysis, we evaluated the outcomes of men who did or did not receive androgen deprivation after primary radiation therapy for locally advanced (stage III) prostate cancer. Randomized clinical trial data have demonstrated a survival benefit for androgen deprivation in this population.33–35 Patients who underwent androgen deprivation had higher grade tumors, were more educated, had more frequent physician visits, and were diagnosed more recently than patients who did not receive androgen deprivation (P < .0001 for each) (Table 1). After adjusting for propensity score, imbalances remained in the year of diagnosis and in the number of provider visits before diagnosis. Then, we performed a series of Cox models that incorporated increasing numbers of covariates. The results of those survival analyses are shown in Table 2. In the unadjusted analysis, men who underwent androgen deprivation had a higher risk of death from prostate cancer (hazard ratio [HR], 1.35; 95% confidence interval [95% CI], 1.11–1.64). After adjusting for all measurable confounders, a persistent effect of higher prostate cancer mortality was observed among men who underwent androgen deprivation (HR, 1.63; 95% CI, 1.32–2.01). There was no significant difference observed in other-cause mortality between men who did and men who did not undergo androgen deprivation.
|Adjuvant androgen deprivation versus none (Referent category)||Mortality from prostate cancer||Mortality from other causes||All cause mortality|
|HR||95% CI||HR||95% CI||HR||95% CI|
|Adjusted for stage, grade, and y of diagnosis||1.49||1.21–1.84||0.97||0.83–1.13||1.12||0.99–1.27|
|Adjusted as above plus age and comorbidity score||1.49||1.21–1.83||0.98||0.84–1.14||1.13||1.00–1.28|
|Adjusted as above plus ethnicity, region, census tract education, and poverty||1.57||1.27–1.94||0.99||0.85–1.16||1.16||1.02–1.32|
|Adjusted as above plus no. of PSA tests and no. of provider visits||1.63||1.32–2.01||1.00||0.86–1.18||1.18||1.04–1.34|
|Adjusted for y of diagnosis, no. of provider visit, and propensity score||1.65||1.33–2.03||1.00||0.85–1.16||1.18||1.04–1.33|
|Cox regression adjusted for age, y of diagnosis, and no. of provider visits stratified by propensity score|
Next, we conducted a propensity analysis to adjust for unmeasured confounders.30–32 Propensity scores are an individual patient's likelihood of receiving a treatment calculated from a logistic regression model that is based on their covariate information. We show the results first with the propensity score as a covariate in our model and then with the results stratified into quintiles based on propensity score. In each analysis, prostate cancer mortality consistently was higher among men underwent androgen deprivation. In the model that incorporated propensity score along with the imbalanced covariates, the HR for prostate cancer mortality was 1.65 (95% CI, 1.33–2.03).
In the second example, we compared the outcomes of men who received active therapy versus men who were observed for localized prostate cancer. We replicated the methods in the original study and were able to reproduce the cohort of patients and point estimates of survival.19 Then, we expanded on the previously published results by performing Cox analyses that demonstrated mortality from prostate cancer and mortality from other causes in addition to all-cause mortality. Table 3 presents the results of the survival analyses. In the unadjusted and adjusted analyses, patients who received active therapy had significantly lower all-cause mortality (adjusted HR, 0.68; 95% CI, 0.65–0.70) compared with patients on observation from prostate cancer, as reported in the original study. However, patients who received active therapy had had lower mortality from all other causes (unadjusted HR, 0.52; 95% CI, 0.50–0.54; adjusted HR, 0.68; 95% CI, 0.65–0.71). The confounding between overall health status and active therapy for prostate cancer also is demonstrated in Figure 1, which shows that patients who underwent radical prostatectomy for prostate cancer actually had better survival than a control population without cancer. To explore the association further between active therapy and cause of death, we performed Cox models for other individual causes of death. The HRs with 95% CIs are plotted in Figure 2. For each individual cause of death, such as diabetes or pneumonia, active treatment for prostate cancer was associated with a significant mortality benefit, similar to the benefit observed for overall mortality or mortality from prostate cancer.
|Active treatment versus observation (Referent category)||Mortality from prostate cancer||Mortality from other causes||All cause mortality|
|HR||95% CI||HR||95% CI||HR||95% CI|
|Cox regression adjusted for age, tumor size, grade, comorbidity, income, and propensity score||0.64||0.57–0.73||0.68||0.65–0.71||0.68||0.65–0.70|
|Cox regression stratified by quintile of propensity score|
In our last example, we also reanalyzed data that were published previously in an analysis of outcomes for patients with lymph node-positive colon cancer.13 The original study reported that patients who received fluorouracil-based chemotherapy had a significantly lower hazard of death (HR, 0.66; 95% CI, 0.60–0.73) than patients who did not receive chemotherapy. We extended on the previous work and performed Cox regression models to estimate colon cancer-specific mortality and other-cause mortality in addition to the previously published overall mortality. We reasoned that, if the lower hazard of death was because of treatment alone, then deaths from other causes would not be related to chemotherapy use. The results are shown in Table 4. We observed a strong association between fluorouracil-based chemotherapy and other-cause mortality (HR, 0.48; 95% CI, 0.41–0.56). For colon cancer mortality, fluorouracil chemotherapy also was associated with a survival benefit (HR, 0.80; 95% CI, 0.72–0.89), although the effect was not as strong as that for overall survival.
|Fluorouracil versus none (Referent category)||Mortality from colon cancer||Mortality from other causes||All cause mortality|
|HR||95% CI||HR||95% CI||HR||95% CI|
|Cox regression adjusted for age and propensity score||0.80||0.72–0.89||0.48||0.41–0.56||0.67||0.62–0.74|
|Cox regression adjusted for age, stratified by propensity score|
In the current study, we selected several examples in which we had a priori reasons to suspect that selection biases would influence outcomes. We believed that there may have been strong selection biases both on extent and aggressiveness of the tumor and on the underlying health of the patient. We reasoned that the bias for patients with more aggressive cancer to receive more therapy would result in an implausibly worse survival among more extensively treated patients. The selection bias favoring the treatment of healthier patients would result in improved survival among treated patients. These biases could work in isolation or could be present simultaneously. Because these biases could have opposite effects on survival and, thus, tend to cancel each other out, we segregated survival by measuring 3 types of mortality: all-cause mortality, mortality from cancer, and mortality from all causes other than the cancer. This allowed us to estimate the impact of the 2 proposed selection biases. Selection biases for poorer prognosis tumors would be reflected best in cancer-specific mortality, whereas biases involving the selection of healthier patients would be reflected in mortality from all other causes.
In the comparison of men with prostate cancer who received either active therapy or observation, we hypothesized that healthier men who were diagnosed with prostate cancer would be more likely to receive active treatment, which would result in improvements in both all-cause survival and in deaths from causes other than prostate cancer. We did demonstrate a large effect of active therapy on deaths from all causes. In fact, active therapy for prostate cancer had at least as much effect on deaths from diseases like pneumonia and cardiovascular disease as it did on deaths from prostate cancer. It is important to consider how odd these results actually are. It is not plausible that prostate cancer therapy improves survival from causes other than prostate cancer. The most likely explanation is that selection biases are responsible for the effects observed. More noteworthy, these biases persist after statistical adjustment for all measured confounders. It is also possible that active therapy is a marker for overall quality of care.
Two potential explanations for why controlling for reported comorbidity does not adequately control for selection biases are the lack of information on functional status and the lack of information on self-reported health. Measures of functional status, such as the Activities of Daily Living score, the Karnofsky performance status scale, or the Barthel index, independently can predict future physical function, morbidity, and mortality.36–38 Self-rated health, which typically is assessed by using a 4- or 5-point scale from excellent to poor, also has been demonstrated as a strong predictor of survival in several observational studies.39–42 Most relevant to our current studies, self-rated health remains a strong predictor of survival even after controlling for comorbidity and all other measurable factors that may affect survival. The best example is the Cardiovascular Health Study, which included a rich variety of clinical information from physical examination, laboratory assessments, and noninvasive testing, such as cardiac ejection fraction.43 Self-rated health still was a strong, independent predictor of survival. This means that there is information available to individual patients about their health that is not captured even with extensive medical assessment and yet is reflected in a simple, subjective health assessment. The information reflected by patients' self-rated health also presumably is accessible to the clinicians advising them if the physicians inquire. That information could guide treatment decisions, and patients who have more robust underlying health may be more likely to choose more invasive and more extensive treatments. Given the effect of competing risks on outcomes of treatment, such decision making may be entirely appropriate.44
A similar line of reasoning can be used to explain the inability to completely control for the selection biases whereby those with more aggressive tumors tend to receive more extensive treatments (ie, confounding by indication). The information reported in SEER on tumor characteristics is extensive and includes tumor stage, size, histologic type, histologic grade, and the number of positive lymph nodes. However, it would be naive to assume that experienced clinicians would not make more subtle distinctions in tumor prognosis than could be made based only on the information found in SEER. The example of androgen deprivation for the treatment of prostate cancer illustrates how difficult it can be to control for tumor aggressiveness. We observed that, whereas clinical trials have demonstrated a survival benefit of androgen deprivation, our observational data indicate that androgen deprivation is associated with worse tumor-specific survival.17–19 Presumably, the finding that men with more aggressive tumors were more likely to receive androgen deprivation could not be captured entirely by the extent-of-disease characteristics available in the SEER data.
Just as adjusting for comorbidity and tumor characteristics did not completely remove selection biases, statistical adjustment using propensity scores did not substantially alter the findings. Some reports have suggested that propensity scores can eliminate up to 90% of the bias resulting from confounding covariates.45–48 In our examples, adjustment for propensity score had little effect on the HRs. A recent study evaluated the effects of variable choice in propensity analyses.49 The authors suggested that variables unrelated to the exposure but related to the outcome always should be included in propensity score analyses, because they will decrease the variance without increasing bias. It is not clear that the datasets currently used for observational studies of cancer treatment, such as the SEER-Medicare linked data, can furnish such a variable. None of the cancer treatment outcome studies that we reviewed included such a variable.11–19 Other techniques, like instrumental variable analyses, take an analogous approach to minimize bias in observational studies. However, statistical techniques cannot eliminate all bias and confounding.
Our final example of fluorouracil-based chemotherapy for colon cancer is a less extreme and perhaps more representative situation. We note that our analyses lead to the same conclusion as the original study—that chemotherapy for lymph node-positive colon cancer is associated with improved survival. These observational analyses in older patients are consistent with data from randomized clinical trials in younger patients, which have demonstrated conclusively that 5-FU chemotherapy for colon cancer is associated with a 33% lower mortality rate.50 Other well designed observational studies also have produced results that closely approximate data from randomized clinical trials.51, 52 However, in the colon cancer example, the strength of the association between chemotherapy and survival is strongest for noncancer deaths, which presumably are not being prevented with chemotherapy. Thus, our findings call into question the reliability of using overall survival as the primary endpoint. Our results with the other examples raise the possibility that this finding may have resulted from a chance alignment of confounders.
We have drawn several major conclusions. First, the results of observational studies that compare outcomes of different therapies should be viewed with some skepticism. Such publications may result from an interaction of selection biases with publication bias. Analyses that make sense are followed up and ultimately published, whereas analyses that produce implausible results, such as some of the examples presented here, are more likely to be discarded or rejected for publication if they are pursued.
Second, any analyses of observational data for treatment outcome, at a minimum, should attempt to segregate the outcome measurements into those that possibly may be caused by the treatments versus those that could not be caused by the treatments. Most prior publications on cancer treatment outcomes only assessed all-cause mortality.11–17 Examination of treatment effects on mortality from cancer versus other causes may produce clues for the presence of unmeasured selection biases. This would hold not just for cancer therapies but also for observational studies of outcomes from therapies for any condition. There are many clinical situations, particularly in the treatment of the elderly, in which data from clinical trials are nonexistent, and observational studies may be the only potential method to assess benefits of treatment. We suggest that disease-specific survival, other-cause survival, and overall survival all should be provided in any studies of treatment outcomes. Finally, the strong yet implausible treatment effects observed in our analyses should reinforce the caution and modesty of all investigators assessing outcomes from observational data.
We are indebted to the Applied Research Program, National Cancer Institute; to the Office of Research, Development, and Information, Centers for Medicare and Medicaid Services; to Information Management Services; and to the Surveillance, Epidemiology, and End Results Program for the creation of the Surveillance, Epidemiology, and End Results-Medicare database. The interpretation and reporting of the data are the sole responsibility of the authors.