Using claims-based measures to predict performance status score in patients with lung cancer
Performance status (PS) is a good prognostic factor in lung cancer and is used to assess chemotherapy appropriateness. Researchers studying chemotherapy use are often hindered by the unavailability of PS in automated data sources. To the authors' knowledge, no attempts have been made to estimate PS using claims-based measures. The current study explored the ability to estimate PS using routinely available measures.
A cohort of insured patients aged ≥50 years who were diagnosed with American Joint Committee on Cancer stage II through IV lung cancer between 2000 and 2007 was identified via a tumor registry (n = 552). PS was abstracted from medical records. Automated medical and pharmaceutical claims from the year preceding diagnosis were linked to tumor registry data. A logistic regression model was fit to estimate good versus poor PS in a random half of the sample. C statistics, sensitivity, specificity, and R2 were used to compare the predictive ability of models that included demographic factors, comorbidity measures, and claims-based utilization variables. Model fit was evaluated in the other half of the sample.
PS was available in 80% of medical records. The multivariable regression model predicted good PS with high sensitivity (0.88 or 0.94 depending on how good PS was defined), but moderate specificity (0.45 or 0.32) with a 0.50 prediction cutoff, and good sensitivity (0.64 or 0.83) and specificity (0.69 or 0.55) when the cutoff was 0.70. The goodness-of-fit c statistic was 0.76 or 0.78.
PS can be estimated, with some accuracy, using claims-based measures. Emphasis should be placed on documenting PS in medical records and tumor registries. Cancer 2011. © 2010 American Cancer Society
Since 1997, evidence-based guidelines have recommended the use of chemotherapy for medically fit patients with lung cancer to improve survival, symptoms, and quality of life.1-4 Despite these recommendations, numerous studies5-8 have illustrated variability in the receipt of chemotherapy among patients with lung cancer. Nevertheless, the ability to determine the appropriateness of observed treatment variability has been greatly hindered by voids in the clinical information necessary to judge appropriateness.
One key factor in evaluating the appropriateness of chemotherapy is the patient's performance status (PS).1-4 PS is a subjective composite measure used by clinicians to assess current functional capacity and the likelihood of adverse events, quality of life, and survival after treatment. Measures of PS are currently not available through automated medical claims, tumor registries, or other observational data commonly used to study cancer treatment and its associated outcomes. Thus, the use of such data to address questions regarding chemotherapy has been relatively limited and when undertaken, the inability to consider PS is a noted limitation.8-10 The systematic lack of information regarding PS similarly impedes the ability of researchers to use existing automated, observational data for comparative effectiveness research.
This article asked 2 questions. First, how often are measures of a patient's PS documented in his or her detailed medical record? Second, is it possible to accurately estimate a patient's PS using routinely available tumor registry and claims-based measures on that patient's demographics, comorbidities, and prior healthcare utilization? By using a cohort of lung cancer patients diagnosed between 2000 and 2007, the feasibility of using medical record documentation to obtain PS measures was described overall and by patient characteristics. We then combined medical record-documented PS information with information routinely available in an automated tumor registry as well as medical and pharmaceutical claims data to evaluate the feasibility of estimating PS among lung cancer patients using information routinely available in observational data sources. To our knowledge, this has not previously been attempted among patients with lung or other cancers.
MATERIALS AND METHODS
Study Population and Setting
Study patients were those receiving care from a 900-physician member, multispecialty, salaried medical group practice in southeast Michigan. Data available from the medical group's tumor registry were used to identify all patients aged ≥50 years who were diagnosed with lung cancer between January 1, 2000 and December 31, 2007. The medical group, which provides care under both fee-for-service and capitated arrangements, staffs 27 primary care clinics throughout Detroit and the surrounding metropolitan area. Patients eligible for study inclusion were those continuously enrolled in an affiliated health plan (ie, health maintenance organization) for the 1-year period preceding their date of lung cancer diagnosis. Patients for whom no stage of disease was available at the time of diagnosis or for whom the stage at diagnosis was 0 to I were excluded because chemotherapy was not indicated for patients with stage 0 or I disease during this time period.11 The medical group's Institutional Review Board approved all aspects of the study protocol.
The 2 most commonly used PS systems are the Eastern Cooperative Oncology Group (ECOG) scale and the Karnofsky performance scale (KPS).12 Although the 2 scales are not identical, they are generally believed to capture the same conceptual domain and conversions are possible between them (Table 1).13
Table 1. The ECOG PS Score and Its Karnofsky PS Equivalent
|0||Fully active, able to carry on all pre-disease performance without restriction||100|
|1||Restricted in physically strenuous activity but ambulatory and able to perform work of a light or sedentary nature (eg, light housework, office work)||80-90|
|2||Ambulatory and capable of all self-care but unable to perform any work activities. Up and about >50% of waking hours||60-70|
|3||Capable of only limited self-care, confined to bed or chair >50% of waking hours||40-50|
|4||Completely disabled. Cannot perform any self-care; confined to bed or chair||20-30|
Two trained chart abstractors reviewed inpatient and outpatient nursing and physician notes available within the patient's electronic medical record from 2 months before diagnosis until the first notation of death, disenrollment, initiation of chemotherapy, or 6 months after diagnosis. If available, abstractors documented specific numeric PS and scale (ie, ECOG or KPS). Patients were assigned a “good” PS if they had an ECOG score of 0 or 1 or a KPS score of 80 to 100. A “poor” PS was assigned to patients with an ECOG score of 2 to 5 or a KPS score of 0 to 70. This was done to be consistent with standards in practice regarding recommendations for chemotherapy use among lung cancer patients during the study period,1-4 as well as with existing research applications.14 With the issuance of the 2009 American Society of Clinical Oncology (ASCO) guidelines, the standard for chemotherapy use changed to include the consideration of use in those patients with an ECOG score of 2 or a KPS score of 60 to 70. Thus, we also presented alternative results for which those patients with these scores were realigned to a “good” PS.
If no numeric score was documented, abstractors collected medical record documentation of “good” or “poor” PS. If no reference to PS was documented in the medical record, notes regarding the patient's functionality (eg, references to shortness of breath, use of a wheelchair or other personal mobility devices, labor force participation, exercising habits, activities of daily living, or other references to mobility) were recorded and used to estimate PS. Inter-rater reliability between the 2 abstractors was assessed on a random subset of 40 observations. The resulting Cohen κ was 0.88. Among the inter-rater reliability subset (N = 40), in each incident in which the abstracted PS did not match between the 2 abstractors (3 cases), 1 abstractor indicated “good” or “poor” whereas the other selected “unknown” PS. For the final analytical database, these differences were reconciled by choosing “good”/“poor” over “unknown.”
Automated Tumor Registry and Claims Data
Automated tumor registry and claims data were used to obtain patient demographic characteristics, cancer stage, and diagnoses for each patient. Demographic measures included age, gender, and race. The age of the patient (in years) was recorded as of the date of lung cancer diagnosis. Clinical variables examined included stage of disease at the time of diagnosis and the Charlson comorbidity index.15 Cancer stage was reported using the American Joint Committee on Cancer (AJCC) stages II through IV. A dichotomous variable was created to control for AJCC stage IV patients in the regression analysis. The Deyo adaptation of the Charlson comorbidity index and each of its component diagnostic subgroups were constructed using inpatient and outpatient diagnostic information available in the 12-month period preceding diagnosis.16 In addition, claims data provided information regarding prescription drugs dispensed and medical care use within the 12-month period preceding lung cancer diagnosis.
Medical care use measures included those reflective of inpatient stays in a short-stay hospital or skilled nursing facility (SNF); ambulatory care visits; emergency department visits; and use of home health services, same-day surgery, and durable medical equipment (DME). For each person, inpatient use measures included the total number of distinct inpatient stays, the total number of inpatient days, and the average length of an inpatient stay for those with a non-0 number of stays. The number of outpatient visits was recorded, and in the regression analysis a dichotomous variable was created to control for patients with non-0 outpatient visits. Similar dichotomous variables were constructed to reflect any drug dispensing and any DME use. The emergency department, home health, and same-day surgery use variables measured the counts of visits incurred. We also evaluated the use of a count of the distinct number of medications dispensed during the baseline year, as recommended by Schneeweiss et al.17 For this measure, medications whose first 8 digits of the American Hospital Formulary Services code were equal were considered to be the same drug.18
Among the cohort of lung cancer patients, we reported the frequency of documented PS in medical records and described the different ways PS was recorded. Systematic differences between patients for whom PS was recorded and patients for whom it was not recorded were examined using 2-sample Student t tests (or Wilcoxon rank sum tests) and chi-square tests, depending on the nature of the characteristic. Similar analyses were conducted to compare unadjusted differences in patient characteristics by “good” PS versus “poor” PS. Multivariable logistical regression models were fit to evaluate the feasibility of using routinely available observational data to predict “good” versus “poor” PS. Three separate models were estimated, reflective of 3 different levels of the comprehensiveness of observational data routinely available. The first regression model included only those variables typically available via tumor registries (demographics and stage of disease). The second model included those same variables plus measures of medical care use and diagnoses available in medical claims data. The third model added measures of prescription drug use routinely available via pharmaceutical claims.
For each model, a split-sample cross validation was used to check for model overfitting. C statistics, sensitivity, specificity, and R2 were used to assess and compare the predictive ability of the different models. Initially, all variables were considered for inclusion. However, the final model in each of the 3 categories was fit using the stepwise elimination method. Pairwise interactions were tested but were not found to enhance model prediction. Likewise, we evaluated the need to account for the nonindependence of patients seen by the same physician, but because the intraclass correlation coefficient (ICC) was negligible (ICC = 0.01), we elected not to do so because not doing so enabled us access to additional assessment of model fit. The final models were estimated on the full sample and bootstrapping was used to replicate each final model 1000 times to create 95% confidence intervals around the c and R2 statistics.19
To examine model discrimination, patients were ranked by their predicted probability of “good” PS based on each model. Patients were then divided into deciles based on increasing predicted probability of “good” PS and actual “good” PS rates were reported among patients in all deciles to suggest how well models separated patients with “good” PS from those with “poor” PS.20
We used SAS statistical software (version 9.1.3; SAS Institute Inc, Cary, NC) for all analyses, and we considered P < .05 to be statistically significant.
A total of 552 patients met the criteria for study eligibility. The mean age of the patients at diagnosis was 67.4 years (standard deviation [SD], 9.1 years). Of the patients eligible for the study, 42% were female, whereas the racial distribution was 69% white and 31% black. The AJCC staging distribution was as follows: 9% of patients were diagnosed with stage II disease, 20% with stage IIIA disease, 19% with stage IIIB disease, and 52% with stage IV disease. The average Charlson comorbidity index across the eligible sample was 2.8 (SD, 3.4), whereas the average number of distinct prescription drugs used in the year before diagnosis was 9.3 (SD, 7.1).
The average number of inpatient days in the year before diagnosis for the cohort (including those with no inpatient stays) was 2.9 days (SD, 7.5 days), whereas the average number of inpatient stays was 0.5 (SD, 0.8), resulting in an average inpatient length of stay of 5.0 days (SD, 5.2 days). The average number of outpatient visits was 5.7 (SD, 8.5) and the average number of emergency department visits was 0.6 (SD, 1.1) for the same time period. Across the study-eligible sample, 28% recorded any home health use, 3% had same-day surgery, 12% incurred a DME dispensing, and 4% incurred a stay in a rehabilitation facility or SNF. None incurred a hospice stay in the period before the lung cancer diagnosis.
Medical Record Documentation of PS
Of the 552 study eligible patients, PS was recorded in the medical record for 261 cases (47%). Among these, a numeric score was documented in 248 cases (95%), with the ECOG scale most often used (74%). For the remaining 13 patients, although a numeric score was not documented, explicit documentation was found of either “good” or “poor” PS.
Among the 291 (53%) patients for whom PS was not recorded, there were 181 for whom there was a sufficient verbal description of the patient's functioning in either the physician's notes, nurse's notes, or a combination of both to enable a determination of either a “good” or “poor” PS score. Thus, overall there were 442 patients (80%) for whom PS was determinable in their medical record.
Differences in patient characteristics by PS documentation level are reported in Table 2. The first 2 columns compare those patients for whom medical record documentation could be used to determine PS (known PS) with those for whom medical record documentation was insufficient to determine PS (“unknown” PS). As shown, patients with “unknown” PS (n = 110) did not differ significantly from those with a known PS (n = 442) with regard to demographic or clinical characteristics or measures of medical care use.
Table 2. Sample Characteristics and Prescription Drug and Medical Care Use in the Year Prior to Diagnosis of Lung Cancer by PS Documentation Level (n = 552)
|Average diagnosis age (SD)||66.9 (9.9)||67.5 (8.8)||68.0 (8.6)||66.9 (9.2)|
|AJCC stage, %|
|Average Charlson comorbidity index (SD)||2.7 (3.6)||2.8 (3.3)||2.9 (3.5)||2.7 (3.1)|
| Atherosclerotic cardiovascular disease, %||15.4||21.3||24.5a||16.6a|
| Congestive heart failure, %||16.4||17.4||18.4||16.0|
| Ischemic heart disease, %||8.2||9.3||10.3||7.7|
| Peripheral vascular disease, %||11.8||13.4||14.2||12.2|
| Dementia, %||0.9||1.1||1.5||0.6|
| Pelvic ulcer disease, %||2.7||3.6||3.8||3.3|
| Rheumatologic disease, %||5.4||6.3||6.1||6.6|
| Chronic pulmonary disease, %||32.7||41.4||41.4||41.4|
| Liver disease, %||2.7||1.4||0.8||2.2|
| Diabetes, %||19.1||28.3||29.9||26.0|
| Diabetes with complications, %||4.6||4.5||4.2||5.0|
| Paralysis, %||1.8||0.9||1.1||0.6|
| AIDS, %||1.8||1.1||1.1||1.1|
| Cancer, %||24.6||29.9||28.0||32.6|
| Cancer with metastasis, %||5.4||7.5||8.0||6.6|
| Renal disease, %||6.4||5.7||6.1||5.0|
| Aneurysm, %||6.4||6.3||7.3||5.0|
| Gangrene, %||0.9||0.7||0.8||0.6|
|Prescription Drug Use|
|Average no. of dispensings (SD)||8.7 (7.0)||9.4 (7.1)||9.5 (7.1)||9.3 (7.2)|
|Percentage with ≥1 dispensing||85||89||91||86|
|Medical Care Use|
|Average no. of IP days (SD)b||2.7 (6.8)||3.0 (7.7)||2.4 (5.1)a||3.9 (10.4)a|
|Average no. of IP stays (SD)||0.4 (0.7)||0.5 (0.8)||0.4 (0.7)a||0.6 (0.8)a|
|Average length of IP stays (SD)c||5.8 (5.2)||6.1 (5.2)||6.0 (5.3)||6.1 (5.0)|
|Average no. of OP visits (SD)||5.9 (8.3)||5.6 (8.6)||5.7 (9.1)||5.5 (7.8)|
|Percentage with ≥1 OP visit||75||76||77||76|
|Average ≥1 ED visit (SD)||0.7 (1.0)||0.6 (1.1)||0.7 (1.2)||0.5 (1.0)|
|Percentage with ≥1 home health claim||25.4||29.0||31.4||25.4|
|Percentage with ≥1 ambulatory surgery claim||2.7||3.2||3.4||2.8|
|Percentage with ≥1 rehabilitation/SNF claim||4.6||3.8||5.0||2.2|
|Percentage with ≥1 DME claimd||10.9||12.0||13.4||9.9|
Among patients with a known PS, the third and fourth columns of Table 2 compare patient characteristics between those who had a documented PS (either numeric or verbal) with those for whom a PS was extrapolated based on notes in the medical record. No significant differences were observed for most measures. However, there were significant differences by gender, diagnosis of atherosclerotic cardiovascular disease, and the average number of inpatient days.
Patient Factors Associated With PS
Among the 442 patients for whom PS was known, 290 patients (66%) had a “good” PS using the pre-2009 definition of “good” and 152 (34%) had a “poor” PS. This changed to 76% with a “good” PS and 24% with a “poor” PS when those with a documented numeric PS of 2 were considered to have “good” PS, as would be consistent with that in the 2009 ASCO guidelines for the use of chemotherapy. The unadjusted differences in patient characteristics by PS are illustrated in Table 3. Compared with patients with “good” PS, patients with “poor” PS were significantly older (69.7 years vs 66.4 years) and more likely to be male (66% vs 54%), have stage IV disease (64% vs 44%), and have a significantly higher Charlson comorbidity index (3.6 vs 2.4). Consistent with the latter finding, patients with “poor” PS were significantly more likely to have been diagnosed with several of the individual components of the Charlson comorbidity index when compared with those with “good” PS. Patients with “poor” PS also incurred significantly more inpatient days (5.5 days vs 1.7 days) as well as longer lengths of stay (6.8 days vs 5.4 days) in the year before diagnosis, and were more likely to have incurred any outpatient visit, home health use, or DME use in the year before diagnosis. Also of note is that patients with “poor” PS were significantly less likely to have undergone chemotherapy in the year after diagnosis (42% vs 82%) (data not shown). Similar differences between the groups were found when those with a PS of 2 were realigned with the “good” PS group, with 2 exceptions: statistically significant differences in gender and the prevalence of peripheral vascular disease no longer existed.
Table 3. Among Patients With Known PS, Sample Characteristics and Prescription Drug and Medical Care Use in the Year Prior to Diagnosis of Lung Cancer by PS (n=442)
|Average diagnosis age (SD)||66.4 (9.1)c||69.7 (7.9)c||66.8 (8.9)c||70.0 (8.1)c|
|AJCC stage, %|
|Average Charlson comorbidity index (SD)||2.4 (3.0)c||3.6 (3.8)c||2.5 (3.1)c||3.9 (3.9)c|
| Atherosclerotic cardiovascular disease, %||19.3||25.0||19.9||25.5|
| Congestive heart failure, %||12.1c||27.6c||14.3c||27.4c|
| Ischemic heart disease, %,||6.6c||14.5c||7.7c||14.2c|
| Peripheral vascular disease, %||10.3c||19.1c||12.2||17.0|
| Dementia, %||0.0c||3.3c||0.0c||4.7c|
| Pelvic ulcer disease, %||2.1c||6.6c||2.4c||7.6c|
| Rheumatologic disease, %||5.9||7.2||6.3||6.6|
| Chronic pulmonary disease, %||34.5c||54.6c||35.4c||60.4c|
| Liver disease, %||1.0||2.0||1.0||3.0|
| Cancer, %||28.3||32.9||27.7||36.8|
| Cancer with metastasis, %||6.6||9.2||6.8||9.4|
| Diabetes, %||26.2||32.2||28.0||29.2|
| Diabetes with complications, %||3.8||5.9||3.9||6.6|
| Paralysis, %||0.3||2.0||0.3||2.8|
| AIDS, %||1.0||1.3||0.9||1.9|
| Renal disease, %||3.8c||9.2c||3.9c||11.3c|
| Aneurysm, %||5.2||8.6||6.2||6.6|
| Gangrene, %||0.3||1.3||0.3||1.9|
|Prescription Drug Use|
|Average no. of dispensings (SD)||9.0 (7.1)||10.3 (7.2)||9.1 (7.1)||10.5 (7.3)|
|Percentage with ≥1 dispensing||87||93||88||92|
|Medical Care Use|
|Average no. of IP days (SD)d||1.7 (3.9)c||5.5 (11.7)c||1.8 (4.1)c||6.8 (13.3)c|
|Average no. of IP stays (SD)||0.3 (0.6)||0.8 (1.0)||0.3 (0.6)||0.9 (1.1)|
|Average length of IP stays (SD)e||5.4 (4.8)||6.8 (5.5)||5.7 (5.1)||6.7 (5.2)|
|Average no. of OP visits (SD)||6.0 (8.5)||4.9 (8.7)||6.0 (8.6)||4.5 (8.6)|
|Percentage with ≥1 OP visit||82c||66c||81c||63c|
|Average ≥1 ED visit (SD)||0.5 (0.9)c||0.8 (1.3)c||0.5 (1.0)c||0.9 (1.3)c|
|Percentage with ≥1 home health claim||23.1c||40.1c||24.7c||42.4c|
|Percentage with ≥1 ambulatory surgery claim||3.8||2.0||3.6||1.9|
|Percentage with ≥1 rehabilitation/SNF claim||3.1||5.3||3.3||5.7|
|Percentage with ≥1 DME claimf||6.9c||21.7c||8.0c||24.5c|
Results from the logistical regression models predicting “good” versus “poor” PS defined the 2 ways are presented in Table 4. Results are presented for models fit on the full sample and include only significant (P < .05) variables per the stepwise regression. In the model that included only tumor registry variables, only age at diagnosis and AJCC stage were selected (Model 1). Diagnosis of chronic pulmonary disease, the number of inpatient stays, any outpatient visits, and the number of emergency department admissions were all added when information from medical claims data were considered (Model 2). One more variable, the number of distinct prescription drugs, was added when information from pharmaceutical claims data was considered (Model 3).
Table 4. Estimated Logistic Regression Parameters (β) and SEs and Measures of Model Performance for Alternative Models of PS Predictors
|Age at diagnosis, y||-0.04||0.01f||-0.04||0.01f||-0.04||0.01f|
|AJCC stage IV disease||-0.84||0.21e||-0.81||0.23f||-0.84||0.23f|
|Chronic obstructive pulmonary disease|| || ||-0.67||0.24f||-0.63||0.24f|
|No. of IP stays|| || ||-0.58||0.16f||-0.60||0.16f|
|Any no. of OP visits|| || ||1.05||0.26e||1.19||0.27e|
|No. of ED visits|| || ||-0.22||0.10g||-0.22||0.10g|
|Any no. of DME claims|| || ||-0.80||0.35g||-0.81||0.36g|
|Any prescription drug dispensing|| || ||-1.08||0.43g|
|Model Performance|| || || || |
|C statistic (95% CI)||0.66 (0.61-0.71)||0.75 (0.71-0.81)||0.76 (0.72-0.81)|
|R2 (95% CI)||0.07 (0.03-0.11)||0.19 (0.13-0.26)||0.20 (0.15-0.28)|
|Predictive threshold = 0.50|
| Sensitivity (95% CI)||0.88 (0.78-1.00)||0.90 (0.84-0.94)||0.88 (0.84-0.93)|
| Specificity (95% CI)||0.20 (0.00-0.40)||0.43 (0.31-0.55)||0.45 (0.34-0.56)|
| False positive (95% CI)||0.32 (0.28-0.37)||0.25 (0.21-0.29)||0.25 (0.20-0.29)|
| False negative (95% CI)||0.54 (0.29-0.87)||0.31 (0.24-0.40)||0.34 (0.25-0.40)|
|Predictive threshold = 0.60|
| Sensitivity (95% CI)||0.72 (0.61-0.83)||0.79 (0.72-0.87)||0.79 (0.72-0.87)|
| Specificity (95% CI)||0.43 (0.27-0.65)||0.53 (0.45-0.66)||0.55 (0.47-0.67)|
| False positive (95% CI)||0.29 (0.23-0.33)||0.24 (0.18-0.27)||0.23 (0.18-0.26)|
| False negative (95% CI)||0.55 (0.46-0.62)||0.43 (0.34-0.49)||0.42 (0.34-0.48)|
|Predictive threshold = 0.70|
| Sensitivity (95% CI)||0.49 (0.33-0.65)||0.61 (0.52-0.74)||0.64 (0.53-0.75)|
| Specificity (95% CI)||0.75 (0.60-0.88)||0.69 (0.60-0.79)||0.69 (0.61-0.79)|
| False positive (95% CI)||0.21 (0.16-0.26)||0.21 (0.16-0.25)||0.20 (0.15-0.25)|
| False negative (95% CI)||0.56 (0.51-0.62)||0.52 (0.44-0.57)||0.50 (0.43-0.56)|
|PS (Post-2009): Good/Poorh||Model 1b||Model 2c||Model 3d|
|Age at diagnosis, y||-0.04||0.01e||-0.04||0.02f||-0.04||0.02f|
|AJCC stage IV disease||-0.64||0.23e||-0.55||0.26f||-0.57||0.26f|
|Chronic pulmonary disease|| || ||-0.88||0.27e||-0.83||0.27e|
|No. of IP stays|| || ||-0.80||0.17e||-0.81||0.17e|
|Any no. of OP visits|| || ||1.08||0.29e||1.20||0.30e|
|No. of ED visits|| || ||-0.25||0.11f||-0.25||0.11f|
|Any no. of DME claims|| || ||-0.60||0.36g||-0.61||0.36g|
|Any prescription drug dispensing|| || ||-0.98||0.50g|
|C statistic (95% CI)||0.64 (0.58-0.70)||0.78 (0.74-0.84)||0.78 (0.74-0.85)|
|R2 (95% CI)||0.04 (0.01-0.08)||0.19 (0.13-0.28)||0.20 (0.14-0.28)|
|Predictive threshold = 0.50|
| Sensitivity (95% CI)||1.00 (0.98-1.00)||0.94 (0.91-0.97)||0.94 (0.91-0.97)|
| Specificity (95% CI)||0.00 (0.00-0.01)||0.28 (0.19-0.47)||0.32 (0.21-0.50)|
| False positive (95% CI)||0.24 (0.20-0.28)||0.19 (0.15-0.22)||0.19 (0.14-0.22)|
| False negative (95% CI)||1.00 (0.98-1.00)||0.40 (0.23-0.48)||0.38 (0.21-0.46)|
|Predictive threshold = 0.60|
| Sensitivity (95% CI)||0.96 (0.87-1.00)||0.90 (0.86-0.94)||0.90 (0.86-0.94)|
| Specificity (95% CI)||0.06 (0.00-0.27)||0.42 (0.29-0.59)||0.45 (0.32-0.60)|
| False positive (95% CI)||0.24 (0.20-0.27)||0.17 (0.13-0.20)||0.16 (0.12-0.20)|
| False negative (95% CI)||0.71 (0.37-1.00)||0.42 (0.32-0.52)||0.42 (0.32-0.51)|
|Predictive threshold = 0.70|
| Sensitivity (95% CI)||0.76 (0.62-0.91)||0.83 (0.78-0.99)||0.83 (0.76-0.88)|
| Specificity (95% CI)||0.37 (0.13-0.63)||0.56 (0.22-0.72)||0.55 (0.46-0.70)|
| False positive (95% CI)||0.21 (0.16-0.24)||0.14 (0.11-0.21)||0.15 (0.10-0.17)|
| False negative (95% CI)||0.67 (0.55-0.78)||0.49 (0.28-0.58)||0.50 (0.41-0.57)|
Statistical performance improved with the inclusion of additional explanatory variables (Table 4). Cross-validated c and R2 values were never more than 0.01 smaller than fitted values. By using a predictive threshold of 0.50, a high sensitivity (0.88 or 0.94, depending on how “good” PS was defined) was obtained with the best model (Model 3), but with moderate specificity (0.45 or 0.32). Increasing the predictive threshold to 0.70 continued to yield relatively high sensitivity (0.64 or 0.83) and more moderate specificity (0.69 or 0.55), regardless of how “good” PS is defined.
Table 5 shows the actual and predicted “good” PS rates for patients within each of the 10 deciles. As measured by the Hosmer-Lemeshow chi-square statistic,20 all models had good calibration, in which actual and predicted rates within each of the 10 deciles were not significantly different (P = .69, P = .32, and P = .13 for Models 1-3, respectively) when a PS of 2 was defined as “poor” and likewise not significantly different (P = .92, P = .63, and P = .98 for Models 1-3, respectively) when a PS of 2 was defined as “good.” Model discrimination was also improved with the inclusion of more explanatory variables.
Table 5. Number of Patients With Observed (Predicted) Good PS by Model and Model-Determined Decilea
|1||23 (21)||10 (9)||8 (8)|
|2||23 (22)||13 (18)||14 (18)|
|3||25 (24)||29 (24)||28 (23)|
|4||26 (27)||29 (27)||27 (27)|
|5||23 (27)||31 (30)||32 (30)|
|6||28 (31)||33 (32)||37 (33)|
|7||26 (29)||34 (34)||31 (36)|
|8||37 (34)||33 (36)||34 (37)|
|9||39 (37)||37 (40)||38 (39)|
|10||40 (38)||41 (40)||41 (39)|
|Rank Deciles||No. With Observed (Predicted) Good PS (Post-2009)f|
| ||Model 1c||Model 2d||Model 3e|
|1||28 (28)||12 (12)||10 (12)|
|2||29 (30)||22 (24)||24 (24)|
|3||34 (33)||32 (30)||32 (30)|
|4||37 (36)||33 (33)||34 (33)|
|5||32 (34)||38 (36)||36 (36)|
|6||34 (35)||40 (38)||37 (37)|
|7||39 (37)||35 (38)||39 (39)|
|8||42 (42)||42 (40)||39 (40)|
|9||36 (37)||41 (42)||41 (42)|
|10||25 (23)||41 (41)||44 (43)|
Among a contemporary cohort of patients with stage II through IV lung cancer, we found explicit medical record documentation of PS less than half the time (47%). Review of nursing and physician notes led PS to be determinable via medical records approximately 80% of the time. Given the central role that PS plays in clinical decision-making among patients with lung cancer, the lack of consistent medical record documentation is troubling. When documented, we found the distribution of PS among the cohort (34% with “poor” PS [when a PS of 2 was considered as having “poor” PS]) to be identical to the 34% with “poor” PS reported by Lilenbaum et al in contemporary clinical studies.14
We found that “poor” PS among lung cancer patients with stage III to IV disease can be predicted reasonably well regardless of whether a PS of 2 is considered “good” or “poor.” Furthermore, this was true regardless of the level of comprehensiveness of the data used, but particularly for models that used information routinely available in medical claims data or medical and pharmaceutical claims data combined, in which the c statistics were all >0.70. Although the inclusion of information routinely available in medical claims data marginally improved model fit and predictive accuracy when compared with a model fit using only data available in tumor registries, the inclusion of information from pharmaceutical claims data did not appear to substantively alter model fit, regardless of how “good” PS is defined.
To our knowledge, this is the first study to use observational data to estimate PS for lung, or any other, cancer patients. As such, these findings represent a significant contribution to the field. These findings are important for our ability to monitor quality of care and the appropriateness of chemotherapy, and our ability to prospectively identify patients who may be appropriate (but not targeted) for clinical trial or palliative care/hospice enrollment without relying on expensive and time-consuming primary data collection methods. Predictive models such as those presented herein that rely on data routinely available within large, observational databases can also be used to augment comparative effectiveness research, including comparisons of different chemotherapy regimens as well as the receipt of chemotherapy versus non-chemotherapy treatment and thereby greatly enhance the capabilities of existing electronic databases such as that available via Surveillance, Epidemiology, and End Results (SEER)-Medicare data.
Although our findings of significant differences in chemotherapy receipt by “good” versus “poor” PS add face validity to the accuracy of the PS score abstracted from the medical record, the finding that approximately 42% of patients with medical record-documented “poor” PS received chemotherapy in the year after diagnosis highlights the importance of attempts such as ours to make documented PS or PS proxies more readily available to those who monitor and study cancer care quality and outcomes. At the time of this study, national clinical practice guidelines for patients with non-small cell lung cancer unequivocally recommended chemotherapy for patients with a PS of 0 or 1.1, 3 These guidelines suggested that chemotherapy might “possibly” be of benefit in patients with a PS of 2, noting that those patients had been excluded from clinical trials. This was in keeping with expert opinions of the time.21 More recent data have shown survival and quality of life benefits for patients with a PS of 2, although less than with “good” PS, and the most recent ASCO guidelines are more supportive of chemotherapy for patients with a PS of 2.22 Routine chemotherapy among lung cancer patients with a PS ≥3 continues to not be recommended by any national professional organization. Chemotherapy use in patients with little chance of benefit and more chance of toxicity may delay discussion about prognosis and dying,23 which may lead to further poor quality of care, such as the inappropriate use of mechanical ventilation or delays in referral to hospice, worse surviving caregiver quality of life, and high end-of-life care costs.24 Without PS proxies, little can be done to use automated data sources to monitor and measure either the underuse or overuse of chemotherapy and its implications on patient and economic outcomes.
The results of the current study should be interpreted in light of the following limitations. First, subjectivity is present in the assignment of PS. Even when assessed by a healthcare professional, PS scales are subjective in nature25 and when estimated by physicians are known to be prone to error,26 usually being overestimated.14 Thus, even if our model were 100% accurate, caution would have to be used in interpreting results dependent on an accurate classification of PS. Nonetheless, the ability to develop a useful proxy measure of PS from existing observational data will help in the use of existing national data resources such as that available with SEER-Medicare data for comparative effectiveness research. Second, our models were developed on a relatively small sample and one that is specific to 1 delivery system. Therefore, not only should care be taken when generalizing findings, but our parsimonious models may exclude important predictors of PS available in observational data. Finally, identifying patients with “poor” PS by their diagnoses and use of care via claims data poses its own limitations. For example, DME use varies significantly based on differing personal preferences and practices in addition to restrictions on reimbursement by public and private insurers. Although claims for DME offer useful information, they identify only selected people with potentially disabling conditions.27 The same is true of medical diagnoses, many of which are known to be undercaptured in medical claims data, and prescription drug dispensing, which reflects only those medications prescribed by physicians that the patient elected to fill. Nevertheless, the ability to proxy PS is critical to the ability to use observational data to accurately draw conclusions regarding comparative effectiveness and cancer care quality at a population level if not at the bedside.
Despite these limitations, results from the current study shed new light on the capacity of information routinely available in observational data to identify lung cancer patients with “good” versus “poor” PS. This is especially useful for researchers interested in leveraging existing observational databases for comparative effectiveness research. Recent studies have highlighted a likely overuse of chemotherapy in the treatment of patients with lung cancer as well as aggressive treatment near the end of life.28-30 Using a predictive model such as the one developed herein with a threshold of 0.70 to proxy a patient as having “poor” PS would ensure reasonably high specificity (0.69 if a PS of 2 is considered “poor”) and thereby enable the identification of a population for whom the receipt of chemotherapy appears inadvisable or requiring a more tailored discussion of less benefit and more risk per current guideline recommendations, and for whom early hospice intervention may be warranted. Conversely, using a lower predictive threshold (0.50) and thereby increasing the sensitivity of the predictive model may be useful to health disparities researchers, in whom interest might be in testing a hypothesis centered on undertreatment among minority populations. Similarly, choosing a predictive threshold with a high sensitivity could facilitate population identification for observational comparative effectiveness research. The best selection of both a predictive threshold and the allocation of patients with a PS of 2 will ultimately depend on the user's objectives.
PS has long been considered one of the strongest prognostic factors31 and is used today by clinicians to assess the appropriateness of chemotherapy and regimen choice for patients with lung cancer.22 With the aging population, the number of Americans with functional limitations will increase dramatically, and therefore the urgency to capture and classify information regarding functional status will grow.32 Furthermore, given the current challenges faced by the US healthcare system to deliver better and more cost-effective outcomes, the importance of comparative effectiveness studies is likely to only grow. To the best of our knowledge, the results of the current study are the first to provide health services researchers and others with a viable tool with which to predict PS among lung cancer patients using information routinely available in observational data. As such, the value of observational data for comparative effectiveness research and for use by those interested in understanding cancer care quality or targeting specific lung cancer patients for possible inclusion in clinical trials, hospice care, or other interventions is greatly enhanced.
We thank Elizabeth Dobie and Nonna Akkerman for their assistance with data acquisition.
CONFLICT OF INTEREST DISCLOSURES
Supported by the Fund for Henry Ford and the National Cancer Institute, National Institutes of Health under grant (NIH R01 CA114204-03).