Populations with low income, economic barriers, and cultural and/or linguistic access barriers to medical care are at risk for worse cancer-related outcomes. Medically underserved patients with hepatocellular carcinoma (HCC) have decreased survival compared with those in the Surveillance, Epidemiology, and End Results database. Given this suboptimal outcome, the high cost of HCC treatment, and unknown risk-to-benefit ratios of invasive therapies, the authors sought to identify a predictive model of extremely poor overall survival (OS).
A retrospective review of an institutional HCC database was conducted. Payor status, race, treatment, clinicopathologic, and outcome parameters were recorded. The primary outcome was OS <1 month. A logistic regression model predictive of OS <1 month was developed using backward, stepwise elimination and bootstrapping techniques.
In total, 337 patients HCC (272 men and 65 women) were identified. Only 4% of patients had Medicare coverage; whereas 96% relied on publicly funded, safety-net health programs. OS <1 month was noted in 90 patients (26.7%). There were no differences in race or sex between patients who had an OS <1 month and those with an OS >1 month. A higher percentage of patients who had an OS <1 month had advanced stage disease and did not receive therapy for HCC. Advanced liver disease, as measured by laboratory parameters and a composite score (Child-Pugh and Model for End-Stage Liver Disease [MELD]), alpha fetoprotein level, creatinine level, disease stage, and lack of treatment were predictors of OS <1 month.
Hepatocellular carcinoma (HCC) is the fifth most common cancer in the world and is the leading cause of cancer-related mortality in many areas.1-3 With the prevalence of HCC expected to increase in the United States over the next 20 years, approximately 2% of the urban population may be at risk for HCC.4, 5 These epidemiologic statistics are ominous, because recent data from the Surveillance, Epidemiology, and End Results (SEER) registry demonstrated a median survival for patients with HCC in the United States is only 7 months.6 For those who lack access to routine medical care, survival is worse. According to the US Department of Health and Human Services definition, a population is designated medically underserved if it has economic barriers (low-income or Medicaid-eligible) or cultural and/or linguistic access barriers to primary medical care services.7 Our institution recently demonstrated that the median survival for medically underserved patients with HCC was only 3.4 months.8 Combined with the finding that few effective treatment options currently exist, this malignancy is likely to remain a significant source of cancer-related mortality, especially for those without regular medical care.
Prognosis is central to the practice of medicine and often is used to direct diagnostic pathways and to inform patient treatments.9 In oncology, prognostic markers are clinical measures used in predictive models to help elicit an individual patient's risk of a future outcome, such as disease recurrence after treatment or death.10 Prognostic models for survival or time-to-event data are used increasingly in medical care and clinical research. These models may facilitate individual treatment choice and aid in patient counseling.11, 12 However, prognostic models based specifically on time-to-event data are limited by the finding that “estimated” duration of survival or time to the event is not possible.13 This is because the dependent variable in these models is a hazard ratio. Other regression models, such as a simple linear or logistic regression, do not suffer from these inherent methodological limitations. Despite active investigation, only a few prognostic models have been incorporated into widespread clinical use, such as the Acute Physiology and Chronic Health Evaluation II (APACHE II) score and the Model for End-Stage Liver Disease (MELD) score.14, 15 Considering the relatively poor survival observed in medically underserved patients with HCC, we hypothesized that we could develop an exploratory prognostic model that would predict the probability of 30-day or 1-month mortality in a high-risk patient population.
MATERIALS AND METHODS
From an Institutional Review Board-approved, prospective, gastrointestinal cancer database all patients with a diagnosis of HCC who received treatment in the public hospital system in Houston, Texas from 1998 to 2010 were reviewed. Diagnosis was made by 1 of 3 methods: pathologic confirmation, radiographic evidence of a liver mass and alpha fetoprotein (AFP) level ≥400 ng/mL, or a progressively enlarging liver mass in the setting of cirrhosis in which the clinical picture is consistent with HCC. Standard patient demographics (age, sex, race) were recorded. Laboratory values at diagnosis were recorded (AFP, albumin, international normalized ratio [INR], total bilirubin, platelet count, model for end-stage liver disease [MELD] score, hepatitis B virus status, and hepatitis C virus status). Clinicopathologic data (stage at diagnosis, treatment course, and survival) were recorded. Patients with American Joint Committee on Cancer (AJCC) stage I and II disease were included in the same group as those with localized disease. Patients who had AJCC stage III and IV disease were categorized with regional disease and metastatic disease, respectively. The MELD score was calculated using the following formula: MELD = 3.78(Ln serum bilirubin[mg/dL]) + 11.2(Ln INR) + 9.57(Ln serum creatinine[mg/dL]) + 6.43. The MELD score is a reliable measure of mortality risk in patients with end-stage liver disease and is suitable for use as a disease severity index to determine organ allocation priorities.16
Treatment was determined by a combination of data from the institutional tumor registry and the electronic medical record. Liver transplantation is not offered in the public hospital system, and ablative procedures (radiofrequency or microwave) were not performed at the time of this study. The primary modalities of treatment were investigational systemic chemotherapy, transarterial chemoembolization, and surgical resection. Survival was determined as the difference between the date of diagnosis and the date of death confirmed by the institutional tumor registry. Overall survival (OS) from the date of diagnosis to the date of death that was <30 days was coded as a dichotomous variable (0 or 1) termed OS <1 month. This time interval was chosen for several reasons. First, it is a clinically relevant time point, because 1 month should be easily understandable by providers, patients, and family members. Second, 30-day mortality is a common time frame used as a hospital quality measure for other diagnoses, such as acute myocardial infarction, pneumonia, and congestive heart failure.17, 18
Univariate analyses were completed with Student t tests for continuous variables and chi-square analyses for categorical variables. Survival analysis was performed using the Kaplan-Meier method with log-rank tests, and significance was defined as P < .05. Logistic regression models (univariate and multivariate) were used to determine the association between potential predictor variables and the primary outcome (OS <1 month). The final multivariate model was created using the backward, stepwise method of covariate elimination. The backward, stepwise procedure was used to include only significant covariates (P < .05) in the final multivariate model.19 In a relatively large patient population, this liberal P value excludes only those characteristics with low predictive value.20 The backward elimination scheme is preferred to forward selection because it starts with a full model and considers a wider range of possible best models.21 In addition, backward elimination is a better method when variables are possibly correlated. The significance of the exploratory model was tested using the likelihood-ratio test to determine whether the model was significantly better with additional predictor variables. Standard tests for the regression assumptions were performed. The collinearity of predictor variables was assessed by using STATA 12 software (Stata Corporation, College Station, Tex), in which the default automatically excluded 1 variable from a pair that displayed perfect collinearity. If this was not detected by STATA, then collinearity diagnostic tests (variance inflation factors) were performed on the predictor variables. The final model was based on the following equation: logit = βo + β1X1 + β2X2 + ··· + βkXk. The subsequent predicted probabilities were determined using the equation: probability = 1/(1 + e−logit).
Validations of Predicted Probabilities
Because the logit is the natural log of the odds (or probability/[1 − probability]), it can be transformed back to the probability scale. Then, the resultant predicted probabilities can be revalidated with the actual outcome to determine whether high probabilities indeed are associated with events and low probabilities are associated with nonevents. The degree to which predicted probabilities agree with actual outcomes is expressed as either measure of association. To test the discriminative ability of the prognostic model, we performed bootstrap resampling with calculation of the Somers D statistic. The bootstrap functions to use the data from our study cohort as a “surrogate population” to approximate the sampling distribution of Somers D; ie, to resample (with replacement) from the sample data at hand and create a large number of “test” or bootstrap samples. The sample summary was then computed on each of the bootstrap samples. The Somers D statistic measures the strength and direction of relation between pairs of variables. Its values range from −1.0 (all pairs disagree) to 1.0 (all pairs agree). It is defined as (nc − nd)/t, where nc is the number of pairs that are concordant, nd is the number of pairs that are discordant, and t is the number of total number of pairs with different responses.22-24
A second method of model discrimination was performed using receiver operating characteristic (ROC) curves. ROC analysis quantifies the accuracy of diagnostic tests or other evaluation modalities used to discriminate between 2 states or conditions.25 The discriminatory accuracy of a diagnostic test is measured by its ability to correctly classify known normal and abnormal subjects. ROC curve analysis generates a graph of the sensitivity versus 1-specificity of the diagnostic test. The sensitivity is the fraction of positive cases that are correctly classified by the diagnostic test, whereas the specificity is the fraction of negative cases that are correctly classified. Thus, the sensitivity is the true-positive rate, and the specificity is the true-negative rate. The global performance of a diagnostic test is commonly summarized as the area under the ROC curve (AUC). This area can be interpreted as the probability that the result of a diagnostic test of a randomly selected, abnormal individual will be greater that the result of the same diagnostic test from a randomly selected, normal individual. An AUC equal to 0.5 denotes there is no predictive ability of the test, whereas an AUC equal to 1.0 implies a perfect predicted probability of the test under consideration.
To identify the predicted probability and further define any statistical interaction associated with important covariates in the logistic regression model, margins of responses were obtained. The margin is a postestimation statistic based on a fitted model calculated over a data set in which some of or all the covariates are fixed at different values.26 The predictive margins command answers the question, “What does my model have to say about X group of patients, where Y factor or variable may be fixed at a certain Z level?”All statistical analyses were performed using the STATA 12 software package (Stata Corporation).
In total, 337 patients (272 men and 65 women) who were diagnosed with HCC from 1998 to 2010 were identified for this study, including 90 patients (26.7%) who survived for <1 month (Table 1). The overwhelming majority of patients (96%) had no insurance and relied completely on local public funding, whereas only 4% of patients had Medicare. The majority of patients were either African American or Hispanic, and there were no racial or sex differences between the 2 groups. At the time of clinical presentation and diagnosis of HCC, there were differences in the laboratory values indicative of cirrhosis and end-stage liver disease (ESLD) between groups. To evaluate baseline cirrhosis between study groups, 2 commonly used scoring systems for assessing the prognosis of patients with ESLD were calculated. Child-Pugh and MELD scores were significantly higher in the OS <1 month group. On univariate analysis, as the MELD score increased, the probability of OS <1 month also increased (Fig. 1). Patients in the OS <1 month group were more likely to have regional and metastatic disease (P = .005; chi-square test) and were less likely to have received any type of HCC therapy (6% vs 27%; P < .001) (Table 2).
P values are for the chi-square test statistic for differences in stage at presentation, treatment, and no treatment, respectively.
Multiple tumors at diagnosis
A multivariate logistic regression was performed to determine predictors of OS <1 month. Initial univariate results are provided in Table 3. Both the MELD score and the Child-Pugh score were predictive of OS <1 month; however, the likelihood ratio test indicated that the MELD score was a better fit to the regression model than the Child-Pugh score. The initial model included age, AFP, albumin, MELD, AJCC stage, and treatment. Analysis of the model yielded a specification error indicating that a significant variable probably was omitted. To further evaluate the model, we tested several of the standard model-building assumptions. A standard assumption is that all significant predictor variables are independent of each other. We tested the covariates for all plausible interactions and observed that an interaction term between MELD score and albumin level was significant. Including this interaction term in the model eliminated the specification error and improved the overall fit. Collinearity of these predictor variables was excluded by assessing the variance inflation factor.
Table 3. Univariate and Multivariate Analysis of Variables Predictive of Overall Survival <1 Month
Abbreviations: AFP, alpha fetoprotein; INR, international normalized ratio; CI, confidence interval; MELD, Model for End-Stage Liver Disease; OR, odds ratio.
Child-Pugh A is the referent value.
Localized stage is the referent value, and albumin*MELD is the interaction term.
The interaction between MELD score and albumin level at diagnosis implies a complex relation. More specifically, the relation between 1 covariate and the dependent variable varies across the value of another covariate. To further investigate this statistical interaction, we used our logistic regression model to calculate the predictive margin of the dependent variable (OS <1 month) at fixed values of MELD score and albumin level, whereas all other covariates were fixed at their mean values (Table 4). The MELD score was fixed at 10, 20, 30, and 40 and included serum albumin levels of 2.0 g/dL, 2.5 g/dL, and 3.0 g/dL. This model demonstrated that a patient had a 16% probability of OS <1 month if their MELD score was 10 and their albumin was 3.0 g/dL, and a MELD score of 40 and a serum albumin level of 2.0 g/dL predicted a 93% probability of OS <1 month.
Table 4. Predicted Margins of the Probability of Survival for <1 Month Based on Combinations of Covariates
Percent Probability of OS < 1 Month (95% CI)
Serum Albumin 2.0 g/dL
Serum Albumin 2.5 g/dL
Serum Albumin 3.0 g/dL
Abbreviations: CI, confidence interval; MELD, Model for End-Stage Liver Disease; OS, overall survival.
0.308 (−0.32 to0.649)
0.395 (−0.17 to 0.960)
To examine the discriminatory performance of the logistic regression model, we performed 2 methods of internal validation. First, we calculated the Somers D statistic from our model, and it was 0.669 (standard error, 0.0620; 95% confidence interval, 0.547-0.791). Then, we performed bootstrap sampling with random covariate replacement (1000 repetitions) and calculated the bootstrapped Somers D. Bootstrapping the data set yielded an estimated Somers D value of 0.693, and the uncertainty of the model was measured by the standard error 0.0610 (95% confidence interval, 0.573-0.812). This Somers D was significantly (P = .000) greater than zero (ie, rejection of the null hypothesis that there is no association between predicted outcome and actual outcome), which internally validated the model.
In addition, we generated ROC curves, and the AUC was calculated to demonstrate the discriminative performance of the model to predict (or test) OS <1 month. The AUC was lowest and least predictive for clinical stage (0.62). In a comparison of ESLD scoring systems, the AUC for MELD score (0.73) was higher than for Child-Pugh score (0.68), indicating that the MELD score is the more predictive ESLD scoring system for OS <1 month (Fig. 2a). In addition, as more covariates were added to the model, the AUC increased, and the final multivariate logistic regression model yielded the highest AUC (0.85) (Fig. 2b).
The median survival of medically underserved patients with HCC who attended this urban, safety-net hospital system remained significantly lower than the national average. In this population, a considerable proportion of patients failed to receive any sort of therapy for HCC. This finding may be explained by the high prevalence of ESLD, which precluded aggressive therapy. Compounding the problem is the lack of evidence-based guidelines for the optimal treatment of patients with ESLD and/or with advanced-stage HCC, the unavailability of treatment options like transplantation in a public hospital system, and a lack of predictive models to guide clinical decision-making in high-risk patient populations (ie, low socioeconomic groups). In this study, we examined the factors associated with poor survival in a disadvantaged patient population and demonstrated 2 interesting findings: 1) the interaction between MELD score and albumin level, and 2) the model predictive of OS <1 month.
The data from this study suggest that the primary risk factor for death in this cohort of patients with HCC is the severity of underlying liver disease, particularly among those with localized disease (AJCC stages I-II). To account for ESLD in our cohort, we observed that the MELD score was better than the Child-Pugh score for predicting the probability of OS <1 month. The MELD score initially was developed using a logistic regression model and has been used to predict the probability of death (within 90 days) for patients with ESLD awaiting liver transplantation.27 Serum albumin, a variable that is included in the Child-Pugh score, is not a component of the MELD score. We included the interaction term MELD*albumin into our model and observed that it improved the overall fit. We also calculated the predictive margins of MELD and albumin to illustrate the effect the probability of OS <1 month. That is, together, a higher MELD score and a lower albumin level significantly increase the likelihood of OS <1 month. This interaction is plausible because patients with ESLD who had high MELD scores and concomitant low serum albumin levels could be expected to have a poor prognosis. Because both measurements are related to impaired liver function, an interaction between the 2 appears to be likely. Additional statistical analysis was performed to determine whether this correlation resulted from a true interaction and was not the result of collinearity. Including a collinear pair of variables will lead to an unstable model with a markedly elevated odds ratio and standard error. Neither was observed in our analysis.
These results are similar to those reported by other investigators, who noted a correlation between the MELD score and albumin levels for various outcomes in patients with HCC and/or ESLD.28-30 Herein, we developed a pilot model to predict the probability of the outcome of interest, OS <1 month, for patients with HCC. This differs from the regression model-building approach typically used in oncology. Most cancer-related regression models involves survival duration (time-to-event) and use the semiparametric Cox proportional hazards model, which measures the relation between predictor variables and survival. The Cox model does not make assumptions about the distribution of times to the event of interest, and the output of the Cox model is a hazard ratio, not a predicted or estimated time to the event. The equation for the Cox regression, as statistically defined, does not include an intercept term. The implication of the lack of an intercept term is that, from the regression output of a proportional hazards model, we cannot reconstruct group-specific hazard rates.31 Thus, only ratios can be estimated, and the output prevents using the regression equation to predict survival time. Exploratory prognostic survival models typically include predicted survival probabilities at specified time points in the period of observation.13 This is the same methodology that we used by specifically examining survival at 1 month.
Although the Cox proportional hazards model is widely used and is indispensable for clinicians and investigators, it does not provide the information that concerns most patients: an accurate estimate of survival time. Much like the APACHE II score predicts the probability of 30-day mortality after intensive care unit admission or the MELD score predicts the probability of death within 90 days for patients awaiting liver transplantation, we developed a logistic regression model to predict the probability of 30-day survival after the diagnosis of HCC in a high-risk population. We believe this effort has clinical utility in our high-risk patient population for poor outcomes and potential adverse events after aggressive therapies. Without an accurate means to estimate survival (ie, provide a prognosis), health professionals tend to err on the side of over-optimism with survival estimates.32 Unfortunately, this behavior likely leads to late referral to other specialties, such as palliative care/symptom control, and current evidence suggests that patients and families often regret being too optimistic about the patient's actual survival.33, 34 Moreover, using this information to develop prognostic or predictive models may improve clinicians' ability to determine which patients may receive more harm from high-risk, invasive procedures (hepatectomy or transarterial chemoembolization).
In this study of medically underserved patients with HCC, advanced cirrhosis appears to contribute significantly to poor median survival. For most cancers (breast, colorectal, prostate, etc), tumor stage usually dictates care, and most patients are eligible for some form of treatment. This often is not the case for HCC, because portal hypertension, cirrhosis, and thrombocytopenia limit treatment options.35 Because ESLD is a prominent feature in most patients with HCC, we believe that the approach outlined in this report is an important step toward providing providers and patients with an accurate means to estimate risk and survival. In fact, our data set includes 2 perioperative mortalities, because these were viewed as the ultimate adverse events. One of our goals was to develop a model to predict risk that is likely to be a component of ESLD, disease progression, and poor tolerance to therapy. This measure of survival estimation also is important because the available guidelines for the treatment of patients who have HCC with ESLD, at best, are a suggestion regarding which patients should receive which treatments. These analyses are an attempt to put data into a meaningful clinical context.
There are several important limitations of this study that are related primarily to the retrospective nature of the data. In this study, an analysis of extremely poor survival and its association with important epidemiologic exposures and risk factors was limited by the nature of data collection. Other medical conditions (ie, hypertension, obesity, and diabetes mellitus) and comorbidities (ie, cigarette smoking measured in pack/years, quantity of daily alcohol or illicit drug consumption) also may contribute to poor survival. Our retrospective database lacked these fine details. Additional prospective studies will be necessary to determine the reasons for lack of treatment and survival disparities in this patient population. Prospective studies also may identify reasons for the lack of early detection, nonreferral to surgeons by primary care physicians, lack of availability of advanced treatments like transplantation, associations of major comorbidities, or a combination of these factors.
For this pilot predictive model to be incorporated into clinical practice, validation using an external data set is paramount. This critical step will determine whether the findings are generalizable to other patient populations and possibly may identify more important covariates. In doing so, further development of an accurate prediction of those unlikely to survive beyond 1 month may prevent futile use of limited resources and provide an opportunity to improve symptom control, palliative care, and health-related quality-of-life measures for those patients in whom long-term survival is not possible.
We thank Dr. Robert E. Lasky at the University of Texas Health Science Center for Clinical and Evidence-based Research for his review of our biostatistical methods.