Assessment of tumor volume and density as a measure of the response of advanced hepatocellular carcinoma to sorafenib: Application of automated measurements on computed tomography scans

Background and Aim To better predict patient survival, we used automated tumor volume and density measurements to make an objective radiological assessment of the response of advanced hepatocellular carcinoma (HCC) to treatment with sorafenib. Methods Patients treated with sorafenib were identified retrospectively. Those who were diagnosed with Child‐Pugh class A liver function, Barcelona‐Clinic Liver Cancer stage C, and Eastern Cooperative Oncology Group performance status grade 0/1 were enrolled (n = 22). Reviews of contrast‐enhanced computed tomography images were supported by the automated measurement of lesions using computer software. Treatment responses were assessed using volume and density criteria. Kaplan–Meier methods and multivariate Cox regression analysis were used to evaluate treatment responses and identify the most significant prognostic factors for overall survival (OS). Results After patients were dichotomized according to volume and density criteria, the median OS for those with an objective response (OR) (complete response + partial response) was 20.4 months and that for those with a non‐OR (stable disease + progressive disease) was 9.3 months (P = 0.009). The best multivariate regression model for survival identified volume and density criteria (OR or non‐OR) as a significant variable, along with baseline alpha‐fetoprotein levels (log‐rank test, P = 0.01). No other conventional criteria were identified as significant. Conclusions Tumor volume and density assessment using automated lesion measurements may be an objective method of evaluating responses of advanced HCC to treatment with sorafenib.


Introduction
Primary liver cancer is the sixth most common cancer in the world and the second largest contributor to cancer-related mortality. 1 The worldwide incidence of the most common type of cancer, hepatocellular carcinoma (HCC), is growing, and it is estimated that, by 2020, the number of new cases in Europe, the United States, and Japan will reach 70 290, 35 574, and 42 104, respectively. 2 The oral multityrosine kinase inhibitor sorafenib [Nexavar; Bayer HealthCare Pharmaceuticals (Seattle, WA, USA)-Onyx Pharmaceuticals (South San Francisco, CA, USA)] was the only approved drug that demonstrated survival benefits for patients with advanced unresectable HCC for nearly a decade until the recent approval of regorafenib (Stivarga; Bayer HealthCare Pharmaceuticals; Seattle, WA, USA), used as a second-line treatment, and lenvatinib (Lenvima, Eisai Co., Ltd, Tokyo, Japan), used as a first-line treatment. [3][4][5] Although sorafenib provides patients with HCC with a survival advantage, no study has accomplished a timely and accurate evaluation of its treatment effects. The Response Evaluation Criteria in Solid Tumors (RECIST) guidelines (version 1.1) 6 may underestimate its efficacy because of its modest ability to shrink tumors. RECIST 1.1 uses unidimensional morphological criteria only. Antiangiogenic agents such as sorafenib induce heterogenic changes in tumor appearance, such as areas of necrosis and irregular changes in the shape of the lesion, by reducing vascularization. 7,8 Therefore, assessment using RECIST 1.1, which is based on tumor size, raises concerns about its appropriateness as a surrogate end-point for the survival of patients with HCC. 9 Thus, alternative response criteria, such as modified REC-IST (mRECIST), the European Association for the Study of the Liver (EASL), and the Choi criteria, take into account changes in lesion vascularity or viability, which is measured by contrastenhanced computed tomography (CT). 9 The mRECIST for HCC was adopted by the international guidelines on the management of HCC, 10 but the use of RECIST1.1 and mRECIST is only suggested for the assessment of response of HCC treated with systemic therapy such as sorafenib because there is no clear evidence of its accuracy. 11 This might be because these criteria are still dependent on manual radiological assessments based on the simple measurement of the longest diameter (LD) of the lesion. 9,12 Consequently, three-dimensional volumetry of tumor masses is proposed as a more reproducible and sensitive method. [13][14][15][16][17] Indeed, several computer software packages have been developed to assist the taking of objective measurements from CT scans. Previously, we used such software for patients with lung cancer or multiple myeloma 18,19 ; these studies demonstrated the efficacy and utility of this software for evaluating responses to cancer treatment in a standardized way. Thus, a large-scale evaluation of the data collected, and sharing these data in a multicenter clinical trial, was suggested.
Here, we further examined the applicability of a threedimensional automated radiological evaluation method based on computer software that can simultaneously measure both the volume and density (attenuation coefficient on CT scan) of target tumors on CT scans. We used this objective approach to evaluate treatment responses of advanced HCC to sorafenib therapy by investigating correlations between survival outcomes and measured lesion parameters.

Methods
Patients. All consecutive patients with advanced HCC and treated with sorafenib at Saga University Hospital and Saga-ken Medical Centre Koseikan between July 2008 and March 2012 were identified. Those who were diagnosed with Child-Pugh class A liver function, Barcelona-Clinic Liver Cancer (BCLC) stage C, and Eastern Cooperative Oncology Group (ECOG) performance status (PS) grade 0 or 1 were enrolled in this retrospective study. Patients underwent routine practice and so might have received other locoregional treatments before sorafenib treatment. The study protocol was approved by the Clinical Research Ethics Committee at each hospital and complied with the Declaration of Helsinki and its related guidelines.
Treatment. Enrolled patients received sorafenib for at least 30 days. Blood samples were collected at baseline, and pretreatment serum marker levels of alpha-fetoprotein (AFP), the Lens culinaris agglutinin-reactive fraction of AFP (AFP-L3%), and Des-gamma-carboxy prothrombin (DCP) were measured. Overall survival (OS) was measured from the beginning of sorafenib treatment to the date of death or last follow-up (right censored).

Radiological evaluation of treatment responses.
Contrast-enhanced spiral CT scans (slice thickness, 5 mm) were performed at baseline (before initiation of treatment) and at every 2-3 months afterward. Computer software was used to help evaluate the best clinical response according to RECIST 1.1 criteria and to allow the automated measurement of lesion volume and density as imaging parameters. In addition, mRECIST assessment was conducted independently from the above measurements as current standard imaging criteria. This study examined the combination of volume and density parameters indicative of better response criteria using automated measurement, but routine evaluations are still widely based on RECIST 1.1 because of its simplicity. Therefore, this study included both RECIST 1.1 and volumetric criteria as a reference to unidimensional measurement and its expansion to three-dimensional measurement, respectively. Hepatic lesions were measured on contrast-enhanced images during the arterial phase. Three physicians (Yasunori Kawaguchi, Taiga Otsuka, and Shunya Nakashita) reviewed all images jointly and made a consensus decision about whether a manual correction to a lesion contour was necessary. If a patient was evaluated as having a complete response (CR) or partial response (PR), or as having achieved stable disease (SD), then they were classified as "under disease control (DC)." If a patient showed a CR or PR, they were considered to have an objective response (OR). These evaluations were made at the time of the best clinical response during the treatment course. Tumor density was standardized relative to background liver density. Thus, the density measurements were rendered comparable to better reflect the vascularity of each lesion. For this exploratory study, cut-off values were determined by taking into account the patients' survival outcome (summarized in Table 1). For progressive disease (PD), they were set as a ≥ 50% increase in tumor volume, while for PR, they were set as a ≥ 50% decrease in tumor volume or a ≥ 15% decrease in tumor density and a AE <50% change in tumor volume. Lesion Management Solutions software (MEDIAN Technologies, Valbonne, France) was used for radiological evaluation. This software supports three-dimensional quantification and allows the comparison of successive CT scans from the same patient, with synchronous navigation between two scans and automated pairing of lesions. 18 The software performed automated delineation of the lesion contour and then calculated the volume and density of each lesion.
Statistical analyses. Continuous variables are expressed as median or mean values with their ranges, and categorical variables are expressed as numbers and percentages. Median OS time (in months) was estimated using the Kaplan-Meier method. Differences in survival curves between response groups were evaluated using the log-rank test. Univariate and multivariate Cox regression analyses were performed to identify prognostic factors of OS. Variables with P < 0.1 in univariate log-rank tests were included in multivariate analysis. Selection of the final model was based on Akaike's Information Criterion (AIC). Before conducting Cox regression analyses, the importance of each pretreatment and peritreatment variable was measured using the random forest approach to aid variable selection for entry into Cox regression. Important variables could be critical predictors of survival following sorafenib treatment. AFP and DCP were transformed to a logarithmic scale to reduce the skewness of their distributions. 20 All statistical analyses were conducted using R version 3.4.0 (The R Foundation for Statistical Computing Platform; Vienna, Austria). A two-sided significance level of P < 0.05 was used for all statistical analyses.

Results
Patient characteristics and treatment. Initially, 81 consecutive patients treated with sorafenib were identified. Of these, 22 met the inclusion criteria. Their demographic and clinical characteristics are summarized in Table 2. The median age was 76 years (range, 50-86 years). Most patients were male (91%). A majority had a Child-Pugh score of 5 (64%) and an ECOG PS grade of 0 (86%). The median duration of sorafenib treatment was 2.6 months (range, 1.1-19.5 months). Only one patient was still receiving sorafenib treatment at the end of the follow-up period. For the other patients, sorafenib treatment was terminated due to tumor progression (68%) or adverse events (27%). The median baseline serum levels of AFP, AFP-L3%, and DCP were 5215 ng/mL (range, 2.8-48 000 ng/mL), 36.4% (range, 0-91.3%), and 1318 mAU/mL (range, 12-9399 mAU/mL), respectively.
Survival analyses. The median OS was 12.6 months for all patients (95% confidence interval [CI], 9.0-21.2; Fig. 1a). A total of 44 follow-up time points were reviewed for radiological assessment of treatment responses. When patients were dichotomized into OR and non-OR groups according to volume and density criteria, the median OS was 20.4 months for the OR group and 9.3 months for the non-OR group (Fig. 1b, P = 0.009). When the patients were dichotomized into DC and PD groups according to volume and density criteria, the median OS was 20.4 months for the DC group and 9.3 months for the PD group (Fig. 1c, P = 0.02). When patients were dichotomized into OR and non-OR groups according to RECIST 1.1, the median OS for the non-OR group was 11.4 months (Fig. 1d, P = 0.051; the median OS was not reached by the OR group). When patients were dichotomized into DC and PD groups according to RECIST 1.1, the median OS was 17.2 months for the DC group and 9.3 months for the PD group (Fig. 1e, P = 0.07). When patients were dichotomized into OR and non-OR groups according to volumetric criteria, the median OS for the non-OR group was 11.4 months (Fig. 1f, P = 0.051; the median OS was not reached by the OR group). When the patients were dichotomized into DC and PD groups according to volumetric criteria, the median OS was 20.4 months for the DC group and 9.3 months for the PD group (Fig. 1g, P = 0.02). The   Kaplan-Meier analyses by mRECIST are available in Figure S1, Supporting information. Median OS was 10.4 months for the OR group and 12.6 months for the non-OR group according to mRECIST (P = 0.58). The median OS was 18.8 months for the DC group and 9.0 months for the PD group according to mRECIST (P = 0.09). Classification of tumor responses according to the four criteria is summarized in Table 3. DC rates were lower for patients dichotomized according to mRECIST than they were for those assessed using other criteria (40.9% for mRECIST; 63.6% for RECIST 1.1; 59.1% for both volumetric criteria and volume and density criteria, chi-squared test, P = 0.44). In contrast, the OR rate was higher for volume and density criteria than for other criteria (31.8% for volume and density criteria vs 9.1% for other criteria, chi-squared test, P = 0.08). All reclassifications based on volume and density criteria, as opposed to RECIST 1.1 criteria, were observed as changes from SD to PD or PR groups. Most of these (five of six patients) were classified as better responders according to volume and density criteria. Similarly, volume and density criteria reclassified many cases (10 of 12 patients) as better responders compared to mRECIST.
Importance of different variables and selection of the best variables for predicting survival. First, the importance of both pretreatment and peritreatment variables was measured using the random forest approach for survival outcome. . This random forest-based analysis identified the following variables as having relatively high importance (these were then entered into the Cox regression analysis): RECIST 1.1 OR, RECIST 1.1 DC, volumetric criteria OR, volumetric criteria DC, volume and density criteria OR, volume and density criteria DC, mRECIST DC, extrahepatic spread, AFP level, and smoking history (Table 4).

Discussion
RECIST 1.1-based radiological assessment is used widely for treatment response classification and as a surrogate end-point both in clinical trials and in routine practice. However, the development of molecular targeted therapies to lessen tumor vascularity has shown that the RECIST 1.1 assessment has limitations. 7,9,17,21 Some modifications to RECIST 1.1 for HCC treatment have been proposed, including mRECIST, EASL, and the Choi criteria, which are considered to better reflect treatment effects. 9 However, these modifications are still based on a unidimensional manual assessment and are strongly affected by heterogeneities in tumor appearance, that is, CT enhancement  pattern, induced by sorafenib and other locoregional therapies such as radiofrequency ablation (RFA) and transcatheter arterial chemoembolization (TACE). In daily clinical settings, we often encounter a variety of changes in tumor shape and vascularity. Therefore, a reproducible and objective method, such as the use of automated computer-assisted volume and density measurement that captures therapeutic changes in the whole lesion, is desirable. For example, two of the cases in this study showed much longer survival times (16.2 and 20.4 months) than expected according to RECIST 1.1 criteria (Fig. S2). When we applied the new volume and density criteria to these patients, their evaluation changed to PR rather than the PD or SD evaluation obtained using RECIST 1.1. This evaluation was more acceptable as a radiological assessment of treatment response because it correlated well with OS and was reproducible. Thus, to overcome the many limitations of conventional criteria in routine practice, we examined new radiological assessment methods based on the automated measurement of volume and density changes in hepatic lesions on CT scans. The aim was to achieve a simpler and more accurate classification of prognostic responses than those obtained using unidimensional or three-dimensional volumetric measurements alone. The main point is that this new method considers both morphological (volume) and functional (density) aspects of the lesions simultaneously and automatically. As a consequence, it enables more appropriate discrimination of good responders from SD patients. In comparison, mRECIST identified equal or fewer responders compared to other response evaluation criteria. The DC rate was 40.9% (vs volume and density criteria, P = 0.23), and the OR rate was 9.1% (vs. volume and density criteria, P = 0.07). The routine use of mRECIST has  limitations because irregular morphological changes of tumor enhancement could not control its accuracy of objective unidimensional measurement of a viable part by assessors. In contrast, new volume and density criteria were further confirmed as good classifiers using several statistical models, including the random forest approach and Cox regression analysis. Even though the results are subject to the limitations discussed below, the automated volume and density criteria approach appears to be a superior objective method of radiological assessment of the effects of sorafenib treatment. The new method may also offer better prediction of OS because it can globally reflect both the shape and vascularity (the major parameters affected by sorafenib) of the lesion during/after treatment with sorafenib. In addition to the radiological/imaging factors, the random forest approach and Cox regression analysis identified the prognostic potential of baseline AFP levels, although the results of Kaplan-Meier analysis did not show a statistically significant classification of the patients according to AFP level; this is due to the small sample size (log-rank test between ≥400 ng/mL [n = 10] and <400 ng/mL [n = 12], P = 0.14). AFP is an established tumor marker for HCC and may be associated with the prognosis of HCC patients. Several studies have proposed AFP as a marker that can be used to assess HCC responses to targeted chemotherapy because of its ability to discriminate patients with longer OS. [22][23][24] Consequently, further studies should investigate the ability of combined radiological assessment plus AFP levels to predict OS.
An automated and objective assessment of treatment response, such as that demonstrated here, could help to establish a cloud computer system for data collection or a clinical data repository for HCC therapy. This would facilitate "big data" applications and allow global multisite clinical trials to be conducted efficiently.
This study has several limitations. First, the study was retrospective in design, with a small number of patients from a limited area of Japan. Real-world cases often present difficulties with respect to precise radiological assessment due to major morphological modifications caused by other prior locoregional treatments. This may lead to structural uncertainty when determining cut-off values for the classification of treatment responses. Based on our results from a limited patient cohort, further studies should be conducted to establish robust cut-off values for volume and density changes. Second, the impact of reproducibility on assessments made by automated software measurements was not examined in detail. For the objective measurement of lesion volume and density, the assistance of computer software is both critical and inevitable. Therefore, a software validation study is also required for this scheme. Third, the acquisition protocol for CT scans may affect the radiological assessment through changes in image quality, particularly with respect to density measurements on the arterial phase acquisition. The CT scan images in this study were obtained from two neighboring hospitals, but the acquisition protocol may vary across other institutions; thus, operationally acceptable guidelines for imaging quality control are necessary for the global expansion of our method.
This exploratory study suggests that the automated measurement of tumor volume and density on CT scans using computer software could be a better method of assessing responses of patients with advanced HCC to treatment with sorafenib. This radiological assessment method was good at reflecting survival outcomes. Concerns raised about the routine use of RECIST 1.1 to assess tumor responses to sorafenib therapy could be addressed using this software-based standardized approach to three-dimensional radiological assessment.