Clinical variables and magnetic resonance imaging‐based radiomics predict human papillomavirus status of oropharyngeal cancer

Abstract Background Human papillomavirus (HPV)‐positive oropharyngeal squamous cell carcinoma (OPSCC) have better prognosis and treatment response compared to HPV‐negative OPSCC. This study aims to noninvasively predict HPV status of OPSCC using clinical and/or radiological variables. Methods Seventy‐seven magnetic resonance radiomic features were extracted from T1‐weighted postcontrast images of the primary tumor of 153 patients. Logistic regression models were created to predict HPV status, determined with immunohistochemistry, based on clinical variables, radiomic features, and its combination. Model performance was evaluated using area under the curve (AUC). Results Model performance showed AUCs of 0.794, 0.764, and 0.871 for the clinical, radiomic, and combined models, respectively. Smoking, higher T‐classification (T3 and T4), larger, less round, and heterogeneous tumors were associated with HPV‐negative tumors. Conclusion Models based on clinical variables and/or radiomic tumor features can predict HPV status in OPSCC patients with good performance and can be considered when HPV testing is not available.


| INTRODUCTION
Human papillomavirus (HPV) infection is an important factor in the development and disease course of oropharyngeal squamous cell carcinoma (OPSCC). 1,2 HPV-related OPSCC has a better progression-free survival and overall survival after (chemo)radiation treatment than HPVnegative OPSCC. [3][4][5] Despite these differences in prognosis and treatment response, HPV-positive and HPV-negative OPSCC are currently not treated differently. Only recently, it was shown that cetuximab cannot replace cisplatin in HPV-positive OPSCC. 6 Ongoing de-escalation trials will further elucidate whether HPV-positive tumors can be treated with less aggressive treatment regimens in the future to reduce treatment-related toxicity (trial number NCT03952585). This is especially relevant as HPV-positive OPSCC patients tend to be younger with an associated higher life expectancy than HPV-negative OPSCC patients. 5,7,8 Adding to the importance of HPV status of OPSCC is the increasing relative incidence of HPV-positive OPSCC compared to HPV-negative OPSCC over the past years despite declining overall age adjusted incidence of head and neck cancer in developed countries. These changes are probably due to a decline in alcohol and especially nicotine abuse combined with an increase in sexual promiscuity with a high risk of HPV transmission. 9 For these reasons, HPV tumor status is increasingly important and has therefore been included in the most recent eighth edition of the TMN classification. 10 HPV infection is detected using p16/p53 immunohistochemistry and/or HPV DNA polymerase chain reaction (PCR) on biopsy material. 11,12 Determination of tumor HPV status from just clinical and/or tumor features extracted from imaging would be ideal, and could possibly reduce the need for time consuming and expensive immunochemistry and PCR techniques. Recent literature showed that tumor biology can be assessed noninvasively in other tumor types using advanced imaging analysis or radiomics. 13,14 The same approach may be used to determine predictive features for the HPV status in OPSCC. Multiple studies reported that the CT-based radiomic features, such as shape and homogeneity, are associated with HPV positivity in OPSCC tumors. [15][16][17] To our knowledge, MRI-based radiomics to predict HPV status has not been performed previously. Clinical variables associated with HPV-positive tumors are well known and include male gender, younger age, and less exposure to tobacco and alcohol. 9 These variables have been used to predict HPV status of head and neck cancer, including OPSCC. [18][19][20][21] This study aims to assess and compare the ability of clinical variables, MR-based radiomic features, or a combination of these variables to predict HPV status of OPSCC.

| MATERIALS AND METHODS
This study is approved by the local institutional review board (IRBd18047). Due to the retrospective design, informed consent was waived.

| Clinical data
A total of 240 consecutive patients with histologically proven primary OPSCC, treated with CRT (70 Gy radiation with three planned cycles cisplatin-based chemotherapy [100 g/m 2 ]) at our Institute between January 2010 and December 2015, were considered for this study. Patients were excluded when pretreatment MRI of the primary tumor was not available (n = 38), image quality was poor (n = 7), tumors were undetectable on MRI (n = 17), a second head and neck tumor was present (n = 1), or when HPV status of the tumor was missing (n = 24). This resulted in a total of 153 patients eligible for this study.
Age, gender, smoking status, tumor subsite, and TNM classification (TNM seventh edition), were collected for each patient. T-classification and N-classification was determined in multidisciplinary consensus based on clinical and radiological information, including MRI, ultrasound staging with fine needle aspiration cytology, and, when available, PET images. Smoking status was classified into the categories nonsmoker, current smoker, and former smoker (quit more than 2 years prior to diagnosis) at the initial visit to the outpatient clinic. T-classification was dichotomized in low (T1 + T2) or high T-classification (T3 + T4). N-classification was dichotomized in node-positive (N > 0) or node-negative disease (N = 0). Differences in clinical variables between HPV-positive and HPV-negative tumors were assessed by applying the Fisher exact test and independent t-test for age. P values of <.05 were considered statistically significant.

| Determination of HPV tumor status
A combination of p16 and p53 immunohistochemistry on tumor biopsy material was performed to determine HPV positivity or negativity of the tumor for each patient. p53 positivity was concluded when at least 80% of the tissue sample showed strong nuclear staining or completely negative. No p53 staining of tumor tissue with positive staining of surrounding normal tissue was regarded as tumor mutation for which p53 positivity was concluded. p16 positivity was concluded when at least 70% of tumor tissue stained positive for p16. A known HPV-positive tonsil sample, surrounding tissue of the tested biopsy sample and appendix, was used as positive internal and external control. HPV positivity was concluded when tumor biopsy material tested positive for p16 and negative for p53 staining. HPV negativity was concluded when tumor biopsy material tested negative for p16, regardless of p53; see Henneman et al 22 for further details on the HPV testing scheme.

| MRI data
All patients underwent an MRI examination of the primary tumor for pretreatment staging purposes as part of the routine clinical workup. Imaging was performed at 1.5 T or 3.0 T (Achieva, Philips Medical System, Best, The Netherlands) using a standard head and neck coil (SENSE-NV-16). The imaging protocol included T1-weighted (T1W), T2-weighted (T2W), postcontrast 3D T1W, perfusion, and diffusion-weighted sequences. Imaging details are summarized in Table 1 and Supplementary Table S1.1. The axial slices of 3D T1W high-resolution isotropic volume excitation (THRIVE) after gadolinium injection (postcontrast 3DT1W) were used to manually delineate primary tumor volumes. One nonexpert observer (PB, 1 year experience in head and neck diagnosis) manually delineated the tumor volumes (ie, nonexpert delineations), which were verified and corrected by an experienced head and neck radiologist (BJ, 7 years of experience in head and neck diagnosis) (ie, expert-corrected delineations). The observers were allowed to review other available pretreatment MR imaging sequences and available PET scans as reference to improve delineations.

| Radiomic feature extraction
Signal intensities for each individual MRI scan were normalized (with zero mean and unit SD) prior to further analysis to reduce intensity variations between MRI scans obtained from different patients. Image resampling to isotropic voxels of 1.0 mm was performed using B-spline interpolation. Image discretization was applied to allow quantification of texture images in fixed bin width of five. In total, 1184 radiomic features per patient were calculated from the postcontrast 3DT1W MRI within the primary tumor volumes using the open-source package PyRadiomics 2.2.0, 23 which were categorized into the five groups: shape, intensity, texture, wavelet transform, and Laplacian of Gaussian filter. Wavelet features were calculated in seven decompositions and texture coarseness is determined by four levels modifying the Gaussian radius parameter from 0.5 to 2.0 mm, in steps of 0.5 mm. Detailed definitions of the radiomic features can be found elsewhere. 28 After quality control, features with zero variance were excluded. Stable features were selected using the interclass correlation coefficient with regard to the nonexpert and expert-corrected tumor delineations and the Mann-Whitney U test in features with regard to the different MRI field strengths. Features with an interclass correlation coefficient greater than 0.75 and a significance level equal to or above .05 in the Mann-Whitney U test were considered stable. From the selected stable features, collinear features (Pearson correlation coefficient > 0.9) were removed, where for each pair the feature that has the largest mean absolute correlation is deleted. The remaining 77 features (see consort diagram in Supplementary Figure S1.1) eligible for radiomic analysis were normalized with zero mean and unit variance for analysis.

| Machine learning analysis
From the total of 153 patients, 60% (n = 91) were randomly allocated to a training/validation subset and 40% (n = 62) to a test subset, stratifying for HPV status and MRI magnet strength (1.5 or 3.0 T).
Then, separate logistic regression models 24 were build based on solely clinical variables (ie, age, gender, smoking status, T-classification, N-classification, and subsite of cancer) (clinical model), only radiomic features (radiomic model) and a model where both clinical and radiomic features were combined (combined model). As data from other cancer registries may be missing smoking status and/or TN-classification, we constructed a combined model without smoking status and/or TN-classification (see Supplementary Material II).
Feature dimensionality is reduced by applying a sequential backward wrapper feature selection approach (recursive feature elimination). This method obtains the optimal feature set for the given classifier (in this case logistic regression) by iteratively removing the weakest feature assessed by its feature importance score. The optimal set of features is used to train the model. 25,26 In the training phase, Bayesian optimization was used to obtain optimal hyperparameters employing 1000 iterations of 4-fold cross-validation on a 75% (n = 68) training and 25% (n = 23) validation set. During this process, the regularization parameter (λ, 0.005-200), a parameter for the complexity of the model, and the number of features (k, 1-77 [radiomic model] or 1-86 [combined model]) were tuned based on the four training performances obtained during cross-validation. Area under the curve (AUC) was calculated as measure of model performance, where the loss function is minimized. The loss function was defined as 1 − mean(AUC) + SD(AUC), where mean(AUC) aims to maximize model performance and SD(AUC) aims to minimize model generalization. [27][28][29] The optimized hyperparameters obtained in the training phase were then used to verify the predictive model in the test phase, applying bootstrapping on the test subset. Bootstrapping calculated model performance (AUC) of 500 randomly selected samples (with replacement) of the test subset. Median AUC and the 95% confidence interval (95% CI) of these 500 iterations were then calculated to reflect the model performance that can be attained of HPV prediction. All analyses were implemented in python 3.5 and SPSS version 25.0 (SPSS Inc. Chicago). The complete machine learning pipeline is shown in Figure 1.
A clinically applicable nomogram was constructed from the clinical logistic regression model using F I G U R E 1 Analysis pipeline. Three models were created to predict human papillomavirus (HPV) status of oropharyngeal squamous cell carcinoma (OPSCC). A clinical model (based on the clinical variables, age, gender, smoking status, T-classification, N-classification, and tumor subsite), a radiomic model based on radiomic features, and a combined model based on both clinical variables and radiomic features. Morphological, texture, intensity, and filter-based radiomic features were computed from within the tumor delineations on the postcontrast 3DT1 MRI images. Feature reduction was performed using the wrapper feature selection approach by recursive feature elimination, resulting in an optimal subset of features as input for the logistic regression models. The three separate models were created using logistic regression analysis on the training subset. Resulting models were tested using bootstrapping with 500 iterations. Model performance on the test set was evaluated using median area under the curve, sensitivity, specificity, accuracy, and its 95% confidence intervals [Color figure can be viewed at wileyonlinelibrary.com] R software package RMS (version 3.6.3). 30 Points were assigned to each prognostic variable from the clinical model based on the distribution of the regression coefficients, maximizing sensitivity and specificity for discrimination between HPV-positive and HPV-negative tumors. The probability of HPV positivity can be deducted from the sum of these points. Table 2 summarizes patient characteristics for the total patient cohort and subgroups stratified by HPV status. The clinical characteristics of the whole patient group have an equal distribution of HPV (n = 77 HPV negative and n = 76 HPV-positive tumors) and T status (51% patients have T1 + T2 tumors, 49% T3 + T4 tumors). Tumors were mostly located in the tonsils. Patients were categorized as either smoking or nonsmoking, no patients were categorized as former smokers.

| RESULTS
OPSCC patients with HPV-positive tumors were younger (median age: 63 vs 59 year, P = .007), less likely to smoke (P < .001), and had a lower T-classification (T1-T2 vs T3-T4; P < .001) compared to patients with HPV-negative tumors. For node-positive disease (P = .051) and male gender (P = .067), these differences were borderline significant at the 5% level. Tumors of the soft palate (P = .017) were significantly more frequent in HPV-negative tumors.

| Performance of logistic regression models
Performance of the three logistic regression models is summarized in Table 3. All models showed good   Figure 3, where a cutoff value of 134 points has the maximum sensitivity (76%) and specificity (73%). A sum of points below 134 is indicative of HPV negativity. Out of the 77 initial radiomic features, three prognostic features were selected in the radiomic model after model construction. Fourteen radiomic features were selected in the combined model, along with six clinical variables that were included in the clinical model. Radiomic features indicated smaller, rounder, more homogeneous, and more regular texture in HPV-positive tumors. Figure 4 illustrates textural differences between a patient with HPV-negative and HPV-positive tumor. The interpretation of all selected radiomic features is summarized in Supplementary Table S1.2.

| DISCUSSION
This retrospective study shows that logistic prediction models based on clinical and/or MR-based radiomic features are able to predict HPV status in OPSCC with good performance. The model combining radiomic features and clinical variables performed better than separate models based on clinical and radiological features.
The variables included in the clinical model were variables that can be expected to differentiate HPV-negative and HPV-positive tumors (ie, smoking status, age, gender, T-classification, N-classification, and tumor location). This underscores that the clinical model, besides the good overall performance, is biologically plausible.
The discriminatory MRI features in the radiomicbased models probably reflect differences in tumor biology between HPV-positive and HPV-negative tumors. HPV-positive tumors are characterized by less-invasive exophytic growth, nonkeratinizing histopathology, genetic stability, and well-defined surroundings. 31 These histopathological differences are likely to be reflected in the selected radiomic features indicating rounder tumors, lower maximum intensity values, and texture homogeneity. Conversely, HPV-negative tumors are genetically more unstable, 32 which can lead to focal hypoxia or varying grades of dedifferentiation within a tumor, likely to T A B L E 4 Selected features in the radiomic and combined models with regression coefficients ranked from high to low, SEs, and OR Features in the combined model that are also included in the clinical or radiomic model. be reflected in the selected MR features of heterogeneity in the radiomic models. Although no direct comparison was made, our MR-based predictive radiomic model suggests similar performance (AUC = 0.76) compared to CT. 16,17 This suggests that postcontrast 3DT1W MRI and CT reveal, at least partly, similar textural properties relevant for the discrimination of HPV-positive and HPV-negative tumors in radiomic analysis. Intuitively, features from MRI and CT should at least be able to characterize tumor size and morphology in a similar way, explaining similar performance. Whether structural MRI or CT is better for determination of HPV status of OPSCC by radiomic analysis is not entirely clear at this point. In our opinion, MRI is preferable over CT for staging and radiomic analysis for OPSCC due to the better soft tissue contrast of MRI in this anatomically challenging area. But in the end, the choice for CT or MRI will largely depend on the preference and experience of the radiologists within the center. The radiomic model presented in this article seems to have better predictive performance compared to fluorodeoxyglucose-positron emission tomography (FDG-PET) (AUC: 0.64). 33 This can be expected as FDG-PET images are less able to provide textural detail of tumor tissue.
The models in this article are less sensitive (88%) and specific (71%) compared to pathological methods (p16 immunohistochemistry: sensitivity 56-100%, specificity 79-93%; DNA PCR: sensitivity: 100% specificity 89% or the combination of latter techniques: Sensitivity and F I G U R E 3 Nomogram for the clinical model to predict human papillomavirus (HPV) positivity. A, Points are given to each clinical by drawing a line between the clinical variable with the "Points" line (top row) ranging from 0 to 100. The sum of all points for the individual clinical variables result in a total score (total points). A total score of ≥134 points is indicative of HPV positivity of the tumor. B, worked example. A nonsmoking female with a T1 tumor of the tonsil region, including node-positive disease had a total score of 284 points, corresponding to HPV positivity of the tumor [Color figure can be viewed at wileyonlinelibrary.com] specificity 100% 34 ) to determine HPV status of the tumor. However, these pathological methods are expensive and time consuming and are not always available (for instance, in retrospective studies when no biopsy is performed or biopsy/tissue samples are not available), making predictive models based on clinical and/or radiomic features a useful alternative.
This study is, to our knowledge, the largest radiomic study on MRI in head and neck squamous cell carcinomas. 35 However, our sample size is still quite limited compared to previous studies evaluating CT-based radiomics. [15][16][17] Clearly, larger populations, preferably in a multicenter setting, are needed to confirm our findings and create radiomic models that are more generalizable across scanners and populations.
The present study included patients from a single center, without an external cohort to validate our results, which is obviously a recommendation for further work. Another, minor, limitation might be the accuracy of the self-reporting variables, especially smoking status. This is partly overcome by categorizing smoking status into three robust categories (current-, former-, and nonsmoker), where former smokers stopped for at least 2 years prior to diagnosis. Only postcontrast 3DT1W MRIs were used in this study to limit the number of features with our available cases. Other MR sequences might give additional radiomic features for prediction of HPV status and is a topic for further study. In a preliminary study, we included all available MRI sequences, revealing mainly radiomic features from the postcontrast 3DT1W sequence, suggesting that other sequences would not contribute to the eventual predictive models. Finally, time-consuming manual tumor delineations were used for feature extraction, which introduces interobserver variability. Stable features with regard to delineations were selected to minimize the effect of interobserver variability in the eventual models. Ideally, this interobserver variability should be eliminated. Automated tumor delineation algorithms by, for instance, convolutional neural networks may overcome interobserver delineation variability. 36 In addition, automated tumor delineation would greatly reduce the workload of manual tumor delineation, making clinical implementation of radiomic analysis more feasible. Another approach would be to use deep-learning models or other unsupervised machine learning techniques to predict HPV status of head and neck tumors. However, adequate training of these models is challenging due to the relatively small tumors in a large and challenging anatomical area. Radiomic analysis therefore seems to be the most straight forward approach at this point in time.

| CONCLUSION
This study shows that logistic regression models based on clinical variables, MR-based radiomic features, or a combination of clinical and radiomic features can accurately predict HPV status in OPSCC patients. Although a model based on clinical and radiomic features performs best, the clinical model would be the method of choice due to its ease of implementation. These models have a place in determination of HPV tumor status in settings where tumor biopsy material, tumor samples, immunohistochemistry, and/or DNA polymerase chain reaction techniques are not available. HPV testing is becoming more a routine in hospitals, but not everywhere, especially not in the past when the importance of HPV status of the tumor was not known. Medical images, on the other hand, are widely available due to the advantage of storage capability of medical images for a long time, making it a good alternative to assess HPV tumor status.

SUPPORTING INFORMATION
Additional supporting information may be found online in the Supporting Information section at the end of this article.
How to cite this article: Bos P, van