Deep learning‐based pathology image analysis predicts cancer progression risk in patients with oral leukoplakia

Abstract Background Oral leukoplakia (OL) is associated with an increased risk for oral cancer (OC) development. Prediction of OL cancer progression may contribute to decreased OC morbidity and mortality by favoring early intervention. Current OL progression risk assessment approaches face large interobserver variability and is weakly prognostic. We hypothesized that convolutional neural networks (CNN)‐based histology image analyses could accelerate the discovery of better OC progression risk models. Methods Our CNN‐based oral mucosa risk stratification model (OMRS) was trained to classify a set of nondysplastic oral mucosa (OM) and a set of OC H&E slides. As a result, the OMRS model could identify abnormal morphological features of the oral epithelium. By applying this model to OL slides, we hypothesized that the extent of OC‐like features identified in the OL epithelium would correlate with its progression risk. The OMRS model scored and categorized the OL cohort (n = 62) into high‐ and low‐risk groups. Results OL patients classified as high‐risk (n = 31) were 3.98 (95% CI 1.36–11.7) times more likely to develop OC than low‐risk ones (n = 31). Time‐to‐progression significantly differed between high‐ and low‐risk groups (p = 0.003). The 5‐year OC development probability was 21.3% for low‐risk and 52.5% for high‐risk patients. The predictive power of the OMRS model was sustained even after adjustment for age, OL site, and OL dysplasia grading (HR = 4.52, 1.5–13.7). Conclusion The ORMS model successfully identified OL patients with a high risk of OC development and can potentially benefit OC early diagnosis and prevention policies.


| INTRODUCTION
The concept of premalignancy was introduced more than two centuries ago by a European panel of physicians suggesting that some histologic changes may take place before the onset of cancer. 1 In the oral cavity, precursor lesions are named oral potentially malignant disorders (OPMD), characterized by visual changes to the oral mucosa, usually white or red patches, which are associated with increased risk for oral cancer (OC) development. Oral leukoplakia (OL) is the most common type of OPMD, for which the malignant transformation rate ranges from 1% to 3% per year. 2,3 An estimated half million new cases of OC are diagnosed globally each year, with more than 300,000 deaths. 4 Despite oncological treatment advances, the prognosis for OC patients remains poor, especially for patients diagnosed at advanced disease stages. Early OC detection is critical for therapeutic success and satisfactory quality of life. 5 Yet, most OC patients are diagnosed at an advanced stage, at least partially due to our limited ability to determine which and when OPMD are at higher risk for malignant progression. 6,7 Currently, OL malignant transformation risk is estimated by histopathological evaluation of epithelial dysplasia, which involves the detection of architectural alterations and cytological atypia and their extension across the oral epithelial tissue. 8,9 The World Health Organization proposes a three-tier OL grading scheme (mild, moderate, and severe) for dysplastic lesions, but due to low accuracy and low reproducibility, simplified binary systems have also been proposed. [10][11][12] Importantly, the evaluation of epithelial dysplasia is inevitably subjective, resulting in great inter-and intraexaminer variability in the interpretation of the presence, degree, and significance of the criteria. 8,13 Most OL diagnosed with dysplasia never progress to OC in the life of the patient and the time-to-progression (TTP) in those that do is not predictive of subsequent invasive disease. Furthermore, various reactive and regenerative changes in the oral epithelium, secondary to trauma and chronic inflammatory ulcerations, closely mimic mild to moderate dysplasia. Hence, a more objective and efficient grading system is needed such that risk stratification of OL patients is suitable for guiding disease management decisions.
Since the introduction of whole slide scanners in 1990, technology has evolved, allowing the creation of digital histology images that retain high levels of detail. 14 In parallel, the development of deep learning-a newer branch of artificial intelligence (AI)-technologies has provided opportunities to design automated learning algorithms to examine these high-definition histologic images and potentially benefit the practice of surgical pathology. [15][16][17][18][19] Some deep learning algorithms have achieved performance comparable to pathologists in tasks such as the detection and segmentation of tumor regions 20,21 and the identification of metastatic foci in lymph nodes. 22 slides. As a result, the OMRS model could identify abnormal morphological features of the oral epithelium. By applying this model to OL slides, we hypothesized that the extent of OC-like features identified in the OL epithelium would correlate with its progression risk. The OMRS model scored and categorized the OL cohort (n = 62) into high-and low-risk groups.
Results: OL patients classified as high-risk (n = 31) were 3.98 (95% CI 1. 36-11.7) times more likely to develop OC than low-risk ones (n = 31). Time-to-progression significantly differed between high-and low-risk groups (p = 0.003). The 5-year OC development probability was 21.3% for low-risk and 52.5% for high-risk patients. The predictive power of the OMRS model was sustained even after adjustment for age, OL site, and OL dysplasia grading (HR = 4.52, 1.5-13.7).

Conclusion:
The ORMS model successfully identified OL patients with a high risk of OC development and can potentially benefit OC early diagnosis and prevention policies.

K E Y W O R D S
carcinogenesis, convolutional neural network, disease progression, oral leukoplakia, patient prognosis, precancer, whole slide imaging image segmentation and classifications. 16,25 Through its deep network structure, CNNs can learn from the data sets to extract highly predictive image features and make accurate predictions independent of clinical intervention. 26,27 In this way, we believe that CNN can be an effective tool for the identification of morphological features associated with malignant progression risk in OL. We hypothesized that OL with a higher risk of cancer progression might exhibit morphological features that resemble OC tissue and that a CNN-based model would be able to identify such features by analyzing and comparing OC and nondysplastic oral mucosa images. To test this hypothesis, we developed OMRS (oral mucosa risk stratification), a CNN-based deep learning model that uses images of Hematoxylin and Eosin (H&E) stained tissue as input. This model was initially trained with images of nondysplastic oral mucosa and oral squamous cell carcinoma to create a cancer progression risk score based on the identification of morphological differences between epithelial cells in these two types of tissues. The model was then applied to OL histopathological slides, assuming that OL epithelium with morphological similarity to OC has a higher risk of cancer progression. We demonstrated that OMRS can predict time-to-progression more effectively than the three-tier World Health Organization (WHO) OL classification system, and it performs on par with newer binary systems in terms of disease progression risk assessment. We expect that prospective improvements to OMRS, by the addition of new layers of morphological complexity into the model, could improve the accuracy of OL malignant progression risk assessment and potentially be used to tailor surveillance intervals and treatment decisions for leukoplakia patients.

| Training datasets
The OMRS model was developed on hematoxylin and eosin (H&E)-stained sections of 38 oral mucosal biopsies. Among these tissues, 13 tissue sections consist of nondysplastic oral epithelium, which was retrieved from the archives of the Oral Pathology Service of the School of Dentistry at Universidade Federal de Minas Gerais, Brazil (Table S1). This cohort included biopsies of the clinically normal oral epithelium of oral mucosal diseases with pathologic changes limited to the lamina propria while preserving the normal architecture and cytology of the oral epithelium. The remaining 25 samples were H&E-stained images of oral squamous cell carcinomas (OSCC) randomly selected from The Cancer Genome Atlas (TCGA) Head and Neck Squamous Cell Carcinoma (HNSCC) tissue imaging data set 28 (https://wiki.cance rimag ingar chive.net/displ ay/Publi c/TCGA-HNSC, accessed early 2019. Table S2), with inclusion criteria of oral cavity location and human papillomavirus (HPV) negative. All H&E-stained slides for model development were scanned or available at 40× magnification.
Each slide/image was evaluated by an expert oral pathologist (F.O.G-N) in order to annotate areas of technical artifacts that should be excluded and to annotate the regions of the epithelium (tumor or nondysplastic), connective tissue, and background.

| OMRS model development
To develop the model using nondysplastic oral epithelium and OSCC tissue slides, image patches measuring 300 × 300 pixels (40×) of four classes (nondysplastic epithelium, cancerous epithelium, connective tissue, and background) were extracted from the annotated regions and randomly divided into training, validation, and testing sets. Image patches from the same slide were always assigned to the same set. The model development dataset size is shown in Table 1.
The approach used for OMRS model development is summarized in Figure 1A. The OMRS model adapted a modified Inception (V3) architecture, 29 a type of CNNbased deep learning model. The model was fine-tuned using our training data set including nondysplastic oral epithelium and OSCC tissue, as described above. The model took as input an image patch and output a patchlevel probability of the four classes. Detailed methodology is described in Appendix S1.
To test the model on the image patch level, we used the test dataset with size as described in Table 1. Probabilities of being in each of the four classes were predicted by the OMRS model and recorded. Slide-level prediction heatmaps were also generated for each nondysplastic and OSCC tissue pathology slide, where each pixel represented the class of the image patch with the highest probability at that location.

| Application of the OMRS model on leukoplakia slides
To evaluate the prognostic performance of the OMRS model on OL patients, we retrieved H&E slides of OL cases biopsied and followed up at the Department of Head and Neck Surgery at The University of Texas MD Anderson Cancer Center. This cohort included 62 patients with clinical OL diagnoses without evidence of concurrent cancer. H&E-stained slides were reviewed by pathologists and scanned at 40×. In order to assess the prognostic performance of the OMRS model in identifying patients with a high risk of progression to OC, we retrieved demographical and clinical data, including time-to-progression (TTP) to oral cancer, from patient clinical charts. OL histopathological grading was performed by an oral pathologist (N.V.) according to the WHO Classification of Tumors. 30 The OL H&E-stained tissue images were scanned at 40× magnification. A 300 × 300 pixel window was slid over the entire scanned slide to extract image patches for the OMRS model without overlapping between any adjacent windows. For each image patch, probabilities of being in different classes (nondysplastic and cancerous epithelium, deemed as one class-epithelium, also a second class was defined and included stroma and white background) were predicted and recorded. Based on the prediction, a heatmap was generated for each pathology slide, where each pixel represented the class of the image patch with the highest probability at that location ( Figure 1B). From the heatmap, areas predicted as epithelium were used in further analysis. OL epithelium image patches were then evaluated to predict the probability of an OL epithelium patch being either "nondysplastic-like" or "tumor-like" by the OMRS-trained model, which quantified the resemblance of the OL epithelium to tumor or nondysplastic oral mucosa epithelium. We used this predicted probability as the tumor progression risk score for each tissue patch.

| Statistical analysis
To determine the progression risk for each slide, we used the median model-predicted probability value of all the epithelium patches on the slide as the slide-level risk score. Patients were categorized into two equal-sized groups-low-and high-risk groups-according to their individual slide-level risk score in relation to the median risk score of the cohort.
Kaplan-Meier plots were used to summarize the progression-free survival curves of the patients in predicted low-or high-risk groups. Univariate Cox regression models were used to evaluate the association between clinicopathological variables and TTP. A multivariate Cox proportional hazard (CoxPH) model was used to evaluate the association between the predicted risk score and patient TTP-adjusted relevant variables. The results were considered significant if the resulting two-tailed p value was less than 0.05.

| Ethics
Written informed consent was obtained from patients included in the study. This study was approved by the Institutional Review Board (IRB).

| CNN model distinguishes tissue type
After the training process ( Figure 2E-G), the OMRS model showed an overall prediction accuracy in the testing set of 95.4%; the accuracy was 94.7% for tumor epithelium patches and 98.0% for nondysplastic epithelium patches. Receiver-operating characteristic (ROC) curve showed the area under the curve (AUC) was 0.992 for the tumor and 0.999 for nondysplastic epithelium. Example results of slide-level region detection results are shown in Figure 2A-D. These results showed that the model was successfully trained with the ability to distinguish nondysplastic epithelium from tumor epithelium very accurately.

| Risk score predicts cancer-free survival
The clinical characteristics of OL patients included in the study are described in Table 2. Patients were followed up for a mean time of 5 ± 4.14 years (minimum of 0.12 and maximum of 14.0 years) after performing the diagnostic biopsy used in the study. Twenty-four cases of OL (41.9%) progressed to OC, within 4.4 ± 3.5 years of the initial biopsy (minimum of 0.12 and maximum of 13.0 years). Associations between OC progression and clinicopathological variables are described in Table 2. Malignant progression was significantly associated with younger age (progressors age 51.5 ± 14.0 vs. nonprogressors age 59.1 ± 9.36, Wilcoxon p value = 0.038), and with OL located in the tongue (23 out of 43 developed OC, 53.5% progression) compared with the other combined sites (3 out of 19 developed OC, 15.8% progression, Fisher's exact test p = 0.006). OL dysplasia grading was not associated with OC progression.   The OMRS risk classification was significantly associated with OC progression. Among OL patients categorized as high risk, 58.1% (18 out of 31) developed OC, whereas only 25.8% (8 out of 31) of the low-risk patients developed OC (Fisher's Exact test, p = 0.019). OL patients classified as high-risk were 3.98 (CI 95% 1.36-11.7) times more likely to develop OC than low-risk ones.
TTP was significantly different between high-and lowrisk groups (p = 0.003). Low-risk OL patients had a significantly longer TTP compared with those of the high-risk group (Figure 3). The 5 and 10 years OC progression probabilities for high-risk patients were 52.5% (36.1-68.3) and 71.8% (52.4-85.4), respectively, whereas, for the low-risk group, the probabilities were 21.2% (10.6-38.4) and 34.9% (18.4-55.9). Interestingly, TTP was not different among samples grouped according to their dysplasia grade (no and mild dysplasia versus moderate and severe dysplasia) ( Figure 3B).
Univariate analyses showed that age (HR = 0.97, 0.94-1.0 CI 95%), OL site (HR = 4.1, 1.22-13.8 CI 95%), and OMRS risk classification (HR = 3.48, 1.44-8.40 CI 95%) were the only factors significantly associated with TTP. OL dysplasia grading was not associated with TTP (HR = 1.81, 0.7-4.65 CI 95%), but it was also included in the multivariate model since it is considered an important variable for OC progression risk ( Table 3). The multivariate analysis showed that the OMRS risk score was the only factor significantly associated with OC progression in OL after adjustment for other variables (HR = 4.52, 1.49-13.7 CI 95%) ( Table 3).

| DISCUSSION
In this study, we presented the OMRS model, designed to predict OC development risk using H&E-stained OL slides. The model demonstrated an encouragingly powerful ability to discriminate OL with higher potential for malignant progression from those cases with lower cancer progression risk, serving as an independent and objective prognostic tool different from current practices. To the best of our knowledge, this is the first deep learning study regarding OL pathology H&E image analysis.
Implementation of traditional epithelial dysplasia grading systems used to estimate malignant progression risk of OL requires a dedicated/specially trained expert pathologist, and still suffers from high inter-and intraobserver variability with poor reproducibility. Modification of the traditional three-tier OL grading system to a binary classification has improved the grading system's accuracy. Within the three-tier grading system, the malignant transformation rate (MTR) for OL classified as mild and moderate has been reported as ranging from 5% to 12%, whereas among severe cases, the MTR is around 25%. 31,32 On the other hand, within binary systems, OL classified as high risk and low risk have an MTR of 58% and 13% respectively. 10 In our OMRS model, high-risk patients have an MTR of 58.1%, indicating that it outperforms the three-tier system prediction and parallels with the most recent binary grading approaches. Furthermore, the OMRS model has the advantage of being a standardized procedure that performs with virtually no variability.
An growing body of evidence has demonstrated the success of AI, especially deep learning image analysis of tissue slides, in contributing to the diagnosis and prognosis of a range of diseases. 23,33,34 However, despite advancements in AI technology during the last decade, there are few studies adopting deep learning techniques in the field of oral potentially malignant disorders. According to a recently published systematic review, 35 only one study has  evaluated the use of machine learning to predict cancer development risk in OL; Baik et al. 36 developed their semiautomated algorithm using Random Forests, which is a more traditional machine learning approach, but applied an algorithm training strategy similar to what we did in this study. Their algorithm learned from a set of normal oral mucosa and OC specimens how to differentiate normal from abnormal cell nuclei. Although their approach was highly effective, it relied on a special nucleus staining (Feulgen-Thionin), which is not part of the routine H&E/ surgical pathology-staining protocols routine, and it was dependent on a skilled technician for identification and delineation of regions of interest in the tissue. Conversely, our algorithm was developed using an advanced deep learning technique, which autonomously identifies regions of interest from digital whole slide images and was trained to work with H&E-stained slides using the standard staining protocol at surgical pathology services.
The OMRS model provides supporting evidence for the usefulness of deep learning algorithms using image feature extraction to contribute to the development of new diagnostic and prognostic tools that could potentially benefit patients with oral diseases. That said, we believe that by incorporating other data points into our algorithm, including clinical and genomic data, we may be able to build an even stronger OC risk prediction model, though a much larger dataset will be needed.
There are still limitations for future improvement in this study. The current training data set of nondysplastic oral epithelium and OSCC tissue images used for OMRS model development is small, which could lower the robustness of the currently trained OMRS model. More tissue sections of nondysplastic and dysplastic oral epithelium and OSCC, preferably from multiple centers should be incorporated into future models to improve performance. In addition, the prognostic study was only done in OL patients without concurrent oral cancer. It would be prudent to expand the study into other subsets of OPMD, and explore patients with or without concurrent oral cancer when a larger data set is available.
In summary, our study presents a new predictive tool that performs at least as well as the available OL histologic dysplasia grading approach, but with the additional advantage of being automated and free of variability. We believe that with the improvement of this model, it could potentially be an important tool for the early diagnosis of OC and safeguard those patients with a lower risk of malignant progression from unnecessary mental stress and recurring surgical intervention.