Establishment of a Machine Learning Model for Early and Differential Diagnosis of Pancreatic Ductal Adenocarcinoma Using Laboratory Routine Data

Early diagnosis and clear differentiation of pancreatic ductal adenocarcinoma (PDAC) from chronic pancreatitis (CP) is clinically challenging. A machine learning model is developed for the diagnosis of PDAC. The model is induced using a dataset of 13  987 participants, of which 12  402 are used for training the model and the remaining 1585 for testing purposes. One thousand sixty‐six laboratory variables are reduced to 18 measures using standard filtering and feature importance methods. Then, five machine learning classifiers are evaluated for the study. Hyperparameter optimization for each classifier is carried out, and the optimal algorithm is established using a tenfold cross validation on the training data. Finally, gradient boosting decision tree‐based ternary classifier composed of 18 routine laboratory variables (GBDT‐TC18) is established. In the test cohort, GBDT‐TC18 differentiates PDAC from CP and healthy control (HC) with an accuracy better than carbohydrate antigen 19‐9 (CA19‐9)‐based diagnosis. It also maintains a high diagnostic accuracy for stages I, IIA, and IIB PDAC, small‐sized PDAC, body and tail adenocarcinoma, CA19‐9‐negative PDAC, and nonjaundice PDAC. What's more, GBDT‐TC18 shows a higher accuracy than CA19‐9 in distinguishing PDAC from CP. GBDT‐TC18 can be used to augment the capability of doctors for early and differential diagnosis of PDAC.

It is well known that PDAC initiation and development is a complex process that involved in multiple mutations, hypoxic microenvironment, reprogramming of cellular metabolism, evasion of tumor immunity, etc. [5,10] These changes occur not only in the tumor mass, but also in the interactions between the growing tumor and the larger system of which it is a part; namely, the host organism. [11] Thus, it is impossible to reflect the complicated biological changes through one or a few tumor markers. Some previously neglected laboratory parameters may also provide potential diagnostic values by detecting the cumulative metabolic changes in the body fluids. [12] Routine laboratory parameters have sufficient validity and stability for large-scale evaluations and have been widely used to confirm, exclude, classify or monitor various diseases. [13] However, the menu of available tests in most hospital laboratories is quite large (always over 1600 tests), [14] thus it is impossible to analyze the interrelationships of large numbers of laboratory indicators manually. Unfortunately, the true power of laboratory data is frequently underestimated in clinical practice, perhaps practitioners mainly concentrate on important abnormal parameters (e.g., tumor markers) and overlook the inter-relationships of all the parameters.
The advent of machine learning has enabled us to pursue solutions to laboratory big data previously thought impossible, [15] in which vast numbers of variables can be handled and searched for combinations that reliably predict outcomes. While machine learning is similar to the traditional regression model (both being based on outcomes, covariates, and statistical functions), it is superior in handling enormous numbers of predictors and combining them in nonlinear and highly interactive ways. [16] Therefore, we believe machine learning tools based on laboratory big data will prove transformative.
In this study, we carefully labeled and curated clinical data, and comprehensively analyzed global patterns of laboratory markers in patients with PDAC and CP as well as healthy controls (HC) to accomplish several goals. First, machine learning was used to identify important laboratory variables that could differentiate PDAC from HC and CP. Next, we aimed to develop an optimal machine learning model to outperform CA19-9 in terms of detection of pancreatic cancer from HC and CP, especially for early-stage, small-sized, body and tail, CA19-9-negative, or nonjaundice PDAC. Finally, as CP represents a risk factor for PDAC and a potential differential diagnosis, [4,5] we sought to assess the performance of the model to distinguish patients with PDAC from patients with CP.

Characteristics of the Participants
A total of 13 987 participants were eventually included in the study, including 12 402 and 1,585 participants in the training and test cohorts, respectively. Flow charts for participant inclusion/exclusion are shown in Figure 1. The clinical characteristics of the study cohorts are shown in Table 1. The mean age at PDAC diagnosis (61.63 AE 10.24 years old) was older than that at CP diagnosis (45.23 AE 15.11 years old), suggesting that age is a risk factor for PDAC. In addition, 62.6% of PDAC and 70.6% of CP patients were male, suggesting that men are more susceptible to pancreatic diseases than women (Table 1).

A GBDT Classifier Based on 18 Predictor Variables Shows
Highest Diagnostic Accuracy for Differential Diagnosis of PDAC, CP, and HC After data preprocessing and feature selection, 18 predictor variables in 1066 laboratory indicators in the training cohort were selected to develop machine learning models ( Figure 2). In addition to the five already-known predictors of PDAC (CA19-9, CEA, CA125, amylase, and age), three other variables including red blood cell (RBC) count, lymphocytes percentage (LYMP%), and blood glucose levels (GLU) remained in the model in 100% of iterations after stepwise selection. These eight variables were followed by ten variables that remained in the model in 54-99% iterations after stepwise selection. As shown by curves in Figure 2a, the accuracy of the model was improved as each variable was added.
Based on the 18 predictor variables, we subsequently developed five optimal machine learning classifiers in the training cohort with tenfold cross validations: 1) gradient boosting decision tree (GBDT) classifier,(2) artificial neural network (ANN) classifier, 3) support vector machine (SVM) classifier, 4) logistic regression (LR) classifier, and 5) random forest (RF) classifier ( Table 2). We found that the GBDT classifier showed the highest diagnostic accuracy (0.872) for differential diagnosis of PDAC, CP, and HC, thus selected the GBDT algorithm to construct the final model. We termed this classifier as GBDT-based ternary classifier composed of 18 routine laboratory variables (GBDT-TC 18 ).
The relative importance of the 18 predictor variables in GBDT-TC 18 is shown in Figure 2b. In addition to the five locked-in variables (CA19-9, CEA, CA125, amylase, and age), the most important factor for diagnosis prediction in GBDT-TC 18 was a plateletlarge cell ratio (P-LCR), a simple marker to reflect platelet activity. The second important factor was white blood cells (WBCs), followed by gender, glucose, RBCs, direct bilirubin (DBiL), percentage of neutrophilic granulocytes (NEU%), percentage of lymphocytes (LYMP%), etc.

GBDT-TC 18 Shows Higher Diagnostic Performance than CA19-9 for Diagnosis of PDAC and Its Subgroup
In the training cohort, GBDT-TC 18 produced a higher diagnostic performance than CA19-9 to discriminate PDAC from CP and HC: AUC 0.987 versus 0.828, true positive rate (TPR) 90.0 versus 72.1% (Figure 3a,b). Similarly, in the test cohort, the performance of GBDT-TC 18 for PDAC diagnosis was also higher than that of CA19-9 (AUC 0.960 vs 0.827, TPR 89.3 vs 70.8%, Figure 3a,b). In addition, the GBDT-TC 18 had significantly more effective discrimination than CA19-9 for diagnosing PDAC from CP and HC, in terms of sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and F1 score in both the training and test cohorts ( Table 3).
According to TNM staging (AJCC 8th edition), [17] Stage I/II and III/IV tumors were defined as early-stage and late-stage PDAC, respectively. We next validated the performance of GBDT-TC 18 in diagnosing early-stage PDAC (stages I, IIA, and IIB) versus CP and HC. We found that GBDT-TC 18 had a higher diagnostic power than CA19-9 to detect all early PDAC stages, as follows: stage I, AUC 0.989 versus 0.822, TPR 92.1 versus 70.8% in the training cohort and AUC 0.968 versus 0.767, Figure 1. Study design and enrollment of participants. Abbreviations: HC, healthy control; CP, chronic pancreatitis; PDAC, pancreatic ductal adenocarcinoma. *Only the major exclusion criterion was adopted for a patient meeting multiple exclusion criteria. ** A patient would be excluded when missing laboratory examination data within 15 days before definite diagnosis.  (Figure 3c-h, Figure S1a-d, Supporting Information, and Table 3). These data suggest that GBDT-TC 18 has a stable diagnostic performance independent of pancreatic tumor stage. Based on TNM staging (AJCC 8th edition), [17] we further explored the diagnostic performance of GBDT-TC 18 for PDAC with different tumor sizes, including small (≤2 cm), medium (2-4 cm), and large (>4 cm) tumors. We found GBDT-TC 18 had a higher diagnostic accuracy than CA19-9 to differentiate small-sized PDAC from CP and HC, with an AUC 0.990 versus 0.810, TPR 90.5 versus 68.6% in the training cohort, and an AUC 0.989 versus 0.944, TPR 100 versus 94.1% in the test cohort (Figure 3i,j). Similarly, we observed that GBDT-TC 18 achieved a similar diagnostic performance for small-sized, medium-sized, and large-sized tumors in PDAC participants in both the training and test cohorts (Figure 3i,j, Figure S1e-h, Supporting Information, and Table 4). These data suggest that GBDT-TC 18 can effectively detect PDAC with different tumor sizes and its performance is not affected by the tumor sizes.
Compared with pancreatic head and neck tumors, it is more difficult to diagnose tumors located in the pancreatic body and tail due to the insidious clinical symptoms. Our GBDT-TC 18 showed enhanced detection of pancreatic body and tail adenocarcinoma from CP and HC participants compared with the CA19-9-based diagnosis, in the training cohort (AUC 0.987 vs 0.821, TPR 89.6 vs 70.8%), and in the test cohort (AUC 0.973 vs 0.810, TPR 93.9 vs 67.3%, Figure 3 k,l). Similarly, we observed that GBDT-TC 18 achieved a similar diagnostic performance for pancreatic body and tail adenocarcinoma as well as head and neck adenocarcinoma in both the training and test cohorts ( Figure 3k,l, Figure S1i,j, Supporting Information, and Table 4). These data suggest that GBDT-TC 18 can reliably diagnose PDAC regardless of the pancreatic tumor location.
2.4. GBDT-TC18 Shows High Diagnostic Performance for CA19-9 Negative PDAC Diagnosis CA19-9-based diagnosis had poor sensitivity in early or smalldiameter PDAC patients [7,18] and the expression of CA19-9 was undetectable in Lewis antigen-negative individuals. [8] In this study, participants with CA19-9 values of <37 U mL À1 were considered CA19-9 negative participants. [18] We included 561 (19.6%) and 40 (17.2%) CA19-9negative patients with PDAC in the training cohort and test cohort, respectively ( Table 5). Remarkably, GBDT-TC 18 produced a high AUC for detecting CA19-9-negative PDAC from CA19-9-negative CP and HC, with an AUC of 0.969 and 0.956 in the training cohort and test cohort, respectively ( Figure 4a). In addition, GBDT-TC 18 maintained a high TPR for detecting patients with CA19-9-negative PDAC, with a TPR of 68.3 and 67.5% in the training and test cohorts, respectively. Conversely, these patients were not detected by CA19-9 analysis alone (Figure 4b, Table 5). These data confirm the excellent capability of our new approach, GBDT-TC 18 , in detecting individuals with CA19-9 negative PDAC. Obstructive jaundice is one of the most common symptoms of late-stage PDAC [10] and caused by excessive bilirubin. [19] However, most patients with obstructive jaundice are not candidates for curative resection, [10] thus it is important to make a timely and accurate PDAC diagnosis in patients without jaundice. Herein, participants with total bilirubin (TBiL) values of ≤17.1 μmol L À1 were considered TBiL-negative participants. Our study included 1299 (45.4%) and 114 (48.9%) individuals with PDAC who were TBiL-negative in the training and test cohorts, respectively (Table 5). Our results showed that GBDT-TC 18 presented a larger AUC than CA19-9 in discriminating TBiL-negative PDAC from TBiLnegative CP and HC (AUC 0.982 vs 0.829 in the training cohort; 0.939 vs 0.811 in the test cohort, Figure 4c). In addition, more patients with TBiL-negative PDAC were detected using GBDT-TC 18 than with CA19-9, as indicated by a TPR of 84.1 versus 72.1% in the training cohort and 82.5 versus 67.5% in the test cohort ( Figure 4d). These data suggest that GBDT-TC 18 is a promising, noninvasive method for identifying nonjaundice PDAC.

GBDT-TC 18 Shows Higher Diagnostic Performance than CA19-9 for Differential Diagnosis of PDAC and CP
Patients with PDAC sometimes present with malabsorption and therefore can potentially be misdiagnosed to have CP, as a consequent of delay in cancer treatment. [20] Our results showed that GBDT-TC 18 presented better performance than CA19-9 in distinguishing PDAC from CP (AUC 0.976 vs 0.796 in the training cohort; AUC 0.965 vs 0.778 in the test cohort, Figure 5a).
To explore the feasibility of clinical translation, we verified our results on an independent CP/PDAC cohort in a clinical setting. Using a proprietary application programming interface (API), our GBDT-TC 18 model was linked to existing electronic health records (EHR) platforms, individual clinical laboratory data could be automatically extracted, analyzed, and matched to the patients diagnosed with CP/PDAC. We found 685 patients with PDAC and 148 patients with CP, who were initially diagnosed as "pancreatic mass lesions (PML)." That is to say, even doctors in the tertiary hospitals were unable to give a definite diagnosis at the first diagnosis. Our results showed that GBDT-TC 18 had a higher diagnostic accuracy than CA19-9 to distinguish PDAC from CP (AUC 0.930 vs 0.734, TPR 87.3 vs 69.6%, Figure 5b,c). In addition, we also found 42 patients with PDAC who were initially www.advancedsciencenews.com www.advintellsyst.com misdiagnosed with CP in the EHR system. Our model also identified more patients with PDAC than CA19-9 (61.9 vs 59.5%, Figure 5d). Taken into together, our results demonstrated that GBDT-TC 18 could augment the capability of human doctors for differential diagnosis of PDAC and CP.

Discussion
Our pragmatic study is representative of real-world pancreatic cancer diagnosis. As noted in a recent statement of the American Society of Clinical Oncology, [21] although real-world studies may lack rigorous prespecified data collection, the realworld data they produce may be more representative of patients and exposures (e.g., diagnostics) in routine practice. In this study, we conducted vast data preprocessing steps to produce high-quality real-world evidence. For example, we reclassified the T and N staging in the pathology reports according to the recent TNM staging (AJCC 8th edition) [17] to ensure TNM staging compatibility. Moreover, to properly handle missing data, we selected variables that have a missing value rate of <30% at the www.advancedsciencenews.com www.advintellsyst.com beginning, then adopted and compared five different imputation methods to train our model. We ultimately chose "the imputation with an extra category" that showed the highest accuracy (0.872, Table S2, Supporting Information). Meanwhile, this strategy most closely represents the real clinical practice, which allows us to make predictions even if a few variables are missing. To avoid laboratory data heterogeneity, we also normalized all the qualitative and quantitative data based on the corresponding sex, age, units, and apparatus type. Then, the candidate variables were starting from a small set of variables (CA19-9, CEA, CA125, amylase, and age) previously associated with pancreatic disease, which enhances the validity of the variables. [22][23][24][25] Subsequently, we used standard filtering and feature importance methods to obtain another 13 top variables. Interestingly, many variables have been implicated to some degree in PDAC initiation and progression. For example, variables associated with hypercoagulable state (such as elevated P-LCR and fibrinogen), inflammation state [26] (such as elevated WBC and neutrophils), immune suppression [26,27] (such as decreased lymphocytes and eosinophils), and nutrient deficiency (such as decreased RBC, hemoglobin, and hematocrit) [28] were associated with a higher PDAC risk. Moreover, consistent with previous studies, [29][30][31] we show that determining glucose levels improves the early diagnosis of PDAC. Furthermore, the multivariable model also confirmed that determining the level of both DBiL (associated with obstructive jaundice) [10,19] and TBiL (associated with various degrees of jaundice) [32,33] improves the differential diagnosis of PDAC, CP, and HC. Therefore, we believe laboratory parameters as realworld evidence might assist doctors in making clinical decisions.
The clinical values of GBDT-TC 18 include the following aspects. First, the model could improve CA19-9-based PDAC diagnosis by providing higher sensitivity for PDAC diagnosis (Table 3). Specifically, GBDT-TC 18 was able to identify 2578 cases (90% PDAC) in the training and 208 cases (89.3% PDAC) in the test cohort. It even detected 513 cases (17.9% PDAC) in the training and 43 cases (18.5% PDAC) in the test which were otherwise undetectable and thus missed by CA19-9-based PDAC diagnosis. In addition, our model also maintained higher sensitivity than CA19-9 in diagnosing nonjaundice, early-stage, small-sized, or body and tail tumors ( Table 3-5). Second, our model achieves the detection of the CA19-9-negative PDAC with high sensitivity. It is well known that CA19-9 is not expressed in Lewis-negative individuals [8] and suffers from low sensitivity in early or smallsized PDAC patients. [7] Our real-world dataset showed that 561 (19.6%) and 40 (17.2%) of individuals with PDAC in the training and test cohorts, respectively, were CA19-9-negative; the current CA19-9-based diagnosis would thus fail to diagnose these patients. [34] However, our GBDT-TC 18 model could differentiate most patients with CA19-9-negative PDAC (68.3 and 67.5% in the training and test cohorts, respectively; Figure 4) from non-PDAC controls. This amazing performance clearly demonstrates that our GBDT-TC 18 model is a breakthrough method. Third, our GBDT-TC 18 model could improve the differential diagnosis of PDAC from CP. In clinical practice, patients with PDAC present symptoms suggestive of CP and therefore can potentially be misdiagnosed to have CP. Our model ( Figure 5), however, provides a new powerful diagnostic method for either disorder.
Based on these findings, we hope our model has the following potential clinical applications. First, this model could be used to augment clinicians, especially primary health-care (PHC) doctors, to identify those high-risk patients with PDAC. In PHC systems in China, imaging or pathological tests are less available, while laboratory tests are most-commonly used diagnostic method. [35] Consequently, we hope our laboratory-based model could be used in PHC systems to examine a population of patients where the doctors were expressly uncertain of PDAC versus CP or HC, and suggest those high-risk patients with PDAC to tertiary hospitals for further diagnosis and treatment as early as possible. By doing this, it may help to detect more patients with early and resectable tumors, resulting in a greater survival probability as well as less expensive treatments. In addition, our model could be used to assist doctors to differential diagnose of PDAC and CP, especially for some complicated cases such as mass-type CP and CA19-9 negative PDAC. According to a retrospective study including 471 992 veterans, [20] up to 5% of all patients with PDAC were initially diagnosed as CP. Cancer diagnosis was postponed by 2-24 months in 68% of all PDAC patients misdiagnosed as CP, further delaying the timely treatment of PDAC. Therefore, it is highly desirable to develop tools for differential diagnosis of PDAC and CP. Using a proprietary API, our model was linked to existing EHR platforms, individual clinical laboratory data could be automatically extracted, analyzed and matched to the patients diagnosed with CP/PDAC. We found 42 PDAC patients initially misdiagnosed as CP, 26 (61.9%) cases of which were accurately identified by our model (Figure 5d). In addition, we also found 685 PDAC patients and 148 CP patients, who were initially diagnosed as "pancreatic mass lesions." That is to say, even doctors in the tertiary hospitals were unable to give a definite diagnosis at the first diagnosis. Our model, however, accurately identified 598 (87.3%) patients with PDAC and 100 (67.6%) patients with CP, thus supporting that our model is a reliable, noninvasive, and cost-effective evaluation method for the differential diagnosis of PDAC from CP. In summary, we have developed and validated an easily automatable and highly accurate laboratory-base machine learning model, GBDT-TC 18 , for early and differential diagnosis of PDAC. It is an innovative method in diagnosing PDAC with significant clinical advantages over the current methods such as CA19-9 analysis. Furthermore, the study design and concept of our method could potentially serve as a blueprint for development of similar analytical diagnosis for other complicated diseases.

Experimental Section
Study Design and Population: The study population consisted of participants from a consecutive hospital-based case-control study conducted in Shanghai Changhai Hospital, a tertiary hospital and a large referral center for difficult and complicated pancreatic diseases. The study protocol was approved by the regional ethics committee. Cases were enrolled from newly diagnosed and pathologically confirmed PDAC. Controls were enrolled from HC who were clinically healthy and benign pancreatic controls, who were diagnosed with CP. Eligibility criteria of participants are provided in the Supporting Information. Eligible participants diagnosed between January 1, 2010 to December 31, 2017 were included in the training cohort, and those diagnosed between January 1, 2018 and July 26,2018 were included in the held-out test cohort. To prevent over-sampling, we used propensity score matching (PSM) to select HC in the training cohort (with details presented in the Supporting Information). Then, five machine-learned classifiers were evaluated for the study. Hyperparameter optimization for each classifier was carried out using a 10-fold cross validation on the training data. The optimal hyperparameters that obtain the best predictive performance (model accuracy) of 10-fold cross validations were chosen to establish the final machine learning model. Finally, the model was assessed on the held-out test cohort. A summary of the workflow and an overview of the cohorts are shown in Figure 1. The study was conducted according to the statement for transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) (http://www.equator-network.org/reportingguidelines/tripod-statement/). Data Collection and Quality Control: To improve clinical scientific research, Shanghai Changhai Hospital has developed an integrated data acquisition system and built a large-scale data integrated platform including clinical data repository (CDR) and research data repository (RDR). [36] In this study, anonymized demographics, medical records, laboratory test results, operation notes, diagnoses information, pathology, and radiology reports of eligible participants were automatically extracted from this platform. Specifically, a total of 1066 laboratory indicators pertaining to eligible participants in the training cohort were extracted and transferred into categorical variables based on the corresponding sex, age, laboratory units, and apparatus type to avoid data heterogeneity. The qualitative data were processed into two categories, namely, positive or negative. The quantitative data were processed into three categories (high, low, or normal) according to the reference interval. To better discriminate PDAC and CP, [19,37] CA19-9 and TBiL levels were divided into five levels as follows: CA19-9 (<37, 37-160, 160-500, 500-1000, and ≥1000 IU mL À1 ), TBiL (≤17.1, 17.1-34.2, 34.2-171, 171-342, and >342 μmol L À1 ; ≤10, 10-20, 20-100, 100-200, and >200 mg L À1 ). [32,33]  Then, all the laboratory variables were subject to the following quality control steps for further feature selection: 1) having been tested within 15 days before the diagnosis time (if repeated items existed, the one closest to the diagnosis time was used), leading to the generation of 490 variables; and 2) having a missing value rate <30% for all types of diagnosis, resulting in the generation of 64 variables (Table S1, Supporting Information). We also included age and sex during feature selection (Table 1), due to their reported association with CP and PDAC. [22] Consequently, a total of 66 candidate variables were selected for feature selection steps. Then, a forward stepwise method was used to eliminate uninformative variables based on the model accuracy with internal 10-fold cross validations. Of note, according to previous studies [22,23] and guidelines for PDAC, [25] five variables closely related to pancreatic disease (CA19-9, CEA, CA125, amylase and age) were locked-in at the beginning of stepwise selection to avoid local optimization error. Finally, variable selection was carried out 100 times and the 18 top-ranked predictors (Figure 2 and Figure S2, Supporting Information) out of 100 iterations were selected to develop the machine learning model.
Machine Learning Algorithms and Model Development: In the training cohort, five machine learning algorithms (with details presented in the Supporting Information) were used to construct respective ternary classifiers using the 18 selected predictors: 1) GBDT classifier, 2) ANN classifier, 3) SVM classifier, 4) LR classifier, and 5) RF classifier.
As mentioned earlier, we obtained optimal hyperparameters and developed five optimal machine learning classifiers using a 10-fold cross  18 for patients with CA 19-9-negative PDAC versus CA 19-9-negative CP and HC in the training and test cohorts. b) True positive rates for GBDT-TC 18 and CA 19-9 for patients with CA 19-9-negative PDAC in the training and test cohorts. c) ROC for GBDT-TC 18 and CA 19-9 for patients with TBiL-negative PDAC versus TBiL-negative CP and HC in the training and test cohorts. d) True positive rates for GBDT-TC 18 and CA 19-9 for patients with TBiL-negative PDAC in the training and test cohorts. Abbreviations: ROC, receiver operating characteristic curve; AUC, area under the curve; GBDT-TC 18 , gradient boosting decision tree-based ternary classifier composed of 18 routine laboratory indicators; CA19-9, 37 U mL À1 of carbohydrate antigen 19-9 as a cutoff; TBiL, total bilirubin; PDAC, pancreatic ductal adenocarcinoma; CP, chronic pancreatitis; HC, healthy control.
www.advancedsciencenews.com www.advintellsyst.com validation on the training data. Then, the classifier with the highest accuracy was chosen as the final machine learning model. After that, the model was fixed and trained across the training cohort, then validated on an independent held-out test cohort. Subgroup analysis was also carried out to test the model's ability to detect early-stage, small-sized PDAC, pancreatic body and tail adenocarcinoma, CA19-9-negative, as well as nonjaundice PDAC in the training and test cohort, respectively. Statistical Analysis: We used the receiver operating characteristic curve (ROC) to evaluate model discrimination for each class and the area under the curve (AUC) were used to reflect ROC performance (with details presented in the Supporting Information). Sensitivity, specificity, PPV, NPV, and F1 score for each class were calculated. The F1 score was defined as 2 Â (sensitivity Â PPV)/(sensitivity þ PPV). We also compared diagnostic performance of our model with that of CA19-9. The value of 37 U mL À1 for CA19-9 was used as a cutoff for distinguishing PDAC from CP and HC. [18] We carried out learning curve analysis to test the sufficiency of sample size for model development. [38] Participant data from the training cohort were sampled at rates of 2, 4, 6, 8, and N Â 10% (N ranged from 1 to 10) and repeated sampling was made 100 times for each rate. For each data set, a 10-fold cross validation was performed to generate the accuracy and AUC of the newly established model. As shown by both parameters, the results reached a stable level at %30% of the sample size of the training cohort ( Figure S3, Supporting Information). Therefore, the current sample size was more than enough to represent the stable statistics.
To evaluate the effect of missing values, we trained the final model with five different imputation methods: mode imputation, adjacent imputation, multiple imputation, imputation using recursive partitioning, and regression trees (RPART), and imputation with an extra category (with details presented in the Supporting Information). In the training cohort, the imputation with an extra category showed a subtle advantage over the other methods (Table S2, Supporting Information), we thus chose the imputation with an extra category to handle missing data.
Continuous variables with a normal distribution were expressed as the means AE standard deviation whereas variables non-normally distributed were presented as the median and interquartile range. These variables were compared by student's t-test or Mann-Whitney U-test. Categorical data were compared by Pearson's χ 2 test or Fisher's exact test, as appropriate. A P value <0.05 was considered statistically significant, unless Figure 5. Performance of GBDT-TC 18 in differential diagnosis of PDAC and CP. a) ROC for GBDT-TC 18 and CA19-9 for patients with PDAC versus CP in the training and test cohorts. b) ROC for GBDT-TC 18 and CA19-9 for patients with PDAC or CP, whereas initially diagnosed as PML. c) True positive rates for GBDT-TC 18 and CA 19-9 for patients with PDAC, whereas initially diagnosed as PML. d) True positive rates for GBDT-TC 18 and CA19-9 for patients with PDAC initially diagnosed as CP. Abbreviations: ROC, receiver operating characteristic curve; AUC, area under the curve; GBDT-TC 18 , gradient boosting decision tree-based ternary classifier composed of 18 routine laboratory indicators; CA19-9, 37 U/mL of carbohydrate antigen 19-9 as a cutoff; PDAC, pancreatic ductal adenocarcinoma; CP, chronic pancreatitis; PML, pancreatic mass lesions.

Supporting Information
Supporting Information is available from the Wiley Online Library or from the author.