Predictive value of a novel Asian lung cancer screening nomogram based on artificial intelligence and epidemiological characteristics

Abstract Background To develop and validate a risk prediction nomogram based on a deep learning convolutional neural networks (CNN) model and epidemiological characteristics for lung cancer screening in patients with small pulmonary nodules (SPN). Methods This study included three data sets. First, a CNN model was developed and tested on data set 1. Then, a hybrid prediction model was developed on data set 2 by multivariable binary logistic regression analysis. We combined the CNN model score and the selected epidemiological risk factors, and a risk prediction nomogram was presented. An independent multicenter cohort was used for model external validation. The performance of the nomogram was assessed with respect to its calibration and discrimination. Results The final hybrid model included the CNN model score and the screened risk factors included age, gender, smoking status and family history of cancer. The nomogram showed good discrimination and calibration with an area under the curve (AUC) of 91.6% (95% CI: 89.4%–93.5%), compare with the CNN model, the improvement was significance. The performance of the nomogram still showed good discrimination and good calibration in the multicenter validation cohort, with an AUC of 88.3% (95% CI: 83.1%–92.3%). Conclusions Our study showed that epidemiological characteristics should be considered in lung cancer screening, which can significantly improve the efficiency of the artificial intelligence (AI) model alone. We combined the CNN model score with Asian lung cancer epidemiological characteristics to develop a new nomogram to facilitate and accurately perform individualized lung cancer screening, especially for Asians.


INTRODUCTION
In 2018, the Global Cancer Statistics Report suggested that there were 18.1 million new cases of cancer and 9.6 million deaths due to cancer globally. Lung cancer has among the highest morbidity (11.6%) and mortality (18.4%) rates, accounting for 1.6 million deaths annually. 1 Eliminating lung cancer remains a serious challenge.For early-stage lung cancer, surgery is an effective treatment method: a 75%-100% 5-year survival rate can be achieved in patients with stage IA non-small cell lung cancer (NSCLC) after surgery but only a 4% to 17% survival rate for advanced patients. 2 Therefore, it is crucial to detect and cure the disease in the early stages.
In recent years, evidence from a wide range of sources has indicated that low-dose computed tomography (LDCT) screening can reduce the mortality of lung cancer. 3 The National Lung Screening Trial (NLST) revealed a significant 20% reduction in lung cancer mortality with LDCT screening in the USA. 4 Traditional LDCT screening often produces false-positive results, 24% of LDCT screening examinations were positive, and the range of false-positive rates overall was 7.9% to 49.3% for baseline screening. 5,6 Meanwhile, all of the above assessments require labor-intensive work from radiologists. Recently, deep learning-based convolutional neural networks (CNN) have achieved satisfactory effect in image recognition, and several CNN models for chest CT image analysis have been proposed for lung nodule detection and classification. 7,8 Nonetheless, unlike conventional diagnosis methods, most artificial intelligence (AI) prediction models only consider image features without epidemiological and clinical characteristics. In the conventional process of lung cancer diagnosis, the epidemiological characteristic manifestations of patients are a very important diagnostic basis and must be taken into account. Moreover, there are significant differences between Asians and Europeans and Americans in the epidemiological characteristics of lung cancer.
In this study, we present a prediction model that is derived from a deep learning CNN algorithm on LDCT findings combined with epidemiological characteristics for lung cancer screening, especially for Asians. Finally, a risk prediction nomogram was developed and validated for lung cancer screening in Asian patients with SPN.

Data sets
We collected an independent data set for CNN model training and testing and named data set 1; the other independent data set named data set 2 and a multicenter data set named data set 3 for the hybrid model training and validation. This study was approved by the institutional review board of The Affiliated Hospital of Qingdao University.
First, we retrospectively collected lung cancer patients' LDCT image data from our institution during January 2014-August 2018 for data set 1. The inclusion criteria of patients included: (1) The patient underwent a general health examination and performed LDCT, pulmonary nodule sizes were less than 30 mm in diameter on LDCT images, (2) histopathological results were confirmed after thoracic surgical resection and the postoperative histopathological result reference standard, and (3) preoperative LDCT could be obtained and the thickness of the LDCT images was ≥5 mm. Ultimately, a total 231 312 LDCT images from 3644 patients were collected for training, tuning, and testing the CNN model. Second, 790 consecutive patients' preoperative entire volume thoracic LDCT images, clinical data were collected from our institution with the same inclusion criterion as the above between September 2018-December 2019 as data set 2 to train and validate the hybrid model.
In addition, each participant needed to complete an epidemiological questionnaire in a follow-up.
Third, 210 patients' data were collected by Shanghai Chest Hospital, Xuanwu Hospital Capital Medical University, Qingdao Municipal Hospital, The Affiliated Hospital of Qingdao University and Qingdao Chengyang District People's Hospital with the same inclusion criterion as the training cohort between January 2020 to May 2020 were used for final assessment of the risk prediction nomogram. All patients were screened by a healthy examination and were later diagnosed as requiring surgery by an experienced surgeon in accordance with NCCN guidelines (Vision 1, 2020), 9 and the pathological results were confirmed after surgery. The details of the three datasets are described and listed in Tables 1-3.

CNN model development
Five experienced radiologists annotated the LDCT images process with LabelImg1.1 software. The labeled lung images were used to train, tune and test the CNN model.
We constructed a 16-layer feature-extracted network and a 26 Â 26, 52 Â 52, 104 Â 104 three-scale detection network based on the framework of the YOLO detection algorithm. 10 On the premise of extracting enough feature information, the detection speed of the algorithm can be improved by the shallower feature extraction network. The three-scale detection network can greatly increase the generalization ability of the detection algorithm to the size of the target and improve the recall rate.
The size of the feature graph was 26 Â 26, and it was outputted by the feature extraction network with 256 channels. By means of convolution operation, upsampling   (X, Y), width and height of the target, and classification reliability. Nonmaximum suppression and threshold filtering methods were used to retain the predicted objects with high scores as the detection results ( Figure 1). In the process of forward prediction, each layer of feature map convoluted with convolution kernel pixel by pixel to extract pixel information. Then, batch normalization was used to make the data conform to the normal distribution. The leave relu activation function was used to activate a specific node. The distribution of eigenvalues was fitted by nonlinear function. After the forward prediction, the error between the predicted value and the real value was calculated in the cross entropy loss function. In order to classify the objects better, gradient descent was used to update the parameters of convolution kernel in each layer of neural network for back propagation, and object position information was regressed. The parameters of the trained nonlinear function were saved and the weight file was generated.

Risk factor screening
In total, epidemiological characteristics were collected from 790 patients. The patients were divided into a malignancy group and benign group according to the postoperative pathology. Currently, postoperative pathology is the reference standard in clinical diagnosis. Epidemiological questionnaires were collected by an experienced surgeon during the follow-up, including age, gender, race, marital status, smoking status, alcohol consumption, dietary habits, occupational exposure, family history of cancer, nonpulmonary chronic diseases, pre-existing lung disease, dwelling environment exposure and so on.
Race inhaled the smoke exhaled by smokers for at least 15 min more than 1 day in a week. The degree of smoking in current and former smoker was measured by heaviness of smoking index (HSI), 11 which was <400 for mild smoking, and ≥400 for heavy smoking. Alcohol consumption was assessed using the consumption subscale of the Alcohol Use Disorders Identification Test (AUDIT). 12 This three-item subscale assessed participants' frequency and quantity of alcohol use. Each item was scored using a four-point Likert scale with varying endpoints and summed to create final scores that range from 0 to 12, with higher than eight points defining alcohol consumption positive. Dietary habits were divided into healthy diet or unhealthy diet. A diet of moderation rich in fruits and vegetables was defined as the healthy diet. The occupational exposure positive was defined as work associated with mining and quarrying, metal production industries, including smelting and refining, asbestos production; shipbuilding; and construction. Family history of cancer was divided into three categories: no, other cancer family history and lung cancer family history. Preexisting lung diseases included pneumonia, emphysema, asthma, chronic bronchitis, pulmonary fibrosis, tuberculosis and chronic obstructive pulmonary disease. Dwelling environment exposure positive was defined as residential areas located around heavy industrial factories, mining areas, docks and traffic intensive areas.

Statistical analysis
Statistical analysis was performed with R software (version 3.6.2; http://www.Rproject.org), MedCalc software and SPSS version 25.0 statistical software (IMB-SPSS Inc., Armonk, NY, USA). Continuous data were described as means and SD, and categorical variables were described as frequencies and percentages. Continuous variables were compared using the t-test, and comparisons between two categories were made using Pearson's χ 2 test. All tests were 2-tailed and statistical significance was set at p < 0.05.

Development of a lung cancer risk prediction nomogram
Predictors with p < 0.05 screened out by univariate logistic regression analysis were combined with the CNN model score and included in a multivariate logistic regression model. The odds ratios (ORs), probability and 95% confidence intervals (CIs) were estimated for each selected risk factor. The probability score was used to draw the receiver operating characteristic (ROC) curve to assess the sensitivity and specificity of the risk prediction model. Definition of test positivity cutoffs was exploratory. In order to be more convenient for clinical application, we further built a risk prediction nomogram base on multivariable logistic analysis.

Performance of the nomogram in the training cohort
The accuracy of prediction model was quantified with the area under the ROC curve (AUC). The statistical significance of the improvement in AUC after adding the risk factors was calculated by Delong's test. 13 Calibration curves were plotted to assess the calibration of the risk prediction nomogram. Bootstraps with 1000 resamples were applied to these activities.

Validation of the prediction nomogram
Internal validation. We performed the internal validation using the training data set. Independent validation. The performance of the internally validated nomogram was examined in the multicenter validation cohort. The logistic regression model trained in the training cohort was applied to all patients in the multicenter validation cohort, and the total score of each patient was calculated. At last, the ROC curve and calibration curve were derived on the basis of the regression analysis.

Patient characteristics
Details of the CNN model training data and the hybrid model training data are summarized in Table 1. In the CNN model development data set, 58.5% of patients were females and the average age was 61.042 AE 9.276 years. Tumor pathological results revealed that 83.2% of the cases were adenocarcinoma, 8.6% were squamous cell carcinoma, and 8.2% were other types of pulmonary tumors.

CNN model training and test
In total, 231 312 entire volume LDCT images were retrospectively collected to train, tune and test the CNN model. Seventy percent of the images were assigned to the training set, and 30% were assigned to the tuning and testing set randomly. When the threshold was set to 0.24, the precision of the tuned CNN model achieved 95%, and the recall achieved 92%. The trained CNN model can achieve a mean average precision (mAP) of 89.95%.

Patient characteristics
Patient characteristics in the training cohort are given in Table 2. Finally, a total of 756 patients were retrospectively enrolled and 34 patients were excluded because of LDCT images or follow-up data absence. There was a total of 63.3% of patients with malignancies with average ages of 60.66 AE 9.78. There were significant differences in age, gender, family history of cancer and smoking status between the malignancy and benign groups (p < 0.001). Details of the multicenter validation cohort data are summarized in Table 3. The preoperative LDCT images and epidemiological data of 210 patients were collected from five medical centers. There were 158 malignancy nodules and 52 benign nodules. There was no significant difference in the number of pathological classifications (p = 0.623). There were significant differences in the CNN model score, smoking status, and family history of cancer between the two groups (p < 0.05).

Risk factor screening
Epidemiological characteristics showed an association with postoperative pathological status. By univariate logistic regression analysis, compared with patients with benign pulmonary nodules, patients with malignant pulmonary nodules were more likely to be older, female, tobacco exposure and positive family history of cancer. Furthermore, multivariate logistic analysis identified that age (p < 0.001), gender (p < 0.001), family history of cancer (p = 0.001), smoking status (p < 0.001) and CNN model score (p < 0.001) were independent risk factors for lung cancer ( Table 4).

Development of a prediction model
The hybrid model that incorporated the independent epidemiological predictors and the CNN model score was established and presented as a nomogram (Figure 2).

Comparison of the CNN and hybrid models
As shown in Figure 3, the only CNN model had an AUC = 90.7%; with the addition of the epidemiological predictors, the AUC was significantly improved to 91.6% (p = 0.00963, Delong's test). This suggests an important role of epidemiological risk factors in the prediction of lung cancer.

DISCUSSION
We developed and validated a nomogram for lung cancer screening based on the CNN deep learning algorithm and epidemiological characteristics. The nomogram incorporates five items: CNN model score, age, gender, smoking status and family history of cancer. The CNN model successfully classified patients according to their LDCT -image features. Incorporating the CNN model score and epidemiological risk factors into an user-friendly nomogram assists lung cancer screening. Previous studies showed that the CNN deep learning algorithm could be applied in some diagnosis areas, such as gastric cancer, liver tumor, and skin cancer, and achieved remarkable success. [14][15][16] In lung cancer, a few researches have previously studied pulmonary nodule detection and classification by CNN. The first report on the application of a deep learning model to nodule classification came from Hua et al. 17 Encouraging results were revealed in both the deep belief network and CNN models for pulmonary nodule classification. Ardila et al. 18 suggested a three-dimensional deep learning CNN model that uses a patient's current and prior CT images to predict the risk of lung cancer. Their model allowed end-to-end lung cancer F I G U R E 2 Developed lung cancer prediction nomogram. Smoking, smoking status; FHC, family history of cancer; CNNS, CNN model score. The prediction nomogram was developed in the training cohort, with age, gender, smoking status, family history of cancer and the CNN model score incorporated screening and achieved satisfactory result: AUC 94.4%. Nibali et al. 19 used a deep residual CNN model to detect pulmonary nodules and achieved satisfactory performance (sensitivity: 91.7%, specificity: 88.6%). Most of the above models were mainly applied to the classification of high-resolution CT images with a thickness of 0.625-1 mm. However, low-dose spiral CT is commonly used with 5 mm layer thickness images in lung cancer screening. In this study, we specially established a CNN detection model for LDCT images with a thickness of 5 mm.
Compared with those previous deep learning models used to pulmonary nodule detection, our CNN model was trained and verified by the YOLO detection network. YOLO addresses target detection as a regression problem. The YOLO network borrows from the classified network structure of GoogLeNet. The difference is that YOLO does not use the inception module but instead uses a simple replacement of the 1 x 1 convolutional layer plus a 3x3 convolutional layer. YOLO can recognize all the information of the whole image in the process of training and reasoning, and the background false detection rate is low. Tests showed that YOLO's false detection rate for background images was less than half that of Fast RCNN. The source code of YOLO is based on the Darknet framework. The third-party library is less dependent and easily ported to other platforms such as Windows or embedded devices. Based on these advantages, our CNN model is efficiency, low-cost, suitable for population screening in various regions, and easy to promote.
In our research, the application of the deep learning CNN model was improved for lung cancer screening using amount LDCT images with matched pathologically confirmed annotations. We achieved 90.7% AUC on the CNN model. However, the lack of epidemiological and clinical data has hindered the development of CNN model, rendering it incapable of comprehensive consideration. In the basis of the CNN model, we further constructed a hybrid model combined other risk factors.
The occurrence of lung cancer is multifactorial, [20][21][22] and screening for lung cancer requires comprehensive consideration. In particular, epidemiological characteristics of Asians are different from Europeans and Americans. 23,24 In our study, an epidemiological questionnaire were designed for The diagonal blue dotted line represents the consistency between the actual risk and the predicted risk for lung cancer. The amaranth pure solid line reveals the accuracy of prediction of our nomogram, of which a closer fit to the diagonal dotted line indicates that the prediction is more accurate lung cancer risk factors screening according to the Asians epidemiological characteristics, including genetic factors, behavioral factors, environmental factors and so on. Finally, four independent risk factors were identified by univariate and multivariate analysis.
Smoking is one of the main risk factors for lung cancer. There is a direct correlation between the amount of smoking and the risk of lung cancer. 25,26 We use the smoking index and secondhand smoke exposure as the subcriteria for smoking-related risk factors, and smoking was also the most important epidemiological predictor screened in our risk model with an ORs of 4.031, 5.086 and 6.799 in three smoking exposure status respectively. Our study shows that age is an independent factor in the development of lung cancer, with an ORs of 1.039. In Cao's study, 27 lung cancer was deemed a senescent disease in some ways, with an increased risk of DNA damage due to the constant shortening of telomeres during repeated cell replication cycles. On the other hand, air pollution is a significant risk factor for lung cancer. 28 Exposure to industrial exhaust, car exhaust, kitchen smoke, or decorative formaldehyde increases the risk of lung cancer with age. In our research, female has more likely to suffer from lung cancer. The reason for this result may be due to the number preponderance of Asian female adenocarcinoma patients in our data. And it has also been proved in studies whether in the Americas or Asia. 29,30 In our study, we also found that family history of cancer was associated with an increased incidence of lung cancer. Various studies have indicated that patients who have a positive family history of cancer have a significantly higher risk of developing the disease. Matakidou et al. 31 showed that smokers with a family history of cancer had a two-fold increased risk of lung cancer. A positive family history of lung cancer showed a 1.5-fold increase in lung cancer risk among nonsmoking families. 32 We found that the ORs of a family history of other cancers and lung cancer were 8.703 and 11.378, respectively. This suggests that genetic factors play an important role in the development of lung cancer.
In addition to these factors, dietary habits, 33 occupational exposure, 34 pre-existing lung disease 35 and so on have also been reported to be associated with the occurrence of lung cancer, but no significant correlation was found in our multivariate analysis. The explanations for the other differences remain to be expounded and may provide new insights into the cause of lung cancer involved. Based on the above previous findings, we attempted to build a prediction model that incorporated the CNN model score and epidemiological characteristics.
Nomograms are graphical tools that use algorithms or mathematical formulae to estimate the probability of an outcome and optimize the prediction accuracy for each patient. 36 To better use our research in clinical, we further constructed a nomogram that incorporated the four risk factors of CNN model score, age, gender, smoking and family history of cancer, and it showed accuracy and discrimination in predicting the risk of lung cancer, with an AUC of 91.6%. Through this model, clinicians could more precisely assess the risk of lung cancer in the screening population and formulate more precise management measures. For example, consider a male heavy smoker with LDCT screening who is 40 years old, with a positive family history of other cancer and the CNN model score was 80 points. Our nomogram calculations are as follows: age = 40, which corresponds to 11 points; smoking status = high risk, which corresponds to 29 points; family history of cancer = positive family history of other cancers, which corresponds to 27 points; CNN model score = 80%, which corresponds to 80 points; this equals 147 total points, corresponding to a lung cancer probability of 96%. The Youden index of the model was 71.91, and this patient got a positive result. To our knowledge, this is the first nomogram to combine the AI LDCT detection and epidemiological risk factors for lung cancer screening. Our nomogram can conveniently and accurately screening for lung cancer with improved efficiency, low cost, simple procedure, and high scalability.
Nevertheless, there were several limitations in our findings. First, several potential biases were inevitable due to the retrospective design of the study. Second, in the screening of epidemiological characteristics, it was still difficult to fully explain that the model has already contained all the necessary epidemiological characteristics. Some potential risk factors for lung cancer might not be included in our study, such as other environmental pollutants, 37,38 and the level of education 39 could not be examined for confounding effects. Finally, as far as epidemiological studies are concerned, the sample size of this study is relatively small. Therefore, we should further increase the sample size and combine more lung cancer risk factors to improve the performance of the model.
In conclusion, our study showed that epidemiological characteristics must be considered in Asians lung cancer screening, which can significantly improve the efficiency of AI model alone image recognition for lung cancer screening. We combined the CNN model score with the epidemiological characteristics to construct a new Nomogram to facilitate and accurately perform individualized lung cancer screening, especially for Asians.