A strategy to reduce the false‐positive rate after low‐dose computed tomography in lung cancer screening: A multicenter prospective cohort study

Abstract Background The ability of lung cancer screening to manage pulmonary nodules was limited because of the high false‐positive rate in the current mainstream screening method, low‐dose computed tomography (LDCT). We aimed to reduce overdiagnosis in Chinese population. Methods Lung cancer risk prediction models were constructed using data from a population‐based cohort in China. Independent clinical data from two programs performed in Beijing and Shandong, respectively, were used as the external validation set. Multivariable logistic regression models were used to estimate the probability of lung cancer incidence in the whole population and in smokers and nonsmokers. Results In our cohort, 1,016,740 participants were enrolled between 2013 and 2018. Of 79,581 who received LDCT screening, 5165 participants with suspected pulmonary nodules were allocated into the training set, of which, 149 lung cancer cases were diagnosed. In the validation set, 1815 patients were included, and 800 developed lung cancer. The ages of patients and radiologic factors of nodules (calcification, density, mean diameter, edge, and pleural involvement) were included in our model. The area under the curve (AUC) values of the model were 0.868 (95% CI: 0.839–0.894) in the training set and 0.751 (95% CI: 0.727–0.774) in the validation set. The sensitivity and specificity were 70.5% and 70.9%, respectively, which could reduce the 68.8% false‐positive rate in simulated LDCT screening. There was no substantial difference between smokers' and nonsmokers' prediction models. Conclusion Our models could facilitate the diagnosis of suspected pulmonary nodules, effectively reducing the false‐positive rate of LDCT for lung cancer screening.


| BACKGROUND
Lung cancer is one of the most common and deadliest malignant tumors worldwide and in China. 1,2 In 2020, the number of new cases of lung cancer in China reached 815,563, and 714,699 patients died of lung cancer. 2 The survival rate of stage IV lung cancer is close to 0, whereas that of stage I lung cancer is up to 80%. 3 Therefore, it is essential to identify patients at an early stage of lung cancer to improve their prognosis.
Screening with low-dose computed tomography (LDCT) is recognized as an effective way to reduce the lung cancer disease burden and has been demonstrated to decrease lung cancer mortality rates by 20%-24% 4,5 ; our team also assessed the effect of lung cancer screening in Chinese population, which demonstrated that one-off LDCT scan access would reduce 31% of the mortality of lung cancer in the target population. 6 However, the false-positive rate (FPR) of LDCT in lung cancer screening could be as high as 96.4% 4 ; patients with benign nodules who undergo unnecessary diagnostic procedures face radiation exposure, medical expenses, and physical or mental burdens. [7][8][9] Therefore, decreasing the FPR of LDCT scans is crucial.
The early manifestations of lung cancer are nodules, and the imaging characteristics of pulmonary nodules can often directly reflect the lung lesions. Using patient information, such as epidemiological factors, and imaging characteristics of pulmonary nodules, prediction models can predict and evaluate the risk of lung cancer in patients with positive LDCT screening. International guidelines recommend the use of prediction models to reduce the FPR of screening. 10,11 However, our previous systematic review suggested that no representative model has been constructed using data from large-sample, multisource, prospective cohorts in China. 12 At present, the models for the Chinese population are mostly based on small sample, single-center retrospective research, and extrapolation is limited [13][14][15][16][17] ; existing and widely used models are generally based on European and American populations, [18][19][20][21][22][23] and may not be appropriate for the screening of Asian and Chinese populations, which have unique demographic characteristics. 12 To fill this gap, our study established a progressive risk prediction model based on the imaging characteristics of pulmonary nodules. The model was constructed using data from the National Lung Cancer Screening (NLCS), a large cohort There was no substantial difference between smokers' and nonsmokers' prediction models.

Conclusion:
Our models could facilitate the diagnosis of suspected pulmonary nodules, effectively reducing the false-positive rate of LDCT for lung cancer screening.

K E Y W O R D S
early diagnosis and early treatment, lung cancer, prediction model, pulmonary nodules, screening multicenter prospective cancer screening program. An independent external validation set was created using data from the HMCC (Hospital-based Medical Checkup Cohort), a cancer screening program composed of two hospital-based data sources.
In addition, most existing models are based on the whole population or only heavy smokers, and few are based on nonsmoking populations. 12 Lung cancer in eversmokers and nonsmokers shows genetically and statistically different characteristics, 24,25 but previous separate models based on these two subgroups were insufficient 12 ; thus, relevant discussion and evidence were lacking. Therefore, further in-depth models based on distinct population characteristics were created separately for eversmokers and nonsmokers.

| Study population
The training set built on data from the NLCS, a cancer screening program initiated in October 2012 aimed at preventing cancer in urban areas. Our study used the data of NLCS enrolled from 2013 to 2018. Data from January 2013 to June 2021 were collected from patients in 12 cities across eight provinces (Beijing, Liaoning, Henan, Hunan, Zhejiang, Anhui, Jiangsu, and Guangxi) who (1) had cancer registration data; (2) had complete vital statistics data (age, sex, etc.); and (3) had a low migration rate and were relatively stable. Eligible participants who provided written informed consent completed a baseline questionnaire that collected information on their exposure to risk factors and evaluated their lung cancer risk using the NLCS scoring system, which introduced in our former research. 6 Participants assessed as high-risk were invited to undergo LDCT screening.
Participants were excluded if they met one or more of the following criteria, which were confirmed by the registered residence system of the local community: (1) lung cancer symptoms or cancer diagnosis prior to cohort entry; (2) age outside the age range of 40-74 years; (3) death prior to the cohort entry; or (4) invalid data. For the high-risk group, patients with invalid data referred to those who did not meet the high-risk criteria but were classified as high-risk individuals. For the low-risk group, patients with invalid data referred to those who were (i) high-risk individuals misclassified as low-risk or (ii) labeled as at low-risk for lung cancer but undertook the free LDCT scan. More details were published in our former research. 6 Data from individuals with positive nodules detected by LDCT scans were included and used for model construction. According to the NLCS protocol, positive nodules were considered (1) solid or part-solid nodules larger than 5 mm in diameter; (2) nonsolid nodules larger than 8 mm in diameter; or (3) nodules suspected to be positive on the basis of imaging findings. Only the most representative nodule from each individual with multiple positive nodules was selected for analysis. The most representative nodules were chosen using the following criteria: (1) the nodules were noncalcified; (2) if the degree of calcification of two nodules was the same, the one with a larger mean diameter was chosen.
The validation set was created using data from the HMCC, a hospital-based physical examination program. Patients who underwent LDCT screening in the Cancer Hospital, Chinese Academy of Medical Sciences, Beijing, China, from January 2017 to December 2017 and the Second People's Hospital of Liaocheng, Shandong, China, from February 2007 to January 2022 were identified. Participants of any age who had at least one suspicious pulmonary nodule were included in the study.
Our study was approved by the ethics committees of the China National Cancer Center/Cancer Hospital, the Chinese Academy of Medical Sciences, and Peking Union Medical College. All participants provided written informed consent. Ethical approval was obtained for all data collection.

| Outcome ascertainment
The outcome of this study was the incidence of lung cancer within half a year after LDCT screening. The International Classification of Diseases (10th revision) codes were used for data management, and lung cancer was encoded as C34.
For the training and validation sets, the following information was recorded by the tumor registration system: whether lung cancer was diagnosed, the time of diagnosis, the pathological type, and the clinical stage. All lung cancer cases were based on pathological results and clinical manifestations and issued by professional clinicians, and cross-referenced between the cancer registry system, local medical insurance databases, and hospital information systems and reviewed by professional clinicians from the Cancer Hospital, Chinese Academy of Medical Sciences, and provincial hospitals. To guarantee the consistency of outcomes, about 1% of all original images were reviewed by professional clinicians. When discrepancies in diagnoses were observed, the records were manually reviewed by at least one thoracic surgeon, one radiologist, and one pathologist from the clinical expert committee. 6

| Data collection
Demographic factors and clinical characteristics were collected using paper-based and computer-based forms (epidemiological questionnaire, LDCT report, followup information, pathology report, etc.) in our screening program. Each participant had an identification code for management and traceability. The images of nodules were observed by multirow (64 rows) spiral CT (at least 16 rows); the long diameter and short diameter were measured at the largest section of the nodule with an electronic measuring ruler (self-contained in the workstation or Picture Archiving and Communication System [PACS]). The report was issued by a senior radiologist with at least 3 years of experience. All the data were saved and analyzed in the National Cancer Prevention and Control Network (NCPCN) at the National Cancer Center (NCC) of China.
The following information was recorded: age, gender, body mass index (BMI), education level (low, defined as primary school or below; medium, defined as intermediate, i.e., junior school to high school; and high, defined as college or above), occupational exposure (harmful working conditions, such as asbestos and dust exposure; yes or no), smoking pack-years (a pack-year was defined as 20 cigarettes smoked every day for 1 year), emphysema (yes or no), history of chronic respiratory diseases (yes or no), and family history of lung cancer (yes or no). In addition, the following nodule imaging information was collected: maximum diameter (the longest diameter of the largest section of the nodule) and minimum diameter (the longest diameter perpendicular to the maximum diameter), mean diameter (mean of the maximum and minimum diameters), density (solid, part-solid, or nonsolid), edge (smooth or spiculated), location in the upper lobe of the lung (yes or no), shape (elliptical or round; a maximum diameter to minimum diameter ratio of ≥1.8 was defined as elliptical, and a ratio of <1.8 was defined as round), pleural involvement (yes or no), and calcification (yes or no). The validation set only collected variables that were eventually incorporated into the models.

| Statistical analysis
The statistical analysis was performed with R software, version 4.0.3. Student's t-test was used to compare the quantitative variables, and analysis of variance (ANOVA) or chi-squared (χ 2 ) analyses were used to assess differences between participants who developed incident cancer and those determined to be cancer-free. Standard mean differences (SMDs) were also used to evaluate differences between groups. Multivariable logistic regression models were developed, and the stepwise regression method was used to select the model with the highest fitting degree according to the value of the Akaike information criterion (AIC). Nonlinear trends of effects were estimated using restricted cubic splines. The beta coefficients, odds ratios (ORs), and 95% confidence intervals (95% CIs) of these were used to discover the associations between covariates and lung cancer risk from the fitted models. The area under the curve (AUC) was used to assess the performance of the models. Calibration performance was measured by plotting the predicted malignancy against the actual outcomes by deciles. Bootstrap resampling conducted 1000 times was used for internal validation. Existing models based on similar factors were compared with our model for the whole population in the validation set. A simulated validation dataset was constructed by bootstrap resampling (1000 times) in the original validation set. The resampled data yielded an incidence rate consistent with that of the training set and were used to simulate the discrimination and calibration in real screening programs.

| Baseline characteristics of the study population
In the NLCS cohort, 1,016,740 participants received risk assessment, of whom 223,302 participants were assessed as high-risk; 79,581 high-risk participants underwent LDCT screening, of whom 5165 with at least one suspicious pulmonary nodule were allocated to the training set. The validation set comprised 1815 patients from the Cancer Hospital, Chinese Academy of Medical Sciences, Beijing, and the Second People's Hospital in Liaocheng, Shandong. Figure 1 provides the details of the study profile.
In the training set, 149 (2.9%) participants were diagnosed with lung cancer within half a year, and the FPR was 97.1%. At baseline, the mean age of the training set participants was 58.26 ± 7.66 years. In the validation set, 800 (44.1%) participants were diagnosed with lung cancer within half a year, and the FPR was 55.9%; on the date of LDCT screening, the mean age of the participants was 52.21 ± 11.09 years. The older, underweight participants' nodules had a higher ratio of malignancies. Noncalcified nodules with large diameters, high ratios of nonsolid composition, spiculate edges, locations in the upper lobes, and pleural involvement were also associated with a higher risk of lung cancer in the training set (p < 0.05, Table 1).
The baseline characteristics of ever-smokers and never-smokers are displayed in Tables S1 and S2. A total of 3385 (65.5%) ever-smokers and 1780 (34.5%) neversmokers were used for the model construction in the subgroups. The incidence of lung cancer is 2.9% for both ever-smokers (97/3385) and nonsmokers (52/1780), with a FPR of 97.1%. Data from 521 (28.9%) ever-smokers and 1282 (71.1%) never-smokers were used for validation.

| Model development and evaluation
Six covariates (the age of the participants and the mean diameter, calcification, pleural involvement, edge, and density of the nodules) and their interaction effects associated with lung cancer were entered into the prediction model through multivariate logistic regression. The Significant interaction effects were observed between age and density (part-solid × age vs. solid × age: OR = 0.902, 95% CI: 0.821-0.990; nonsolid × age vs. solid × age: OR = 0.915, 95% CI: 0.820-1.020), age and mean diameter (OR = 1.003, 95% CI: 1.000-1.006), and edge and mean diameter (OR = 1.081, 95% CI: 1.043-1.121); these were entered into the model (Figure 2). The correlation between F I G U R E 1 Study profile. * Invalid data included those: (i) who did not meet the high-risk criteria but were classified as high-risk individuals; (ii) who were actually high-risks but were misclassified as low-risks; (iii) who were low-risks but undertook the free LDCT scan.   age and incidence was nonlinear because of the result of restricted cubic spline; in patients over 60 years old, lung cancer risk did not change much with increasing age ( Figure S1). Thus, in our models, the risk of participants over 60 years old was calculated as a constant.

T A B L E 1 Baseline characters of the participants.
The model in whole population showed good discriminative capacity in internal validation (AUC = 0.868, 95% CI: 0.839-0.894, Figure 3A) and calibration ( Figure 4A) by bootstrap resampling 1000 times. The sensitivity and specificity were 85.2% and 75.2%, respectively, with a 3.9% risk threshold, indicating that our model could reduce the 72.9% FPR in the screening cohort. Online calculator was provided at http://cance rrc.ncsis.org.cn/#/lungC ancer for the calculation of lung cancer probability.
Compared with existing models based on similar factors (Zhang et al. 17 ; Gould et al. 20 ; PanCan 1A, McWilliams et al. 21 ; and Tammemagi et al. 22 ), our model obtained a higher AUC in the Chinese population ( Figure 5); the consideration of the nonlinear trend of age and the interaction effects also improved the discrimination ( Figure S2).
In the validation set, our model in whole population also showed excellent discrimination (AUC = 0.751, 95% CI: 0.727-0.774, Figure 3B). When the threshold was 3.1%, the sensitivity and specificity reached 70.5% and 71.0%, respectively. Compared with other existing models, our model fit better with actual lung cancer incidence, demonstrated by lower absolute errors (observed minus predicted probabilities) in the resampled validation set (Table S3). In the simulated validation set, the AUC was 0.752 (95% CI: 0.709-0.788) with 70.5% sensitivity and 70.9% specificity at a 3.1% risk threshold, which may reduce the 68.8% FPR in LDCT screening. The predicted and observed outcomes matched well in most cases in the validation set ( Figure 4B).
The models based on smokers (AUC = 0.732, 95% CI: 0.686-0.778) and nonsmokers (AUC = 0.740, 95% CI: 0.712-0.767) also demonstrated excellent discrimination in the validation set ( Figure S3). Good calibration was still observed in most cases in the resampled validation sets ( Figure S4). The details of models are displayed in Figure S5. The included variables of the two models, however, did not differ significantly.

| DISCUSSION
Our study constructed models based on imaging and epidemiological factors that aimed to predict the risk of lung cancer. High-quality evidence obtained from a prospective, multicenter cohort study in China was used for model construction. We applied a selection strategy that focused on the most suspicious cases and optimized the existing judgment criteria for positive screening results. Validation  was performed in an independent dataset, which comprised data collected from different cities. The results of internal and external validation by bootstrap resampling showed that our models can reduce the FPR and overdiagnosis in lung cancer screening programs. Compared with other models based on similar factors, our model in whole population performed better in the validation set. With the support of independent external validation, our study has pioneering significance for Chinese lung cancer screening.
Lung cancer in China and Asia is unique in terms of epidemiological characteristics; for example, 30% of lung cancer cases in China are not attributable to smoking, a proportion much higher than that in the American population (20%). 22,26 Existing models, which are generally based on American and European populations, may be inefficient when applied to Asian populations because of the differences in population characteristics; the validation of these models in our dataset supports this. Models constructed on Chinese and Asian populations may have bias originating from small sample sizes, retrospective study designs, and single data sources. 12 Models based on deep learning algorithms have shown good discrimination, [27][28][29] but the demand for large numbers of covariates may limit their clinical application. Our research provides a viable solution to these problems.
Our study considered widely recognized lung cancer risk factors, 12 including traditional epidemiological factors and imaging features of nodules. Therefore, our models reflect the epidemiological and imaging risk factors in the Chinese population to the greatest extent. Our ultimate models were mainly constructed on imaging factors, and most of the macrolevel epidemiological factors were not statistically significant. Previous research revealed that the personal statements of patients may have great bias 30 ; however, our models use only age as an effective epidemiological predictor, which means that most of the possible random measurement bias from self-reporting is avoided, and a higher application value in actual screening can be expected. Our models primarily depend on imaging variables, which makes them easier to apply in practical LDCT screening protocols.
In addition, in former Chinese researches, interaction effects were rarely mentioned. 12 To our knowledge, our research is the first to consider interaction effects in the Chinese population, and it achieved superior performance compared with previous models. Our findings reveal that the mechanisms of action between different factors are not independent and should not be ignored.
However, smoking was not considered a significant predictor, which was contrary to common sense, but could be explained by the unique population characteristics in Asia and China. In the researches of Bin Zheng et al., 13 Sungmin Zo et al., 31 and Xiaobo Chen et al., 32 smoking status was not an independent risk factor of lung cancer in prediction models for Asian populations. Furthermore, the differences between models for ever-smokers and nonsmokers were not obvious; similar risk factors and effects were observed in the two models. Therefore, we believe that the degree of malignancy depends more on the imaging characteristics of the nodules than on smoking status in the Asian population.
Our previous research has confirmed that simultaneous screening for smokers and nonsmokers based on the same risk of lung cancer is currently the most effective screening strategy in the Chinese population. 24 The study, however, did not take into account the difference in FPR and ensuing management challenges in positive screening results. The effectiveness of screening will be relatively low if suspicious nodules in nonsmokers have a higher F I G U R E 2 Effectiveness of covariates in the final model in whole population. *Age in our models was performed with the following calculation: (Age-60). 61 years old and older participants were defined as 60 years old due to the nonlinear association of age and lung cancer risk.
FPR or are more challenging to accurately assess their risk of lung cancer. This means that this population will undergo more unnecessary diagnoses and treatments. This study indicates that the FPR is equal between smokers and nonsmokers, and the models' discrimination and calibration are also comparable. This suggests that the cost of unnecessary diagnosis and treatment is similar for both groups. The results of this study fill the evidence gap and provide strong evidence to support lung cancer screening in nonsmokers in China.
Our study has several limitations. Our research used data from one-off LDCT screening 6 ; therefore, variables that changed over time, some of which may associated with lung cancer risk, 33 could not be assessed. Biomarkers and environmental factors were not considered for practical and economic reasons. In addition, our models were constructed for large-scale lung cancer screening; however, our validation set used retrospective clinical data. Although the validation set was also derived from individuals with suspicious nodules detected by LDCT, it was not established based on random sampling or a strict definition of high-risk groups. Compared to real-world screening, it was difficult to avoid the bias brought by retrospective study design. As a result, the data exhibits differences in population characteristics, which are reflected in differences in risk factors and a higher risk of lung cancer in this study. Due to differences in population characteristics, it may lead to lower discrimination and calibration of models during external validation. To evaluate the models' calibration for actual screening, simulated datasets with incidence rates consistent with those of the training set must be constructed by bootstrap resampling; however, more examinations in future screening programs are required.