Comparison of the sensitivity of different criteria to select lung cancer patients for screening in a cohort of German patients

Abstract Introduction Trials of CT‐based screening for lung cancer have shown a mortality advantage for screening in North America and Europe. Before introducing a nationwide lung cancer screening program in Germany, it is important to assess the criteria used in international trials in the German population. Methods We used data from 3623 lung cancer patients from the data warehouse of the German Center for Lung Research (DZL). We compared the sensitivity of the following lung cancer screening criteria overall and stratified by age and histology: the National Lung Screening Trial (NLST), the Danish Lung Cancer Screening Trial (DLCST), the 2013 and 2021 US Preventive Services Task Force (USPSTF), and an adapted version of the Prostate, Lung, Colorectal, and Ovarian no race model (adapted PLCOm2012) with 6‐year risk thresholds of 1.0%/6 year and 1.7%/6 year. Results Overall, the adapted PLCOm2012 model (1%/6 years), selected the highest proportion of lung cancer patients for screening (72.4%), followed by the 2021 USPSTF (70.0%), the adapted PLCOm2012 (1.7%/6 year) (57.4%), the 2013 USPTF (57.0%), DLCST criteria (48.7%), and the NLST (48.5%). The adapted PLCOm2012 risk model (1.0%/6 year) had the highest sensitivity for all histological types except for small‐cell and large‐cell carcinomas (non‐significant), whereas the 2021 USPTF selected a higher proportion of patients. The sensitivity levels were higher in males than in females. Conclusion Using a risk‐based selection score resulted in higher sensitivities compared to criteria using dichotomized age and smoking history. However, gender disparities were apparent in all studied eligibility criteria. In light of increasing lung cancer incidences in women, all selection criteria should be reviewed for ways to close this gender gap, especially when implementing a large‐scale lung cancer screening program.


| INTRODUCTION
Trials of computed tomography (CT) lung cancer screening have shown a mortality advantage for patients in both North America and Europe. 1-3 In contrast to some regions of America, Asia, and Europe have not yet widely implemented CT screening. The German S3 Lung Cancer Guidelines were updated in 2018 and now include a "can" recommendation for lung cancer screening using CT Thorax, 4 meaning that doctors can offer yearly CT screening to patients with a defined risk for lung cancer. Currently, the implementation of a nationwide lung cancer screening program is in planning.
Prior to implementing this screening program, it is important to assess the criteria used in international trials in the German population before using those criteria as the foundation for broad public health measures in this region. Due to regional variability in genetic susceptibility, smoking patterns, and both indoor and outdoor air quality, there may be clinically relevant regional differences in the performance of screening algorithms and risk scores. For this reason, it is important to test screening criteria developed elsewhere in the German population.
Lung cancer screening inclusion criteria can be divided into two categories. First, some eligibility criteria identify high-risk patients according to age and smoking history, which is easily applicable. Second, lung cancer risk models such as the Bach model, the Liverpool Lung Project model (LLP), or the Prostate, Lung, Colorectal, and Ovarian risk prediction model (PLCOm2012), which additionally take into account factors like personal history of cancer, family history of lung cancer, body mass index (BMI), respiratory comorbidities, and other factors. Though more complex, risk prediction models have been shown to be superior to the dichotomized criteria such as the 2013 and 2021 US Preventive Task Force (USPSTF) at identifying high-risk patients for screening programs. [5][6][7][8] Thus far, two studies have analyzed eligibility criteria in the German population. The first study investigated German ever-smokers in the European Prospective Investigation of Cancer and Nutrition (EPIC) study, who were followed up for 5 years regarding cancer development. 9 The authors found that all of the established lung cancer risk prediction models performed better at detecting high-risk patients compared with the simpler eligibility criteria. 10 Of the risk prediction models, the PLCOm2012 performed slightly better than the LLP and Bach models. 10 The second study used data from the German Health Update study (GEDA; "Gesundheit in Deutschland aktuell"), a series of cross-sectional surveys covering health Conclusion: Using a risk-based selection score resulted in higher sensitivities compared to criteria using dichotomized age and smoking history. However, gender disparities were apparent in all studied eligibility criteria. In light of increasing lung cancer incidences in women, all selection criteria should be reviewed for ways to close this gender gap, especially when implementing a large-scale lung cancer screening program.

K E Y W O R D S
health policy, lung cancer screening, NSCLC, thoracic malignancy and disease in the German population. 11 This analysis showed that compared to other criteria the PLCOm2012 had the best concordance between the numbers of lung cancer cases predicted and those reported in registries. 12 The most recent study aiming at comparing the NELSON and the PLCOm2012 selection criteria the HANSE study (Clini calTr ials.gov Identifier: NCT04913155), started enrolling German at-risk patients in June 2021. At the time of this study, no results had been published yet.
Although these two studies provide a good basis concerning the performance of selection criteria in a population of healthy patients, further work is required to understand how these criteria perform in a cohort of already diseased patients. Additionally, we need a better understanding of the characteristics (e.g. histology, gender, and molecular pathology) of patients selected and not selected by each of the criteria.
Therefore, in our analysis, we aimed to compare the sensitivity of different lung cancer screening inclusion criteria to select lung cancer patients for screening in a population of German lung cancer patients. We assessed the screening inclusion criteria used in recent large trials including the National Lung Screening Trial NLST/USPSTF, 1 the Danish Lung Cancer Screening Trial DLCST, 13 and an adapted version of the PLCOm2012. 5,7 Additionally, we aimed to test the sensitivity of the selection criteria across different lung cancer histological types and between males and females, as well as to compare other characteristics between the patients who were selected and not selected for screening.

| Study design, patient cohort, and data collection
In this retrospective analysis, we used data provided by the data warehouse of the German Center for Lung Research (DZL), covering five major German lung cancer centers consisting of several (university) hospitals and other scientific facilities. The DZL data warehouse provides broad coordinated access to patient-related lung research data for scientific purposes. Patients included consent to the pseudonymized, pooled use of their clinical data for research purposes. Within the DZL, interdisciplinary teams representing each area of research define basic clinical parameters for the dataset and encourage all contributing sites to include all consenting patients in a prospective manner. The dataset provided by the DZL data warehouse contained 9481 patients with a diagnosis of lung cancer. Variables in the dataset included date of birth, date of diagnosis, gender, histology, smoking status, pack years, height, weight, BMI, documentation of comorbidities such as chronic obstructive pulmonary disease (COPD), and TNM (tumor, nodes, metastases) stage at diagnosis. The depth of documentation varied between datasets. As age and smoking history are the major factors typically used in screening eligibility criteria, we excluded all patients that did not have any information on age and smoking status/pack years.

| Ethics statement
Approval for this retrospective non-interventional study was obtained from the Ethics Committee of the Ludwig-Maximilians University (reference number . This study was conducted in accordance with the Declaration of Helsinki, Good Clinical Practice guidelines, and local ethical and legal requirements.

| Adaptation of variables and handling of missing data
If available, we used variables for the exact date of diagnosis. For variables that may fluctuate over time (e.g. BMI, weight), we used the values closest to the date of diagnosis. We adjusted pack years to zero in patients that indicated they were never smokers or passive smokers. When pack years were provided as categories in the dataset, we used the midpoints of the category. We categorized histological types according to the WHO Classification of Thoracic Tumors into adenocarcinoma (ACC), squamous-cell carcinoma (SCC), large-cell carcinoma (LCC), small-cell carcinoma (SCLC), neuroendocrine tumors (including carcinoids and large-cell neuroendocrine carcinomas, excluding SCLC) (NET), and other histology. The category other included patients with rare histological types such as adenosquamous carcinoma, sarcomatoid carcinomas, carcinosarcoma, salivary glandtype tumors, and patients with unknown histological type. If a patient had no diagnosis of chronic pulmonary disease (COPD) documented in the dataset, we assumed they did not have a diagnosis of COPD. This might lead to an underestimation of patients selected for screening by the adapted PLCOm2012. We categorized patients' UICC stage using clinical and pathological TNM from the dataset. The edition of the UICC was provided with information on TNM and was used accordingly. When clinical and pathological TNM were both available, we used the pathological information rather than the clinical data.
The NLST, USPSTF, and DLCST selection criteria use quit times of <15 and <10 years for inclusion for screening, respectively. As quit time was not available in the dataset, we disregarded quit time when applying these screening criteria. This might lead to an overestimation of the sensitivity of these selection approaches, as some former smokers most probably had quit times greater than the thresholds set by the criteria. As some variables used in the calculation of the PLCOm2012noRace were not available in the dataset, we used an adapted version provided by the creator of the original PLCOm2012 model Martin Tammemägi. The original PLCOm2012 model uses the number of years smoked, and cigarettes smoked per day to measure the smoking intensity and includes a personal history of cancer and a family history of lung cancer, which were all missing in our dataset. The adapted version of the PLCOm2021 model included age, COPD, BMI, smoking status, and pack years. Using the adapted version of the PLCOm2021 made it possible to calculate the 6-year risk for a larger proportion of the patients in the dataset improving the power of this analysis. As, other than in other variables with missing values, BMI was only missing in a small proportion of patients (12.5%) we used multiple imputations to fill in the missing values.

| Comparison of characteristics
We compared the characteristics of patients selected by different selection criteria as well as between patients selected and not selected for screening. The reason for these comparisons was to determine differences between the criteria other than sensitivity as well as to detect areas for improvement of selection criteria in general.

| Statistical analysis
Patient characteristics are presented as mean values with standard deviation (SD) for metric variables and absolute and relative frequencies for categorical variables. They were compared between included and excluded patients, and between selected and not selected patients using the Student's t-test for metric variables, and Chi 2 -test or fisher-exact test, when cell numbers were <6, for categorical variables. Statistical significance for these comparisons was determined using two-sided p-values with alpha errors <0.05. Multiple imputations of BMI was performed using the R package mice, which uses conditional multiple imputations. Variables used in the imputation process were age, gender, and comorbidities COPD, asthma, cardiovascular disease (CVD), renal insufficiency, and diabetes mellitus. We calculated the sensitivity of the screening criteria as the proportion of patients selected for screening among the patients included in the analyses according to the exclusion criteria. We compared the sensitivity of the different criteria using the McNemar test for the comparison of proportions in dependent samples. To control the type I error rate we reduced the number of tests performed by limiting the comparison of the criteria to comparing the one with the best performance to all other criteria. Additionally, statistical significance was determined using two-sided p-values with Bonferroni-adjusted alpha errors <0.00143 (0.05 divided by 35 tests). The precision of estimates was based on 99.857% confidence intervals (CI).
Data analysis was performed using R Version 4.0.0 and RStudio Version 1.4. Tables and figures were created in RStudio and Microsoft Excel.

| Patient population and demographics
In total, 9481 patients with a thoracic malignancy were identified in the DZL data warehouse. Of these, 3588 had complete information on pack years and age and were included in the analysis. The mean age of the included patients included was 66.5 with an SD of 9.9 years and not significantly different compared to the excluded patients (66.2, SD = 10.0, p-value = 0.19). Of all included patients 58.8% (n = 2106) were male compared to 58.6% (n = 3257, p-value 0.90) of all excluded patients. BMI was available in 87.5% (n = 3141), and mean BMI was 26.1 with an SD of = 4.9. After imputation, mean BMI was 26.1 with the SD of 4.7. In excluded patients, BMI was 26.0 with an SD of 5.7 (p-value = 0.44) and available in 18.9%. Stage at diagnosis was available in 99.2% (n = 3560) of patients and distributed as follows: in situ 0.2% (n = 8), stage I 21.3% (n = 760), stage II 12.9% (n = 460), stage III 31.4% (n = 1117), and stage IV 34.5% (n = 1229). Compared to excluded patients, stage was significantly different (stage I = 35.5%, stage II = 23.8%, stage III = 30.2%, stage IV = 10.4%, p-value <0.0001), however, information was missing in excluded patients in 49% of patients. Histology was known for 99.7% of patients; 53.2% (n = 1904) of patients were diagnosed with adenocarcinoma, 25.5% (n = 913) had SCC, 9.7% (n = 346) SCLC, 5.1% (n = 181) NET, 0.6% LCC (n = 22), and 5.8% (n = 212) had a histology other than the aforementioned. The proportion of patients with adenocarcinoma, the major histologic subtype, was not significantly different in excluded patients (53.2%, p-value 0.95). Smoking status was available in 99.8% of patients; 35.1% (n = 1259) indicated active smoking status, 51.8% (n = 1858) were former smokers, and 13.1% (471) were never smokers. Mean pack years were 47.3 with a SD of 21.3 years for active smokers and 37.7 with a SD of 23.9 years for former smokers. The proportion of patients with fewer than 30, 20, or 15 pack years was 48.3%, 35.0%, and 31.4% in females, which was higher compared to the proportion of males which were 26.2%, 17.1%, and 14.2%. Table 1 shows all patient characteristics, overall and stratified by gender.
Regarding histological types, the adapted PLCOm2012 with a threshold of 1.0%/6 years selected a higher proportion of patients for screening compared to all other selection criteria among all histological types, apart from SCLC and LCC where the USPSTF selected the highest proportion of patients (no statistical significance). Exact sensitivities with CI of all selection criteria among all histological types and stratified by sex, as well as p-values can be found in Table 2.

| Comparison of patients selected by adapted PLCOm2012 and USPSTF
We compared the proportions of patients with specific characteristics (comorbidities, stage, and smoking status) selected by the adapted PLCOm2012 and USPSTF. The proportions of patients with comorbidities that were selected by the adapted PLCOm2012 were significantly higher than those selected by the USPSTF. It selected 6.9% more of the patients with COPD (92.5% vs. 85.6%, p-value <0.0001), 5.3% more of the patients with CVD (76.7 vs. 71.4%, p-value <0.0001), and 9.0% more of the patients with renal insufficiency (79.8% vs. 70.7%, p-value = 0.01).
The adapted PLCOm2012 also performed significantly better in patients with stage I selecting 6.7% (70.8% vs. 64.1%, p-value <0.0001) more patients compared to the USPSTF.
Additionally, the adapted PLCOm2012 selected 5.4% more current smokers (96.1% vs. 90.7%, p-value <0.0001). It was also the only selection criteria to select never smokers for screening. Table 3 displays the proportions of all characteristics, additionally stratified by gender.

| Characteristics of selected and unselected patients
Compared to patients not selected for screening, patients selected by the adapted PLCOm2012 were significantly older (68.7 ± 8.2 vs. 60.6 ± 11.2 years, p-value <0.0001). The prevalence of comorbidities such as COPD (39.9% vs. 8.5%, p-value <0.0001), CVD (68.7% vs. 54.2%, p-value <0.0001), diabetes mellitus (18.2% vs. 13.8%, p-value = 0.002), and renal insufficiency (5.8% vs. 3.8%, p-value = 0.02), was significantly higher than in patients not selected for screening. Other characteristics and results stratified by gender can be found in Table 4.  in age does not signify a clinically relevant difference and is probably due to the large sample size. Similar to the adapted PLCOm2012 the prevalence of the comorbidities COPD (38.2% vs. 14.9%, p-value <0.0001), CVD (66.2% vs. 51.7%, p-value = 0.01), and diabetes mellitus (18.0% vs. 14.7%, p-value = 0.02), was significantly higher in patients selected by the USPSTF. There was no significant difference in renal insufficiency. The results of these comparisons, also stratified by gender, are presented in Table 5.

| DISCUSSION
Given that the introduction of a national lung cancer screening program is planned in Germany, this study aimed to compare the performance of different screening algorithms to select lung cancer patients for screening. Using a cohort of lung cancer patients documented as part of a research consortium, we were able to compare the sensitivity of the selection criteria. We found that out of the screening algorithms compared in this study the adapted  19 which is even lower than the 45.2% we found in this study. An interesting additional finding of Vu et al. was that 48.1% of females not selected for screening by the USPSTF had a family history of cancer, which is a criterion included in the original PLCO risk model. A study across several countries in Europe comparing the sensitivity and specificity of smoking-based screening criteria found similar trends with regard to gender. The sensitivity in men than in women was higher in general, as well as in the German cohort. However, the specificity was higher in women. 20 Therefore, the loss in specificity when expanding selection criteria for females to increase sensitivity might still be reasonable. All selection criteria performed better in histological types associated with a higher attributable risk of smoking, such as SCC and SCLC. This finding is not surprising as smoking history is one of the two main factors incorporated in the selection criteria. However, as adenocarcinomas represent the highest proportion of lung cancers and have a lower attributable risk of smoking, this aspect underlines the need to incorporate factors other than smoking and age into screening algorithms. In addition, the proportion of women with adenocarcinomas is higher, which maybe one of the reasons for the lower sensitivity of selection criteria compared with that of males. Given the rising lung cancer incidence in females and the fact that the incidence in males is plateauing, a more genderspecific approach to screening is expected to benefit screening programs. One approach to this, which is available to model-based approaches, is to add a specific predictor term for gender, which increased risk in women. Two existing models, the Bach and LCRAT/LCDRAT include predictor terms for gender but are counterproductive regarding gender disparity because they reduce the risk for women and lower the probability of selection of women for screening.
Disregarding model-based selection criteria, an important finding of this study was, that the sensitivity of the 2013 USPSTF criteria was higher than that of the NLST for the whole sample and also in all subgroups. The only difference between these criteria is the upper age of 75 versus 80 years, meaning the difference in sensitivities is attributable to upper age differences. This shows how the choice of selection criteria and small differences in eligibility criteria can have an impact on public health practice and possibly cost-effectiveness.
When comparing other characteristics of patients selected by USPFTF and the adapted PLCOm2012, we found that the adapted PLCOm2012 selected a significantly higher proportion of the patients with stage I disease at diagnosis. This is important in terms of life years gained which is a factor that is considered when analyzing the cost-effectiveness of screening programs. The two largest trials of lung cancer screening the NLST 1 and the NELSON 21 trial reported a stage shift when implementing lung cancer screening programs, and numerous follow-up studies and systematic reviews support these findings. 22,23 The adapted PLCOm2012 resulted in the selection of a patient cohort with a significantly higher comorbidity burden compared to patients not selected for screening. For example, in total, 5.2% of patients in the dataset had renal insufficiency. The proportion of those patients selected by the adapted PLCOm2012 was higher (+9.0%) compared to the USPTF and represented almost 80% of all patients in this subgroup. It will be important to consider comorbidities when planning the follow-up of detected lesions and the treatment of detected cancers.
A limitation of our analysis derives from the nature of datasets within the DZL data warehouse. The DZL data warehouse is a multicenter data pool based on data from five academic lung cancer centers in Germany, and, as such, might not perfectly represent the general population of lung cancer patients in Germany. For instance, there may be socioeconomic differences between patients treated at academic centers and the broader population. In addition, within the DZL various contributing departments may provide different types of clinical data, leading to the missing stage and smoking data in some datasets. Due to the strong research focus of thoracic surgery departments within the DZL, early-stage lung cancer may be overrepresented in our cohort. However, this may provide more insight into patients with lower stage who in fact are the focus of lung cancer screening programs. Due to incomplete documentation of some basic clinical parameters in datasets from some contributing researchers, we had to exclude a large number of patients due to missing age at diagnosis or smoking status. When comparing included and excluded patients from the data warehouse we did not find significant differences concerning age, gender, and BMI. Stage at diagnosis was significantly different in the included dataset; however, the cohort of patients included in the analysis was a better reflection of the expected distribution of disease stage compared to the excluded patients. We suspect that patients excluded due to incomplete data may have had missing information on distant metastases, and may have had stage IV disease. Unfortunately, due to the high number of data sets with missing information on smoking history, we had to exclude a large number of patients with stage I and II disease who would have added substantial power to our analysis. We encourage public and academic data repositories to emphasize complete documentation of a basic  clinical dataset including lung cancer risk factors such as smoking.
Within the cohort used for the analysis, some variables used in the calculation of the PLCOm2012 were not available in the dataset or were available but in a different format. Therefore, we used an adapted model of the PLCOm2012. Although this model did include fewer variables the AUC was still very high (0.8375). Overall, using this adapted model might lead to an underestimation of the sensitivity of the PLCOm2012, so conclusions regarding the PLCOm2012 are conservative. Another limitation of the analysis is that due to the nature of the dataset and lack of inclusion of at-risk individuals without lung cancer, we were not able to compare the specificity of the screening criteria. However, with regard to comparisons between USPSTF and PLCOm2012, risk thresholds were selected which have been shown in other studies to yield similar numbers being found to be eligible by both criteria, thus allowing for fair comparisons.
One of the strengths of this study was the large size of the dataset, which allowed subset analyses. Many other analyses comparing screening criteria have been limited to lung cancer screening studies, in which few lung cancer cases were available, and subset analyses by histological subtypes were constrained. Furthermore, our analysis included patients from all stages with differing treatment indications. Additionally, even though, one goal of lung cancer screening is to detect patients in early stages we believe that also including patients with higher stages can help to detect areas for improvement of selection criteria in general to improve their sensitivity also in these stages. Additionally, using this retrospective approach, our results provide the important aspect of real-world evidence.

| CONCLUSION
Using a risk-based selection approach resulted in higher sensitivities compared to criteria using dichotomized categorical age and smoking history in a German population of lung cancer patients. However, gender disparities were apparent in all studied eligibility criteria. In light of increasing lung cancer incidences in women, all selection criteria should be reviewed regarding ways to close this gender gap, especially when implementing a large-scale lung cancer screening program.