Development of machine learning model to predict pulmonary function with low‐dose CT‐derived parameter response mapping in a community‐based chest screening cohort

Abstract Purpose To construct and evaluate the performance of a machine learning‐based low dose computed tomography (LDCT)‐derived parametric response mapping (PRM) model for predicting pulmonary function test (PFT) results. Materials and methods A total of 615 subjects from a community‐based screening population (40–74 years old) with PFT parameters, including the ratio of the first second forced expiratory volume to forced vital capacity (FEV1/FVC), the percentage of forced expiratory volume in the one second predicted (FEV1%), and registered inspiration‐to‐expiration chest CT scanning were enrolled retrospectively. Subjects were classified into a normal, high risk, and COPD group based on PFT. Data of 72 PRM‐derived quantitative parameters were collected, including volume and volume percentage of emphysema, functional‐small airways disease, and normal lung tissue. A machine‐learning with random forest regression model and a multilayer perceptron (MLP) model were constructed and tested on PFT prediction, which was followed by evaluation of classification performance based on the PFT predictions. Results The machine‐learning model based on PRM parameters showed better performance for predicting PFT than MLP, with a coefficient of determination (R2) of 0.749 and 0.792 for FEV1/FVC and FEV1%, respectively. The Mean Squared Errors (MSE) for FEV1/FVC and FEV1% are 0.0030 and 0.0097 for the random forest model, respectively. The Root Mean Squared Errors (RMSE) for FEV1/FVC and FEV1% are 0.055 and 0.098, respectively. The sensitivity, specificity, and accuracy for differentiating between the normal group and high‐risk group were 34/40 (85%), 65/72 (90%), and 99/112 (88%), respectively. For differentiating between the non‐COPD group and COPD group, the sensitivity, specificity, and accuracy were 8/9 (89%), 112/112 (100%), 120/121 (99%), respectively. Conclusions The machine learning‐based random forest model predicts PFT results in a community screening population based on PRM, and it identifies high risk COPD from normal populations with high sensitivity and reliably predicts of high‐risk COPD.


INTRODUCTION
In the current aging society, chronic non-communicable diseases have become a major burden on healthcare.Low dose chest CT screening has been widely promoted in China.Chest CT images not only provides information on pulmonary nodules, but also further evaluates Emphysema, coronary artery calcification, etc. in community population.Occasional lesions in LDCT lung cancer screening are not uncommon, and most of these abnormal findings do not have significant clinical significance and do not require further examination and treatment. 1However, there are still some abnormal manifestations indicating that the subjects have potential health hazards, especially some noncommunicable chronic disease (NCD) that are highly prevalent in the aging society, which may be the main reason for the decrease in all cause mortality (ACM) beyond lung cancer.LDCT is superior to other organs in displaying lung diseases and can be used to observe common lung lesions such as chronic obstructive pulmonary disease (COPD).Pulmonary function tests (PFTs) are the detection methods used to diagnose chronic obstructive pulmonary disease (COPD).The diagnostic information provided by PFTs is limited and cannot accurately screen high-risk COPD patients.In fact, many articles published in the American Journal of Respiratory and Critical Care, Lancet, and Nature Outlook [2][3][4] have appealed to physicians to pay more attention to the role of imaging in the early diagnosis of COPD.
Related studies have found that small airway remodeling or vascular remodeling occurs before the pulmonary parenchyma destruction. 5,6A certain number of asymptomatic patients in the chest disease screening population will have small airway diseases.Therefore, early diagnosis of small airway abnormalities is very important, particularly since functional small airway disease (fSAD) is reversible.At this stage, however, ability of the lung function tests to detect any early abnormality is limited.Air trapping is the index used most commonly to evaluate small airway disease.However, it is difficult to differentiate the cause of air trapping from emphysema or non-emphysema fSAD.fSAD is a reversible transitional stage between normal lung tissue and emphysema, which occurs earlier than emphysema.Parametric response mapping (PRM), a recently developed CT quantitative parameter, is based on changes in voxel density between the paired inspiratory and expiratory CT images. 7][10][11] Moreover, PRM has demonstrated good sensitivity in the evaluation of disease progression. 12rtificial intelligence (AI) has accelerated the progress of COPD research, including in emphysema detection and subtype classification, early screening, and diagnosis. 13,14PRM has been shown to correlate positively with PFT parameters. 10In recent years, many studies have used PRM to construct predictive COPD/non COPD models, 15,9 which have achieved good results.However, studies about whether PFT parameters can be predicted based on the PRM from dual phase LDCT have not been retrieved to improve our limited knowledge in the community population.An AI algorithm that predicts PFT parameters based on PRM from LDCT scanning would greatly increase the value of one-stop CT scanning to extract more vital information about the pulmonary function status.

Related work
Related work can be divided into two categories according to the study purposes and approaches: one focuses on finding the correlation between PRM parameters and PFT parameters from a clinical perspective using traditional statistical tools, [16][17][18][19] such as the multivariate linear regression model; the other focuses on computeraided diagnosis (CAD) of COPD, including patient classification and scoring, based on AI techniques.In terms of the clinical studies on the PRM-PFT correlation, Bhatt et al. 10 constructed multivariate linear regression models and found an association between the PRM parameters and FEV1 annual decline, claiming that the association was of greater importance for patients with mild COPD.Pompe et al. 20 also used the multivariate linear regression model, finding that PRM fSAD was associated with total lung capacity (TLC), alveolar volume (VA) and residual volume (RV).The regression model in that study had an R 2 of 0.69 for PRM fSAD prediction.In a similar study, Capaldi et al. 21performed multivariate linear regression using a reversed direction to regress PRM parameters with PFT parameters, finding that PRM gas trapping was predicted by FEV1/FVC, and that PRM emphysema was predicted by carbon monoxide diffusion capacity and ventilation defect percentage (VDP).
In terms of the studies on AI-based CAD for COPD, researchers used AI models for emphysema detection, differential diagnosis, and COPD assessments.Ho et al. 22  to distinguish COPD patients from non-COPD subjects based on 2D or 3D PRM image input.The accuracy and sensitivity of their classification model was 89.3% and 88.3%, respectively.Besides the attempts to use AI models to distinguish COPD and non-COPD cases, Humphries et al. 23 constructed a CT image-based deep-learning model for emphysema scoring, which corresponded well with visual scoring (κ = 0.60).Besides, several studies attempted to construct regression models to predict PFT parameters directly from CT images.Li et al. 24 constructed a 3D deep learning model to predict PFT parameters, with an R 2 of 0.57 for FEV1 prediction and 0.66 for FEV1% prediction.Schabdach et al. 25 reported a non-parametric FEV1 regression method with an R 2 of 0.55.Singla et al. 26 proposed a novel method and reported the best regression performance to date, with an R 2 of 0.68 for FEV1 and 0.71 for FEV1/FVC.
Inspired by these studies, we aimed to construct PRM-based artificial intelligence algorithms, including a random forest model and a multi-layer perceptron model, to predict key PFT parameters and to evaluate their performance in a community-based LDCT screening population undergoing one-stop screening for Big Three chest diseases (lung cancer, COPD, and cardiovascular disease).

Patient population
From August 2018 to October 2018, a total of 861 consecutive community-based participants were screened for the Big Three chest diseases in our hospital and the PRM data of 615 participants were collected retrospectively.The patient selection process is shown in Figure 1 27 Based on the GOLD criteria, the severity of COPD was classified into GOLD I (FEV1/ FVC < 0.7 and FEV1% predicted≧80%), GOLD II (FEV1/ FVC < 0.7 and 50%≤FEV1% predicted value < 80%), GOLD III (FEV1/FVC < 0.7 and 30% ≤ FEV1% predicted value < 50%), and GOLD IV (FEV1/FVC < 0.7 and FEV1% predicted value <30%).All participants filled out the questionnaire before PFTs, then underwent PFTs and chest CT scanning in the same day.

Pulmonary function tests
PFTs were performed for all patients using the Multifunction Spirometer (HI-801, CHESTGRAPH, CHEST.MI.Omnia Inc., Japan).PFTs have 15 separate parameters, including FVC, FEV1/FVC, FEV1% and other parameters.FEV1/FVC and FEV1% were the key parameters for analysis in the present study.

CT scanning
All patients underwent breath-hold training before scanning, taking a supine position with arms above the head.Non-contrast-enhanced volumetric chest CT scanning was performed at the end of inspiration and expiration using a 256-slice CT scanner (Brilliance-iCT, Philips Healthcare, Cambridge, MA, USA) from the thoracic inlet to diaphragm, respectively.The following CT scanning parameters were used: collimation 128 × 0.625 mm, tube energy 120 kV, Z-axial and 3D automatic tube current modulation, Dose right on and reduced dose level 3 (inspiratory/expiratory scanning), pitch 0.70, slice thickness 1 mm, slice increment 1 mm, FOV 350 mm × 350 mm, matrix 512 × 512, high and standard resolution algorithms.

PRM analysis
The raw Dicom data of CT images were transferred to the workstation (A-VIEW, Suhai Information Technology Ltd., Suzhou, China) for PRM analysis.First, a 20-year experienced thoracic radiologist checked and redefined the lobe segmentation slice-by-slice during the PRM analysis, who was blinded to participants' clinical information and PFT results.Then, the expiratory CT images were registered to the inspiratory CT images at the pixel level.As described previously, 11

AI model construction and performance evaluation
Two different types of AI regression models were constructed and trained for the PFT key parameter regression tasks: (1) Random Forest, a machine learning regression model; and (2) Multilayer Perceptron (MLP), which is also known as Artificial Neural Network (ANN).A total of 76 features from each case, including 72 PRM parameters and four clinical features (age, sex, height, and weight), were used as input for our regression models, and the FEV1/FVC or FEV1% were used as the ground truth for regression model training.For each type of AI model, one regression model was established for the FEV1/FVC prediction task, and another regression model was established for the FEV1% prediction task.The dataset was divided into training and validation dataset with the ratio of 4:1 randomly (494 cases for training and 121 cases for validation).The specific method of constructing the models are shown in Supplementary material.Meanwhile, the coefficient of determination (R 2 ) was also calculated as a parameter indicating the proportion of variance in the dependent variable that was predictable from the independent variables.between the prediction and ground truth, which means better regression performance.The Spearman correlation was also calculated between the predicted value of the model and the measured PFT value.The parameters FEV1/FVC and FEV1% predicted by the best AI model was further used for the classification tasks.
In the evaluation process of the present study, classification performance on the validation dataset was examined using confusion matrices.Five metrics were calculated, including sensitivity, specificity, positive predictive value (PPV), negative predictive value (NPV), and accuracy.

Statistical analysis
The datasets with normal distribution are expressed as mean ± standard deviation, and the datasets that do not follow normal distribution are presented as median and interquartile range (IQR).Rank sum test or chi square test (SPSS 26.0) was used for age, sex, height, weight and PFT parameters.Other statistical analysis was performed using the R language platform (Version 4.0.0,R Foundation for Statistical Computing, Vienna, Austria).Statistical comparisons between groups were performed using the analysis of variance (ANOVA) test.
ANOVA was used for the datasets with normal distribution and equal variance; non-parametric Kruskal-Wallis Test was used for non-normally distributed variables.Tukey HSD Test or Nemenyi Test was performed to compare any two groups for datasets with normal distribution and non-normal distribution, respectively.

Demographic data, PFT, and PRM parameters of the three groups
Among the 615 participants included in this study, 367 were normal subjects (151 males, 216 females), 194 were high-risk subjects (102 males, 92 females), and 54 subjects had COPD (36 males and 18 females).No significant differences were found in age (p = 0.129 > 0.001) between the three groups.However, significant differences were shown in sex, FEV1/FVC and FEV1% between the three groups (p < 0.001) (Table 1).
At the whole lung level, all PRM parameters were different between normal, high-risk, and COPD groups (p < 0.001) (Table 2).The mean values of PRM VEmph , PRM VEmph %, PRM VfSAD , and PRM VfSAD % in the normal group, high risk group and COPD group increased in turn (Figures 2 and 3).Correlations between single PRM and single PFT parameters were very weak (Table S1), as shown in the Supplementary materials.

Performance of regression models
For the FEV1/FVC regression task on the test set, the coefficient of determination (R 2 ) of the random forest model was 0.749, the MAE was 0.038, the MSE was 0.0030, and the RMSE was 0.055.In contrast, the R  prediction results of the FEV1% regression model, sensitivity was 34/40 (85%), specificity was 65/72 (90%), the PPV was 34/41 (83%), the NPV was 65/71 (92%), and accuracy was 99/112 (88%).In total, 13 participants were in the validation dataset with inconsistent results in classification by PFT and model prediction between normal and high-risk participants.Among these, significant differences were shown from the ground truth in six participants, as shown in Table 3 and group).Three cases with normal PFT were predicted as the high-risk group, which showed greater PRM fSAD % (Figure 5).Accuracy was only 44% (4/9) for the GOLD stratification of the COPD patients based on the prediction result of the FEV1% regression model.For all test set data, Spearman correlation was calculated between the predicted value and the measured PFT value.For FEV1/FVC, the Spearman correlation ρ was 0.813 (p < 0.001).For FEV1%, the Spearman correlation ρ was 0.846 (p < 0.001).

DISCUSSION
In the present study, AI models, including random forest models and MLP models, were established on the basis of LDCT-derived PRM parameters to differentiate the normal population from the high-risk population, and to differentiate the COPD population from the non-COPD population.These machine-learning-based models enhance the clinical value of one-stop chest CT scanning by predicting PFT results according to PRM parameters.Functional small airway disease may eventually develop into chronic diseases such as COPD or asthma. 28,29PRM is a good predictor for the fSAD.As stated in our literature review, some studies [16][17][18][19] have focused on the correlation between PRM parameters and PFT parameters using traditional statistical tools.In our study, the correlation between PRM parameters and PFT parameters was shown to be relatively weak, which was similar to the weak PRM-PFT correlation in the non-COPD group reported by Capaldi et al. 21Considering that non-COPD patients outnumbered COPD patients (n = 561 vs. n = 54) in our experiment, the observation was in accordance with the literature.We found that the correlation between PFT and PRM parameters was weak in screening subjects, so random forest and MLP regression models were constructed to regress key PFT parameters.The results showed that the random forest regression model has a much higher performance than the MLP model, which might be due to the lower demand Table 3 shows the inconsistent classification by PFT and model prediction results between normal and high-risk COPD groups.High-risk COPD predicted by our random forest model showed higher PRM VfSAD % and PRM VEmph %, suggesting that the model captured functional information such as functional small airway sensitivity, while PRM could not be detected by PFTs.The model was also reliable in distinguishing normal from high-risk COPD groups.Previous study 30 also demonstrated that PRM Emph % cannot capture the information of GOLD II ∼ IV, but it can distinguish between normal and mild COPD.In our study, PRM parameters were used to predict PFT, which further revealed the clinical potential of PRM for the early management of COPD.
AI has been used to detect emphysema, for differential diagnosis, and to assess the severity of COPD.In our study, the COPD/non-COPD classification accuracy (99%) and sensitivity (89%) are higher than that published by Ho et al., 22 which are 89.3% and 88.3%, respectively.The better classification performance of the present study can be explained by the explicit integration of the PFT parameter-based COPD and high-risk definitions in the classification process, while, in contrast, the deep-learning based classification model implicitly learned it in the training process, which has a high demand on the quality and quantity of image data.We used a simple but practical machine learning method to exploit the clinical value of PRM in community screening of high-risk COPD patients.Our results indicated a promising potential of this method in clinical practice.
Compared with the studies attempted to construct regression models to predict PFT parameters directly from CT images, the R 2 values in our study were 0.749 and 0.792 for the FEV1/FVC and FEV1% random forest regression models, respectively, which outperformed previous studies that used deep-learning approaches.Most deep-learning approaches used CT images as input, which contained very high dimensional information about the spatial heterogeneity of the lung.In contrast, our feature-based AI approaches, including random forest models and MLP models, use information extracted from pre-processed CT images.This may represent an advantage for solving complex issues such as classification of COPD and non-COPD cases.However, at the same time, the redundant information may also interfere model performance when dealing with other situations such as PFT parameter regression, which was a probable cause for the relatively better result in the present study.
The main limitations of the present study include that it was a single-center retrospective study without external validation,which limits the extent to which results can be generalized to other populations and cannot rule out selection bias.Multi-center prospective research should be performed in the future to validate generalization of results.High-risk COPD was based on one of the published results.As already well known, the criteria of high-risk COPD are controversial.Therefore, the performance of the machine-learning model to differentiate normal from high risk may be affected due to the selection of high-risk criteria.Only PFT results were included for classification, other common clinical history, such as smoking history, was not considered in this study, which may affect the classification of groups to some degree.
This study compared machine learning models and artificial neural network algorithm,but did not compare them with deep learning algorithms that use images as inputs.In future work, we need to add other models to further validate the effectiveness of PRM derived from low-dose chest CT in predicting lung function.All the above factors must be considered as we draw a more accurate and powerful model to predict PFT with PRM from chest CT.
In conclusion, machine-learning-based regression models using LDCT-derived PRM parameters demonstrate good performance in predicting reliable PFT results and classifying normal/high risk patients as well as COPD/non-COPD patients.More functional information can be captured by this model than by pulmonary function tests, and the prediction results are complementary to PFT under the current evaluation criteria of pulmonary function.Results of the model play a warning role in evaluating the screening population for COPD, which greatly improves the cost-effectiveness of LDCT.

AU T H O R C O N T R I B U T I O N S
Xiuxiu Zhou: Substantial contributions to the conception and design of the work; the acquisition, analysis, interpretation of data for the work.Yu Pu: Design of the work; the acquisition, analysis, interpretation of data for the work.Di Zhang: The acquisition, analysis, interpretation of data for the work.Yu Guan: Drafting the work and revising it critically for important intellectual content.Yang Lu: The acquisition, analysis of data for the work, revising it critically for important intellectual content.Weidong Zhang: The analysis of data for the work.Chi-Cheng Fu: The analysis of data for the work, revising it critically for important intellectual content.Qu Fang: Substantial contributions to the conception of the work.Hanxiao Zhang: The analysis of data for the work.Shiyuan Liu: The analysis of data for the work, final approval of the version to be published, agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.Li Fan: Substantial contributions to the design of the work, revised the manuscript critically for important intellectual content, final approval of the version to be published, agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.

AC K N OW L E D G M E N T S
We thank the investigators and participants at the investigative sites for their support during the conduct of the study.We acknowledge Prof. Rui Wang and Prof. Qian He from the Department of statistics of Second Affiliated Hospital of PLA Naval Medical University for their help in statistics.They agreed with the data analysis of this study.This work was supported by the National Natural Science Foundation

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are available on request from the corresponding author.The data are not publicly available due to privacy or ethical restrictions.

E T H I C A L S TAT E M E N T
This study was approved by the hospital ethics committee and the written informed consent from all subjects were obtained in this study.Clinical Trials Registry Number is ChiCTR2000035283.

R E F E R E N C E S
constructed a 3D-CNN deep learning network F I G U R E 1 Study selection and baseline characteristics.

Note:F I G U R E 2
PRM VEmph and PRM VEmph % = the volume of voxels less than or equal to −950HU on the inspiratory image and less than −856HU on the expiratory image of PRM and the volume percentage in whole lung; PRM VfSAD and PRM VfSAD % = the volume of voxels greater than −950HU on the inspiratory image and less than or equal to −856HU on the expiratory image and the volume percentage in whole lung; PRM VNormal and PRM VNormal % = the volume of voxels greater than −950HU on the inspiratory image and greater than −856HU on the expiratory image and the volume' percentage in whole lung; PRM VUncategorized and PRM VUncategorized % = voxels less than −950HU on the inspiratory image and greater than −856HU on the expiratory image and the volume percentage in whole lung.Numbers are listed in median (IQR) or mean ± standard deviation.The letters a, b and c indicate statistical differences between groups, the letters with repetition indicate no significant statistical differences between groups, and the letters without repetition indicate statistical differences between groups.Abbreviation: LV, lung volume.Box plot of PRM VEmph , PRM VEmph %, PRM VfSAD , PRM VfSAD % of whole lung.

Note:
No. = Numbers of 6 cases; PRM VfSAD % = the volume percentage of voxels greater than −950HU on the inspiratory image and less than or equal to −856HU on the expiratory image of PRM; PRM VEmph % = the volume percentage of voxels less than or equal to −950HU on the inspiratory image and less than −856HU on the expiratory image of PRM.Abbreviations: COPD, chronic obstructive pulmonary disease; FEV1%, percentage of forced expiratory volume in the one second predicted; FEV1/FVC, ratio of the first second forced expiratory volume to forced vital capacity; PFT, pulmonary function test.F I G U R E 4 Model performance in the validation dataset.(a) Regression performance for FEV1/FVC prediction in validation dataset and confusion matrix for regression prediction-based classification between COPD and non-COPD group.(b) Regression performance for FEV1% prediction invalidation dataset (for non-COPD patients) and confusion matrix for the regression prediction-based classification between normal and high-risk groups.(c) Regression performance for FEV1% prediction in validation dataset (for COPD patients) and confusion matrix for the regression prediction-based classification between different GOLD levels.
of China [grants number 81871321, 82171926 and 81930049]; the program of Science and Technology Commission of Shanghai Municipality [grant number 21DZ2202600]; National Key R&D Program of China [grant number 2022YFC2010000, 2022YFC2010002, 2022YFC2010005]; Construction of CT standardized database for chronic obstructive pulmonary disease [grant number YXFSC2022JJSJ002]; Clinical Innovation Project of Shanghai Changzheng Hospital [grant number 2020YLCYJ-Y24] C O N F L I C T O F I N T E R E S T S TAT E M E N T All authors have no conflicts of interest to declare.
the voxels are divided into four categories according to CT values on paired respiratory CT images: (1) Emphysema, voxels less than or equal to −950HU on the inspiratory image and less than −856HU on the expiratory image; (2) f SAD , voxels greater than −950HU on the inspiratory image and less than or equal to −856HU on the expiratory image; (3) Normal lung, voxels greater than −950HU on the inspiratory image and greater than −856HU on the expiratory image; and (4) Uncategorized tissue, voxels less than −950HU on the inspiratory image and greater than −856HU on the expiratory image.The volume as well as the volume percentage of each voxel category (PRM Emph , PRM Emph %, PRM fSAD , PRM fSAD %, PRM Normal , PRM Normal %, PRM Uncategorized , and PRM Uncategorized %) were calculated at the level of whole lung, left lung, right lung, and each lung lobe, respectively.A total of 72 PRM parameters were measured for each participant.
The performance of the random forest regression model was evaluated by calculating the Mean Absolute Error (MAE), Mean Squared Error (MSE) and Root Mean Squared Error (RMSE) between the predicted PFT parameters and the PFT measured ground truth.Lower MAE/MSE/RMSE indicates smaller differences Demographic data and PFT parameters in normal, high risk, and COPD groups.Numbers are listed in median (IQR) or mean ± standard deviation.Abbreviations: COPD, chronic obstructive pulmonary disease; FEV1%, percentage of forced expiratory volume in the one second predicted; FEV1/FVC, ratio of the first second forced expiratory volume to forced vital capacity; PFT, pulmonary function test.
TA B L E 1Note: PRM parameters in normal, high risk, and COPD groups.
Lung function of cases whose model classification is inconsistent with PFT in normal/high-risk COPD groups.
TA B L E 3