Direct estimation of regional lung volume change from paired and single CT images using residual regression neural network

Background: Chest computed tomography (CT) enables characterization of pulmonary diseases by producing high-resolution and high-contrast images of the intricate lung structures. Deformable image registration is used to align chest CT scans at different lung volumes, yielding estimates of local tissue expansion and contraction. Purpose: We investigated the utility of deep generative models for directly predicting local tissue volume change from lung CT images, bypassing computationally expensive iterative image registration and providing a method that can be utilized in scenarios where either one or two CT scans are available. Methods: A residual regression convolutional neural network, called Reg3DNet+, is proposed for directly regressing high-resolution images of local tissue volume change (i.e., Jacobian) from CT images. Image registration was performed between lung volumes at total lung capacity (TLC) and functional residual capacity (FRC) using a tissue mass- and structure-preserving registration algorithm. The Jacobian image was calculated from the registration-derived displacement field and used as the ground truth for local tissue volume change. Four separate Reg3DNet+ models were trained to predict Jacobian images using a multifactorial study design to compare the effects of network input (i.e., single image vs. paired images) and output space (i.e., FRC vs. TLC). The models were trained and evaluated on image datasets from the COPDGene study. Models were evaluated against the registration-derived Jacobian images using local, regional, and global evaluation metrics. Results: Statistical analysis revealed that both factors – network input and output space – were significant determinants for change in evaluation metrics. Paired-input models performed better than single-input models, and model performance was better in the output space of FRC rather than TLC. Mean structural similarity index for paired-input models was 0.959 and 0.956 for FRC and TLC output spaces, respectively, and for single-input models was 0.951 and 0.937. Global evaluation metrics demonstrated correlation between registration-derived Jacobian mean and predicted Jacobian mean: coefficient of determination (r2) for paired-input models was 0.974 and 0.938 for FRC and TLC output spaces, respectively, and for single-input models was 0.598 and 0.346. After correcting for effort, registration-derived lobar volume change was strongly correlated with the predicted lobar volume change:for paired-input models r2 was 0.899 for both FRC and TLC output spaces, and for single-input models r2 was 0.803 and 0.862, respectively. Conclusions: Convolutional neural networks can be used to directly predict local tissue mechanics, eliminating the need for computationally expensive image registration. Networks that use paired CT images acquired at TLC and FRC allow for more accurate prediction of local tissue expansion compared to networks that use a single image. Networks that only require a single input image still show promising results, particularly after correcting for effort, and allow for local tissue expansion estimation in cases where multiple CT scans are not available. For single-input networks, the FRC image is more predictive of local tissue volume change compared to the TLC image.


| INTRODUCTION
5][6][7] A limitation to structural roadmaps of tissue destruction and remodeling is that they offer limited insight into the disease etiology and they do not provide a linkage to lung function.[12][13][14][15][16] Deformable image registration (DIR) of CT scans obtained at two or more lung volumes (e.g., at expiratory and inspiratory breath holds) is one approach which has moved away from simple structural metrics and offers an appropriate alternative for estimating functional information based on changes in local tissue expansion and contraction. 17,18rom image matching of inspiratory and expiratory lung images has come an awareness of the role played by functional small airways disease (fSAD) as a precursor to parenchymal destruction. 19,20The Jacobian determinant (J) of the registration displacement field quantifies local volume change, and has been validated against a well-known, noninvasive method for estimating regional lung ventilation: xenon-enhanced CT (Xe-CT). 17,182][23][24] Bodduluri et al. demonstrated that biomechanical features (extracted from Jacobians) were better predictors of COPD severity than conventional texture-and intensity-based features. 21Local change in lung volume averaged over the lung field was shown to be in better agreement with spirometry and was able to capture lung function decline and parenchymal tissue destruction. 22Moreover, mean Jacobian was significantly correlated with several indices of lung function and patient health, including forced expiratory volume in 1 second (FEV 1 ), extent of emphysema, and 6-minute walk distance. 23These studies indicate the effectiveness of registration-based local tissue expansion measures for assessing COPD development and progression.
Image registration frameworks may not be feasible for some patients.For most patients, in a clinical setting, expiratory CT scans are simply not acquired.In carefully controlled academic environments, there is little or no coaching to functional residual capacity (FRC) or residual volume (RV) and thus the results from image matching are difficult to interpret.When there is coaching, some patients are simply unable to follow the specific instructions.An additional impediment is the fact that for frequent longitudinal assessments, despite 10 fold reductions in radiation doses from CT, 25,26 there is a reluctance to expose patients repetitively to CT levels of radiation.There are also image processing issues with image registration methods.State-of-the-art lung image registration methods incur high computational costs for processing large 3D datasets.Registration methods relying on iterative optimization and subject-specific hyperparameter tuning are time-consuming, and are not practical for large-cohort COPD studies.However, the limitations of lung CT acquisition and registration do not undermine their significant clinical contributions.Rather, they highlight a need for alternative approaches to estimate CT-based local tissue mechanics that are robust, fast, scalable, and viable as an alternative to direct image registration or for predictions of regional mechanics (such as air trapping derived from Jacobians) even when only a single scan is available.Such a method may allow for salvaging a lung study when one of a lung volume pair is obtained at an inadequate inspiratory or expiratory effort or is degraded for some other reason.
The typical registration-based approach to estimating local tissue expansion relies on aligning structures in two images acquired at different lung volumes.However, deep generative modeling may enable prediction of local tissue distensibility directly from a single CT image, 50 instead relying on empirical structure-function relationships that link local intensity and texture to distensibility.For example, increased tissue density due to pulmonary edema or fibrosis may be associated with altered mechanical stiffness and inflation dynamics. 51,52Based on these structure-function relationships, we hypothesized that AI predictions of local tissue expansion using only a single lung CT image will be correlated with registration-derived Jacobian estimations, albeit to a lesser degree compared with AI predictions using paired images.4][55] While several deep learning-based registration methods have been proposed which involve predicting the lung motion in three dimensions, 48 predicting the Jacobian image directly only involves estimating the tissue distensibility without having to estimate motion making it a less complex task.
In our previous work, we demonstrated the feasibility of directly predicting a Jacobian image from paired CT images acquired at TLC and FRC using a convolutional neural network. 56In a separate study, we demonstrated the feasibility of directly predicting a Jacobian image from a single CT image acquired at RV using a generative adversarial network. 50The generative model in the latter study was trained on two-dimensional slices and was not guaranteed to be spatially consistent between adjacent slices.Additionally, these studies used different lung volumes for predicting Jacobian images and thus do not provide a consistent comparison between single-and paired-input Jacobian prediction methods.In this study, we performed a multifactorial design to assess Jacobian prediction by networks trained with either paired or single CT images, in each case estimating either local expansion from FRC or local contraction from TLC.There are three novel contributions in this work: (1) We provide a direct and fair comparison between single-and paired-input approaches for Jacobian prediction.The previous works were performed on different datasets, different lung volumes, different dimensionality, and different networks.(2) This is the first work to use TLC images for single-input models and compare it directly to using FRC for single-input models.The previous work using single-input models only used RV images as input. 50(3) For paired-input models, we report the combined effect of registration direction and prediction direction, that is, expansion from FRC versus contraction from TLC.The previous work with a paired-input model only reported prediction of expansion from FRC, using only registrations with moving image at TLC and fixed image at FRC. 56

| Dataset
Chest CT image datasets from the COPDGene study were utilized in this study. 57OPDGene is a multicenter study that aims to identify genetic factors associated with COPD and characterize chest CT phenotypes of COPD. 57Each participant enrolled in COPDGene received two chest CT scans acquired at two lung volumes: TLC using 200 mAs, and FRC using 50 mAs.The images reconstructed from these scans will be referred to as I TLC and I FRC .Disease severity was defined by the Global Initiative for Chronic Obstructive Lung Disease (GOLD), where GOLD 1 to 4 indicate mild to severe COPD and GOLD 0 includes asymptomatic smokers. 58For this study, 700 subjects were randomly sampled such that there was a uniform distribution of patients from each of the seven groups: the five GOLD levels, nonsmokers, and PRISm (preserved ratio impaired spirometry). 59The dataset was split into independent training and testing sets of 630 and 70 subjects, respectively, with uniform distribution of the seven groups.

| Image registration
Deformable image registration (DIR) estimates a transformation that describes the pointwise correspondence between two images: a fixed image I F and a moving image I M .The transformation h(x F ) maps points x F from the fixed image space to their corresponding locations in the moving image space.The determinant of the Jacobian matrix of the transformation (henceforth referred to as simply the Jacobian or J) can be computed as a scalar-valued spatial map in the same spatial reference frame as the fixed image, that is, J F (x F ) or more simply J F .The Jacobian thus describes the voxelwise multiplicative factor by which the local volume changes when the fixed image deforms to align with the moving image.J F (x F ) > 1 indicates expansion at x F and J F (x F ) < 1 indicates contraction at x F .
Image registration can be performed in two directions between CT images acquired at FRC and TLC (Figure 1).The forward transformation h(x FRC ) describes the deformation of moving image I TLC to align with fixed image I FRC .Conversely, the inverse transformation h −1 (x TLC ) describes the deformation of moving image I FRC to align with fixed image I TLC .The registration outputs h(x F ) and J F are always defined in the fixed image space (i.e., I F ), and h(x F ) is used to warp the moving image I M to align with I F .Due to the opposing directions of the transformations, J FRC represents expansion from I FRC to I TLC , whereas J TLC represents contraction from I TLC to I FRC .For each subject in our dataset both h(x FRC ) and h −1 (x TLC ) were estimated using a B-spline parameterized pyramid image registration algorithm. 17,18A tissue-mass-preserving cost function was utilized to account for changes in CT intensity at different lung volumes.Additionally, a structure term was added to the cost function to penalize misalignment of fissures and vessels.Fissure and vessel segmentations were obtained using FissureNet and a CNN for vessel segmentation. 60Prior to registration, I FRC and I TLC were masked using lung segmentations M FRC and M TLC obtained using a multi-resolution CNN. 61,62The registration-derived J FRC and J TLC images were used as ground truth for training CNNs for direct prediction of local tissue expansion.

| Preprocessing-
The following preprocessing was performed on all images for each subject: I TLC , I FRC , J TLC , J FRC .Gaussian smoothing was applied to prevent aliasing and then images were downsampled to 3 mm isotropic voxels.Images were cropped to the bounding box of M TLC ∪ M FRC .FRC and TLC images were then masked using the corresponding mask downsampled with nearest-neighbor interpolation.Lastly, CT intensity values in I TLC and I FRC were clamped to the interval [−1024 HU, 200 HU] and then linearly rescaled to [−1, 1].

| Reg3DNet+
-In this work, we propose a deep learning approach for directly estimating J from single or paired CT images.The goal is to learn a model G θ with learnable parameters θ that maps a multidimensional input CT image to an output Jacobian image: G θ : I J In the case of paired-image input, I in this expression is considered to be a four-dimensional image formed by the concatenation of the two three-dimensional CT images.The model G θ is parameterized by a CNN, which is a specialized neural network for exploiting spatial correlations in image data.We propose Reg3DNet+, which is a modified version of Seg3DNet -a memory efficient 3D fully convolutional encoder-decoder network (shown in Figure 2). 60Compared to Seg3DNet, Reg3DNet+ is a regression network that utilizes residual connections, or skip connections (+), which improves training in deep networks by allowing gradients to flow to earlier layers and thereby avoids vanishing gradients.
The input to Reg3DNet+ is I TLC and/or I FRC .If both images are used, they are concatenated along a channel dimension, analogous to the channel dimension in an RGB image, and the number of channels is C in = 2.The output of the network Ŷ is a single-channel image that is the same spatial size as the input image.Ground truth is defined as Y = log 2 J such that reciprocal volume ratios have equal magnitudes and opposite signs (i.e., expansion at x: Y(x) > 0; contraction at x: Y(x) < 0).The final Jacobian prediction is recovered as Predicting J is a regression task where each voxel in the target image takes on a continuous value, as opposed to a classification task where each voxel in the target image takes on a discrete value.Therefore, the output image has only one channel.Reg3DNet+ transforms the input CT images to the output Ŷ through a hierarchy of convolutional layers.
Reg3DNet+ consists of levels l = 0 … L − 1 which operate at different spatial resolutions.
Each level consists of a residual identity block (Figure 2B) followed by a residual down block (Figure 2C).Residual identity blocks maintain both the spatial size and number of channels.Residual down blocks decrease the size of the feature maps by a factor of 2 along each dimension and increase the number of channels by a factor of 2. Both residual identity blocks and residual down blocks incorporate a residual connection allowing the identity function to be learned.The number of filters input to each residual identity block was C l = 8 × 2 l .The output of each residual identity block is upsampled to the original size using upconvolution with C 0 filters, and the resulting maps are concatenated to create a multiscale feature representation of C 0 L channels.Lastly, two more convolutional layers are used to integrate the multi-scale features and produce Ŷ.All convolutional layers are followed by instance normalization and ReLU activation, except for the last layer.In residual blocks the normalization is performed before the residual connection and the activation is performed after the residual connection.In this work we use L = 4.

| Training
where M F is the set of voxels in the lung mask for the F output space.The SSIM of two image patches a and b is: where μ a and μ b are the Gaussian weighted mean of patch a and patch b, respectively, σ a and σ b are the Gaussian weighted standard deviation of patch a and patch b, respectively, and σ ab is the Gaussian weighted covariance of patch a and patch b.C 1 = (K 1 Δ) 2 and C 2 = (K 2 Δ) 2 where Δ is the dynamic range of the image and K 1 and K 2 are constants.In this work we used an image patch of size 7 3 voxels, a Gaussian weighting with σ = 1.5, Δ = 5, K 1 = 0.01, and K 2 = 0.03.The SSIM loss between the ground truth image Y and predicted image Ŷ is: where w(Y(x)) is a patch extracted from image Y centered at location x.The total loss function is defined as follows: In this work we use α = 0.5.Adam optimization was used with a learning rate of 1 × 10 −4 ; 63 the AMS-Grad variant of the algorithm was used to improve convergence. 64Parameters were initialized using He normal initialization.

| Multifactorial design
A multifactorial study was used to investigate two factors: network input and output space (Figure 3).Two levels were defined for each factor: single versus paired network input and FRC versus TLC output space.Note, the single-input networks only used the input image that matched the target output space, that is, I FRC for the FRC output space (Figure 3 upper left quadrant) and I TLC for the TLC output space (Figure 3 lower left quadrant).

| Evaluation
Each model was evaluated by quantitatively comparing the similarity of Ĵ and J images at local, regional, and global scales.At the local scale, voxelwise Spearman correlation, peak signal-to-noise ratio (PSNR), and MAE were computed.At the regional scale, SSIM and Dice coefficient were computed.The Dice coefficient was computed for high-and low-Jacobian lung regions which were defined for each patient by the upper and lower quartiles of Jacobian values, respectively.An aligned rank transform (ART) two-way repeated measures analysis of variance (ANOVA) was performed to analyze the main effects and interactions of the two factors in the multifactorial ablation study: network input and output space.This test was performed for all local and regional evaluation metrics.
At the global scale, the Jacobian mean (J μ ) and coefficient of variation (J CV ) were computed for the entire lung mask.Furthermore, we compared the distribution of Jacobian across different regions of interest (ROIs) by computing the volume change in each ROI (ΔV ROI ) expressed as a percent of total lung volume change (ΔV L ): where: and where J μ,ROI is the mean Jacobian in the ROI and V ROI is the volume of the ROI.
Two approaches were considered for partitioning the lung into non-overlapping ROIs.In the first approach, each lobe was considered a ROI, with lobar segmentations generated by a multiresolution CNN. 67In the second approach, ROIs were defined according to voxelwise classification as either normal, functional small airway disease (fSAD), or emphysema via parametric response mapping (PRM). 68Thresholds for PRM were −856 HU at FRC and −950 HU at TLC.For both the lobe-and PRM-defined ROIs, predicted ΔV ROI % was computed for each of the four models and compared to DIR ΔV ROI %.

| RESULTS
Qualitative results of three subjects with different GOLD levels are displayed in Figure 4. Jacobians are transformed as|log 2 (J)| so that expansion from FRC and corresponding contraction from TLC can be visualized on the same color scale.Qualitatively, all AI models showed promising performance and were able to produce images that have similar spatial patterns of expansion or contraction compared to registration-based results.Figure 4 shows qualitative results for three cases with different GOLD levels.The least severe disease case (GOLD0) shows normal pattern of mechanics with more tissue expansion in the lower lobes compared to the upper lobes.All models performed similar and were able to predict this pattern.The GOLD2 case shows less overall expansion with a large region of very low expansion in the lower right lobes.All models except the single input model in the TLC space were able to replicate this pattern.The most severe disease case shows the lowest overall expansion, with large asymmetry between the left and right lungs.All models except the single input model in the TLC space were able to predict this asymmetry.The qualitative results demonstrate that for abnormal cases the I FRC provides more predictive features.
Box-plots for the local and regional quantitative evaluation metrics for each of the four models are displayed in Figure 5.The local and regional metrics were computed using |log 2 (J)| so that contraction and expansion ratios can be compared on the same scale.For all evaluation metrics, the ART ANOVA found both main effects, network input and output space, to be statistically significant factors (p < 1 × 10 −5 ).For SSIM, Spearman, and Dice low-Jacobian, the interaction term was also statistically significant (p < 0.01).
Box-plots for global evaluation metrics are displayed in Figure 6.The global J μ and J CV were calculated within the lung field and the results were stratified by GOLD level.The plots show a decrease in J μ with increasing GOLD level and a decrease in J CV with increasing GOLD level.Note, a Jacobian of 1 indicates zero expansion or contraction; Jacobian greater than 1 indicates expansion whereas Jacobian less than 1 indicates contraction.
Figure 7 shows scatter plots of network-predicted versus registration-derived J μ for each subject.A linear regression was computed for each combination of network input (single vs. paired) and output space (FRC vs. TLC) and the coefficient of determination r 2 was computed.For paired-input models r 2 = 0.974 and 0.938 for FRC and TLC, respectively, and for single-input models r 2 = 0.598 and 0.346 for FRC and TLC, respectively.For paired-input models, note that although the FRC output space was associated with a higher r 2 , the TLC output space was associated with a regression slope closer to identity.
Figure 8 shows scatter plots of network-predicted versus registration-derived ΔV ROI % using lobar ROIs.Linear regressions and coefficients of determination r 2 were computed for each model.For paired-input models r 2 = 0.899 for both FRC and TLC, and for single-input models r 2 = 0.803 and 0.862 for FRC and TLC, respectively.Figure 9 shows Bland-Altman plots for ΔV ROI % using PRM-derived ROIs.The different PRM categories are denoted by color and the mean and 95% limits of agreement were computed for each category (i.e., normal, fSAD, and emphysema).To avoid divergence of ΔV ROI % calculation (i.e., division by near-zero values), subjects with almost no total lung volume change (|J μ − 1| < 0.02) were excluded from this analysis.This resulted in one subject being excluded.

| DISCUSSION
In this work, a deep learning approach for local volume change prediction in CT images was proposed.Deformable image registration is the standard approach for estimating local volume change, but it is an iterative process than can take several hours especially for large volumetric medical images.The proposed AI framework enables Jacobian images to be directly predicted from single or paired CT images in less than a minute.This enables highthroughput processing and real-time analysis in clinical and research settings.Furthermore, the single-input models enable biomechanical assessment even when only a single CT image is available -an impossible task for image registration.In this study, deformable image registration was used to generate ground truth images in FRC and TLC output spaces -J FRC and J TLC -with network input being either a pair of CT images (I FRC and I TLC ) or only a single CT image (I FRC or I TLC ).The Reg3DNet+ architecture was proposed for directly predicting Jacobian images from CT images.A multifactorial study was designed to test the effects of network input and output space.
A dataset of CT scans in COPD patients was used for training and evaluation. 57Overall there was a decrease in global J μ for patients with increasing GOLD level, and a decrease in global J CV .These findings are consistent with previous literature indicating that Jacobians are reliable biomarkers of COPD severity. 23,56,69e ART ANOVA found network input to be a significant factor for all local and regional evaluation metrics.As expected, models with paired-image input (I TLC and I FRC ) outperformed models with single-image input (I TLC or I FRC ), by 15% for PSNR and by 1.5% for SSIM.The benefit of using paired-image input is likely attributed to the availability of information about both inflation states simultaneously.We speculate that the paired-image network is able to learn convolutional filters that correlate distinct structural features and their corresponding deformation between the two input images.By contrast, predicting expansion and contraction based on only a single image is a difficult task, because it requires inference of mechanical properties and distensibility based on apparent CT intensity and texture alone.Nonetheless the single-input models exhibited high performance.The small 1.5% difference in SSIM (Figure 5) indicates that single-input models are able to predict spatial patterns with roughly the same degree of accuracy as the paired-image models.However, the larger 15% difference in PSNR (Figure 5) and 63% to 171% difference in global mean Jacobian r 2 (Figure 7) indicates that the exact magnitude of volume change is less predictable in single-input models which is expected since magnitude is, in part, effort dependent.
The ART ANOVA also found output space to be a significant factor affecting Jacobian prediction.Specifically, models predicting Jacobian in the FRC output space performed better than models predicting Jacobian in the TLC output space.For several of the evaluation metrics (i.e., Spearman, SSIM, Dice low-Jacobian), the interaction term between network input and output space was significant.This indicates that the effect of output space is dependent on the network input, that is, there is a greater difference between metrics for TLC and FRC output space in single-input models than in paired-input models.Specifically, I FRC provides more predictive information than I TLC especially for single-input models.The r 2 for global J μ DIR vs. J μ predicted regressions was also much higher for the single-input FRC output space (0.598) than for the single-input TLC output space (0.346).
After correcting for effort dependent variability by normalizing regional volume changes to total lung volume change, we found that network prediction for both lobar-and PRM-based ROIs were consistent with DIR estimates.The r 2 for these lobe-based effort-corrected regressions all indicate a strong correlation (i.e., r 2 >= 803) which when contrasted against the low r 2 for global mean Jacobian for single-input models, indicates that regional heterogeneity of Jacobian prediction is accurate, despite weak ability to predict the effort dependent global mean Jacobian.We also repeated this analysis using PRM-defined ROIs, which also supports this notion and furthermore indicates that model predictions are sensitive to structural indications of pathological alterations, for example, emphysema and fSAD.We do note however that these structural indications (i.e., gas trapping and emphysema) are less obvious in TLC scans, 53,55 which results in larger differences between DIR and predictions for single-input TLC model.This is illustrated in Figure 9 where compared to the other models, the single-input TLC model overestimated volume change in fSAD regions and underestimated volume change in normal regions.The addition of spirometry to inform the total volume change from FRC to TLC could augment the singleinput models for a more accurate prediction of absolute volume changes rather than relative volume changes.
In this study, the performance of single-input models provides confidence that appearance in CT images (e.g., intensity, texture, location) is linked to tissue distensibility, such that inherent structure-function relationships may permit clinical assessment of inflation heterogeneity, spatial gradients, and regional defects using only a single CT image.Furthermore, it may be possible to combine single-image assessment together with global measurements of respiratory mechanics (e.g., spirometry, compliance) to accurately predict both the spatial patterns and the local magnitudes of volume change, without a second CT scan.Although it may be nonintuitive to use a single image for a task typically requiring two images, deep learning approaches benefit from their ability to discover abstract yet predictive features within single images directly from training data.Another application involving monocular depth perception (i.e., from single images as opposed to paired stereoscopic images) was also successfully performed by AI models, with attempts to decipher which image features convey predictive information by testing modified inputs. 70elying on single-image models with CT scans at FRC-and forgoing a second CT scan at TLC-has benefits including reduced radiation exposure and reduced Jacobian variability associated with effort-dependent inspiration to TLC.The current study focused on a multicenter study of smoking associated COPD.For deep learning models to predict regional lung function from single or even pairs of lung images across a variety of clinical presentations, it is likely that a broader training set will be required.
Future studies will investigate whether images acquired at RV improve performance compared to FRC images for single-input models.RV is the lung volume used in pulmonary function laboratories to assess air trapping.FRC was the expiratory volume used in COPDGene because of historical reasons.When the study began, it was difficult for patients to hold their breath for the 10-20 s required for image acquisition.Current scan times can be well under 5 s.In addition, it may be worth investigating whether prediction performance is enhanced when the network input includes a combination of more than two images (e.g., RV, FRC, and TLC), compared to only a lung volume pair.It is unclear why there is a difference in performance associated with TLC versus FRC output spaces for paired-input models.It is possible that deformable image registration is less reliable when registering small lung volumes to large lung volumes (i.e., using TLC output space or I TLC fixed image), 71,72 and thus the registration-derived Jacobian used as ground truth is less predictable.It is also unclear whether image quality and signal-to-noise ratio influence model training or prediction, given that CT scans were performed with lower dose (i.e., higher noise) at FRC (50 mAs) compared to TLC (200 mAs).Whatever the reason may be for the observed differences in performance, the results presented herein suggest that AI-based assessment of regional lung volume change with only a single CT scan is more performant if the scan is acquired at FRC rather than TLC.In this work the single-and paired-input models were trained independently.An alternate approach would be to train one network that could accept either single or paired inputs using missing data approaches, which may improve the predictive power of the network.
A limitation of this work is that the registration-derived estimates used as ground truth for training AI models in this study are not necessarily error-free.The ground truth J images were derived from a deformable image registration algorithm which has demonstrated low registration error and high correlation with ventilation imaging modalities. 18,73,74However, image registration is an iterative process that is subject to errors due to convergence failures or convergence to local minima. 71Registration is especially difficult for complex, nonaffine, and large deformations which occur in the lung between FRC and TLC.Furthermore, B-spline parameterized registration can only describe smooth deformations, and thus cannot account for sliding that occurs between lobes and at the chest wall. 75Biases that exist in the registration algorithm may inadvertently be learned by the network. 76Another limitation of this work is the models were trained on downsampled images in order to train on full 3D volumes given constraints in GPU memory.While low-resolution images allows for more global and 3D context, there is a loss of small-scale structures present in high-resolution images that are likely predictive of tissue distensibility.

| CONCLUSION
In this work, we explored a deep learning approach to directly predicting regional lung volume change in CT images, bypassing the time-consuming process of image registration.
Moreover, we investigated the potential for predicting regional lung volume change from a single CT image, bypassing the need for multiple CT scans and additional radiation exposure.Paired-input models predicting Jacobian images in the FRC output space demonstrated the best performance in a cohort of COPD patients.Nonetheless, single-input models showed promising performance, especially when used with images acquired at FRC rather than TLC.The single-input model regional performance was comparable to the paired-input after correction for effort, indicating that the regional heterogeneity of Jacobian single-input prediction is accurate, despite weak ability to predict the effort dependent total Jacobian mean.The multifactorial study design used in this work yielded novel insights into the presence of structure-function relationships between lung tissue appearance in CT images and regional distensibility.Such intrinsic relationships underlie the success of AI models in predicting volume change based on only a single image, which is a nonintuitive task for humans as well as an impossible task for image registration.Qualitative results showing CT images and Jacobians on coronal slices.Three subjects are displayed, two rows for each subject corresponding to images in the FRC space and images in the TLC space.Note the the FRC and TLC rows for each subject show the images in the original space, therefore, there is no pointwise correspondence and it is not possible to show the exact same lung region.However, for consistency we chose the coronal slice with the carina for both FRC and TLC images.From left to right the columns show CT image, DIR result, Jacobian prediction of single-input network, and Jacobian prediction of paired-input network.From top to bottom different subjects are shown with increasing GOLD level.
-To learn the model θ the parameters θ are optimized by minimizing a loss function which defines the dissimilarity between the ground truth Y and the network prediction Ŷ.A linear combination of the mean absolute error (MAE) and the structural similarity index (SSIM) was used as the loss function for training our model.The MAE loss between ground truth Y and network prediction Ŷ is defined as:

2. 3 . 4 |
Implementation-TensorFlow was used to implement the deep learning models.The BioData Catalyst ecosystem66 was used for training and analysis.Specifically, NVIDIA V100 graphics cards with 32 GB RAM were used.To improve training efficiency and increase batchsize a multi-GPU training strategy was employed with 8 GPUs.A batchsize of 8 was used for each GPU, yielding an effective batchsize of 64.All models were trained for 1000 epochs, where each epoch took approximately 8 s.Compared to training with a single GPU, the multi-GPU setting on BioData Catalyst decreased training time by five-fold.

FIGURE 1 .
FIGURE 1.Image registration between two CT images acquired at different lung volumes, I FRC and I TLC , is used to estimate a displacement vector field representing the anatomical correspondence between images (left column).The displacement vector field and Jacobian image are defined in the space of I F .Two different displacement vector fields (middle column) and Jacobian images (right column) can be estimated by performing the registration in both directions, that is, by alternating which image is defined as I F .If I F = I FRC , h(x FRC ) represents the mapping from points in IFRC to points in I TLC and J FRC represents the expansion from I FRC to I TLC (top middle and top right, respectively).By contrast, if I F = I TLC then h −1 (x TLC ) represents the mapping from points in I TLC to points in I FRC and J TLC represents the contraction from I TLC to I FRC (bottom middle and bottom right, respectively).

FIGURE 2 .
FIGURE 2.The overall Reg3DNet+ architecture proposed in this work is illustrated in panel a.The network consists of levels l = 0 … L -1 which operate at different spatial resolutions.Each level consists of a residual identity block (gray arrow) followed by a residual down block (dashed arrow).The residual identity and residual down blocks are expanded in panel b and c, respectively.The output of the residual identity block from each level is upsampled to the original resolution through an upconvolution operation and the results are concatenated to create a multiscale representation.Lastly, two more convolutions are performed to generate the output image.Each cube represents a 3D image with the number of feature channels denoted in the lower left corner.A "N" and/or "A" above an operation denotes instance normalization and/or ReLU activation is performed following the operation.

FIGURE 3 .
FIGURE 3.Multifactorial study design.Four Reg3DNet+ models were trained to study two factors: network input (columns) and output space (rows).The network input can be a single CT image (left column) or paired CT images (right column).The output space can be FRC (top row) or TLC (bottom row).

FIGURE 5 .
FIGURE 5. Local and regional quantitative evaluation metrics evaluated on testing dataset.Results are stratified by model.Each panel shows summary results for a different metric: (a) MAE, (b) Spearman correlation, (c) SSIM, (d) PSNR, (e) Dice low-Jacobian, and (f) Dice high-Jacobian.Note, a lower MAE is indicative of better performance whereas for the remaining metrics a higher value is indicative of better performance.Black triangles indicate mean values.

FIGURE 6 .
FIGURE 6.Global statistics of Jacobian images stratified by GOLD level.Top row shows trends in the J μ and the bottom row shows trends in J CV .The columns from left to right show results for deformable image registration (DIR), single-input model, and paired-input model.Each panel shows results for prediction in the FRC output space (white) and TLC output space (gray).

FIGURE 7 .
FIGURE 7. Scatter plots of J μ for each subject, showing the value estimated by deformable image registration (DIR) compared to the value predicted by the four trained networks.Each panel corresponds to a different network, with rows corresponding to the output space (top: FRC, bottom: TLC), and columns corresponding to the network input (left: single image, right: paired images).Solid black line indicates fitted linear regression model.Dashed black line indicates line of identity for reference.

FIGURE 8 .
FIGURE 8. Scatter plots of the volume change by lobe region expressed as a percent of total lung volume change (ΔV %) for each subject, showing the value estimated by deformable image registration (DIR) compared to the value predicted by the four trained networks.Each panel corresponds to a different network, with rows corresponding to the output space (top: FRC, bottom: TLC), and columns corresponding to the network input (left: single image, right: paired images).Solid black line indicates fitted linear regression model.The different lobes are denoted with different symbols.Dashed black line indicates line of identity for reference.

FIGURE 9 .
FIGURE 9.Bland-Altman plots for the volume change by PRM region expressed as a percent of total lung volume change (ΔV %) predicted by each of the four trained networks compared to the ground truth estimated by deformable image registration (DIR).Each panel corresponds to a different network, with rows corresponding to the output space (top: FRC, bottom: TLC), and columns corresponding to the network input (left: single image, right: paired images).The mean and 95% limits of agreement (mean ±1.96 standard deviation of the difference) for each PRM region are displayed in the legend.Dashed line is fixed at zero for reference.