Intra‐ and Inter‐visit Repeatability of 129Xenon Multiple‐Breath Washout MRI in Children With Stable Cystic Fibrosis Lung Disease

Multiple‐breath washout (MBW) 129Xe MRI (MBW Xe‐MRI) is a promising technique for following pediatric cystic fibrosis (CF) lung disease progression. However, its repeatability in stable CF needs to be established to use it as an outcome measure for novel therapies.

E arly cystic fibrosis (CF) lung disease presents normal values in conventional pulmonary function tests (PFTs), such as spirometry yielding the forced expiratory volume in 1 second (FEV 1 ), and therefore have diminished value for monitoring disease progression and treatment response. 1 This is anticipated to be increasingly problematic with the advent of highly effective CF transmembrane conductance regulator (CFTR) modulator therapies. 1 The lung clearance index (LCI) derived from the nitrogen (or SF 6 ) multiple breath washout (MBW) test has been shown to be more sensitive than spirometry to disease progression in pediatric CF. [2][3][4] However, LCI is limited to whole-lung measurements of gas washout obtained at the mouth and is therefore unable to directly obtain information about the spatial ventilation distribution in the lung.
Regional lung ventilation can be visualized using MRI following inhalation of a gaseous contrast agent such as hyperpolarized (HP) 129 Xe. MRI of HP 129 Xe gas (Xe-MRI) is typically performed during a breath-hold following the inhalation of a single fixed volume of HP 129 Xe gas. It reflects the static spatial distribution of the inhaled gas mixture inside the lungs during the breath hold, including any obstructions, which appear as signal voids (termed ventilation defects). Then, the ventilation defect percent (VDP) can be derived from these images, which describes the percentage of poorly ventilated or unventilated thoracic cavity volume. 5,6 Xe-MRI has been shown to be safe, tolerable, feasible, and reproducible in both adult and pediatric CF patients. 5,7,8 VDP detects functional changes in stable CF lung disease that are not reflected in spirometry over the same time period and significantly correlates with LCI. 5,6,8 VDP has also been found to reflect response to antibiotic treatment in CF subjects undergoing a pulmonary exacerbation. 9 However, VDP only provides a static snapshot of the distribution of HP gas in the lung during a breath-hold interval immediately following a single inhalation and does not readily provide information on the dynamics of the ventilatory process as described by LCI. 10 The combination of Xe-MRI with MBW can provide an approach for regional measurement of gas washout reflecting the dynamics of regional gas replacement and gas trapping, which may be more physiologically meaningful compared to a static breath-hold. 11,12 This technique was initially proposed by Deninger et al who used a single inhalation of HP 3 He gas in mechanically ventilated guinea pigs and measured signal loss in the lungs due to gas washout with multiple subsequent inhalations of room air. 13 This loss is quantified by the fractional ventilation (FV), defined as the ratio of fresh gas entering a volume (i.e. voxel) of the lung (V new ) to the total end-inspiratory volume (both fresh gas and old gas, i.e. V total = V new + V old ) of the same voxel per breath. [14][15][16] By fitting the decreasing signal intensity against the number of washout breaths with an appropriate model, FV can be extracted from the washout curve for every voxel in the image by assuming a mono-exponential signal decay, resulting in a FV regional map. 13,[16][17][18] This technique has demonstrated to be feasible in adult and pediatric populations. The effect of T 1 on ventilation mapping in healthy rat lungs has also been investigated. [16][17][18] Since FV is extracted from the washout curve, provided that each image has sufficient SNR (>10), FV is less dependent on absolute signal levels (eg polarization) and could therefore be more repeatable than VDP.
Additionally, a marker of spatial ventilation heterogeneity can be derived from the FV maps by comparing the average value in one voxel to its nearest neighbors. Then, a map of the coefficient of variation of FV (CoV FV ) can be produced, which describes the inherent spatial heterogeneity of FV. Similar metrics of ventilation heterogeneity have previously shown to be useful for VDP mapping in pediatric severe asthma and correlate with LCI and FEV 1 in pediatric CF. 19,20 Additionally, ventilation heterogeneity images derived from single-breath signal intensity Xe-MRI scans were able to detect treatment response to severe asthma, 19 suggesting that CoV FV may be promising for interventional studies. However, the reproducibility of FV maps and CoV FV obtained with MBW Xe-MRI has not been demonstrated in pediatric patients with CF. This is an important step toward the eventual use of the technique for monitoring the effects of novel CF treatments.
The purpose of this work was to evaluate MBW Xe-MRI repeatability in children with CF as well as in an age-matched healthy cohort and to investigate the relationships between FV and CoV FV (i.e. repeatability, correlations) and PFTs and VDP.

Study Design
This work was approved by the SickKids Research Ethics Board (REB) and Health Canada (REB #10000063021, Clinical-Trials.gov NCT02740868) as an extension of a longitudinal study evaluating the ability of Xe-MRI to detect treatment response in pediatric subjects following CFTR modulator therapy at 1-month (AE1 week) follow-up (HyPOINT: REB# 1000065980, Clinical-Trials.gov NCT04259970). Written informed consent was obtained from all parents and guardians/participants.
A total of 23 subjects were recruited (7 healthy and 16 stable CF subjects, median age 14.9 AE 2 years [range = 11-18]), with 2 visits each (baseline, 1-month AE 1 week following CFTR modulator therapy). Healthy participants were recruited based on the following inclusion criteria: age 6-18, informed consent/assent, able to perform reproducible spirometry and achieve breath-hold duration sufficient for MRI acquisition ($10 seconds). Participants were excluded based on the following criteria: medical instability, FEV 1 % < 40%, severe claustrophobia, not meeting MRI screening criteria, cough within 3 days and/or usage of antibiotics within 3 weeks prior to study visit, and any known pulmonary disease. For stable CF individuals, the inclusion criteria was as above with the addition that diagnostic criteria for CF is met (i.e. documented sweat chloride ≥60 mEq/L by quantitative pilocarpine iontophoresis [QPIT], or documented genotype with two disease-causing mutations in the CFTR gene). In addition to the above exclusion criteria, participants defined as not having a stable CF condition were excluded. This was defined as no worsening in cough or sputum production within 3 days, no new oral or inhaled antibiotics within 3 weeks, and no intravenous antibiotics within 2 weeks prior to the study visit. For 18 subjects (7 healthy, 11 stable CF), intervisit repeatability was assessed between visits at baseline (visit 1) and 1 month later (visit 2). For 12 of these subjects (5 healthy, 7 stable CF), intravisit repeatability was assessed with repeated scans on either visit $1 hour apart. Each visit was 3-4 hours in duration and included vital signs, pulmonary function tests (PFTs), N 2 MBW and MRI. In all subjects, spirometry (Vmax, VIASYS CareFusion, San Diego, CA, USA) was performed by American Thoracic Society standards. N 2 MBW (Exhalyzer D, EcoMedics AG, Bern, Switzerland) was performed in accordance with the ERS consensus statement for measurement of inert gas washout (i.e. LCI). [21][22][23][24] MRI Acquisition 129 Xe gas was polarized ($30%) using a commercial polarizer (Polarean 9820; Durham, NC). MBW Xe-MRI was performed using a 3 T MRI system (Prisma, Siemens, Erlangen, Germany).A flexible single-channel Tx/Rx vest RF coil (Clinical MR Solutions, Brookfield, WI) was used, as described by Couch et al with the exception that the dose bag contained a volume of HP 129 Xe equal to 1/10 total lung capacity (TLC) topped off with N 2 gas to a total dose volume of 1/6th TLC. 17 TLC was calculated according to height and sex. 25 All participants were instructed to perform the same MBW Xe-MRI maneuver to ensure repeatability as follows. First, two deep breaths followed by a full inhalation of the dose bag. Then, three successive MR images were acquired during a breath hold for flip-angle and T 1 calibration purposes followed by tidal breathing of room air (i.e. washout breath) with approximately 5 seconds between each image acquisition. [16][17][18] Finally, single slice MR images were acquired during a breath-hold (<2 seconds) between each washout breath performed at peak tidal inspiration determined manually by the coach and continued until the signal was depleted (typically 6-8 breaths/images).
For Xe-MRI yielding VDP, participants were coached to inhale the dose bag from functional residual capacity (FRC) and hold their breath for 6-8 seconds, during which time 10-14 coronal slices were acquired as described by Couch et al with the exception that the following parameters were used: TR = 7.8 msec, TE = 2.78 msec, flip angle = 10 , field of view (FOV) = 360 Â 384 mm, matrix = 90 Â 96, slice thickness = 15 mm, and bandwidth = 170 Hz/pixel. 26 16,17 Briefly, the raw data were reconstructed using MATLAB (R2018a; MathWorks, Natick, MA, USA). K-space was zero-filled from 40 Â 64 to 64 Â 64 due to a 62.5% partial-echo in the readout direction, reordered due to centric acquisition, then filtered using a Hamming window. An inverse Fourier transform was applied, and the data were normalized by dividing the transformed images by the maximum signal intensity. The reconstructed washout images were registered using an affine transformation adjusted for the best qualitative fit to account for subject movement and differences in lung volumes between breaths. After registration, images were corrected for background signal using

Image Processing and Analysis
where S is the magnitude of a voxel signal, and B is the mean background signal in a region of interest away from the lungs. 27 For voxels with S 2 smaller than σ 2 , the magnitude of the resulting complex value was taken. The corrected images were then masked using a semi-automated segmentation method based on signal intensity thresholding 17 to identify ventilated regions, masking major airways and the diaphragm.  [16][17][18] FV was derived for every voxel from the signal decay of the washout images using the variable T 1 model signal equation described by Mordago et al. 16 Uptake of HP 129 Xe into the bloodstream and tissues was ignored as a source of signal loss since the dissolved signal represents only approximately 2% of the total signal in the lungs at a given instance in time. 28,29 From the FV map, the CoV FV map was calculated by taking the coefficient of variation (i.e. SD over the mean) of the nearest neighbors (3 Â 3 kernel) for every voxel in the FV map. VDP was calculated as described by Couch et al with the exception that the defect region was determined as intensities below a threshold of 60% of the mean signal. 25,26 Briefly, breath-hold 1 H images were segmented to create a thoracic cavity mask and registered to static Xe-MRI images. The defect region was calculated using signal intensities within the thoracic cavity mask. VDP was calculated by dividing the volume of the defect region by the total lung volume.

Statistical Analysis
Statistical analysis was performed on the mean values of the respective FV and CoV FV whole lung maps. All statistical analyses were calculated using SPSS version 26 (SPSS Inc, Chicago, IL). To determine the extent of correlation and agreement between repeated scans, interclass correlation coefficient (ICC) estimates were based on an absolute-agreement, two-way mixed-effects model, with values <0.5, 0.5-0.75, 0.75-0.9, and >0.90 indicative of poor, moderate, good, and excellent reliability, respectively. 30 The coefficient of reproducibility (CR), within-subject coefficient of variation (CV%), and minimum detectable difference (MDD) were calculated to assess the smallest meaningful change. CR was calculated by CR ¼ 2:77*SD Â ffiffiffiffiffiffiffiffiffi ffi 1 À r p , where r is the reliability coefficient used here as the estimated ICC, and SD is the standard deviation. CV% was calculated using the root-mean-square approach, CV% ¼ P where cv i is the squared coefficient of variation of the repeated measurements for a given participant, and n is the number of repeats (i.e. 2). MDD was calculated using , where z 1Àβ is 1.645 and z 1Àα is 0.842. Repeated scans were tested for agreement by applying a linear regression to replicate scan measurements. Bland-Altman analysis was also used to assess repeatability and agreement. Results from healthy and stable CF groups at baseline were separated and compared via Wilcoxon rank-sum test. The correlation of FV and CoV FV with PFTs (LCI and FEV 1 %) and VDP was tested using the Spearman's correlation, based on the potential of nonlinear correlation between metrics. The Shapiro-Wilk test was used to determine normality of FV, CoV FV , LCI, FEV 1 % and VDP in the combined healthy and stable CF population. P < 0.05 was considered statistically significant. Table 1 presents subject demographics at baseline. The HP 129 Xe gas dose was well tolerated by all participants. One stable CF subject was unable to return for visit 2 and was excluded from this study. Three stable CF patients did not perform MBW Xe-MRI during the baseline visit due to time constraints, and one stable CF subject was excluded at visit 2 due to poor image SNR (SNR < 8). 6 Of the 16 individuals with stable CF, 9 had an LCI < 11 indicative of mild CF lung disease. No significant differences were observed in age (P = 0.22) or body mass index (P = 0.79) between healthy and stable CF patients. CoV FV , LCI, FEV 1 %, and VDP distributions were determined to be not normal in the combined healthy and stable CF populations, while FV was found to be normally distributed. FEV 1 % and FEV 1 over forced vital capacity (FEV 1 /FVC%) were significantly lower in CF compared to healthy participants, while LCI and VDP were significantly higher in the CF cohort. Neither FV nor CoV FV were found to significantly distinguish health from disease at visit 1 (FV: P = 0.75, CoV FV : P = 0.58) or visit 2 (FV: P = 0.74, CoV FV : P = 0.08). Box plots of FV and CoV FV in healthy and stable CF participants for visit 1 and visit 2 are shown separately in Fig. 1. FV histograms were determined to be generally Gaussian, with skewness between À0.5 and 0.5 (median [IQR] À0.021[À0.27, 0.40]) at baseline (Fig. 2). In contrast, the CoV FV histograms were generally right skewed with skewness 2.1 [1.79, 2.33] at baseline (Fig. 2). Additionally, the skewness of FV significantly distinguished between cohorts at baseline but not 1 month later (P = 1) or for CoV FV at baseline (P = 0.79) or 1 month (P = 0.23) (Fig. 3).

Subject Demographics
Evaluation of Repeatability Figure 4 shows FV and CoV FV maps from a representative CF patient and healthy participant. While intensity variations between scans could be noted at the borders of the lung and In the Wilcoxon rank sum test, P < 0.05 was considered to be significant. *One data point was excluded in the stable CF cohort for VDP as Xe-MRI data was not available. **Three data points were excluded in the stable CF cohort as MBW Xe-MRI data was not available.
near the thoracic cavity where registration artifacts were apparent, the maps were generally consistent between repeated scans with similar regions of low FV and high CoV FV (eg white arrows in Fig. 4). For some subjects, artefactual regions were present due to invalid calculations of T 1 as a result of noise (eg blue arrows in Fig. 4). In most subjects with artifacts, they comprised <5% of the possible FV values but ranged up to 20%. Heart motion artifacts were also present on the signal intensity images and subsequently in FV and CoV FV maps, which were not completely removed during registration due to nonselective excitation and affine registration constraints. A summary of the quantitative analysis of intravisit and intervisit repeatability for all subjects is shown in Table 2   patients. CV% was highest for CoV FV in stable CF (CV % = 17.43%) and below 10% for CoV FV in healthy participants and FV in both health and stable CF. CR and MDD were lower in healthy participants compared to stable CF with the exception of the MDD for FV, which was lower in the stable CF cohort. The linear correlation between repeated scans was strong (i.e. R > 0.70) for all metrics, but not significant for FV in stable CF (P = 0.083). VDP had higher within-visit ICC (0.93 [0.71, 0.98]) relative to MBW Xe-MRI metrics in stable CF, but FV in stable CF had a lower CV% (7.79%) compared to VDP (11.82%).
For intervisit scans, FV in healthy participants had the strongest overall repeatability with the highest ICC . ICC values were again stronger in FV compared to CoV FV and higher in healthy participants compared to the stable CF cohort. CR and MDD were lower in healthy participants compared to stable CF, with the exception of the MDD for FV, which was lower in the stable CF cohort, similar to the intravisit repeatability. The correlation between repeated scans was strong (i.e. R > 0.73) for all metrics, though not significant for CoV FV in healthy participants (P = 0.061). Intervisit ICC for VDP in stable CF was low (ICC = 0.68 [0.30, 0.88]) compared to MBW Xe-MRI metrics. CV% for VDP in stable CF (18.58%) was higher than for FV (11.05%) but not CoV FV (24.46%).
Together, the ICC, CV%, CR, MDD and R demonstrated good intravisit and intervisit repeatability for FV and CoV FV . Both CV% and CR were lower within intravisit scans compared to intervisit, while ICC values were comparable for all metrics except CoV FV in stable CF, where it dropped most markedly intervisit compared to intravisit. MDD for FV and CoV FV was also lower for intravisit scans compared to intervisit scans, except for CoV FV in CF, where it was higher.
Absolute and percent changes for FV and CoV FV are reported in Table 3. For intravisit scans, CoV FV in healthy subjects had the lowest percent change (1.8%) while CoV FV in disease had the highest percent change (À9.7%). For intervisit, FV in healthy subjects had the lowest percent change (À0.77%), while CoV FV in disease had the highest percent change (17%). In both cases, the magnitude of absolute and percent changes was higher in disease compared to health. VDP had the lowest percent change for intravisit (1.3%), but  FV had the lowest percent change for intervisit (À7.2%) compared to VDP (À12%), while CoV FV had the highest percent change (17%). FV and CoV FV values in both cohorts were not significantly different for intravisit and intervisit (Table 4). Figure 5 shows the intravisit and intervisit Bland-Altman plots for FV and CoV FV , respectively. FV and CoV FV differences were determined to be normally distributed via the Shapiro-Wilk test meaning the limits of agreement could   be accurately estimated using the 95% confidence interval of the SD of the differences. In all four plots in Fig. 5, almost all data points lie within the limits of agreement. Linear regression was not statistically significant for FV (intervisit: P = 0.59, intravisit: FV [P = 0.076]) and yielded an intercept close to 0, indicating no proportional bias; however, a positive proportional bias (R = 0.56) was observed for intervisit CoV FV , and a negative proportional bias (R = À0.54) was observed for intravisit CoV FV . The bias line for all plots was close to 0 and smaller than both the CR and MDD, thus it was determined to be not considerable. Additionally, data points were scattered above and below the bias line in all  plots indicating no consistent bias for one approach vs. the other. Measurement variability was observed to be independent of magnitude in all plots. For both FV and CoV FV , intravisit limits of agreement were narrower than intervisit limits of agreement.

Discussion
This study assessed the intravisit and intervisit repeatability of regional ventilation measures (FV and CoV FV ) derived from MBW Xe-MRI in pediatric stable CF subjects and agematched healthy controls. We observed: 1) FV and CoV FV had high intravisit and good intervisit repeatability in both healthy participants and stable CF patients, 2) intravisit scans had higher repeatability than intervisit scans, 3) FV had higher intravisit and intervisit repeatability than CoV FV , and 4) repeatability was higher in healthy participants compared to stable CF patients. Horn et al previously demonstrated that ventilation features in healthy adult participants had good qualitative agreement between scans using HP 129 Xe and HP 3 He in the same individual on the same day. 31 Our study extends this work by quantitating the interscan repeatability in a stable pediatric cohort indicating that these ventilation features are also consistent 1 month after MRI, as expected.
In healthy and stable CF participants, intravisit scans were found to be more repeatable than intervisit scans with lower CV%, lower CR, lower absolute and percent changes between scans, and narrower limits of agreement for both FV and CoV FV . In stable CF, the higher intervisit variability may be attributed to changes in, or progression of, disease not reflected by clinical symptoms. This is consistent with an intervisit repeatability study of LCI in pediatric CF, where the variability in stable CF over 1-3 months was higher (AE25%) compared to healthy participants (AE15%). 32 In this study, for healthy participants, FV had similar repeatability between intravisit and intervisit, as expected. The slight increase in intervisit variability may be attributed to retraining of the MBW Xe-MRI breathing maneuver at 1 month. For CoV FV in healthy participants, intervisit repeatability markedly decreased compared to intravisit. For healthy participants CoV FV was restricted to small (i.e. near 0) values. Therefore, small changes in CoV FV due to differences in effort attributed to retraining of the MBW Xe-MRI maneuver can represent large percentage changes and therefore produce a larger variability.
It is interesting that the proportional bias present in both intravisit and intervisit CoV FV Bland-Altman plots were of similar magnitude but in a different direction. The positive bias for intervisit scans indicates that participants with higher average CoV FV have larger variability (i.e. their second visit had a higher CoV FV ) between measurements taken 1 month apart. This is expected, as these individuals may have defect regions that are more variable over time, and this may also be indicative of disease progression. On the other hand, a negative intravisit proportionality indicates that individuals with higher CoV FV tend to have a lower CoV FV at the second scan 1 hour apart. This may be due to the breathing effort enabling a clearing of the lungs before the second scan. It is important to note that both of these proportionality biases are weighted heavily by high  outlier CoV FV data points and may be affected with improvements in CoV FV analysis. Between the two MBW Xe-MRI-derived parameters, FV was found to have better overall repeatability than CoV FV with higher ICC, lower CV% and lower percentage changes. FV was also found to have better intervisit repeatability than VDP. VDP may reflect mobile mucus plugs and other blockages, which are more variable over time, while FV is more reflective of ventilation dynamics that are more stable over time. However, differential mucus plugging may allow for some variation in which areas of the lung are probed and accessible by 129 Xe gas, thus accounting for variations in FV within and across visits. These variations may be more stable than in VDP as FV is reported as the average of all voxels in the map.
As CoV FV is derived from FV maps, it was unexpected to find that FV and CoV FV should differ in their repeatability. This may result from the differing distributions of FV and CoV FV maps. Since the distribution of FV values across the lung for all subjects were observed to be Gaussian, the mean is a good representation of the histogram of FV values and is less sensitive to outliers, thus more repeatable. In contrast, CoV FV maps were right skewed, thus the mean is highly affected by outliers which can occur due to registration difficulties stemming from large tidal volume changes and result in high apparent CoV FV at the edges of the lung.
Both FV and CoV FV were unable to distinguish between healthy and stable CF cohorts at either visit. The inability of FV to distinguish between cohorts is consistent with previous literature. 17,20 While the change was not significant, mean FV was observed to be higher in stable CF patients compared to the healthy cohort in the baseline visit, which is consistent with a feasibility study of MBW Xe-MRI in pediatric CF. 17 This was unexpected, as individuals in the CF cohort had lower FEV 1 % and higher LCI than the healthy cohort making a decrease in FV more expected. One hypothesis is that regions with a MBW Xe-MRI signal compensate for regions which do not. In other words, FV in the ventilated regions increases overall in order to compensate for unventilated regions. This was observed when the skewness of FV in the stable CF cohort at baseline was significantly lower (i.e. left skewed) than the healthy cohort (right skewed). Left-skewed indicates that the bulk of the FV values in the map are shifted to higher values, while the tail of the distribution is toward lower FV values. These lower FV values may represent defects in the ventilation map, with the bulk of the distribution shifted higher to compensate for the defect regions. However, this result was not repeatable in visit 2, likely because of the small sample size.
As MBW requires cooperation, FV may be affected by failure to comply to the coached maneuver. Some children, particularly with CF, were observed to perform rapid cycles of shallow breathing after the initial breath-hold, resulting in a greater apparent signal decay at the start of washout and therefore a higher FV. Interestingly, the opposite and expected trend in FV was observed in visit 2. Given the conflicting results between visits, it is difficult to make a conclusive statement about the nature of FV at this stage. The broad range of values in healthy participants is consistent with previous work in healthy adults using MBW 3 He MRI 18 and may indicate that the absolute value of FV is sensitive to participant effort and thus non-discriminative between cohorts. However, the importance of changes in FV in CF patients over time is still unknown.
CoV FV was also unable to significantly distinguish between cohorts, consistent with Couch et al but not Horn et al, the latter reporting that the SD of FV distinguished between children (age 6-11) with CF and age-matched healthy controls. 17,20 While unable to distinguish between cohorts, CoV FV was observed to trend higher in CF subjects compared to healthy participants, suggesting that higher CoV FV (i.e. more heterogeneity) may be reflective of disease.
CoV FV was also found to significantly correlate with LCI and VDP, while almost significantly correlating with FEV 1 %. Correlations with LCI and FEV 1 % are consistent with previous literature. 20 LCI, FEV 1 %, and VDP were able to discriminate between healthy and stable CF cohorts, suggesting that CoV FV may be able to do so given improvements in the imaging method (i.e. thinner slices, better registration) and larger sample size. VDP correlated more strongly with LCI than CoV FV . This high correlation may be because VDP and LCI are reflecting similar pathophysiologic characteristics, namely ventilation inhomogeneity. 6 However, CoV FV is arguably a more direct measure of ventilation inhomogeneity than VDP. These findings may suggest that VDP reflects airway dead space instead of ventilation heterogeneity and/or alveolar dead space as LCI is expected to be affected by both. 33,34 This is a potential advantage of MBW Xe-MRI over VDP, as VDP may miss an important dynamic ventilatory contribution necessary to completely describe CF lung disease pathophysiology. VDP can potentially be supplemented with MBW Xe-MRI information (specifically CoV FV ) in future treatment investigations. FV did not correlate significantly with PFTs or VDP, suggesting that FV is measuring something not captured by PFTs.
Overall, repeatability was good for all metrics reflected by high ICC values, good agreement, and no significant differences between repeated scans. Since FV and CoV FV are repeatable in clinically stable CF, these findings suggest that MBW Xe-MRI may be useful in the future for detecting changes in CF patients due to treatment response from interventions, especially if limitations that hamper sensitivity of the techniques are resolved. FV and CoV FV can provide insight into ventilation and ventilation heterogeneity, respectively, that is reflected by LCI while also revealing the spatial distribution of these heterogeneities, similar to VDP. As a result, MBW Xe-MRI is potentially able to bridge the gap between lung function information provided by PFTs and MRI and provide insight into how these biomarkers compliment, contrast and correlate with one another. This gives it a unique place in the field of MRI. 35 This combined approach may also distinguish regions of obstruction (i.e. completely unventilated regions) from regions that are slower to ventilate, making it potentially more reflective of subtle disease changes compared to VDP alone. Future studies will assess the responsiveness of MBW Xe-MRI compared to PFTs and VDP for monitoring treatment effects longitudinally in a pediatric CF cohort receiving CFTR-modulator therapy similar to previous studies. 36 Limitations This work is based on the washout model by Deninger et al. 13 A limitation of this model is that rebreathing of gas is not explicitly considered. Thus, FV as calculated in this work reflects voxel-wise 129 Xe gas changes that may include both washout and rebreathing contributions. At the alveolar level, it can be argued that the contribution of rebreathing is minimal, given the dichotomous branching structure of the lung. The same cannot be said for the larger airways. However, in this work, larger airways were avoided through masking, so they did not contribute to the average FV reported.
This study was a prospective longitudinal cohort analysis with a projected recruitment of 25 healthy and 25 stable CF individuals; however, this target was not met, resulting in a small sample size. This may explain the unexpected variations in MDD and lack of significance in Pearson's R and correlations. This also limits the usage of the ICC as a measure of agreement and test-retest reliability in this study as the confidence intervals are very wide. Additionally, the indication of mild CF lung disease may have resulted in similar values between groups (i.e. between participants), thus widening the confidence intervals for ICC. However, as ICC reflects both correlation and agreement, a high estimated ICC, good correlation, and good agreement support the usage of ICC in this work despite the wide confidence intervals.
In this study, repeatability was analyzed using statistics based on the image-wide FV/CoV FV . Spatial similarities were not robustly compared. In future, dedicated analysis of specific spatial patterns and their changes over time by coregistering and analyzing image similarities with DICE coefficients or other similarity measures may be of interest to support MBW Xe-MRI repeatability.
The inability of FV and CoV FV to distinguish between cohorts is likely influenced by the 2D nature of MBW Xe-MRI as performed in this work, consisting of a single thick coronal slice (200 mm) centered on the chest cavity. Smaller regions of abnormal lung function in the anterior-posterior direction were averaged together, resulting in no anteriorposterior spatial resolution and partial volume effects. MBW Xe-MRI will benefit from faster sequences which allow for volumetric coverage and a general improvement in spatial resolution in future. 18 Alternatively, the relatively rapid irrecoverable signal decay due to T 1 relaxation may limit the number of breaths needed for FV and CoV FV to meaningfully discriminate between cohorts and correlate with PFTs or VDP which, according to LCI washout curves, occurs after $10 breaths (i.e. a few minutes). 37 MBW Xe-MRI is typically limited by SNR to <8 washout breaths (Fig. S1 in the Supplemental Material), after which the signal is irrecoverably depleted due to a combination of RF depolarization and T 1 relaxation, in addition to gas washout. Acquiring washout images at higher breath numbers may enable better discrimination of FV and CoV FV between cohorts and improve correlations with PFTs and VDP. Furthermore, because administration of gas was limited to a single anoxic bolus prior to washout, the results may be confounded by incomplete washin of gas. Additionally, this constraint makes it difficult to perform extended washout (or in) experiments. This may be mitigated by using a non-HP and normoxic gas mixture such as 21% O 2 /79% perfluoropropane (PFP), allowing for extended free-breathing washin and washout experiments that more thoroughly probe the gas kinetics of the lungs. 38,39 For instance, MBW PFP-MRI was found to distinguish health from mild and moderate CF in adults, supporting the usage of PFP as a contrast gas for future MBW MRI studies. 38 However, MBW Xe-MRI may be preferred over PFP-MRI due to higher SNR afforded by hyperpolarization and the use of standard pulse sequences such as 2D GRE.
A further limitation is that, for some FV and CoV FV maps, regions of artifacts are apparent due to invalid calculations of T 1 (i.e. negative T 1 values) as a result of noise. This may conceal regions that otherwise contribute to the maps. Alternative methods of variable T 1 analysis that can approximate the T 1 value for these regions (eg filling the missing T 1 values with the average of the T 1 map) may be necessary to avoid losing ventilation information in the FV map, which may improve discrimination between cohorts as well as correlations. However, as most subjects have artifacts comprising of <5% of the possible FV values, this is not expected to have a large effect but may be relevant for subjects with artifact percentages up to 20%.
For CoV FV specifically, as the distribution of the CoV FV map is right-skewed, the mean does not accurately represent the whole map. Analysis using the median may be a better representation of the CoV FV distribution and may reduce the effect of outliers arising from misregistration, likely due to tidal volume differences. Horn et al mitigated this by using pneumotachograph recordings to exclude data when a tidal volume change of >AE15% occurred. 31 An MRIcompatible pneumotach was not employed here, a technical limitation of this study. In future, MRI data can potentially be used to estimate tidal volume and identify changes greater than AE15% to exclude from washout. This consideration, with better registrations and voxel-by-voxel analysis, may improve the repeatability of CoV FV in future. Additionally, analyzing CoV FV by taking median/IQR as opposed to mean over SD may reduce the impact of outliers and improve the ability of CoV FV to discriminate between cohorts. Importantly, VDP was calculated from multiple coronal slices while MBW Xe-MRI is a single-slice analysis. Multislice MBW Xe-MRI may improve the ability of FV and CoV FV to correlate with PFTs and VDP in the future. Lastly, cardiac triggering was not used here, a potential limitation of this work. Image quality may be improved by employing MRIcompatible ECG electrodes to remove artifacts due to heart motion. 40

Conclusion
This study demonstrated the high intravisit and intervisit repeatability of MBW Xe-MRI ventilation imaging in patients with stable CF subjects and age-matched healthy controls. CoV FV was also found to correlate with LCI. This together with the regional functional assessment provided by the technique makes it a promising approach for following pediatric CF lung disease progression and treatment response.