Prospectively predicting Pseudomonas aeruginosa infection/s using routine data from the UK cystic fibrosis register

Abstract Rationale and aims Lung health of people with cystic fibrosis (PwCF) can be preserved by daily use of inhaled therapy. Adherence to inhaled therapy, therefore, provides an important process measure to understand the success of care and can be used as a quality indicator. Defining adherence is problematic, however, since the number of prescribed treatments varies considerably between PwCF. The problem is less pronounced among those with Pseudomonas aeruginosa (PA), for whom at least three daily doses of nebulized therapy should be prescribed and who thus constitute a more homogeneous group. The UK CF Registry provides routine data on PA status, but data are only available 12 months after collection. In this study, we aim to prospectively identify contemporary PA status from historic registry data. Method UK CF Registry data from 2011 to 2015 for PwCF aged ≥16 was used to determine a pragmatic prediction rule for identifying contemporary PA status using historic registry data. Accuracy of three different prediction rules was assessed using the positive predictive value (PPV). The number and proportion of adults predicted to have PA infection were determined overall and per center for the selected prediction rule. Known characteristics linked to PA status were explored to ensure the robustness of the prediction rule. Results Having CF Registry defined chronic PA status in the two previous years is the selected definition to predict a patient will have PA infection within the current year (population‐level PPV = 96%‐97%, centre level PPV = 85%‐100%). This approach provides a subset of data between 1852 and 1872 patients overall and a range of 8 to 279 patients per center. Conclusion Historic registry data can be used to contemporaneously identify a subgroup of patients with chronic PA. Since this patient group has a narrower treatment schedule, this can facilitate a better benchmarking of adherence across centers.

K E Y W O R D S adherence, cystic fibrosis, outcome, process, quality improvement

| INTRODUCTION
Cystic fibrosis (CF) is an archetypal long-term condition for which there is as yet no cure, but regular use of preventative therapies can improve health outcomes by reducing the frequency of exacerbations and attenuating lung function (forced expiratory volume in 1 second, FEV 1 ) decline. 1 However, adherence to preventative inhaled treatments in CF is generally low (30%-35%) and low adherence is associated with worse health outcomes. 2,3 Measuring adherence at an individual level is important for treatment planning and diagnosis in the event of declining lung health. Measuring system-level adherence is also important because benchmarking across centers within a community of practice can provide a basis to share strategies for improvement. Various quality improvement (QI) initiatives in CF have transformed the delivery of healthcare, for example, streamlined approaches to managing acute exacerbations and increased prescription of efficacious preventative inhaled therapies. 4 Improving adherence to inhaled therapies is, therefore, a logically important target for QI projects. QI projects focused on adherence can compare between and within centers, therefore providing an understanding of the variation of care across UK centers.
System-level indicators are most useful when they are independent of local constraints on data capture since the factors that impact quality of care may also impact data capture. Data-logging nebulizers available in CF can objectively and accurately record each dose of inhaled treatments taken by patients every day. 5 However, accurate prescriptions data may be unavailable, which impacts the ability to calculate percentage adherence. Yet adherence among people with CF is typically presented as a percentage because the required number of treatment doses will vary according to disease severity. A person without chronic PA and FEV1 80% may only require dornase alfa once daily, whereas someone with chronic PA, FEV1 30% with large volume of sputum may require alternating antibiotics twice daily, dornase alfa once daily, and hypertonic saline twice daily.
Identifying a subgroup of people with CF (PwCF) where the prescription, and therefore denominator for adherence, can be implied using routine data, despite local challenges in data capture, is an important step toward providing adherence comparisons at the center level. One option is to focus on those who have chronic P. aeruginosa (PA) infection as consensus exists on the treatment these patients should receive. All major CF guidelines recommend the use of longterm inhaled antibiotics and mucolytics for people with chronic PA infection, and this provides a minimum denominator of three drugs for those PwCF with chronic PA. Using registry data to identify patients with chronic PA and designating that these patients should receive at least three daily doses of inhaled drugs (denominator) allows adherence data captured by CFHealthHub 6 (which provides numerator data) to calculate an adherence rate even when prescription data are missing. Although these data will be less accurate in centers without accurate prescription data, this method of calculating adherence data does allow a standardized approach to be used in a well-defined subset of PwCF allowing center benchmarking that can start an informed exploration of practice. 7 The aim of system-level indicators using routine data where possible allows all centers to be included within comparisons, even those that do not have the resources to provide data consistently as the identification of the subgroup with PA uses routine data from the CF Registry. Published UK CF Registry provides historic data on PA status 1 year in arrears. As PA status may fluctuate over time, it is important that the contemporary status is understood as accurately as possible to identify the PA subgroup. Thus, the overarching aim of this paper is to develop a rule to accurately predict which patients will have PA in the current year on the basis of previous years' data collected routinely in the UK CF Registry. It is important to ensure that any prediction rule has an appropriate compromise between sensitivity and specificity to identify a sufficient number of people to allow center comparisons. Thus, the analysis described in this paper sought to determine the optimal rule for prospectively identifying patients who have PA in the current year (trading off accuracy and number). In addition, we explored how other patient characteristics that might be expected to be associated with PA status such as age were related to the PA identifying strategies used.

| METHODS
Retrospective data were collected from the UK CF Registry database for annual reviews between 2011 and 2015 from all UK CF centers.
National Health Service (NHS) research ethics approval was granted for the use of UK CF Registry data (Huntingdon Research Ethics Committee 07/Q0104/2), and under these terms, the UK CF Trust Steering Committee approved this study. The data received had pseudo-anonymized patient and center details.
The following data were obtained from the registry: • Demographics: CF center identifier, gender, age The rules aim to predict whether a patient will have any positive PA samples within the current year (ie, been classified as "chronic" or "intermittent" by the CF Registry).
The accuracy of the prediction rule was assessed using the positive predictive value (PPV) of the prediction against the outcome of the patients having positive PA samples within the current year based on past years' data. The definition producing the highest PPV (or equivalently, the smallest proportion of patients misclassified as having PA) was taken on to further analysis in which we assessed the impact of increasing the number of years of past data to include in the prediction rule. We calculated the PPV for one, two, three, and four consecutive years' PA diagnosis in order to find the best compromise between accuracy and number of patients included in the subgroup.
Once a final prediction rule had been selected, we used logistic regression to assess the robustness of the prediction rule by calculating the PPV adjusted for other patient characteristics. The covariates included in the model were as follows: age, pancreatic status, FEV 1 , BMI, presence of CFRD, IMD, and IV days all as fixed effects, along with center as a random effect. The functional form of continuous covariates was assessed visually using lowess smoothing 12 and using fractional polynomials 13 ; this is particularly important for age, since the relationship has been shown to be nonlinear within younger patients. 14 As multiple years' data are available to make predictions, the model was repeated with all data possible, allowing the consistency of the covariates' effect on the prediction to be assessed. The discriminant ability of the model was quantified using the area under the receiver-operator curve. Since most patients contributed to more than 1 year, the findings of separate predictive models were not combined.
Funnel plots 15 were created to show the PPV by center and identify any outliers from the prediction rule, using both unadjusted and adjusted PPV. No imputation was performed on missing data. All analyses were performed using R V3.4.1 16 and Stata V15. 17

| RESULTS
The flowchart in Figure 1 Table 2).
Using the prediction rule of chronic PA status in previous years, the optimum number of consecutive years' data to use was explored and the results are shown in Table 3. As expected, the PPV increases with extra years' data (range: 94%-98%), and Figure 2 shows this against the number within the subset, which decreases as more years' data are included. Taking the two together suggests the optimal threshold for the prediction rule is to select those with two successive years of a chronic PA status to predict the current years' status, thereby allowing a sufficiently large sample to be used for future research (average N = 1862) without appreciably reducing the confidence in the accuracy of the diagnosis (average PPV = 96%). The contingency table for the PPV calculations of this chosen prediction rule for 2013 is shown in Table 4, and the contingency tables for 2014 and 2015 are in Table A1 and the resulting funnel plots in Figure A1.

| Center comparisons
The PPV was reasonably consistent across the 27 centers (range: 85%-100%). The number selected in the subgroup for each center had a median of at least 49 (from 2013 prediction year) with a range from 8 to 279 (over all prediction years). More details are in Table A2.

| Predictive modeling
The final analyses sought to identify whether the PPV were associated with the patient's demographic and clinical features. These analyses included patients that were recorded as having PA in two successive years and were aged under 60. Patients over 60 are both uncommon and also much more likely to be pancreatic sufficient than other age groups, suggesting that they have a preponderance of less severe phenotypes. Analyses were undertaken separately for patients in 2013, 2014, and 2015 separately ( Table 5).
The models for three different years showed no consistency for significant predictors, and only age appeared more than once.
In all years, the predictive ability of the model was low with an area under the curve of 0.66, 0.68, and 0.63, respectively, suggesting a small influence of the variables to the prediction.
This suggests that the chosen rule of "chronic" PA for two T A B L E 2 PPV for three initial prediction rules   There are limitations to this work due to the data architecture of the UK CF Registry. Firstly, if PA status is reported as "none," this could be because a person did not have PA infection, but alternatively may arise by the patient not producing any sputum at clinic visits or by missing data. Secondly, advances in CF treatments, including treatments for PA infection, [18][19][20] means that the definition in this study should be reexplored again with future registry data. Thirdly, although guidance on the definition of chronic and intermittent P. aeruginosa status is provided by the CF Registry, it is uncertain how closely this definition is followed by individual CF centers. As encounter-based data are not available for the data CF Registry dataset that was used, confirmation was not possible, and so an important next step will be to validate this rule against a gold standard definition such as the Leeds criteria, 21  with higher adherence to treatments linked to better health outcomes. 26 The subgroup of people predicted to have PA in the current year based on historic registry data provide a homogenous group in whom to compare adherence rates with consensus about minimum treatment regimens (mucolytic plus antibiotic), allowing a minimum denominator to be imputed if prescription data are unavailable [27][28][29] ).
Additionally, the use of convenience samples is a well-understood limitation of comparing adherence between cohorts or centers. 30