The importance of data issues when comparing cystic fibrosis registry outcomes between countries: Are annual review FEV1 in the UK only collected when subjects are well?

Abstract Rationale, aims and objective Cross‐country comparisons of cystic fibrosis (CF) outcomes can potentially identify variation in care but are dependent on data quality. An important assumption is that the UK annual review FEV1 is only collected during periods of clinical stability. If this assumption does not hold, results of FEV1 comparisons may be biased in favour of registries with encounter‐based FEV1. We aimed to test the assumption that CF annual reviews in the UK are only performed during periods of clinical stability. Method Prospective encounter‐based data collected in Sheffield (n = 174) was used to establish whether annual review FEV1 were always collected during periods of clinical stability and to determine the group‐level discrepancy between annual review vs best FEV1. We then went on to quantify the group‐level discrepancy between annual review and best annual FEV1 readings within the UK registry (n = 2995) to determine if the differences observed in Sheffield also apply to the wider UK data. Results Sheffield results showed a group‐level discrepancy between best and annual review FEV1 of −2.5% (95% CI −3.95% to −1.2%) for annual reviews performed during periods of clinical stability (n = 50). The group‐level discrepancy is larger at −8.0% (95% CI −11.2% to −4.9%) among annual reviews performed during periods of clinical instability (n = 13). Therefore, the magnitude of this group‐level discrepancy is a surrogate for the proportion of clinically stable annual reviews—smaller discrepancy indicates a higher proportion of clinically stable annual reviews and vice versa. The overall group‐level discrepancy in the UK registry (−5.6%, 95% CI −5.9 to −5.4%) was similar to Sheffield (−6.1%, 95% CI −7.1 to −5.1%). Around 20% of the clinician reviewed, annual reviews in Sheffield were performed during periods of clinically instability. Conclusions Annual review FEV1 underestimates lung health of adults with CF in the UK and may bias cross‐country comparisons.

because of the assumption that the UK annual review FEV 1 was always collected "when subjects are well." 7 This assumption has never been formally tested.
We investigated this issue by using prospective Sheffield Adult CF Centre encounter-based FEV 1 data to establish whether annual review FEV 1 were always collected during periods of clinical stability.
We then went on to repeat our analysis using data from the UK CF registry to determine if the Sheffield findings also apply UK-wide.

| METHODS AND MATERIALS
Encounter-based FEV 1 data were prospectively collected in the Sheffield Adult CF centre between 1 January and 31 December 2016 from every adult who contributed data to the UK CF registry, excluding those who had lung transplantation (n = 7) or on ivacaftor (n = 13).
Annual reviews were performed according to usual practice. In addition, clinicians' opinion of health status and Fuchs' criteria 10 were recorded during every encounter involving clinician review, including outpatient clinics, ward reviews, and home visits. FEV 1 readings were deemed to be taken in a period of clinical stability if there was no exacerbation, no requirement for intravenous antibiotics, and ≤3 Fuchs' symptoms present. Every annual review FEV 1 was matched to another clinically stable FEV 1 that was closest to the annual review.
Mean paired difference and paired t test P-value were calculated.
Non-parametric comparisons were also performed to check the robustness of the results.
The UK registry has no "stable FEV 1 " data but collects best FEV 1 data since 2012 for the European registry. 11 We therefore quantified the group-level discrepancy between best FEV 1 and annual review FEV 1 in both Sheffield 2016 (best FEV 1 data in Sheffield represent the highest FEV 1 reading between 1 January and 31 December 2016) and the UK registry 2014 datasets among people aged ≥16 years to determine if the differences observed in Sheffield also apply UK-wide.
The UK registry data were collected during annual reviews between 1 January and 31 December 2014. The best FEV 1 data in the UK registry represent the highest FEV 1 reading in the 1-year period prior to the date of annual review (ie if a person had annual review on 1 July 2014, the highest FEV 1 reading between 1 July 2013 and 1 July 2014 should be that person's "best FEV 1 " for 2014).
People who had lung transplantation (n = 330) or on ivacaftor (has transformative effect on lung health but unavailable commercially in 2010, 12 n = 281) in the UK registry were excluded. People attending the adult Sheffield CF centre were also excluded to avoid duplicate analysis of the same cohort.
All analyses were performed by using SPSS v22 (IBM Corp, Armonk, NY, USA). Where statistical tests were performed, a P-value <.05 was considered to be statistically significant. Regulatory approval for the analysis of prospective Sheffield data was granted by the

| RESULTS
A total of 174 adults were included for Sheffield analysis and 2995 adults for the UK CF registry analysis. Adults with and without best FEV 1 data in the UK CF registry shared similar clinical characteristics (see Table 1). There was significant group-level differences between annual review vs matched clinically stable FEV 1 in Sheffield (mean −2.9%, 95% CI −3.8% to −1.9%), with similar differences among those with paired readings within 30 days or >30 days apart. Not every episode of clinical instability was accompanied by acute FEV 1 decline, but variability in FEV 1 measurements meant that best FEV 1 would tend to exceed annual review FEV 1 even when annual review was performed during clinical stability. Sheffield results suggested a group-level discrepancy between best and annual review FEV 1 of −2.5% (95% CI −3.9% to −1.2%) for all annual reviews performed during periods of stability (see Table 2). For all annual reviews performed during periods of clinically instability, the group-level discrepancy was larger at −8.0% (95% CI −11.2% to −4.9%). In Sheffield, whereby 20% of the clinician reviewed annual reviews were performed during periods of clinical instability, the overall group-level discrepancy between best and annual review FEV 1 was −6.1% (95% CI −7.5 to −5.1%).
A similar overall group-level discrepancy of −5.6% (95% CI −5.9% to −5.4%) was observed in the UK registry, suggesting that the proportion of annual reviews performed during periods of clinical instability around the UK was similar to Sheffield. This discrepancy was larger among younger adults, similar to the pattern of FEV 1 discrepancy observed in the US-UK comparison. 7 Similar results were obtained with non-parametric comparisons (see Table 3), suggesting that our estimates are robust.

| DISCUSSION
This is the first study to empirically demonstrate that annual review FEV 1 in the United Kingdom were not always collected during periods of clinical stability. We found that the magnitude of group-level discrepancy between best and annual review FEV 1 was larger for annual reviews performed during periods of clinical instability, compared with annual reviews performed during periods of stability. Therefore, the magnitude of this group-level discrepancy is a surrogate for the proportion of clinically stable annual reviews-smaller discrepancy indicates a higher proportion of annual review performed during periods of stability and vice versa. Our results suggest that around 20% of all annual reviews in the United Kingdom may be performed during periods of clinical instability and that annual review FEV 1 in the UK registry underestimated lung health of adults with CF at a group level by 2% to 4% in comparison to clinically stable FEV 1 .
This may bias the US-UK FEV 1 comparison against the UK, because FEV 1 when stable was the intended comparison metric in that analysis. %FEV 1 in our analysis was calculated by using Knudson equation but similar results would be obtained with GLI equation because paired difference between 2 FEV 1 readings was calculated. 13 Our analysis was restricted among adults due to data availability in Sheffield. Although most of the US-UK FEV 1 differences were among younger people, the lack of differences among older adults does not exclude the possibility that lung health at a group level in the United Kingdom was being under-estimated.
Our analysis cannot conclusively prove that the US-UK FEV 1 comparison was biased because some "clinically unstable" FEV 1 in the United States may be mislabelled as "clinically stable." However, we speculate that under-estimation of lung health may be more of a problem with the UK data entry system, which does not have encounter-based FEV 1 data. Data are typically only entered once annually in the UK with a mid-January deadline to complete data entry for preceding year, yet annual reviews are staggered throughout the year due to capacity issues. Around 40% of annual reviews are performed during the final quarter of the year, 7 when exacerbation risks are higher. 14    For the UK CF registry dataset in 2014 (n = 2995) e nutritional outcomes), which is not surprising given that Australian children were much more likely to be diagnosed after newborn screening (65.8%) compared with US children (7.2%). 5 Australia also delivered more aggressive treatment for pulmonary exacerbations, 5 which contributes to better lung health. [16][17][18] Despite the very strong correlation between nutritional outcomes and lung health, [19][20][21] were actually similar between Australian and US children. 5 In fact, Australian children had significantly lower FEV 1 after adjusting for the mode of diagnosis. 5 In 2003, the US registry started collecting encounter-based FEV 1 data whilst the Australian registry was collecting FEV 1 data annually. 5 It may be that annual FEV 1 in Australia was under-estimating the lung health of Australian children, which could explain the disconnect between nutritional outcomes and lung health observed in the US-Australia comparison.
Differences in outcomes detected by registry comparisons attract significant attention; hence, a rigorous process should be adopted to  For the UK CF registry dataset in 2014 (n = 2995) f interpret the results. The "pyramid of investigation" model advocates an incremental approach to understand outcome variation, starting with data review and only inferring differences in the quality of care (eg mucolytic prescriptions) where data are robust. Attention should be paid to differences in data collection systems because systematic bias in data cannot be easily controlled with statistical methods, even for objective outcomes, e.g. survival. 22 Best FEV 1 may be more reliable than annual review FEV 1 but may still under-estimate lung health if these data were only collected once a year, as suggested by the US-Australia comparison. Indeed, best FEV 1 data are most robust if all FEV 1 readings are recorded in a single database, such that the highest reading over a given time period can be automatically and accurately identified. Harmonization of data collection system for CF registries around the world using encounter-based data entry would enable more accurate cross-country comparisons and also allow the use of other potentially more sensitive metrics such as FEV 1 variability for comparison. 23 Systematic data differences should be considered when analysing data and interpreting results from cross-country registry comparisons.
We have demonstrated that UK annual reviews are not always collected during periods of clinical stability. This has potential impact on comparisons with the US registry that collects encounter-based FEV 1 .