Feasibility of using Clinical Practice Research Datalink data to identify patients with chronic obstructive pulmonary disease to enrol into real‐world trials

Abstract Purpose To assess the feasibility of using Clinical Practice Research Datalink (CPRD) data for identifying populations of patients with chronic obstructive pulmonary disease (COPD) eligible for a hypothetical pragmatic trial. Methods A retrospective multidatabase cohort study using CPRD primary care and linked secondary care data to describe the characteristics of populations of patients with COPD. Patients' demographic and lifestyle factors, comorbidity profile, spirometry measurements and treatment changes were evaluated, as was the distribution of follow‐up time and types of losses during follow‐up. Characteristics were evaluated using descriptive statistics. Results A total of 322 991 patients from 1148 primary care practices in the United Kingdom across two CPRD primary care databases, CPRD GOLD and CPRD Aurum, were potentially eligible to participate in a hypothetical trial using CPRD, starting on 31 December 2017. Patients with COPD in CPRD GOLD and CPRD Aurum were comparable in terms of age (median age 70 vs. 68 years), gender (50% vs. 52% male), disease severity (e.g., 25% vs. 24% Medical Research Council [MRC] dyspnoea score grades 3–5) and history of respiratory conditions (e.g., 43% vs. 38% asthma). High proportions of patients with COPD in CPRD GOLD and CPRD Aurum were available on 31 December 2012 for follow‐up at 1, 2, and 5 years (92%, 85% and 67%, respectively). Conclusions Patients and data from CPRD GOLD and CPRD Aurum were comparable across key aspects relevant to COPD trials. A pragmatic trial using CPRD to recruit patients with COPD is scientifically feasible.


| INTRODUCTION
Pragmatic trials test the real-world effectiveness of treatments with the aim of informing clinical practice, allowing diverse populations to be studied and providing good external generalisability of the trial results. 1 Interest in these trials has increased, due partially to technology advances and increased use of electronic health records (EHRs). 2,3 The Salford Lung Study (SLS) is an example of a real-world pragmatic trial with broad inclusion/ limited exclusion criteria, providing data with direct clinical applicability. The Salford Integrated Record was used to link primary and secondary care data in the SLS, providing comprehensive patientlevel EHRs. 4 The Clinical Practice Research Datalink (CPRD) is a real-world research service supporting retrospective clinical studies. 5 CPRD collect, clean and process de-identified patient data using EHRs from a sample of general practitioner (GP), that is, primary care, practices in the UK that use either the Vision or EMIS software systems contributing to the CPRD GOLD or CPRD Aurum primary care databases, respectively. 6,7 These de-identified databases have been individually linked to secondary care and other health-and areabased datasets. 5 EHR data can be linked to a trial database for interventional trials. 8,9 CPRD GOLD has been used previously for observational respiratory research and validated algorithms are available to identify patients with chronic obstructive pulmonary disease (COPD). 10 It has also been used to determine outcomes such as acute exacerbations of COPD with high specificity and positive predictive value. 10,11 However, observational validation studies have yet to be replicated using CPRD Aurum and to date no pragmatic trials in patients with COPD have been conducted using either CPRD database. The overall aim of the current study was to assess the feasibility of using CPRD data to enrol patients in a hypothetical future trial comparing the real-world effectiveness and safety of newly authorised COPD maintenance therapies.

| Objectives
The study had three specific objectives: (1) to estimate the number of primary care patients with COPD in CPRD databases, for whom data were being actively collected from GP practices on

| Study design
In this retrospective study, patients with COPD were identified based on diagnosis codes recorded in primary care using an algorithm previously validated in CPRD GOLD. 10 A cross-sectional design was used to enumerate and describe the population registered in CPRD on 31 December 2017 (objectives 1 and 2). A cohort design was used to describe the distribution of follow-up time and the types of loss-to-follow-up in patients who would have been eligible to participate in the hypothetical pragmatic trial on 31 December 2012 (objective 3).

| Data sources
UK primary care data from CPRD GOLD (December 2018 release) and English primary care data from CPRD Aurum (January 2019 release) were analysed. CPRD GOLD included 16 million individuals (with acceptable quality medical records) from 1987 onwards, from whom data were actively being collected for 2.2 million patients.
CPRD Aurum included 22 million individuals, from whom data were being actively collected for 7.3 million patients. Coded diagnostic data from the CPRD person-level deterministically-linked Hospital Episode Statistics (HES) Admitted Patient Care data were also analysed.

KEY POINTS
• Real-world data from primary-care electronic health records allows for identification of a large, wellcharacterised cohort of patients often used to undertake safety studies, long-term natural history studies or comparative effectiveness research.
• Data from the Clinical Practice Research Datalink (CPRD) GOLD and CPRD Aurum databases were analysed to evaluate the feasibility of identifying patients with chronic obstructive pulmonary disease (COPD) for enrolment into real-world trials.
• A total of 322 991 patients from 1148 general practices in the United Kingdom were identified from CPRD databases for potential trial enrolment using a case study approach.
• Patients and data from the CPRD GOLD and CPRD Aurum primary care databases were broadly comparable across key aspects relevant to a COPD trial.
• A pragmatic trial using the CPRD to recruit patients with COPD is scientifically feasible.
National Health Service (NHS) Digital performed linkage of CPRD data to HES using an 8-stage deterministic methodology. Study investigators had full access to the databases used to create the study population.

| Study populations
The study population comprised patients with COPD registered in CPRD GOLD and CPRD Aurum practices in the UK, including patients F I G U R E 1 Flowchart of patients in the study. COPD, chronic obstructive pulmonary disease; CPRD, Clinical Practice Research Datalink; FEV 1 , forced expiratory volume in 1 s, FVC, forced vital capacity; GP, general practitioner; HES, hospital episode statistics; ICS, inhaled corticosteroid; LABA, long-acting beta-2 agonist; LAMA, long-acting muscarinic antagonist. † Six source populations were: (1) practices contributing to CPRD GOLD, (2) practices contributing to CPRD Aurum, (3) research-active practices contributing to CPRD GOLD, (4) researchactive practices contributing to CPRD Aurum, (5) research-active practices contributing to CPRD GOLD that are also eligible for linkage with secondary-care data (HES) and (6) research-active practices contributing to CPRD Aurum that are also eligible for linkage with secondary-care data (HES). ‡ Enrolment dates: For objectives 1 and 2 (enumeration and description of the populations), the enrolment date is 31 December 2017; for objective 3 (description of distribution of follow-up time and the types of loss-to-follow-up), it is 31 December 2012. § For CPRD GOLD, the requirement was ≥365 days of up-to-standard data available. ¶ Patients met the COPD case definition if they had a COPD clinical code defined by Quint et al 10 in all available history prior to or on the enrolment date and were ≥ 35 years old on enrolment date. Note: The criteria for spirometry and maintenance therapy are not mutually exclusive. Spirometry subgroup: Patients were required to have a ratio of FEV 1 /FVC measurements of <0.7 recorded at any time on or prior to enrolment date. Treatment change subgroup: Patients were required to have received ≥1 prescription for long-acting COPD maintenance inhalation therapy (e.g., LABA with or without ICS) in the 12 months on or prior to enrolment with evidence of treatment change (initiation of a specific active substance or combination of active substances) in the 6 months on or prior to the hypothetical enrolment date of 31 December 2017. Open triple therapy subgroup: Patients were required to have been treated continuously with open triple therapy (ICS, LABA and LAMA; excluding fluticasone furoate/vilanterol trifenatate/umeclidinium bromide) for a duration of at least 3 months on or prior to hypothetical enrolment with a documented history of at least one moderate or severe exacerbation (as defined in Table S1 in Data S1) in the year on or prior to 31 December 2017 in research active practices and in those eligible for linkage to HES data, forming six total source populations ( Figure 1). Patients met the study eligibility criteria if they had a COPD clinical code 10

| Variables
Practice characteristics, patient demographics, respiratory history, disease severity, healthcare utilisation and respiratory medications were measured in general practices contributing to CPRD GOLD and CPRD Aurum and in patients who met the study eligibility criteria. Read code lists to identify the covariates and outcomes were based on three previous studies of CPRD data. [10][11][12] Medical conditions were identified using clinical codes from CPRD GOLD and CPRD Aurum. Details on data sources and method of assessment for each variable can be found in Table S1 in Data S1.

| Ethics
GP practices provided consent for CPRD to collect their patients' deidentified data. Individual patients could opt-out of contributing data to CPRD. GP practices provided consent for data to be linked to HES.
No patient identifiable information was available to the study team, or to the study sponsor, GlaxoSmithKline plc. This study was approved by the Independent Scientific Advisory Committee for MHRA database research, protocol number 17_066A.

| Analyses
This study was descriptive and no statistical hypotheses were tested.
Analyses were conducted separately for CPRD GOLD and CPRD Aurum practices to describe the number of patients in each database and determine whether the demographic characteristics of patients differed between CPRD GOLD and Aurum, and whether information was An "unknown" category was created for variables with missing data. Missing data for lung function/airflow limitation were imputed using a combination of FEV 1 (litres), height, gender and age.
The data in the study are reported, and current manuscript developed, in line with the REporting of studies Conducted using Observational Routinely-collected health Data (RECORD) statement. 14 3 | RESULTS

| Objective 1: Patient numbers
Across both the CPRD GOLD and CPRD Aurum source populations, a total of 322 991 patients from 1148 practices were included in the analyses ( Table 1). The CPRD Aurum population was considerably larger than the CPRD GOLD population, comprising 82% of the total source population and 74% of all source practices. In all practices, patients with COPD in CPRD GOLD and CPRD Aurum were numerically comparable in terms of age (70 vs. 68 years median age), gender (50% vs. 52% male), and deprivation (50% vs. 49% in most deprived two quintiles; England only). Patients were also comparable in terms of smoking status (38% vs. 40% current smokers) and body mass index (BMI) (60% vs. 59% overweight or obese), though fewer patients in CPRD GOLD had no recorded BMI (7% vs. 19% in CPRD Aurum).

| CPRD GOLD versus CPRD Aurum by subgroups
In both databases, patients in the spirometry, treatment change, and open triple therapy subpopulations were comparable with the allpatients group in terms of age, gender, and deprivation (Table 3).

| DISCUSSION
This study assessed the feasibility of using CPRD data to identify patients with COPD to enrol into potential future trials. Our results indicated that a substantial number of patients with COPD identification of rarer safety signals. 17 For example, a real-world study of roflumilast demonstrated higher rates of adverse events in patients with COPD than in randomised controlled trials, leading to discontinuation in one-fifth of patients. 18 The SLS was a pragmatic trial that evaluated the safety and effectiveness of a novel treatment for COPD, compared with current treatments, in a real-world setting. 19

| Strengths and limitations
This study demonstrated several strengths of using CPRD data, Additional investigations on the differences in source populations, recording practices and analytical methods will provide further evidence on comparability of CPRD GOLD and CPRD Aurum. However, the results from this study suggest that any unexplored differences are unlikely to affect the validity of potential future clinical trials based on these data.

| Conclusion
In conclusion, data from CPRD GOLD and CPRD Aurum were shown to be comparable across key aspects relevant to a COPD trial. Using both CPRD GOLD and CPRD Aurum databases to recruit patients with COPD from a real-world setting is scientifically feasible. The large, well-characterised cohort of patients with COPD identified in this study could be used for safety studies, long-term natural history studies, or comparative effective research, reducing the recruitment burden for real-world trials and increasing the recruitment pool, and providing data with direct clinical applicability.

ETHICS STATEMENT
The study used the CPRD database of pseudonymized patient electronic healthcare records therefore patients' informed consent was not required. The study protocol was approved by GSKs Protocol