Supplementing electronic health records through sample collection and patient diaries: A study set within a primary care research database

Abstract Purpose To describe a novel observational study that supplemented primary care electronic health record (EHR) data with sample collection and patient diaries. Methods The study was set in primary care in England. A list of 3974 potentially eligible patients was compiled using data from the Clinical Practice Research Datalink. Interested general practices opted into the study then confirmed patient suitability and sent out postal invitations. Participants completed a drug‐use diary and provided saliva samples to the research team to combine with EHR data. Results Of 252 practices contacted to participate, 66 (26%) mailed invitations to patients. Of the 3974 potentially eligible patients, 859 (22%) were at participating practices, and 526 (13%) were sent invitations. Of those invited, 117 (22%) consented to participate of whom 86 (74%) completed the study. Conclusions We have confirmed the feasibility of supplementing EHR with data collected directly from patients. Although the present study successfully collected essential data from patients, it also underlined the requirement for improved engagement with both patients and general practitioners to support similar studies.


| INTRODUCTION
United Kingdom (UK) primary care electronic health records (EHR) are a valuable data source for epidemiological research as they contain a broad range of prospectively collected data for large samples of the population. However, because the purpose of data collection is routine health care delivery, certain information relevant to specific research questions may not be captured. Such questions would therefore require an alternate data source, or for primary care EHR to be supplemented with the missing information.
In this brief report, we describe a study in which new data were collected directly from patients to supplement EHR data. This builds on prior examples such as the STAGE study 1,2 that demonstrated the feasibility of supplementing primary care EHR from the Clinical Practice Research Datalink (CPRD) 3 with genetic data.
The purpose of our study was to investigate adrenal insufficiency following glucocorticoid exposure in patients with rheumatoid arthritis (RA). Adrenal insufficiency, which has non-specific symptoms, 4,5 is likely to be under-reported or misclassified in the EHR. Additionally, prescription data may differ from true drug exposure due to factors such as nonadherence. 6 We therefore collected saliva samples from participants, using cortisol levels to define adrenal insufficiency, and collected information about glucocorticoid exposure using a patient-reported diary. We describe the study methodology, present the recruitment rate and success of sample collection, and discuss the limitations.

| METHODS
This was an observational study set within English primary care.

| Study population
The search criteria based on the following inclusion and exclusion criteria were applied to the full CPRD dataset. Inclusion criteria were: (1) diagnosis of RA (defined using a validated algorithm 7 ), (2) age 16 or over, (3) registered at an English general practice, and (4) prescribed oral glucocorticoids within the last 2 years. Exclusion criteria were adrenal insufficiency unrelated to glucocorticoid use, other condition or treatment with the potential to affect adrenal function, or less than 2 years of data within CPRD. The list was generated in June 2015 and updated in December 2015. General practitioners were asked to screen the list of patients to confirm eligibility and exclude patients they judged unsuitable (eg, unable to give consent based on English-language information sheets, recent bereavement). All participants gave their consent to take part in the study. We aimed to recruit 400 participants.
Based on the search performed in December 2015, there were 19 665 patients with RA who were currently active in practices contributing to CPRD and registered at an English general practice.
Of these patients, 50% had never used oral GCs, 29% had not used oral GCs within the last 2 years, and approximately 1% were excluded for having less than 2 years of data within CPRD or having a condition known to affect the adrenal glands. The remaining 3974 (20%) were the population considered potentially eligible for inclusion in the study.

| Practice recruitment
General practices were responsible for mailing invitations to patients as only general practices are able to identify their patients from the EHR. To recruit practices, an initial invitation letter and expression of interest form was sent to each of the 252 practices in England with eligible patients. If practices did not respond, they were followed up with another postal invitation, an email, and a final postal reminder.
Costs to practices were minimised: patient invitation materials were provided pre-prepared to practices, and the practices were reimbursed for their time by the National Institute for Health Research (NIHR) Clinical Research Network (CRN). After all patients were followed up, the study data were anonymised. CPRD then provided the EHR data for the study participants, with CPRD identifiers replaced with the participants' study IDs. At no point was it possible for the research team or CPRD to link identifiable information to the EHR.

KEY POINTS
• We have provided further evidence supporting the feasibility of supplementing electronic health records (EHR) with patient derived data.
• Supplementing the EHR may address possible misclassification and/or missing information within EHR • Challenges in practice and patient recruitment demonstrated the importance of considering ways to maximise recruitment.

| Practice recruitment
All 252 practices with eligible patients in August 2015 were invited to participate. Of these, 101 (40%) practices responded after the first invitation, 47 (19%) after at least 1 reminder, and 104 (41%) never responded. In total, 77 (31%) practices expressed interest in being involved and 71 (28%) declined the invitation. Sixty-six practices (26% of 252) completed the mail-out to patients.

| Patient recruitment
Of the 3974 patients considered potentially eligible for inclusion in the study, 859 (22%) were registered with one of the 77 practices that agreed to take part. Invitations were sent to 526 patients, and 117 patients returned valid consent forms. The median time from practices mailing invitations to participants being recruited was 25 (range 6-149) days. All recruited participants were sent diaries and sample collection kits: we had no further contact from 21 participants and 8 participants withdrew (all before returning saliva samples). The flow of patients through the study is presented in

| Data collection
In total, 86 participants returned both saliva samples and diaries, and 2 participants returned saliva samples but not diaries. The median time from mailing study materials to receiving the saliva samples was 12 (range 5-127) days. Four of the samples could not be analysed: 3 of the collection tubes were empty, and 1 sample was omitted from the batch (in error).

| DISCUSSION
In this study, we were able to collect saliva samples and selfreported drug-use information from 86 participants to supplement EHR data. The new data collected will allow us to define the study outcome, adrenal insufficiency, more accurately than using primary care EHR data alone, as symptoms of adrenal insufficiency are non-specific and many cases are only diagnosed if patients present as emergencies. 4,5 The self-reported drug use data will allow us to quantify misclassification in exposure to oral glucocorticoids and adjust the analyses accordingly. However, the final study population was small, and we did not reach our recruitment target of 400.
Practice recruitment was a major limiting factor for our final participant figures-only 26% of general practices with eligible patients sent invitations to patients. The STAGE study also report practice recruitment as a limit on patient recruitment, although at 53%, the rate of practice recruitment was higher than in our study. 1 PLEASANT, a later study conducted by CPRD which only required practices to mail a letter to patients, did recruit their target of 140 practices over a 7-month period. 8 This total included 129 of the 433 practices invited by CPRD (30%). Reaching the target number of practices required significant staff resource to follow up the practices. 8 General practices are currently experiencing high and increasing time and financial pressures. 9 Aside from frequently following up practices, researchers could make use of primary care study tools and platforms such as FARSITE (NorthWest Ehealth) and TrialBase (CPRD), which help streamline the research process, to encourage practice participation.
The proportion of patients who were recruited was also small-117 (22%) were recruited and 86 (16%) completed the study out of 526 invited. This recruitment rate was lower than that of the STAGE study, which used a similar methodology yet had a recruitment rate of 34% (754 of 2194). 1 Recruitment for STAGE was over a much longer period (36 months compared with 8 months). In addition, recruitment rates were higher for patients asked to provide a blood sample, at their local general practice, than patients asked to provide saliva samples in their homes. 1 Recruitment of participants is a challenge common to all research studies. Suggestions for