Real‐world reproducibility study characterizing patients newly diagnosed with multiple myeloma using Clinical Practice Research Datalink, a UK‐based electronic health records database

Abstract Purpose We evaluated the reproducibility of a study characterizing newly‐diagnosed multiple myeloma (MM) patients within an electronic health records (EHR) database using different analytic tools. Methods We reproduced the findings of a descriptive cohort study using an iterative two‐phase approach. In Phase I, a common protocol and statistical analysis plan (SAP) were implemented by independent investigators using the Aetion Evidence Platform® (AEP), a rapid‐cycle analytics tool, and SAS statistical software as a gold standard for statistical analyses. Using the UK Clinical Practice Research Datalink (CPRD) dataset, the study included patients newly diagnosed with MM within primary care setting and assessed baseline demographics, conditions, drug exposure, and laboratory procedures. Phase II incorporated analysis revisions based on our initial comparison of the Phase I findings. Reproducibility of findings was evaluate by calculating the match rate and absolute difference in prevalence between the SAS and AEP study results. Results Phase I yielded slightly discrepant results, prompting amendments to SAP to add more clarity to operational decisions. After detailed specification of data and operational choices, exact concordance was achieved for the number of eligible patients (N = 2646), demographics, comorbidities (i.e., osteopenia, osteoporosis, cardiovascular disease [CVD], and hypertension), bone pain, skeletal‐related events, drug exposure, and laboratory investigations in the Phase II analyses. Conclusions In this reproducibility study, a rapid‐cycle analytics tool and traditional statistical software achieved near‐exact findings after detailed specification of data and operational choices. Transparency and communication of the study design, operational and analytical choices between independent investigators were critical to achieve this reproducibility.


| INTRODUCTION
Regulatory, payer, and clinical decision-makers are increasingly adopting real-world evidence (RWE) derived from existing real-world data (RWD), such as longitudinal electronic health records (EHR) and administrative claims data, to support healthcare decisions. 1,2 The European Medicine Agency (EMA) defines RWD as "routinely collected data relating to a patient's health status or the delivery of health care from a variety of sources other than traditional clinical trials." 1 An important quality and acceptability criterion for RWE is the ability to replicate studies accurately. The latter requires full transparency of data processing, design and analytic choices, beyond what is typically included in publications.
Increasingly, efforts are being made to improve the reproducibility of research by promoting transparency. [3][4][5][6][7] A reproducible study, defined as "independent investigators implementing the same methods in the same data and are able to obtain the same results (direct replication)," 4 requires complete access to the study data (i.e., analytic data sets) and methods for sharing codes and software environment as well as ensuring sufficiently detailed study documentation. [7][8][9] It has also been argued that confirming the findings from observational studies bolsters the overall confidence of scientific evidence. 7,10 One of the most commonly used UK-based RWD sources is the Clinical Practice Research Datalink (CPRD) General Practice (GP) Online Data (GOLD). 11 In a validation study, the CPRD database demonstrated a high positive predictive value of various diagnoses and similar comparisons of incidence with other UK data sources. 12 The adoption of RWD sources including EHR provides an opportunity to generate clinical evidence in oncology 13,14 and the CPRD database 14,15 has been used to evaluate patients with multiple myeloma (MM), a hematological cancer of the bone marrow. 16 Multiple myeloma, estimated to cause approximately 5700 incident cases of myeloma a year in the UK, is the second most common hematological malignancy in Europe and recognizing MM with nonspecific, multi-site symptoms is challenging. While several publications using CPRD exist for this patient population, [17][18][19] the reproducibility of such work using CPRD has not been evaluated. Thus, we sought to evaluate the reproducibility of a study characterizing patients newly diagnosed with MM in CPRD using two different analytic tools.

| METHODS
An iterative two-phase approach was used to evaluate the reproducibility of a descriptive cohort study using primary care data. In Phase I, two teams of independent investigators implemented a common study protocol and statistical analysis plan (SAP) independently and in-parallel using different analytic tools, a cloud-based rapid-cycle analytic tool, (Aetion Evidence Platform® [AEP], version 3.12), and traditional, line-programming statistical software (SAS Enterprise Guide version 7.1). Phase II was an iteration of the analyses following the review and comparison of the Phase I findings, implementation decisions, and revisions to study documents (e.g., SAP).

| Data source
The population-based cohort study used the EHRs from the CPRD GOLD database, which contains anonymized longitudinal patient records from more than 600 UK-based and has primary care medical records of over 18 million patients. 20 CPRD contains demographic data, medical diagnoses, procedures (including laboratory investigations and results), and death information collected using a standardized form 21 ; conditions, interventions and diagnostics use Read codes and medication prescribing recorded using the British National Formulary (BNF). We used CPRD data collected between January 1, 2004 and December 31, 2017 for this study.
Raw CPRD data and raw data converted to Observational Medical Outcomes Partnership (OMOP) CDM (version 4.0) were used for the AEP and SAS analyses, respectively. The latter includes only patients with CPRD data of acceptable quality for research and contains a total of approximately 15 million patients. The raw CPRD data contain approximately 18 million patients, regardless of the data quality.

KEY POINTS
• Adoption of RWE derived from large databases has resulted in efforts to improve the reproducibility and transparency of research.
• Findings from our reproducibility study using common study documents but two different analytic tools highlighted that at a minimum adherence to standard protocol and reporting guidelines is necessary; however, complete transparency of the study design and operational decisions shared by independent investigators was also critical to a successful reproduction.
• Data source parameters and implementation decisions, especially the nuances of using common data models and enrollment and assessment windows, should be explicitly stated during study planning and protocol development.
The research protocol was reviewed and approved by the Independent Scientific Advisory Committee (ISAC, protocol 18_292).

| Study population
During the study period (January 1, 2004 to December 31, 2017), we identified patients newly diagnosed with MM between January 1, 2006 through December 31, 2016 to allow for a minimum of 2-year baseline and 1-year follow-up period. Follow-up began the day after cohort entry and ended at disenrollment from the GP practice, the last data collection date of the practice, death, or end of study period ( Figure 1).The study cohort included patients ≥18 years who were registered at the GP for at least 2 years prior to diagnosis and had no history of solid tumors. Refer to Data S1 for a list of variables, diagnosis definitions, and covariate definitions.

| Statistical analysis
We used descriptive statistics to report the number and proportion of patients meeting the pre-specified criteria and for binary and categorical covariates, and reported the mean, SD, and range for continuous covariates. The presence of bone complications, such as bone pain and SREs, were assessed during both baseline and in follow-up.
The two analytic approaches used were AEP, a rapid-cycle analytic tool that has been previously validated, 3,22 and SAS software as the gold standard for statistical analyses. The reproducibility of study results was evaluated by calculating the match rate within broad categories (e.g., demographics, conditions, drug exposure, and procedures) and the absolute difference in prevalence between line-programming and rapid analytics for each individual characteristic included within the broad category. Match rate was defined as a percentage (rounded to the nearest tenth), calculated as the number of individual characteristics with an exact match in the estimated prevalence between the AEP and SAS results, divided by the total number of characteristics within each broad category. An absolute difference in prevalence of 0% and a match rate of 100% represented exact concordance. After the Phase I analysis, the two teams of independent investigators performed a careful review of the results to identify potential reasons for discrepancies and resolve them for the Phase II analysis. Thus, recommendations to promote transparency in study protocols and SAPs were developed. were small discrepancies in patient distribution across age categories and demographic characteristics (absolute difference range of 0.0% to 1.1%) ( Table 1). High agreement was observed for the prevalence of comorbidities (CVD, CKD, gout, hypertension, osteoarthritis, osteopenia, osteoporosis, and RA), with a match rate of 75% and an absolute difference in prevalence of 0.0%-0.3%. Agreement was observed for several clinical characteristics of symptomatic bone pain and SREs (match rate of 50% and 85.7%, respectively). The absolute difference for SREs prevalence ranged from 0.0%-0.2%, with exact agreement for pathological fracture, spinal cord compression, radiation therapy to bone and surgery to bone.
We observed a lower agreement on specific drug exposure (match rate of 28.6% and an absolute difference in prevalence of 0.0%-19.6%).
Exact concordance was also observed for laboratory investigations (hypercalcemia, renal impairment, and anemia) and valid test results.
The Phase I analysis yielded several discrepancies which were subsequently addressed and resolved in the Phase II analysis by clarifying operational decisions not fully specified in the protocol or SAP.

| Phase II: Revised analyses (analysis iteration based on Phase I findings)
Following further detailed specification of study choices that led to different interpretations during Phase I, the analysis was repeated and a 100% match in the number of patients (N = 2646), demographics, bone pain, SREs, drug exposure, laboratory investigations, and valid laboratory results was achieved. A negligible discrepancy remained in the estimated prevalence of CKD (absolute difference 0.1%), as well as osteoarthritis and gout (absolute different 0.0%), driven by the data of an individual patient with multi-morbidity ( Figure 2, Table 1).

| Comparison of two approaches in Phase I and Phase II
A mismatch in aggregated results triggered a cascade of steps to investigate and remediate discrepancies identified during Phase I. As a first step, the two set of independent investigators reviewed the protocol and SAP definitions against implementation and discussed individual interpretations, such as assumptions and algorithms used, followed by a more detailed investigation into time interval specifications, which resolved a majority of the discrepancies. For any remaining discrepancies, the investigators reviewed patient-level data.
Amending study documents (e.g., SAP) was an iterative process. operational decisions, and finally (3) deciding on analytical choices.
Each step provides opportunities to develop sufficiently clear study documentation (Figure 3). 4 Lack of specificity in these steps during protocol development and SAP contributed to initial discrepancies.

| Transparency in raw data processing
Our direct replication study highlights the importance of understanding, communicating and documenting data source parameters (e.g., data extraction date, data source range, data cleaning and transformation) in the initial stage of conducting a database study. In the current study, the independent investigators used the same data source (CPRD), the same data cut (June 22, 2018) and the same data range (November 21, 1987 -June 30, 2018); however, different data model conversions and hence different data structures were used. The rapid-cycle analytics tool used raw CPRD data processed with a so-called "adaptive rule system" whereas the SAS analyses were based on CPRD data that had been pre-mapped to the OMOP CDM. There are pros and cons to both data model approaches. 23 The OMOP CDM allows the construct of a standard vocabulary for different medical concepts; however, a preconfigured CDM could potentially result in information loss due to incomplete mapping of construct terms. 24,25 The remaining negligible discrepancies might have arisen from the small loss of information after applying the CDM.
Despite differences in the timing of data processing to exclude patients with poor quality data from the analysis, both analytic approaches eventually yielded the same number of eligible patients in the analysis set upon reconciliation of the implementation approach ( Figure 2). In the OMOP CDM data, patients and practices that were considered unacceptable for research were omitted during the extraction, transformation, and loading (ETL) process of data mapping and conversion; these analyses were based on acceptable CPRD data for research from the beginning. In contrast, the AEP analyses applied the data quality checks (e.g., "Up To Standard (UTS)" dates and GP practices considered "acceptable for research") as an exclusion criterion when identifying patients with MM.

| Transparency in study design and operational decisions
In the current study, a common study protocol, implemented by two teams of independent researchers, included an initial study design schema, operational definitions of all variables, and appended code lists and algorithms. A SAP was also co-developed, which included details of the analyses to be undertaken (e.g., handling of missing data). Despite these common study materials capturing the operational and analytical choices, several discrepancies were obtained in Phase I of the study. Subsequent thorough investigations revealed various reasons (Table 2), such as differences in the interpretation of time assessment periods including the eligibility enrollment window.

Eligibility enrollment window
The enrollment window for the inclusion and exclusion criteria is the time window prior to the patient's study entry date, often called the For example, terms such as "baseline" did not specify, for each covariate, whether the cohort entry date should or should not be included in the assessment period.

| Transparency in analytical choice decisions
Several discrepancies were due to investigator-driven interpretation of variable definitions, algorithms, and the exposure assessment window, which were resolved following alignment between both investi-T A B L E 2 Improvement of operational decisions for Phase II following identification of discrepancies observed in Phase I Consensus that the remaining data-driven discrepancies were unresolvable • The rapid-cycle analytics tool used raw CPRD and the in-line programming analyses used CPRD data pre-mapped to OMOP CDM

Drug exposure
Record of specific drug Absence of drug Lack of specificity in exposure assessment window (i.e., when is the start date, and whether or not to include cohort entry date in the baseline period) Agreement on the time assessment window: • Covariate assessment window for drug exposures to start from the 730 days prior till 1 day prior to the index date (Day 0) to assess treatment usage exclusively prior to the MM diagnosis Lack of a definition for the absence of treatment led to the use of different algorithms by the independent investigators. For instance, patients without any occurrence of analgesic versus patients who did not start any analgesics during the baseline period Agreement on the definition for absence of treatment: • Absence of treatments was measured as not starting any analgesics during the baseline period

Laboratory investigation
Not applicable; exact concordance in phase I New decision to facilitate interpretation of the data: • For investigation, we included any tests (not requiring valid test only) a Exact concordance achieved for gender in Phase I.
gators. The remaining discrepancies were data-driven due to differences in data structure and could not be fully addressed in Phase II.

Variable definitions and algorithms
While the variable definitions in the protocol and SAP included a general description of the algorithm and the code list for diagnoses, comorbidities, and clinical conditions, different assumptions were made by the two sets of independent investigators. First, a lack of detail regarding which specific data tables within CPRD to query accounted for the initial discrepancies for all diagnosis events in our study, including MM diagnosis, comorbidities, and baseline clinical conditions. To avoid discrepancies, study documentation should specify both the relevant data fields as well as the data tables to query these fields, depending on the data source used. Second, the inherent logic for certain variable with complicated algorithms was not specific, such as Boolean logic between various components of the algorithms.

Exposure assessment window
The exposure assessment window describes the time window for identifying the exposure status. A similar lack of specificity in this time window accounted for the initial discrepancies between baseline medication use and comorbidities from the two sets of results. While it was specified that baseline medication use would be assessed during the baseline period, one approach included a look back of 2 years and the other of all available data.

| DISCUSSION
In this direct replication study characterizing newly-diagnosed MM within a UK-based EHR, after a thorough investigation on the initial discrepancies, we achieved an exact match of the analytic population and nearly 100% match rate using two different analytic approaches, namely a rapid-analytics tool and traditional statistical software. This replication exercise demonstrated that differences in study results are often due to insufficient level of detail in the study protocol or SAP.

| Standards in transparency and reproducibility
Transparency of the study design and operational decisions shared by the independent investigators were critical to facilitating this direct replication study, highlighting the importance of not only following standard protocol and reporting guidelines (Box 1), but applying greater specificity and transparency to protocol development in the study planning phase. These guidelines on conducting [26][27][28][29] and reporting [30][31][32] non-randomized studies have been developed by professional societies, governmental agencies, and pharmacoepidemiological experts. More recently, a ISPOR-ISPE joint task force convened to develop recommendations to ensure RWE can approximate the gold-standard randomized controlled trial and provide "causal conclusions." 33 The first report from this joint task force provided recommendations for transparency in the "study hygiene" (i.e., planning and procedures) when evaluating treatment and/or comparative effectiveness studies using RWD. 34 One recommendation focuses on replication to obtain the same results using the same data and same analytic methods, explaining "Full transparency in design and operational parameters, data sharing, and open access in clinical research will not only increase confidence in the results but will also foster the reuse of clinical data." The second ISPOR-ISPE joint task force paper 4  will also facilitate reproducibility in epidemiologic research and improve the confidence of observational study findings. The current study, however, is limited to structured data only and does not cover information captured in free text fields and/or non-automated test results.

| CONCLUSION
In conclusion, this direct replication study characterizing patients newly diagnosed with MM within a UK-based EHR database demonstrated that study reproducibility requires maximal transparency at each phase of RWE generation. Following standard protocol and reporting guidelines allowed a rapid-cycle analytics tool to achieve near-exact replication of findings obtained using traditional statistical software. Specificity of the study design and key operational decisions shared by independent teams of investigators were critical to achieve this successful reproducibility.

ETHICAL STATEMENT
The research protocol was reviewed and approved by the Independent Scientific Advisory Committee (ISAC, protocol 18_292).

ACKNOWLEDGMENTS
This project was funded by Amgen. We wish to acknowledge Amanda Patrick for her critical review of the manuscript and Andrew Weckstein for analytical support in this work.

CONFLICT OF INTEREST
The following personal or financial relationships relevant to this manuscript existed during the conduct of the study: JRW, PM, SLR are employees of and hold stock options or equity in Aetion; NPI was an employee of Aetion and held stock options. AS, JM, DN, GK are employees of and hold stock options in Amgen. During the study conduct and reporting, VB and EH were contract workers for Amgen.

Box 1 Standard considerations for ensuring transparency and reproducibility of real-world research protocols and findings
• Follow standard protocol 28,29 and reporting guidelines [30][31][32] developed and endorsed by governmental agencies 26,27 professional societies, 4,34 and other experts to ensure necessary study parameters are fully described and reported.
• Ensure the protocol and reporting include operational definitions of all variables, including exposures, outcomes, potential confounders; include codes lists and algorithms. 28,29,31 • Consider reporting supplemental information that specifies access to the protocol, raw data, or programming codes. 30 • Describe statistical methods including how missing data were handled 28 • Understand and fully report pertinent data source parameters such as data extraction date, data source range, data cleaning and transformations when reporting the study methods and findings 4 (e.g., manuscripts) as well as in study documents (e.g., protocols).