Medicine utilization studies in Australian individual‐level dispensing data: A blinded, multi‐center replicated analysis

Medicine dispensing data require extensive preparation when used for research and decisions during this process may lead to results that do not replicate between independent studies. We conducted an experiment to examine the impact of these decisions on results of a study measuring discontinuation, intensification, and switching in a cohort of patients initiating metformin.

Conclusions: Differences in analytical decisions when deriving exposure from dispensing data affect replicability.Detailed analytical protocols, such as HARPER, are critical for transparency of operational definitions and interpretations of key study parameters.

K E Y W O R D S
Australia, dispensing data, medicine, replication, utilization

Key Points
• Pharmacoepidemiology research using administrative data may not replicate because researchers can make different decisions during data preparation and analysis.
• Four Australian universities using the same initial shared protocol for a new-user cohort study of metformin treatment conducted independent, blinded analyses of dispensing data.
• There was poor agreement between sites for estimates of the proportion who died, discontinued, intensified, and switched treatment.
• Detailed analytical protocols, such as HARPER, facilitate transparency of operational definitions and interpretations of key study parameters.

Plain Language Summary
Pharmacoepidemiological research is often based on data such as government records of subsidy payments when pharmacists dispense prescription medicines.These data require extensive preparation and analysis.Previous research has shown that the many decisions made during this process can lead to analysis results that do not replicate between independent studies.To understand and quantify how decisions made by pharmacoepidemiologists and analysts impact results, we conducted a blinded replication experiment based on a simple demonstration study.
Four Australian universities developed an initial shared protocol to follow a cohort of people after they began treatment with the widely used diabetes medicine, metformin, and to track changes to their treatment over time.Each site independently developed a detailed study protocol and documented it using the HARmonized Protocol template to Enhance Reproducibility (HARPER).The sites then performed their independent, blinded data analyses, estimating time to treatment change and the effect of age and sex on the occurrence of each event.Concordance between sites on these measures was poor and we identified multiple points where differing decisions impacted the subsequent results.Greater transparency through detailed analysis plans can improve the quality of medicines utilization research.

| INTRODUCTION
Administrative claims data require extensive data preparation prior to their use for drug utilization and pharmacoepidemiological studies, and many different operational decisions must be made regarding the analysis.Even when there are only subtle differences in methods, researchers using the same data to answer the same questions can produce substantially different results.For example, when using medication dispensing data, often only dispensing dates are available rather than intended period of exposure.To use these data to calculate episodes of continuous exposure, researchers must infer the intended days of supply.Additionally, researchers must define cessation rules by defining allowable gaps between consecutive dispensings.2][3] Despite the availability of best practice guidelines that aim to improve replicability and reproducibility of pharmacoepidemiological studies, [4][5][6][7][8] previous research has found that when independent researchers attempt to reproduce analyses, results can disagree. 9 understand how different decisions made during the data preparation and analytic process impact study findings, we conducted an experiment across four Australian sites with access to the same medication dispensing dataset that is commonly used for pharmacoepidemiological research in Australia. 10,11We started with a shared protocol for a study of people initiating metformin, measuring treatment dynamics including discontinuation, switching and intensification (defined as the addition of another diabetes medicine).Each site independently wrote a detailed analysis plan using the HARmonized Protocol template to Enhance Reproducibility (HARPER), 12 and then independently programmed and performed the analysis.We then compared concordance of study measurements across the sites.We aimed to describe the variation in approaches to studying treatment dynamics following metformin initiation and quantify the impact of this variation on study results.

| Setting and data
Australia maintains a universal healthcare system entitling all citizens and permanent residents to subsidized medicines through the Pharmaceutical Benefits Scheme (PBS).This study used the PBS 10% sample, a standard analytic dataset provided by Services Australia for use in research. 13The confidentialized, individual-level data for 10% of the PBS-eligible population span 2005 to present and include: a unique item code for the dispensed medicine indicating its active ingredient, formulation, strength, and package size; the date of dispensing; the quantity dispensed; as well as individuals' sex, year of birth, and year of death.The data also contain a record of the PBS beneficiary status associated with the dispensing.General beneficiaries pay the full co-payment amount for each dispensing (currently 30.00 AUD), while concessional beneficiaries (people 65 years and older or economically disadvantaged) pay a reduced fee (currently 7.30 AUD). 14Birth and death are randomly offset by plus or minus 6 months for privacy reasons.While the quantity subsidized for a single dispensing is often a month's supply, the intended dose and duration of supply are not recorded in the data and must be inferred.The PBS 10% sample is released periodically and each approved research site used the same calendar period download.PBS data have been widely used in Australian pharmacoepidemiology research, 10,11 and a recent systematic review identified that over one-fifth of Australian studies accessed PBS dispensing data via the PBS 10% sample dataset. 10ur sites participated in this study-the Universities of New South Wales (UNSW), South Australia (UniSA), Sydney (USyd), and Western Australia (UWA).All hold license agreements with Services Australia to access their site-specific copies of the PBS 10% sample dataset.The sites all participate in the Centre of Research Excellence in Medicines Intelligence, a consortium of experts in pharmacoepidemiology and drug utilization research. 15The study's authors have experience with PBS data and a combined record of more than 700 publications.

| Project workflow
The study was conducted between May 2022 and May 2023, and involved four sites independently performing an analysis to describe patterns of treatment with metformin following initiation.Metformin was chosen as a case-study as it is recommended first-line treatment indicated for long-term use in diabetes, a common, chronic condition.
Metformin is widely used in Australia, and subsidized through the PBS, and, therefore, captured in the PBS 10% sample dataset.
The protocol was inspired by a published study. 16As our aim was to do a blinded, prospective replication, we modified the study design so our results would not be comparable with published ones.
An initial shared study protocol was developed collaboratively between the sites outlining the study outputs, concordance criteria for comparisons, and a common output template (registered at DOI 10.17605/OSF.IO/YC4TM).The 10-page shared protocol specified the required analyses but not all operational details.Each site then independently developed their own detailed study protocol using HARPER 12 (see the Supplement for each site's HARPER protocol), keeping the protocol confidential from the other sites.
Operational definitions of the analysis cohorts and study outcomes were detailed in HARPER based on each site's previous experience using the PBS 10% dataset (Table 1).Each site then independently implemented their HARPER protocol using their chosen statistical software (R, SAS, Stata) in their PBS 10% sample dataset, remaining blinded to the analytical protocols and results from all other sites.Once all sites finalized their analyses, the results were reviewed and discussed by the group.A small working group comprising one representative from each site was tasked with compiling the finalized site-specific output and calculating the concordance measures across all four sites.

| Study population
As specified in the shared protocol, we defined a cohort of people 18 years and older initiating metformin between January 1, 2015 and December 31, 2018.Initiation was defined as the first dispensing of metformin monotherapy (World Health Organization Anatomical Therapeutic Chemical code A10BA02) following a 12-month period with no dispensing of any diabetes medicine.We excluded individuals initiating metformin at the same time as any non-metformin diabetes medicine, either as individual medicines or as a fixed-dose combination (FDC).
As only perturbed year of birth and death were available in the PBS 10% sample dataset, we estimated exact dates of each using sitespecific approaches (see Table 1 and Supplement for individual protocols).

| Patterns of metformin use
Among metformin initiators, we assessed treatment discontinuation, intensification, and switching.Broadly, treatment discontinuation was defined as the cessation of metformin monotherapy treatment, and intensification was defined as a dispensing for any additional diabetes medicine during metformin treatment or a switch from single-agent to T A B L E 1 Operational definitions for treatment exposure and events used by each site, extracted verbatim from HARPER protocols.When determining discontinuation, a grace period of one dosing interval will be applied.If the time between two dispensings exceeds three dosing intervals (supply period + allowable gap + grace period) then the treatment is classified as discontinued.
All subsequent dispensings within the EPE will be counted towards a single continuing period of treatment, until the end of the follow-up period, or no subsequent dispensing within the EPE plus grace period.The date of discontinuation will then be defined as the date of last metformin dispensing, plus the EPE (but not the grace period).
We will explore several definitions of discontinuation based of dispensing interval.Each person will be followed for 1 year from the 1st supply of metformin for the period.The defined duration used was 3 Â 75th percentile.

Switching
Defined as dispensing of another (non-metformin) glucose lowering agent between the final metformin dispensing +1 day, and the last day of the last period of "recent" exposure.The date of switching is the date that the other drug was first dispensed.See Figure S1 for a depiction of the different treatment events.
If prior to discontinuation a second blood glucose lowering medicine is dispensed this is classified as a switch.There may be an overlap of up to two dosing intervals before the end of the metformin treatment episode.
Cases who switched will be defined as those who have a defined date of discontinuation […] AND who have received another blood glucose lowering medicine between the date of last metformin dispensing and the date of discontinuation plus the grace period.The date of switching will be defined as the first date of dispensing of another blood glucose lowering medicine.
A switch will be defined as dispensing of a new diabetes medication after the last dispensing of a discontinued metformin.Each person will be followed for 1 year from the first supply of metformin for the period.The time to discontinuation, intensification, or switch was defined as the number of days between metformin initiation and the occurrence of that event.While these events were broadly defined in the initial shared study protocol, each site was responsible for specifying their own operational definitions (Table 1).

| Statistical analysis
We used descriptive statistics to summarize cohort demographic characteristics and beneficiary status and calculated the crude proportion of the cohort who died during follow-up using estimated date of death (see Section 2.3.1).We calculated the proportion of the cohort who discontinued, intensified, or switched from metformin within the first year following initiation. We

| Sensitivity analyses
Prior to 2012, medicines with dispensed price below the PBS copayment threshold were not captured in PBS dispensing data. 14For analyses that used dispensing data prior to 2012, researchers often restrict cohorts to concessional beneficiaries as their PBS co-payment level is below the medicine cost and, therefore, their dispensings are always captured.To investigate the impact of this practice we conducted two pre-specified sensitivity analyses.We first examined a cohort initiating metformin between January 1, 2007 and December 31, 2010 (all PBS beneficiaries), and then restricted this cohort and the primary cohort (2015-2018) to people whose first metformin dispensing was concessional.

| Site comparisons
We developed concordance rating criteria to quantify variability and to facilitate comparison across the sites, and tabulated key information from the site HARPER protocols.We calculated a consensus value for each measure, defined as the median of all four sites' results, for counts, proportions and Hazard Ratios.We then compared the individual site deviations from the consensus value against a prespecified threshold.We rated relative deviations of <5% as "good," <10% "moderate," and ≥10% "poor," for counts and proportions; and deviations of the ratio of individual site HR to median HR of <5% as "good," <10% "moderate," and ≥10% "poor."For survival time quantiles (e.g., median, 10th-percentile, etc.) we calculated standardised values and considered deviations of <0.1 "good," <0.2 "moderate" and ≥0.2 "poor."For HRs and survival-time quantiles we also calculated the intra-class correlation coefficient (ICC) using the components of variance derived from a random intercept model using the meta R package (v6.5.0). 17We classified ICCs as "poor" (<0.5), "moderate" (0.5 to 0.75), "good" (>0.75 to 0.9), and "excellent" (>0.9). 18 used a "traffic light" rating of "good" (green), "moderate" (yellow), or "poor" (red) to illustrate the concordance between the sites on each result based on the largest deviation (i.e., lowest concordance) across the individual site ratings.Details of all concordance calculations are provided at Open Science Framework [OSF] (DOI 10. 17605/OSF.IO/NJUSH) and in Table S1.All concordance measures were analyzed in R v4.2.0. 19

| Variations from the initial shared study protocol
Before unblinding, one site reported that they had found an instance where the PHs assumption was violated in the Cox regression even when follow-up was artificially censored to the shortest prespecified interval (1 year).All sites agreed to update the registered protocol to allow censoring to 6 or 3 months, or to fit an extended Cox model with time-varying covariates if required.The revised protocol was uploaded to OSF on May 3, 2023.Due to restrictions in the OSF platform we could not update the registered protocol.Initially, we based our qualitative concordance rating on the differences between individual sites and the median measure on an absolute scale, however, this is inappropriate for small values.In this case, it was agreed to use relative differences, in line with how counts were rated.
While we found good to moderate agreement for the association of age 50-74 years and treatment switching (ICC = 0.78, HR range: 0.64-0.77),agreement was poor for all other HRs (Figure 2; Table S2).
Similar to the primary analysis, cohort characteristics from the sensitivity analyses were generally in moderate to good agreement, while other results had mostly poor agreement (Tables S3-S10, Figures S3-S6).There were differences between sites in their definition of "concessional" (see the Supplement containing the full HARPER protocols) which resulted in there only being moderate agreement in cohort sizes (Table S7).

| DISCUSSION
Our study highlights the challenges faced when different researchers aim to replicate common pharmacoepidemiological measures, even when using the same dataset and one that is well known to all analysts.We identified multiple points where differing decisions impacted the subsequent results.These points included different approaches to analytic data construction (date of death imputation, days supply estimates) and operational definition of the treatment events.
The PBS 10% sample dataset does not record intended dosing or duration of use information, only the date on which a medicine was dispensed, and days supply must be estimated.1][22] When days supply are not recorded, there is no "best practice" method for determining intended duration of a dispensed medicine and, depending on the context, some methods may be more suitable than others. 1,23In our study each site used different approaches to estimate treatment exposure for each dispensing.Ultimately, exposure durations cannot be inferred with certainty from dispensing data and this uncertainty should be explored in sensitivity analyses.
Additionally, there are many ways to define drug utilization outcome events such as discontinuation, switching, and intensification based on prescribing records or dispensing claims. 24the implementation of these definitions will depend on the specific data source, the basic considerations are similar across Europe, Asia and North America. 24Differences across sites in their definitions of these events are likely to have contributed to the large variation in study measures observed in our study.Although we provided detailed descriptions of these treatment events in the initial shared protocol, each site had a different approach to operationalization of these events (Table 1).We also observed that different sites used different nomenclature for common terms (e.g."dosing interval," "e pop ," "expected period of exposure," and "dispensing interval"), suggesting that consistent terminology would also facilitate replicability.
Poor concordance in our treatment events was not solely attributable to different approaches to the calculation of exposure duration estimates.Sites also employed different approaches to imputing age and death dates, and as follow up was censored at death, this had implications for the discontinuation rates.There were also differences in interpretation; for example, one site only counted deaths that occurred within 12 months of metformin initiation, while other sites counted deaths observed at any point following initiation, resulting in different counts for the numbers of deaths during follow-up recorded by each site.
A strength of our study is that we captured many differences in interpretation of key concepts because each site compiled their detailed site-specific analysis protocols using the HARPER template. 12 included this step both to demonstrate HARPER's usefulness and to aid our understanding of the differences that might arise.While open access to study artifacts, such as analysis code and data dictionaries, is indispensable to reproducible research, detailed documentation further contributes towards study transparency, robustness, and replicability.We believe that our findings will generalize internationally as determining intended duration of use and defining drug utilization events are common elements in most pharmacoepidemiological studies.A limitation of our study was that our defined thresholds for our concordance ratings were consensus-based and do not necessarily reflect clinically important differences.Further, PBS data only include subsidized medicine dispensings and therefore some diabetes medicine use was overlooked in our analysis; this limitation would not have affected our concordance results.Additionally, though each participating site had ethical and data custodian approval to access the PBS 10% sample data, patient-level data could not be shared between sites and this restriction meant we were unable to investigate differences at the individual level between sites.

| CONCLUSIONS
Our study demonstrates that definitions of exposure and treatment events are critical to replicability, and detailed reporting of study decisions can help to identify potential sources of variability.Detailed analytical protocols, such as HARPER, facilitate transparency of operational definitions and interpretations of key study parameters, and agreement when developing a shared analysis plan, recognizing that fundamental definitions must often be tailored to the study.Specifically, defining exposure duration from dispensing data is an important source of measurement error in pharmacoepidemiology. Future work should focus on validating estimation methods as well as developing recommendations for conducting drug utilization and pharmacoepidemiological studies both in Australia and globally.This will improve generalizability, interpretability, and ultimately, replicability.

FDC
metformin.Switching was defined as the dispensing of additional diabetes medicines after metformin treatment ceased.A minimum of 1-year potential follow-up time was required following initiation and initiators were censored at the first of treatment discontinuation, intensification, switch, death, or end of follow-up (January 31, 2019).
Incidence of treatment discontinuation, switching, and intensification.Metformin dispensing, initiations, and characteristics of unique metformin initiators.Time to treatment discontinuation, switching, and intensification in days.The median value was not reached for some measures and the highest quantile reached is displayed for those measures.