Calculating daily dose in the Observational Medical Outcomes Partnership Common Data Model

We aimed to develop a standardized method to calculate daily dose (i.e., the amount of drug a patient was exposed to per day) of any drug on a global scale using only drug information of typical observational data in the Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) and a single reference table from Observational Health Data Sciences And Informatics (OHDSI).

research network having their data in the OMOP CDM can now apply a standardized way to calculate drug daily dose.
• Only the OMOP DRUG_STRENGTH reference table and DRUG_EXPOSURE table are needed for the dose formulas.
• Four different dose formulas apply to the majority of drug records.
• Most of the calculated median daily dose matched the World Health Organization (WHO) Defined Daily Dose (DDD).
• The established dose formulas increase transparency, reliability, and reproducibly of daily dose-related research in real-world evidence.

Plain Language Summary
The analysis of previously collected healthcare data is crucial for understanding the effects of drugs.However, calculating the amount of a drug ingredient a patient was exposed to per day has been a challenge: (i) Such calculations are different for different drug types, for example, tablets, and liquid drugs; and (ii) it requires a comprehensive source of information about how much drug ingredient is in each drug product on the market.In this research, we propose a solution to this problem.We introduced a standardized method for calculating the daily dose for all possible drugs in healthcare data.Moreover, by working our way through the available anonymized patient data in a structured way, we made the dose calculation process transparent and reproducible.Furthermore, comparing our method against global standards according to the World Health Organization (WHO) average dose per day yielded overall good results.This research offers a clear roadmap for increasing reliability and reproducibility of daily dose results in research using healthcare data.Ultimately, patients will benefit from an improved process of generating evidence of drugs related to different quantities of exposure.

| BACKGROUND
In the realm of pharmacoepidemiology and drug safety, the analysis of healthcare data plays a pivotal role in shaping our understanding of the real-world effects of drugs.To unlock the full potential of those data, robust standardized and well-defined patient-level information are needed.The Observational Medical Outcomes Partnership Common Data Model (OMOP CDM) in combination with the Standardized Vocabularies 1 has emerged as a cornerstone in providing a common structure for analyzing healthcare data. 2 It provides a standardized way to represent clinical and healthcare concepts, ensuring consistency and interoperability across diverse data sources.The Observational Health Data Sciences and Informatics (OHDSI) community, which maintains the OMOP CDM, has produced reliable and transparent analytic packages, facilitating large-scale analysis of observational healthcare databases. 3However, conducting drug utilization studies in the context of the European Union's Data Analysis and Real-World Interrogation Network (DARWIN EU) 4 initiative revealed that a standardized way to calculate drug daily dose was still missing.
In general, daily dose is not readily available from the data.This becomes even more challenging when faced with international multicenter and multi-database studies as drug products differ between countries and so do drug vocabularies.OMOP overcomes this challenge by providing the OHDSI Standardized Vocabularies in which drug products are represented through RxNorm and RxNorm Extension for products outside the USA.In our previous paper we describe how this reference came about. 1 The DRUG_STRENGTH reference table 5 stores information concerning strength or concentration of a drug and the respective units.The DRUG_EXPOSURE table 6 stores all drug records and related information (e.g., product, duration, quantity) that were prescribed, dispensed, or administered to patients.Thus, DRUG_STRENGTH together with prescription and administration data in DRUG_EXPOSURE should be sufficient to define the amount of exposure the patient received, that is, the dose.Although OMOP researchers to date have calculated dose in an ad-hoc fashion for each study question, their research will greatly benefit in reproducibility, reliability, and transparency from a standardized approach.Therefore, we aimed to develop a standardized methodology for daily dose calculations in OMOP CDM data sources and to implement it in OHDSI standardized analytics.

| METHODS
We built a methodology to calculate daily dose of a drug depending on a pattern of used units in DRUG_STRENGTH in which the strength of a drug product is stored as amount, mainly for non-divisible dose forms (e.g., pills, capsules, suppositories) or in numerator plus denominator, specifying a drug concentration, mainly used for divisible dose forms (e.g., liquids, aerosols).Thus, the same ingredient is stored in different ways in relation to the dose form of the drug product.Data S1 depicts and explains the organization of the DRUG_STRENGTH reference table in detail given a few drug product examples.
Only clinically relevant units were used in our calculations, others such as homeopathic dosages or bacteria units were disregarded.Clinically relevant units of the amount and numerator were standardized to international unit (IU), milliequivalent, milligram, and milliliter.To develop dose formulas, we assessed the OMOP drug concepts associated with the previously identified 41 drug strength patterns in the DRUG_EXPOSURE table and their corresponding DRUG_-STRENGTH information in three databases: CPRD GOLD, 7 IPCI, 8 and P + 9 (database information in Table 2 and Data S3).We carried out the assessment per administration route (oral, injection, inhalation, transdermal, transmucosal, and topical) because route may influence dose and to allow for the possibility of defining different dose formulas for different routes.This review of how drug information was stored and used for most common drug concepts led to the formation of dose formulas.Thus, the resulting dose formulas were grounded in the OHDSI Standardized Vocabularies and have been confirmed by clinical assessments.Subsequently, we investigated the proportion of recorded OMOP drug concepts in the DRUG_EXPOSURE table for which we would be able to calculate dose with our suggested dose formulas.Finally, we selected six select pharmaceutical ingredients to test our formulas' dose results against an external benchmark, the World Health Organization's (WHO) Defined Daily Dose (DDD). 10e WHO DDD is defined as "the assumed average maintenance dose per day for a drug used for its main indication in adults." 11The chosen ingredients-metformin, enoxaparin, furosemide, salmeterol, tiotropium, and fentanyl-are diverse in units, administration route, and the setting in which they are used.Using the dose formula for the specific route and drug strength pattern of the drug concepts of interest, we calculated overall median daily dose and the 25th and 75th percentile, and additional stratified results by route and drug strength pattern.The benchmarking was carried out in the three databases from which we derived the dose formulas and additionally in four European databases representing a variety of healthcare settings: MAITT, 12 IQVIA DA, 13 IQVIA LPD, and IMASIS 14 (database information in Table 2 and Data S3).All databases had been previously mapped to the OMOP CDM.The selected databases have overall good data quality with sufficient information on the needed variables to proceed with dose calculations (i.e., quantity and duration).

| Study code
The code used to obtain the drug strength patterns, assess the dose formula's applicability, and daily dose benchmarking is freely available in a public GitHub repository: https://github.com/oxfordpharmacoepi/DailyDoseRouteValidationStudy.

| RESULTS
The resulting dose formulas led to the categorization of the 41 identified drug strength patterns into three groups: fixed amount formulation patterns, time-based formulation patterns, and concentration formulation patterns.The dose formulas are depicted in Table 3.
Another outcome from the clinical review was that the definition of the quantity field was overburdened with different usages.We observed that the use of quantity differed depending on drug strength pattern and dose form.The reason is that in the current CDM version, quantity is used for manifold purposes which shall be resolved in a future CDM version.Our dose formula stem from how the quantity field was being used as observed during our clinical review (some consistent with current definitions, some not).Thus, we state our definition of quantity together with each group.
The first group, fixed amount formulation patterns, contain those patterns that had amount_value numeric, amount_unit present, numerator_unit missing, and denominator_unit missing.These patterns comprise drugs where the drug strength is measured in a fixed Databases participating in this study and their contribution.that the quantity depended on whether the denominator value was missing (=1) or not.In cases where the denominator value was missing, the quantity was mainly populated by giving the total volume/ weight/others of the product prescribed or dispensed.An example is quantity 100 for the drug_concept_id: 1713520 "amoxicillin 25 mg/ mL oral suspension", that is, 100 mL.In cases where the denominator value was not missing, we mainly saw single or multiple unit packages, and the quantity was populated with the number of bottles/units/ sachets/others of the product prescribed or dispensed.An example is quantity 15 for the drug_concept_id: 40708507 "1000 mg Estriol 0.001 mg/mg topical cream", that is, 15 units.
The dose formulas were implemented in the DrugUtilisation R package, 15 which is freely available under the Apache License (Version 2.0) and can be obtained from CRAN (https://cran.r-project.org/web/packages/DrugUtilisation/index.html).The package includes documentation on how to calculate and summarize dose for a drug cohort using the addDailyDose() and addDrugUse() function, 16 respectively.
Furthermore, the package allows flexible handling of gaps and overlapping exposure periods to be decided by the user.The quantity is the number of units/tablets/capsules/others prescribed or dispensed.b Our clinical assessment suggested that the quantity depended on whether denominator value was missing (=1) or not.In cases where denominator value was missing, the quantity was mainly populated by giving the total volume/weight/others of the product prescribed or dispensed.In cases where the denominator value was not missing, we mainly saw single or multiple unit packages, and the quantity was populated with the number of bottles/units/ sachets/others of the product prescribed or dispensed.
for tiotropium, and for P+ and IMASIS for fentanyl.The overall daily dose calculations for tiotropium and fentanyl may not be informative because the WHO DDD is different for different dose forms and administration routes, respectively.Daily dose calculations stratified by route and additional by drug strength pattern for tiotropium are depicted in Table 5.The fixed amount formulation patterns relate to the inhalable powder capsule with a WHO DDD of 0.01 mg and the concentration formulation patterns relate to the inhalable solution with a WHO DDD of 0.005 mg.Most databases depicted a clear distinction of calculated daily dose per drug strength pattern for the two different dose forms.Moreover, the overall calculations for IQVIA LPD (which happened to be an outlier) were masking the less frequent inhalable solution for which the drug strength pattern level calculation yielded a value identical to that of the WHO DDD.Daily dose calculations stratified by route and additional by drug strength pattern for fentanyl are depicted in Table 6.IPCI and MAITT yielded median daily doses close to the WHO DDD for all administration routes.All databases had at least one route and pattern strata that yielded correct WHO DDD.
All results of calculated median daily dose for all ingredients including those for strata of route and strata of route and drug strength pattern are shown in the Data S4.For the majority of cases, the large and therefore important strata yielded correct daily dose calculations.However, we observed quite some variation of median daily doses between administration routes (which may be expected for some ingredients) and even larger variation between different drug strength patterns of the same route (which is unexpected).Yet, the distribution of the proportion of drug records between drug strength patterns was diverse and some were smaller than 1% and are therefore negligible.

| DISCUSSION
We developed and implemented a standardized way to calculate drug daily dose in OMOP CDM data sources though utilizing the DRUG_-STRENGTH reference table and quantity and duration from DRU-G_EXPOSURE.Our approach allows overall daily dose calculation and stratification by route, and additionally by drug strength information.
This approach makes the dose calculation process transparent, reliable, and reproducible.Furthermore, the daily dose calculation implementation is compatible with the existing OHDSI analytical pipeline.
T A B L E 4 Overall daily dose calculations per ingredient per database.Obtained median daily dose results were largely consistent with WHO DDD.A study comparing dose calculations among US data with WHO DDD yielded 86% of prescriptions within a factor three of the WHO DDD, yet they did not disclose their dose formulas. 17The cases for which the daily dose calculation did not match the expected results in our assessment may shed light on data mapping problems, inconsistencies in the source data, or prescription patterns truly deviating from the recommendations.One obvious source of error might be the mapping of the source drug code to OMOP's standard concepts, which are the basis for DRUG_STRENGTH.These mappings may be insufficiently granular (e.g., mapping to concepts that carry only ingredient but no strength information) for example.Such a mapping problem can be detected by running the DARWIN EU developed DrugExposuseDiagnostic 18 R package which assesses the DRUG_EX-POSURE table and with which the mapping level can be quickly checked.Other mapping problems can be overcome at the mapping stage with knowledge of the data, the country, and the healthcare system (e.g., if there is only one strength on the market).One could claim that the strength of drug products is a known fact, but since there is no general global compendium available except the DRUG_-STRENGTH table, inaccuracies in the latter will result in incorrect results.Moreover, our dose formulas rely on correct information in the start and end date of drugs (i.e., duration) and the quantity prescribed/dispensed.These values can also be checked using the Dru-gExposuseDiagnostic R package.And we strongly recommend doing so to obtain sensible daily dose estimates.Finally, some inconsistencies between our daily dose calculation and the WHO DDD can be explained by the fact that the DDD, as an average value, does not necessarily correspond to the recommended or prescribed daily dose and that our results are therefore true deviations depending on the healthcare setting and country.Yet, the WHO DDD is the best available benchmark.
Dose calculations were conducted overall and additionally stratified by route and drug strength patterns.This detailed stratification helps to locate the drug concepts that are behind the respective dose calculations.Depending on what the dosing information is used for, for example, if different doses are compared, the stratification allows for restricting the drug strength patterns or even drug concepts to those providing reliable calculations, that is, to those for which exists high certainty that the source is correct and that they have been correctly mapped.Moreover, this detailed investigation into drug daily dose is also a basis for valuable feedback for all databases to question their data quality.A fixable problem that we have observed during the process are differences in the population of the quantity field in the DRUG_EXPOSURE table, which is present in most dose formulas.
To that end, daily dose calculations within OMOP CDM can also be seen as a quality indicator for a database because each data component needs to be at the right place upstream before data can produce correct daily dose.
The current formulas' ability to apply to over 85% of recorded drug concepts in most databases is promising and covers likely all relevant drugs.Yet, high applicability will not automatically lead to T A B L E 5 Daily dose calculations of tiotropium (WHO DDD: 0.01 mg inhalable powder / 0.005 mg inhalable solution) stratified by route, and by pattern and route.The major strength of this study is its systematic and structured approach toward calculation of daily dose within the OMOP CDM.
The creation of structured drug strength patterns, assessment of OMOP drug concepts in DRUG_EXPOSURE, and comprehensive benchmarking against WHO DDD standards enhance the reliability and transparency of the methodology, and finally the reproducibility of the calculated results.Thus, our dose formula and its implementation in the DrugUtilisation R package allow dose calculation within OMOP CDM on a global scale (given the diagnostic assessment suggests dose calculations for the desired study question).Moreover, possible stratification by route and drug strength pattern (often aligns with formulation) helps with country/market and product specific interpretation of results.
However, it is essential to acknowledge certain limitations.If drug concepts of the same ingredient have different units (for example enoxaparin, IU and mg), then the dose calculations yield a result per unit with the current implementation.A general harmonization of units is not possible here because the conversion of IU to mg depends on the ingredient.Furthermore, while the applicability of dose formulas is high in most databases, there was one database with below 50%, mainly due to missing numeric values in the amount formulation patterns.The reason is that this database has different drug typesaccording to the CDM vocabulary 6 -available in their system with different levels of granularity.This is something that would be detected in the diagnostics step and actions can be taken to only use the useful drug type when conducting the study.Additionally, this study's benchmarking step may not encompass the entire spectrum of drugs in clinical practice although we tried our best to select ingredients as varied as possible.The same is true for the participating databases, while we tried to be comprehensive in the database selection, it may not be generalizable to all databases in the OMOP CDM.Yet, the use of the OHDSI Standardized Vocabularies should ascertain use of our dose formulas across the entire OMOP CDM world.Furthermore, there will be circumstances when using the "signetur" (sig field in DRUG_EXPO-SURE) may yield more precise dose calculations.However, this lacks a standard representation in CDM do date.Therefore, we did not use this information in the dose formulas but plan to amend the dose formula for different circumstances with time.Finally, OMOP has guidelines 19 on how the data has to be mapped and what a successful mapping means assessed through the data quality dashboard. 20However, given different data types and sources, not all OMOP databases will have accurate information to calculate dose (especially with regards to exposure duration).Therefore, the diagnostic assessment of the data is utterly important before dose calculation.
To conclude, we provided a standardized methodology for calculating daily drug doses within the OMOP CDM that was benchmarked against existing universal measures of drug consumption.The suggested dose formulas determined through rigorous evaluation enhance the reliability, transparency, and reproducibility of daily doses in pharmacoepidemiologic studies.
Denominator units were standardized to hour, milligram, milliliter, actuation, and square centimeter.This selection resulted in 41 structured patterns, which we further call drug strength patterns.In DRUG_STRENGTH, a drug can only populate either the amount fields or both the numerator and denominator fields; other combinations are not possible.
AUTHOR CONTRIBUTIONSTB and MC conceptualised this study.EB, KLG, and MC authored the DrugUtilisation software package.MC, KLG, and TB wrote the R script for this study.MC, TB, MR, MM, JG, SS, DV, MAM, JMR, AL, MO, and RK utilized the script against their data and were therefore responsible for performing this study.AG, AMJ, CR, KB, LB, and TB provided epidemiological and clinical expertise.TB wrote the initial version of the manuscript.All authors interpreted the results, critically reviewed the manuscript and approved the final version for submission and agreed to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved.
Table 1 depicts these options schematically in DRUG_STRENGTH with detailed information available for the 41 identified drug strength patterns available in Data S2.

1
Available options from DRUG_STRENGTH for the 41 identified drug strength patterns (shown with respective pattern name and dose unit).Detailed information available in Supplementary material 2.
15se formulas for the three groups of drug strength patterns.Amount value, numerator value, and denominator value come from the DRUG_STRENGTH table.Quantity, drug exposure start date, and drug exposure end date come from the DRUG_EXPOSURE table.The dose formulas were implemented in the DrugUtilisation R package,15which is freely available under the Apache License (Version 2.0) and can be obtained from CRAN (https://cran.r-project.org/web/packages/DrugUtilisation/index.html).
Overall results of calculated median daily dose of metformin, enoxaparin, furosemide, salmeterol, tiotropium, and fentanyl are depicted in Table4.We observed that most databases yielded median daily doses similar to the WHO DDD for all ingredients (within a range of a factor of 3).Outliers were observed for P+ for enoxaparin, for CPRD GOLD and P+ for salmeterol, for IQVIA LPD and IMASIS T A B L E 3 a Note:The amount or numerator unit of the pattern defines the unit of the calculated daily dose.Abbreviations: CPRD GOLD, Clinical Practice Research Datalink GOLD; DA, Disease Analyzer; IMASIS, Multicenter Integrated Hospital Information System; IPCI, Integrated Primary Care Information; LPD, Longitudinal Patient; MAITT, University of Tartu dataset of health data; NA, no calculations or no clinically relevant unit; P+, PharMetrics ® Plus for Academics.Daily dose calculations of fentanyl (WHO DDD: 0.6 mg nasal/sublingual, 1.2 mg transdermal) stratified by route and unit, and by pattern and route.The amount or numerator unit of the pattern defines the unit of the calculated daily dose.daily dose calculations, because from a pharmacological point of view, a sensible drug daily dose calculation requires regular and continuous drug use by the patient.Yet, we do not know if the patient collected and used the drug as prescribed.Moreover, our daily dose calculation only covered the intended dose by the prescriber and what the patient ingests is impossible to know.Furthermore, daily Note: $ Results with less than 5 records are suppressed.Abbreviations: CPRD GOLD, Clinical Practice Research Datalink GOLD; DA, Disease Analyzer; IMASIS, Multicenter Integrated Hospital Information System; IPCI, Integrated Primary Care Information; LPD, Longitudinal Patient; MAITT, University of Tartu dataset of health data; NA, no calculations or no clinically relevant unit; P+, PharMetrics ® Plus for Academics.meaningful