Bayesian hierarchical methods in the detection of potentially teratogenic first‐trimester medications

Bayesian hierarchical models (BHMs) have been used to identify adverse drug reactions, allowing information sharing amongst adverse reactions and drugs expected to have similar properties. This study evaluated the use of BHMs in the routine signal detection analyses of potential first‐trimester teratogens, where these models have not previously been applied.


| INTRODUCTION
Pregnant women are excluded from the majority of new medication safety studies, so information about potential risks to a foetus is lacking.
Routine screening of congenital anomaly (CA) data for potentially teratogenic medications taken during the first trimester of pregnancy is performed by EUROmediCAT. 1 EUROmediCAT compares the odds of exposure to a specific medication for a specific CA with the odds of exposure to the same medication for all other medication-exposed CAs 2 using a double false discovery rate (FDR) procedure to incorporate groupings of similar medications when adjusting for multiple testing. 3 In the field of pharmacovigilance, large spontaneous reporting databases detect signals of adverse events (AEs) using disproportionality analyses to identify drug-AE combinations with more observed reports than expected under the assumption of independence. 4 Bayesian hierarchical models (BHMs) have been applied to these spontaneous reporting datasets, 5 with the aim of improving estimated associations for any drug-AE combination by incorporating information from other similar drugs (or AEs) using specified groupings. 6 BHMs can explicitly allow (without imposing) for the possibility that different medications in the same group might be related and that AE rates are more likely to be similar within than across these groups. A BHM with information sharing in two dimensions has also been proposed to incorporate groupings of both medications and AEs simultaneously. 7 Simulation studies and application to a sample of the World Health Organisation pharmacovigilance database have demonstrated that a two-dimensional model of information sharing can produce a more powerful BHM to detect true adverse drug reactions, compared with sharing information only in one dimension. 7 We aimed to ascertain whether BHMs that share information between medications and/or CAs using existing coding hierarchies have the potential to improve the effectiveness of signal detection methods using EUROmediCAT data.

| Study population
EUROCAT is a network of European population based CA registries whose data is obtained through both active case finding and voluntary reporting, 8 with multiple sources of information used to ascertain CA cases including live birth, foetal death, and termination of pregnancy for foetal anomaly. Data quality indicators are used to assess consistency of inclusion criteria, data collection, and recording across registries. 9 The International Classification of Diseases coding version 10 with British Paediatric Association extension (IDC10-BPA) is used to code CAs according to EUROCAT guidelines. EUROmediCAT comprises EUROCAT registries that collect information on first-trimester (defined as the first day of a woman's last menstrual period up to her 12th gestational week) medication use. Maternal medication exposure data is primarily obtained through prospectively recorded maternity records, with additional sources including maternal interviews and general practitioner records. 10,11 Data on 31 197 malformed foetuses with first-trimester medication exposures from 1995 to 2011 were available for this study, covering a population of around 7 million births across 13 registries in 11 countries. For some registries, there was considerable data loss where it was not possible to verify the timing of reported medication exposures, which has been discussed previously (6). However, the distribution of types of CA were similar for those pregnancies included and excluded due to unknown timing, suggesting that cases remaining in the dataset for these registries should not be prone to selection biases in this respect. Ethical and data access approvals were obtained for each database from the relevant governance infrastructures. This EUROmediCAT dataset was used previously to compare different FDR methods for multiple testing adjustment, 3 and a large proportion of this data was also previously analysed for signal detection purposes. 2

| Congenital anomalies and medication exposures
Cases with chromosomal anomalies, skeletal dysplasia, genetic syndromes, and microdeletions were excluded as they are unlikely to be caused by teratogenic medications. The aetiology of congenital dislocation of the hip is mechanical, so foetuses with this being their only recorded major CA were also excluded (n = 905). EUROCAT's hierarchical coding system categorises the 54 nonchromosomal CA subgroups into 11 main organ system groups and five "other anomalies/ syndromes" subgroups https://eu-rd-platform.jrc.ec.europa.eu/sites/ default/files/Full_Guide_1_4_version_28_DEC2018.pdf. 12 Five hundred forty-two foetuses with a congenital heart defect (CHD) but no information regarding the specific type of CHD were combined into an additional subgroup ("unspecified CHD"; Table S1. The Anatomical Therapeutic Chemical (ATC) system 13 is used to code unlimited exposures in EUROmediCAT data. The ATC coding

Key Points
• Bayesian hierarchical models have the potential to improve teratogenic signal detection by incorporating information sharing between similar medications and/or congenital anomalies.
• In our analysis, Bayesian hierarchical models demonstrated a potential to detect signals with fewer exposed cases than the current frequentist signal detection procedure.
• In our analysis of prospectively collected registry data on congenital anomalies, Bayesian hierarchical models did not outperform the currently used double false discovery rate (FDR) method of adjusting for multiple testing in signal detection; continued use of double FDR methods are therefore recommended for teratogenic signal detection. system has a hierarchical structure, grouping medications at five levels; the highest (ATC1) groups medications into 14 main anatomical groups, the next three (ATC2-4) use three to five digit codes, respectively, to represent further therapeutic, pharmacological, and/or chemical classifications, and the most detailed level (ATC5) uses seven digits to identify the chemical substance. Older ATC codes subject to alterations over time 14 were updated to the newer codes. Foetuses exposed only to vitamins, minerals, and/or folic acid were excluded.
Foetuses were excluded if exposed only to topical medications, codes with less than five digits (detail below ATC4 level; n = 1219), second/ third trimester medications (n = 1490), or with unknown timing (n = 12 073). Further description of these exclusions by registry have been described in more detail in our previous study. 3 After these exclusions, a total of 15 058 malformed foetuses were included in the analysis dataset. ATC5 codes were analysed, but where information was only available to ATC4 level, this was also included. Five hundred twenty-three medications with at least three exposed foetuses were investigated. As in the previous EUROmediCAT analysis, foetuses exposed only to medications with fewer than three exposures in total were included in the dataset as controls (since medications with fewer than three exposures are not analysed for signal detection). 2,3

| Statistical analysis
Results from BHMs were directly compared with those obtained previously for the double FDR procedure on the same dataset. 3 Briefly, the double FDR comprises two steps 15,16 : first, a representative minimum P value is calculated for each group and only those groups with a representative P value below the specified FDR threshold are included in the next step. In the second step, a Simes FDR procedure is applied across all combinations belonging to those groups passing the first step. All data management and calculations for Fisher's exact test and the double FDR procedure were performed using Stata 12. 17 For the BHMs, a Gamma Poisson Shrinker (GPS) and a BHM with a Poisson distribution were combined to model the CA and medication counts. 7 The expected count E ij for each observed count c ij was calculated using the marginal totals for medication i and CA j assuming no association between i and j. The proportional reporting ratio (PRR) 18 for medication i and CA j was the ratio of the observed to expected counts, PRR ij = c ij Eij . The data structure for two-dimensional information sharing by medications and CAs is displayed in Table 1.
Here, d represents ATC3 medication codes with D groups and i = 1, …, n d medications within each group d and a denotes groupings of CAs according to the EUROCAT organ system classes with A groups of CAs and j = 1, …, n a CAs within each group. The lighter grey shading in Table 1 represents the set of the d = 2 group of medications crossed with the a = 2 group of CAs, and the dark grey cell in Table 1 c 2122 denotes the observed count for the combination of medication i = 1 in the d = 2 medication group with CA j = 2 in the a = 2 CA group.
Each set in Table 1 has a group distribution, such that each medication-CA combination within that two-way group shares a common prior distribution. There is also a prior distribution for the set of all top-level sets, that is, an average across all CAs and/or medications.    20,21 The code used to specify these models in JAGS is presented in Table S2. The coda package 22 in R was used to assess model convergence and to summarise the posterior distribution for each parameter, including convergence statistics and visual inspection of trace, density and auto-correlation plots for the parameters in each model.
These measures were also used to determine the required number of total iterations and thinning.
Any medication-CA combination with a posterior 2.5th percentile (ie, the lower limit of a one-sided 97.5% posterior confidence interval [PCI]) greater than 1 for the PRR was considered a potential signal. As this choice of threshold is somewhat arbitrary, the effect of choosing a stricter 0.5th percentile as a cut-off was also assessed. As the purpose of signal detection is to screen for potential teratogens, combinations are only flagged as potential signals if they represent an increased in reporting (ie, PRR > 1). The number of associations with a PRR < 1 (medications relatively "less harmful" for a specific CA than the average medication in the dataset) and corresponding 97.5th (or 99.5th) percentile less than 1 were monitored to determine how often these occurred.

| Evaluation and comparison of signal detection methods
Although risk classification systems have been implemented and used in a number of countries including Australia, the United States, and Sweden, [23][24][25] a key challenge in the assessment of signal detection methods for CA data is that there is no "gold standard" for classifying risks according to specific CAs. 26 Identification rate = Number of "high-risk" medications identified as potential signals Total number of "high-risk" medications in the data The proportion of category A medications being identified as potential signals was also considered, as a measure of the likely number of "false positive" associations. The total number of unique medications in the set of potential signals identified by each method is referred to as the "effective workload." Note that we refer only to potential signals as the aim of this study is to assess signal generation methods for CA data, and we do not further evaluate the potential signals here (since this has already been done for this dataset 28 ).

| Description of signal detection dataset
Data on 15 058 malformed foetuses were available for analysis with 55 CAs in 16 organ system groups (see Table S1): an average of 4. groups, 94 (81%) contained no high-risk medications, 13 (11%) contained one high-risk medication, and nine (8%) groups contained at least two high-risk medications. Table 2 and Figure 1 present key results from the four BHMs and the double FDR procedure, according to the two thresholds used to define potential signals for BHMs (95% and 99% PCIs) and for FDR cut-offs varying from 5% to 50%. A cut-off of 5% (FDR 5%), for example, means that up to 5% of the potential signals in the double FDR analysis might be expected to be false positive associations. The number of potential high-risk medication signals identified by each method is displayed in Figure 1, which plots the identification rate against the effective workload. Table 2 shows that the number of ATC3 groups with at least one potential signal decreased as the cut-off level for each method became stricter and fewer potential signals were identified. The effective workload is also shown for each method (in bold), including a breakdown by risk categories.  Figure 1 show that a one-dimensional BHM with grouping by ATC3 and a 95% PCI cut-off resulted in the most potential signals overall and identified the most high-risk DX medications as potential signals (identification rate = 48%). However, this model also resulted in a very high effective workload (n = 160), with 30% of all medications identified as potential signals and at least one potential signal in over half the ATC3 groups. Individual

| Signal detection analysis
BHMs and those grouping only by CA gave similar results for a 95% PCI cut-off, though with fewer high-risk signals identified and higher effective workloads. Using a stricter 99% PCI cut-off to define potential signals always resulted in a lower identification rate, but a higher proportion of the potential signals being for highrisk medications. The proportion of category A medications identified as potential signals was lower than the high-risk proportion for double FDR, but higher than the high-risk proportion for the majority of BHMs. The "strictest" model was double FDR, especially at lower thresholds, for example, double FDR 5% identified only three potential signals (two high-risk). A BHM grouping both medications and CAs with a 95% PCI threshold identified four more high-risk medications (identification rate 23%) than double FDR 50% (identification rate 14%); however, this was at the expense of more than a 4-fold increase in effective workload, from 16 medications to 71, for a gain in identification rate of only 9%. When using the 99% PCI cut-off to define potential signals, the two-dimensional BHM did not perform as well as the double FDR 50%. Results from the alternative one-dimensional models (averaging over the ungrouped dimension) are presented in Table S4. These models resulted in low effective workloads and identification rates, more comparable with those using a double FDR. Double FDR 50% and its "equivalent" one-dimensional BHM with grouping by ATC3 and averaging over CAs resulted in 16 and 15 medications being identified as potential signals, respectively (Table S4). However, two more high-risk medications (six vs four) were identified by double FDR.
The number of "less harmful" associations according to each method is presented in Table S5, along with the total number of combinations identified as potential signals (as shown in Table 2) for comparison purposes. Double FDR resulted in only three "less harmful" associations; however, all of the BHMs resulted in a considerable number (up to 23 or 69 for models using a 99% or a 95% PCI threshold, respectively).

| Different potential signals according to different methods: The effect of shrinkage in BHMs
As well as the overall number of potential signals, differences in which medication-CA combinations were identified as potential signals in the different methods were apparent. One situation where this occurred was due to shrinkage to the null, where a potential signal attenuates in some BHMs due to the influence of other combinations in that group, as demonstrated in Figure S1. Shrinkage to the mean can also occur if a strong association influences other combinations in its group to create additional potential signals that are not present without this shrinkage, for example, see Figure S2.

| DISCUSSION
Many BHMs considered in this study identified more of the highrisk medications (higher identification rate) than the double FDR; however, these improvements came at the expense of a substantial increase in the effective workload, and therefore lower proportions of the potential signals being for high-risk medications.  Note that the effective workload is lower than the total number of combinations identified as potential signals, since a medication can be a potential signal in combination with more than one CA. b "High-risk" proportion: the proportion of all potential medication signals that are high-risk.
c Identification rate: the proportion of high-risk medications that are identified as potential signals, out of the 44 high-risk medications in the dataset.
d "False positive" proportion: the proportion of potential signals that were marked as category A medications (considered to be safe for use during pregnancy). e FDR cut-off levels were assessed in 5% increments from 5% to 50%, but some cut-off levels resulted in the same values for this table; within the ranges 15%-25%, 30%-40%, and 45%-50% the same number of signal were identified.
strengthened the analysis of combinations with low cell counts by using information in the surrounding cells, allowing them to be included in the set of resulting potential signals (Table S5). In contrast, the frequentist EUROmediCAT approach requires at least three exposures to identify any association as a potential signal. This detection of a potential medication signal for a newer drug with little data is one of the potential advantages of BHMs in signal detection. Further investigation is warranted to determine how likely it is that the addi-

| Evaluation and comparison of methods
The lack of existing knowledge regarding the teratogenic effect of medications used during pregnancy makes it difficult to evaluate how many teratogens are missed by each method and the possible reasons for any lack of detection. A key limitation of using the Australian classification system is that high-risk medications are not identified in association with a specific CA. In addition, categorisation of medications as B or C in the Australian database may indicate a lack of evidence rather than meaning these medications are really low risk for CAs. Furthermore, almost a third of the medications in the EUROmediCAT data were not present in the Australian risk classification database. In practice, teratogenic risk is nearly always specific to certain CAs. 29 This may have affected our "identification rate," which does not reflect the number of different CAs that a medication is associated with. In this analysis, any associations arising due to confounding by indication cannot be identified. Our use of the risk categorisation system here was not to judge the absolute strengths of a signal detection procedure, but rather to directly compare methods in terms of the volume of potential signals and assessment of resulting workload for follow up of identified associations; the limitations identified above should therefore be present across all models considered.

| Methodological considerations
An important assumption of the Poisson distribution is that events in the data occur independently. However, in EUROmediCAT, a malformed foetus often has multiple CAs and/or medication exposures and certain CAs may be more likely to co-occur within pregnancies.
Similarly, exposure to a specific medication may increase the likelihood of exposure to another medication, for example, it is common to take several different asthma medications together. It may also be the These possibilities require further investigation.
In any signal detection analysis using disproportionality measures, reporting biases for a common medication may lead to inflation in the overall rates for that medication, meaning that other associations in the database are masked. 27,31,32 This is not thought likely to occur in the EUROmediCAT as medication exposure is generally collected before the CA diagnosis. Masking may also be an issue in EUROmediCAT data through the use of malformed controls if a proportion of the control group is related to the medication of interest. Studies have demonstrated that the removal of a masking effect may help lead to new signals of public health relevance being discovered. 31,33 It is also thought, however, that significant masking is not common in large spontaneous reporting databases, and where present it mostly affects rarely reported AEs. [34][35][36] Confounding by co-reported medications can also occur if two medications are frequently prescribed together but only one causes the CA of interest. 32 We may expect a teratogen to act in a similar way regardless of where it is taken; however, certain medications may have varying usages and/or availability in different EUROmediCAT registries and countries. As many medication-CA combinations have very small numbers, the best approach to an ongoing signal detection process is considered to be investigation of any potential registry effects at a later stage in the analysis; as such, after potential signals are generated, the next step of the EUROmediCAT signal detection process includes the adjustment of estimates for confounding by registry. 28 This analysis excludes all chromosomal anomalies; these anomalies could theoretically be analysed as a negative control outcome as no medications are expected to be associated with any chromosomal anomalies. However, the risk of a chromosomal anomaly is strongly associated with maternal age, and methods to adjust for this confounder in signal detection analyses would need to be developed.

| Strengths and limitations of EUROmediCAT data
The existing EUROCAT network, upon which EUROmediCAT is based 11 , ensures that CAs are coded in a detailed and standardised manner across all registries. Good agreement between medication exposures recorded in the EUROmediCAT database and those actually used has also been demonstrated. 37 As maternal medication exposure data in EUROmediCAT registries is primarily obtained through prospectively recorded maternity records, confounding by the time of pregnancy registration of adverse outcomes is unlikely to have occurred. On the other hand, there is known under ascertainment for certain medication exposures in EUROmediCAT data, which may reduce the sensitivity of any signal detection analysis. 10,38 There is also a lack of information regarding the dosage and precise timing of medication exposures. Although the critical period of development for most major CAs occurs during the second and third gestational months, the exact timing can differ according to the type of CA and some CAs may also develop after the first trimester of pregnancy. [39][40][41] In this study, we cannot determine whether medications were taken during the particular critical period for development of each specific CA. As only malformed foetuses exposed to at least one medication are included, it is not possible to estimate the relative risk of a CA for medication exposures compared with a healthy (ie, non-exposed and non-malformed) control population. 42

| SUMMARY AND CONCLUSIONS
Despite the difficulties in assessing the performance of the signal detection methods, we recommend the double FDR method for continued use in signal detection analyses of EUROmediCAT data.

ETHICS STATEMENT
The authors state that no ethical approval was needed.

SUPPORTING INFORMATION
Additional supporting information may be found online in the Supporting Information section at the end of this article.