Making fair comparisons in pregnancy medication safety studies: An overview of advanced methods for confounding control

Abstract Understanding the safety of medication use during pregnancy relies on observational studies: However, confounding in observational studies poses a threat to the validity of estimates obtained from observational data. Newer methods, such as marginal structural models and propensity calibration, have emerged to deal with complex confounding problems, but these methods have seen limited uptake in the pregnancy medication literature. In this article, we provide an overview of newer advanced methods for confounding control and show how these methods are relevant for pregnancy medication safety studies.


| INTRODUCTION
More than half of all pregnant women in Western countries take medication during pregnancy, 1-3 making studies of medication safety a pressing public health concern. Studying medication safety in pregnancy presents particular challenges: Effects of medications on fetal development can be unpredictable, vulnerability to exposure changes during pregnancy, and outcomes may occur early in fetal development but be detected later. 4 In the general population, knowledge of medication efficacy and safety is primarily based on randomized controlled trials. However, randomized trials routinely exclude pregnant women due to uncertainties about the effects of medications on fetal development, meaning that studies of medication safety in pregnancy must rely on reproductive toxicity studies in animals and on observational data in humans. Several landmark cases, such as the thalidomide disaster, have taught us that animal models for teratogenicity do not necessarily translate to humans. Observational studies, using data from cohort studies, registries, and administrative databases, 5 are opportunities for understanding the risks of medication acknowledged that observational studies are the best method for assessing the maternal and fetal safety of using medication during pregnancy. 6 However, confounding is a major source of bias in observational studies. Recent years have seen the rapid development of advanced methods for dealing with confounding, yet uptake of these methods has been slow in the pregnancy medication literature. This is unfortunate, because in this field, it is arguably especially important that researchers use the best methods for confounding control, because the consequences for getting the wrong answer are so profound: Failing to detect true effects of medication exposure can have enormous effects in the population, and falsely raising the alarm for a safe drug can result in women forgoing needed therapies and, in some cases, terminating wanted pregnancies. 6 In this paper, we advocate for a greater use of advanced methods for confounding control in the pregnancy medication safety research field and provide an overview of these methods under the following framework: 1. How does this method help us to make fair comparisons between the exposed and unexposed groups?
2. How has this method been applied in the pregnancy medication literature?
3. How is the method used in practice? 4. What are the important assumptions for this method? 5. What are the major strengths and limitations of the method? Table 1 provides an outline of pregnancy medication studies using advanced methods to deal with confounding. This paper gives a useful reference for both students and experienced researchers who wish to gain new skills in advanced methods for confounding control.

| CONFOUNDING IN PREGNANCY MEDICATION STUDIES
Confounding control begins with a review of the literature and consultation with subject-area experts. Directed acyclic graphs (DAGs) provide a graphical means to represent the causal structure the investigator believes is present 7 and guide study design, data collection, and analysis. Figure 1 is an example DAG showing one possible causal model for prenatal antidepressant exposure and childhood neurodevelopment, with potential biasing paths, including confounders (other psychiatric illness, other psychiatric medication use, depression severity, and genetics), which should be controlled as far as possible, as well as a mediator (gestational age), and a collider (live birth). Several nonbiasing paths, including a risk factor for the outcome that is unrelated to the exposure (child gender) and a predictor of exposure that is unrelated to the outcome (prepregnancy antidepressant use), are also shown. Obtaining unbiased effect estimates requires investigators to identify and control confounding, while avoiding bias from inappropriate control for colliders and mediators and loss of precision or confusing interpretation of estimates arising from control for factors only related to the exposure or outcome but not both. 8 The Supporting Information contains a more comprehensive review of definitions of confounding, counterfactuals, and causal inference.

| Methods for measured confounders
In Box 1 (Supporting Information), we include a simplified illustration of confounding by measured factors and the methods to address confounding.
Confounder summary scores and marginal structural models  The propensity score, which is the probability of exposure given observed confounders, 9 reduces a large set of confounders to a single summary score. Propensity scores are commonly used in the medical literature; however, other summary score methods, including disease risk scores 10 (preferred in the case of rare exposures) and polygenic risk scores 11 (useful for cases when genetic confounding) are available.
Propensity scores are typically constructed using multivariable logistic regression, where exposure is the dependent variable and confounders are the independent variables. The PS model should include variables that are confounders or predictors of the outcome; inclusion of factors that are only predictors of exposure will increase variance without decreasing bias. 12 High-dimensional PSs, which include thousands of variables identified through computational algorithms, may also be useful for adjusting for unmeasured confounders, if the measured variables are partial proxies for the unmeasured confounders. 13 The PS can be used to match, stratify, adjust, or weight the outcome model. Propensity scores, including high-dimensional PS, have seen increased uptake in the pregnancy literature, ie, safety studies on ondansetron, 14 lithium, 15 antidepressants, 16 and statins 17 in pregnancy, but their use is still minimal compared to multivariable

KEY POINTS
• Studies of the safety of medication use during pregnancy depend mainly on observational studies, which are subject to confounding bias.
• Novel methods for confounding control have seen limited uptake in the pregnancy medication safety literature.
• Application of novel methods is necessary to appropriately address the complex confounding scenarios found in pregnancy studies.

Strengths and limitations
PS is especially useful when working with a common treatment and rare outcome. They also separate the design of the study (modeling confounding) from modeling the outcome. 18 However, for rare exposures, summary scores do not perform particularly well. 19 In  Figure S1A). For example, when studying the safety of antidepressants, we may wish to control for depression severity.
However, antidepressant use in earlier pregnancy predicts depressive symptoms in later pregnancy, which will also predict subsequent antidepressant use. Standard adjustments for depression severity will always be biased in this scenario.
Central to the MSM is the inverse probability of treatment weight.
At each measurement time t, the investigator uses logistic regression to construct the numerator (probability of exposure) and denominator (probability of exposure, given baseline predictors and history of exposure at time t − 1). 24 The total weight is the product of the weights at each time point, and analyses are conducted in the weighted population, or pseudo-population, in which individuals who are likely to be exposed are downweighted, while those who are unlikely to be exposed are upweighted, producing balance of measured confounders within strata of exposure.
Use of MSMs for pregnancy medication safety studies remains rare, 25,26 despite examples where timing of exposure is of great importance, and exposure is conditional on time-varying confounders, such as other medication use, or changes in disease severity.

Assumptions
Under assumptions of positivity, exchangeability, and consistency, the MSM will give an unbiased estimate of the effect of the exposure on the outcome. These assumptions are not formally testable, although assessment of the positivity assumption may include evaluation of the inverse probability of treatment weight for extreme weights and progressive truncation of the weights to determine whether extreme weights are highly influential. 27 When important confounders are unmeasured or incompletely measured, MSM methods will not provide unbiased effect estimates.

Strengths and limitations
The key strength of the MSM is that it allows consideration of time-varying exposure and confounding, which is highly relevant in pregnancy research due to the changes in fetal vulnerability through the course of pregnancy and the tendency of women to change their medication use during pregnancy. 28,29 However, when the treatment-covariate association is very strong, MSMs can produce very wide confidence intervals, which fail to include the true effect. 27

| Methods for incomplete confounder data
Failure to adjust for unmeasured confounders results in biased effect estimates ( Figure S1B). In some situations, the confounder of interest was not measured in the original dataset, but was measured in a similar sample. In this scenario, confounder adjustment is possible, even if the outcome has not been measured in this sample, using PS calibration. [30][31][32] Propensity score calibration is a method based on regression calibration 33 that offers an additional advantage over other methods of calibration, 34 by allowing for adjustment for multiple confounders. For example, in a study of triptan safety, we used a cross-sectional study to jointly adjust estimates for migraine severity and type. 35 In this method, 2 PSs must be calculated: the error-prone PS (estimated in both the main and validation studies, including only the confounders available in the main study) and the gold-standard PS (estimated in the validation study, including all confounders). The outcome model is fitted using the difference between the error-prone and gold-standard PSs to calibrate effect estimates.

Assumptions
In addition to the assumptions of PS models, outlined previously, PS calibration also assumes that the validation sample is a reasonable stand-in for the main sample and that the measurement error model is correctly specified. 30,31 Propensity score calibration also assumes surrogacy, meaning that the error-prone PS is an adequate surrogate for the gold-standard PS. 36 If the outcome is not measured in the validation study, the surrogacy assumption is not testable. Violations of surrogacy occur when the direction of confounding differs between the main and validation studies, 30 and bias arising from violations of surrogacy can be predicted. 36 Other methods exist for unmeasured confounding, including weighting by the inverse probability of missingness, as well as standard imputation techniques, and a comparison of these methods with PS calibration showed little material difference in bias reduction. 37

Strengths and limitations
The main strength of PS calibration allows for adjustment for multiple unmeasured confounders. However, calibration methods fail when unmeasured confounding is strong, and violations of the surrogacy assumption may result in increased bias.

| Methods for unmeasured confounding
Information on confounders may be too difficult to measure (eg, family environment or parenting style) or too costly (eg, deep sequencing genetic data). The methods discussed below exploit aspects of observational data to control for measured and unmeasured confounders.

| Sibling comparison designs
If the unmeasured confounders are shared between siblings (see Figure S1C for illustration), then studies examining with discordant exposure allows researchers to remove bias from shared confounders. [38][39][40] If, for example, we believe that any differences in autism risk between children with and without prenatal exposure to antidepressants is due to inherited genetic risk, then comparing the autism diagnosis between pairs of siblings with different prenatal exposure should be less biased than comparing autism risk between unrelated exposed and unexposed groups.
There has been substantial uptake of sibling study designs in the pregnancy medication safety literature in recent years, particularly in studies examining the safety of antidepressants, where the main concern is separating the underlying genetic and familial components of depression from exposure to antidepressant medications. 41,42 Assumptions Use of sibling designs is most appropriate when confounders that are shared between siblings are more important than unshared, 39 and there are no carryover effects between siblings. 43

Strengths and limitations
Sibling designs control measured and unmeasured confounding that is shared between siblings. However, failing to control for unshared confounders increases bias; sibling studies are also more vulnerable to bias from measurement error than nonsibling studies. 39 Figure S1D).

Strengths and limitations
Instrumental variable analyses control measured and unmeasured confounding, and so instruments that meet all the assumptions will mimic the results from a randomized trial. However, estimates are highly sensitive to violations of untestable assumptions, and violations may produce bias amplification. 44 A reference to selected software for the methods discussed in this paper is included as part of the Supporting Information. With few exceptions, these methods have seen slow uptake in the pregnancy medication literature. This may be due to a sense of caution about methods that can seem opaque upon first encounter with the methods paper describing the technique. Caution is necessary when applying novel methods. However, it is also true that the standard regression methods require similar assumptions to the methods discussed in this paper. If readers find that their research question fits well with one of the scenarios described in this paper, we suggest approaching the problem by tackling the citations given for the technique. The techniques we describe in this paper have their roots in standard regression techniques and can be implemented with standard software.

| DISCUSSION
While this paper focuses on bias due to confounding, other sources of bias such as exposure and/or outcome misclassification 51 and selection bias, 52 as well as seasonal effects, 53 can also distort associations. This paper is not intended to be an exhaustive discussion of all possible methods for confounding control. New techniques are being developed all the time, and many of these, such as g-estimation 54,55 and targeted maximum likelihood estimation, 56 have not yet been implemented in the pregnancy medication literature.
Quantitative bias analysis can help researchers account for bias from systematic errors in their data. 57 Further, the methods discussed herein are not mutually exclusive and can be used in combination with each other: Combining PSs with IVs 46 or MSMs with quantitative bias analysis 25 gives more information about the probable range of effect estimates than any single method.
Observational studies are vital to our understanding of medication safety in pregnancy, but great care must be taken in the analysis and interpretation of data to minimize confounding and bias. In all

ETHICS STATEMENT
The authors state that no ethical approval was needed.