Formulating causal questions and principled statistical answers

Although review papers on causal inference methods are now available, there is a lack of introductory overviews on what they can render and on the guiding criteria for choosing one particular method. This tutorial gives an overview in situations where an exposure of interest is set at a chosen baseline (“point exposure”) and the target outcome arises at a later time point. We first phrase relevant causal questions and make a case for being specific about the possible exposure levels involved and the populations for which the question is relevant. Using the potential outcomes framework, we describe principled definitions of causal effects and of estimation approaches classified according to whether they invoke the no unmeasured confounding assumption (including outcome regression and propensity score‐based methods) or an instrumental variable with added assumptions. We mainly focus on continuous outcomes and causal average treatment effects. We discuss interpretation, challenges, and potential pitfalls and illustrate application using a “simulation learner,” that mimics the effect of various breastfeeding interventions on a child's later development. This involves a typical simulation component with generated exposure, covariate, and outcome data inspired by a randomized intervention study. The simulation learner further generates various (linked) exposure types with a set of possible values per observation unit, from which observed as well as potential outcome data are generated. It thus provides true values of several causal effects. R code for data generation and analysis is available on www.ofcaus.org, where SAS and Stata code for analysis is also provided.


Introduction
In the companion paper we discuss the estimation and interpretation of various estimands using simulated data. The generation of these data was informed by a real investigation but enriched here by the generation of potential outcome data, in addition to factual data. We follow Wallace et al (2015) in simulating data inspired by the results of the Promotion of Breastfeeding Intervention Trial (PROBIT) (Kramer et al, 2001). In this trial mother-infant pairs across 31 Belarusian maternity hospitals were cluster randomised to receive either standard care or a breastfeeding encouragement intervention to investigate the effect of breastfeeding on a child's later development. In our simulation we are randomising individual mother-infant pairs and are focusing on weight achieved at age 3 months, thus the study population is babies that survive the first three months.
The DAG in Figure 1 sketches the underlying causal relationships between the simulated variables.

Offer of BEP
A 1 Weight at 3m Y In the next sections we describe the models we used to simulate the data, called PROBITsim.

The baseline variables
The distribution of baseline variables, L, was made to resemble that of the Belarus study. In all simulations, binary variables were generated from the binomial distribution, and categorical variables with more than two categories were generated using a multinomial distribution.
• The sample size n is set to be 17,044.
• Age is assumed to be log normal log(age) ∼ N (3.17, 0.19). If the simulated age ≤ 13 years, it is set to be 13 years. This yields a median age of 24 years (interquartile range 21-27 years).
• The child's sex (Sex) is a binary variable. Boys are coded as 1, girls as 0. The probability to be male is 52%.
• Education (Educ) had 3 levels (1=low, 2=medium, 3=high). Its distribution depends on the location, according to the Belarus study, where the probability of having low, medium or high education is set as: -Maternal allergy (Allergy) is a binary variable. The probability of having a mother member who suffers from allergy is 0.03 for low education, 0.05 for medium education and 0.07 for more highly educated women.
-Born by caesarian section (Caesarean) is a binary variable. The probability of a caesarean birth is set to be equal to 0.10 for mothers with low education, 0.12 for medium education and 0.16 for more highly educated women.
• Birth weight (Wgt0) is normally distributed and its mean (E) depends on the child's sex, maternal smoking and education. The standard deviation (SD) is set to be larger for boys than for girls.
where we use the shorthand (X = x) for I (X=x) , the indicator that the statement within parentheses is true. Table 2 provides a summary of these data.

Potential and observed exposures
We consider a randomised trial where randomly half of the pregnant women received an intervention that consisted of an offer for a breastfeeding encouragement programme. We assume than only women in the intervention group have access to the encouragement programme. In this study, we distinguish four different exposure types of interest: • A 1 = 1: being assigned to the intervention group, in which the encouragement programme is offered; A 1 = 0: otherwise.
For each woman in the study, we generated potential exposure values for A 2 , A 3 and A 4 when setting A 1 (and in some instances A 2 ) to be 1 or to 0. The following potential exposures were generated: .
• A 2,a 1 (1) and A 2,a 1 (0) , where A 2,a 1 (1) represents the potential exposure A 2 when A 1 is set to take the value 1, and similarly for A 2,a 1 (0) . These potential exposures indicate whether the training programme would be taken up, had A 1 been set to be 1 or 0.
• A 3,a 2 (1) , A 3,a 2 (0) . They indicate whether breastfeeding would be initiated and continued for 3 months, had A 2 , the training, been set to be 0 or 1.
Because we assumed that the programme is only available to women in the intervention group, women in the control group have no access to it, i.e. A 2,a 1 (0) = 0 for all women. This also implies that A 3,a 1 (0) = A 3,a 2 (0) . The next sections describe how these potential exposure realisations and the observed data were generated.

A 1 : randomised intervention
In our simulation women are randomly assigned to receive the offer of the breastfeeding encouragement programme (BEP), or standard care.

A 2 : the programme offer is actually taken up
When the programme is offered, a subgroup of women will take up the invitation and will actually follow the programme. We assume that the more highly educated women are more inclined to follow the programme. For each woman we generated the potential variable A 2,a 1 (1) indicating whether the woman would have followed the programme had she been randomised to the intervention arm. We use a logistic regression model to relate the odds of following the programme to maternal age, education and smoking status during pregnancy as follows: P r(A 2,a 1 (1) = 1) = expit( -1.9+ 0.1 Age + 0.5 (Educ = 2)+1.0 (Educ = 3)-1.0 Smoke).
The potential variable A 2,a 1 (0) is 0 for all women because it is only possible to follow the programme after receiving an invitation for it (i.e. when A 1 =1).
Assuming that the consistency assumption holds, the observed treatment A 2 is then

A 3 : the mother actually starts breastfeeding
We generated an ordinal variable X, representing an unmeasurable individual characteristic, to distinguish three different types of women: women who would always start breastfeeding whether the BEP is offered or not (X = 2), women who would start breastfeeding after following the encouragement programme, but would not, if the programme were not followed (X=1), and women who would never start breastfeeding (X = 0). This variable is used to generate the potential breastfeeding behaviour under different values of the intervention and potential programme uptake but will be treated as an unobservable individual characteristic (and referred to as "principal strata"). We assume that there are no women who will not breastfeed after following the programme, but will start breastfeeding, without following the programme, i.e. there are no defiers and the assumption of monotonicity holds. The ordinal variable X was generated using an ordinal logistic model with: • P r(X = 2) = expit (-2.5 + 0.25 (Educ =2) + 0.5 (Educ =3) + 0.1 Age + 0.008 Sex -0.5 Smoke + 0.0006 Wgt0).
• P r(X ≥ 1) = expit(1.5 + logit (Pr (X=2)) This ordinal variable X was used to obtain the third treatment variable A 3 . Three potential treatment outcomes were generated: A 3,a 1 (1) indicating whether a woman would start breastfeeding if randomised to the intervention arm, A 3,a 1 (0) whether she would start breastfeeding if randomised to the control arm, and A 3,a 2 (1) , if she would start breastfeeding after following the programme. We assume that only women with A 2,a 1 (1) = 1 , i.e the women who would follow the programme if offered, could also be compliers to starting breastfeeding.
• A 3,a 1 (1) = 1 if X = 2 or X = 1 and A 2,a 1 (1) = 1, 0 otherwise Because we assumed that women in the control group had no access to the encouragement programme, A 3,a 1 (0) = A 3,a 2 (0) for all women. The observed treatment A 3 is then Table 1 shows some of the data for the first 10 women. For example woman 1 has the intermediate level of education ('medium'). She is randomised to intervention (A 1 = 1), but because A 2,a 1 (1) is equal to 0 she does not actually follow the programme. The unmeasurable individual characteristic X was 0, indicating that the woman would not start breastfeeding either when randomised to the intervention or when randomised to control and so both A 3,a 1 (1) and A 3,a 1 (0) are 0. If counter to the facts she would have followed the programme, then she still would not have started breastfeeding (A 3,a 2 (1) = 0) because X = 0. In practice we do not observe all these potential outcomes, but for this woman we observe A 2 = 0 and A 3 = 0. Figure 2 illustrates the data generating mechanism for A 2 and A 3 .

Description of the generated confounders and early exposure data
The following properties hold in our simulated population.
• The frequency distribution of the principal breastfeeding strata is 0.32 for never starters, 0.19 for compliers (they will start breastfeeding if randomised for the programme and not if they are in the control arm), 0.49 always starters.
• The probability of starting breastfeeding if the programme were followed by everyone P r(A 3,a 2 (1) = 1) = 0.79. Intervention group Control group 3.5 A 4 : the mother starts breastfeeding and continues for the full 3 months To define this fourth variable we have to model the duration of breastfeeding. We do so in the next section.
4 Potential duration of breastfeeding in the first three months The duration of breastfeeding varies between mothers and depends on education, birth weight, allergy, age of mother, sex of child, caesarean section, and on whether the woman did follow the BEP.
For each subject we started by generating the following two potential breastfeeding durations: • D a 2 (0),a 3 (1) , the potential breastfeeding duration if a woman had been set to start breastfeeding and not to follow the programme.
• D a 2 (1),a 3 (1) , the potential breastfeeding duration if a woman had been set to start breastfeeding and to follow the programme.
The potential duration of breastfeeding when assigned to the control group is: The potential duration when assigned to attending the programme (and hence also when assigned to the intervention group) is The frequency distribution of the different potential duration variables is given in Table 3. The actual observed duration of BF and the observed value of A 4 , continuing breastfeeding for the full three months are equal to • A 4 = 1 if D ≥ 3 months and 0 otherwise. Figure 3 illustrates the data generating mechanism for D and A 4 . Intervention offered 0.34 0.06 0.10 0.50 D a 2 (1) Programme offered and followed 0.22 0.06 0.11 0.61 D a 1 (0),a 3 (1) Programme not offered, BF started 0.07 0.15 0.20 0.58 D a 2 (1),a 3 (1) Programme followed, BF started 0.02 0.08 0.14 0.75

Potential outcomes for weight at three months
We generated the potential weight at 3 months (Y (D)) under different potential durations of breastfeeding (D) using a linear regression model that included birth weight and all the other baseline variables, duration of breastfeeding during the first 3 months of life, and an interaction terms between duration of breastfeeding with birth weight, education and maternal smoking, assuming that children with lower birth weight, children of less educated mothers and children with smoking mothers benefited more from breastfeeding. The model for potential birth weight, (in grams), under different set durations of breastfeeding, was specified as follows: where D is the duration of breastfeeding generated under different set values for A 1 and A 2 as described in Section 4. Individual realizations were generated assuming a normal distribution with SD=50g (assumed to have a biological variation of sd =40 g and a residual component of sd=10g).
Since duration of breastfeeding varies according to intervention, uptake of the programme, and uptake of breastfeeding (i.e.A 1 , A 2 and A 3 ), the potential outcomes under different scenarios that influence duration were calculated. Several potential outcomes Y · were generated to represent the potential weight at 3 months under different interventions: • Y a 3 (0) , the potential outcome under a no breastfeeding uptake (D=0).
• Y a 1 (0) , the potential outcome under no BEP intervention (no programme offer).
• Y a 1 (1) the potential outcome under BEP intervention.
• Y a 2 (1) the potential outcome under programme uptake.
• Y a 1 (0),a 3 (1) the potential outcome under a joint intervention where the programme is not offered and breastfeeding is set to start.
• Y a 1 (1),a 3 (1) the potential outcome under a joint intervention where the programme is offered and breastfeeding is set to start.
• Y a 2 (1),a 3 (1) the potential outcome if the programme is actually followed and breastfeeding is started.
• Y a 4 (1) , the potential outcome if breastfeeding is set to last for three months.
The mean and standard deviation of the different potential outcomes at three months in our simulated population are given in Table 4. Table 4: True potential weight at three months (mean and standard deviation) in the study population under different scenarios, generated as in PROBITsim Study, but with N = 5, 000, 000.

SD: standard deviation
The observed birth weight is generated according to the observed combination of values for A 1 , A 2 , and D. Figure 4 illustrates the corresponding data generating mechanism.

Different causal effects of interest
To explore the effect of different forms of interventions in different sub-populations, we calculated for each potential outcome the weight gain at 3 months, compared to a no breastfeeding scenario. The mean weight gain is calculated in different sub-populations, and results are given in Table  5. The table can be used to calculate all kinds of contrast of interest. 'prog' are women who followed the breastfeeding programme (A obs 2 = 1) 'noprog' are women who received an invitation but did not follow the breastfeeding programme (A obs 2 = 0 and A obs 1 = 1) BF.interv are women who started breastfeeding in the intervention group (A 3 = 1 and A 1 = 1) no BF are women who did not start breastfeeding in the control group (A 3 = 0 and A 1 = 1) Compliers are women who will start breastfeeding if programme is offered, and not, when it is not offered (A 3,a 1 (1) = 1 and A 3,a 1 (0) = 0)

Causal effects in the total population
The true causal effect which would be calculated had this been a randomized clinical trial (intention-to-treat effect), would be E[Y a 1 (1) − Y a 1 (0) ] = 99 grams. If everyone were to ac-tually follow the programme the difference would be E[Y a 2 (1) − Y a 1 (0) ]= 164 grams. If the programme were offered to everyone and everyone started breastfeeding, the difference, relative to no programme, would be E[Y a 1 (1),a 3 (1) − Y a 1 (0) ] =234 grams. If everyone were to follow the programme and all started breastfeeding this would be E[Y a 2 (1),a 3 (1) − Y a 1 (0) ] =261 grams, a slightly larger effect because the programme increases the mean duration of breastfeeding. Had everybody started breastfeeding, without following the programme, the increase in weight would be E[Y a 1 (0),a 3 (1) − Y a 1 (0) ] =199 grams. The difference compared to the situation where no one would start breastfeeding is E[Y a 1 (1),a 3 (1) − Y a 3 (0) ] = 421 grams. Some of the causal effects described above are not very realistic. Not every woman would be able to start breastfeeding. For example when a mother becomes very ill at the end of pregnancy, breastfeeding her baby may not be an option because of toxicity of prescribed medication or poor health. Assuming that every woman would continue breastfeeding for 3 months is even more unlikely.
This shows that some of the causal effects which may be estimable are unrealistic large. In our example the largest causal contrast is the expected weight difference when every infant versus none is breastfed for 3 months, which is equal to E[Y a 4 (1) − Y a 3 (0) ]= 522 grams.

Causal effects in sub-populations
In our example the "average treatment effect in the treated (ATT)" can be defined in different ways. "Treated" could mean actually following the programme, if offered. In this case the effect of attending the programme is ATT = E[Y a 2 (1) − Y a 1 (0) |A 2 = 1]= 153 grams. The corresponding Average Treatment effect among the non Treated then is ATNT = E[Y a 2 (1) − Y a 1 (0) |A 2 = 0 and A 1 = 1]= 180 grams. Alternatively, treated could mean being breastfed in which case the ATT is the effect of breastfeeding in those who actually start breastfeeding: Another local effect which may be of interest is the CACE, the Complier Average Causal Effect. The CACE for the BEP intervention is CACE= E[Y a 1 (1) −Y a 1 (0) |A 3,a 1 (1) = 1 and A 3,a 1 (0) = 0] =437 grams. In our study, the CACE represents the effect of the programme in the subgroup of individuals whose decision to start breastfeeding depends on allocation to the BEP programme.
When implementing an intervention, it is of interest to identify those subgroups for which the intervention is most beneficial. Table 6 shows for example that infants of less educated women will profit more than those of more educated women, both when the programme is offered and when the programme is actually followed. The women follows the programme if it is offered (yes/no) A3pot.A1.0 The women will start breastfeeding, if no intervention is given (yes/no) A3pot.A1.1 The women will start breastfeeding, if intervention is given (yes/no) A3pot.A1.1.A2.1.
The women will start breastfeeding, after following the programme (yes/no) durpot.A1.0 Potential duration of breastfeeding, under no intervention durpot.A1.1 Potential duration of breastfeeding, under intervention durpot.A1.1.A2.1 Potential duration of breastfeeding, when programme is followed durpot.A1.0.A3.1 Potential duration of breastfeeding, under no intervention, but breastfeeding is started durpot.A2.1.A3.1 Potential duration of breastfeeding, when programme is followed and breastfeeding is started durpot.A2pot.A3.1 Potential duration of breastfeeding, when intervention is given and breastfeeding is started Wgt3pot.A1.0 Potential weight at 3 months under no intervention Wgt3pot.A1.1 Potential weight at 3 months under intervention Wgt3pot.A2.1 Potential weight at 3 months after following programme Wgt3pot.A1.0.A3.1 Potential weight at 3 months under no intervention but breastfeeding is started. Wgt3pot.A1.1.A3.1 Potential weight at 3 months under intervention and breastfeeding is started Wgt3pot.A2.1.A3.1 Potential weight at 3 months when programme is followed and breastfeeding is started Wgt3pot.dur0 Potential weight at 3 months when breastfeeding duration = 0 months Wgt3pot.dur1 Potential weight at 3 months when breastfeeding duration = 1 months Wgt3pot.dur2 Potential weight at 3 months when breastfeeding duration = 2 months Wgt3pot.dur3 Potential weight at 3 months when breastfeeding duration = 3 months Figure 2: Data generating model for A 2 and A 3 in terms of A 1 , L 1 and L 2 .

Offer of training
A 1  Offer of training A 1