Causal Mediation Analysis with Multiple Mediators

In diverse fields of empirical research—including many in the biological sciences—attempts are made to decompose the effect of an exposure on an outcome into its effects via a number of different pathways. For example, we may wish to separate the effect of heavy alcohol consumption on systolic blood pressure (SBP) into effects via body mass index (BMI), via gamma-glutamyl transpeptidase (GGT), and via other pathways. Much progress has been made, mainly due to contributions from the field of causal inference, in understanding the precise nature of statistical estimands that capture such intuitive effects, the assumptions under which they can be identified, and statistical methods for doing so. These contributions have focused almost entirely on settings with a single mediator, or a set of mediators considered en bloc; in many applications, however, researchers attempt a much more ambitious decomposition into numerous path-specific effects through many mediators. In this article, we give counterfactual definitions of such path-specific estimands in settings with multiple mediators, when earlier mediators may affect later ones, showing that there are many ways in which decomposition can be done. We discuss the strong assumptions under which the effects are identified, suggesting a sensitivity analysis approach when a particular subset of the assumptions cannot be justified. These ideas are illustrated using data on alcohol consumption, SBP, BMI, and GGT from the Izhevsk Family Study. We aim to bridge the gap from “single mediator theory” to “multiple mediator practice,” highlighting the ambitious nature of this endeavor and giving practical suggestions on how to proceed.


Web Appendix A: Causal mediation estimands with n causally-ordered mediators
First we note that there are 2 n paths from X to Y when there are n causally-ordered mediators, one for each subset of {M 1 , . . . , M n }. Since we are interested in the finest possible decomposition, this is therefore a decomposition into 2 n path-specific effects.
First note that this is only a path-specific effect from a finest possible decomposition if:
When n = 2, 2 (2 n +n−1) = 32, corresponding to the number of effects given in Table 1 in the main manuscript.
Furthermore, we can construct a decomposition of the total causal effect into a sum of pathspecific effects using the algorithm given below. First, we require some additional notation.
Write 0 (b 1 ,b 2 ,...,bs) a for the (1 × a) row-vector with entries b 1 , b 2 , . . . , and b s all equal to 1, and the remaining entries equal to 0; write 0 a for the (1 × a) zero row-vector and 1 a for the (1 × a) row-vector consisting entirely of ones. Here is the algorithm: (1) Choose a path, i.e. choose a subset S 1 of {M 1 , M 2 , . . . , M n }.
(3) The path-specific effect associated with S 1 is a level-0 effect and is given by: where NPSE stands for natural path-specific effect.
(6) The path-specific effect associated with S 2 is a level-1 effect and is given by: (7) Keep repeating these three steps. The p th path-specific effect is a level-(p − 1) effect through S p and is given by (8) Finally, the (2 n ) th path-specific effect is a level-(2 n − 1) effect through the only remaining subset S 2 n and is given by Note that, since (k * 1 , . . . , k * 2 n ) are all distinct (since each corresponds to a distinct subset/path), (k * 1 , . . . , k * 2 n ) contains all the integers from 1 to 2 n exactly once. Thus 0 (k * 1 ,...,k * 2 n ) 2 n is a (1 × 2 n ) row-vector consisting entirely of ones. That is, The total causal effect decomposes into the sum of these path-specific effects: The order in which the paths were chosen (S 1 , . . . , S 2 n ) was completely arbitrary, and 4 thus the algorithm above can be used to give rise to (2 n )! decompositions, one for each permutation of (S 1 , . . . , S 2 n ). Each decomposition consists of exactly one level-0, one level-1, . . . , one level-(2 n − 2), and one level-(2 n − 1) effect-and these types are allocated to the paths in this order. A non-trivial permutation of (S 1 , . . . , S 2 n ) thus leads to at least two of the path-specific effects changing their type, and thus each of the (2 n )! decompositions constructed via the algorithm above is distinct. Furthermore, there can be no decompositions other than the (2 n )! constructed in this way. This is because the necessary cancelling out of that appears as the left-hand term of one effect, must also appear as the right-hand term of another effect; and for this second effect to be a path-specific effect, the right-hand term in question can differ from its corresponding left-hand term only in one entry. Thus, the choice of the order in which the paths appear in the algorithm is the only free choice one has in constructing effects that permit decomposition.

Web Appendix B: Example: Linear models with interactions
From the linear model with interactions (assuming no confounders) given in Section 3.5 of the main manuscript, we derive the following potential mediators and outcomes: This leads to the following expressions for the 32 effects in Table 1 in the main manuscript: turn out to play a key role in identification (see Section 4.2.2 of the main manuscript, and Web Appendix J).

Web Appendix C: On the interpretation of summary path-specific effects with a single mediator
In Section 3.6.2 of the main manuscript, we introduce summary path-specific effects for the two mediator setting. These expressions turn out to be much simpler for the single mediator setting Kuha and Golthorpe (2010  In a non-randomised study, the SNDE could still be interpreted as the effect that would be found in such hypothetical randomised experiment.

Web Appendix D: Comparison of summary natural and LSEM-based path-specific effects
To show that interactions can influence the values of the summary natural path-specific effects, we consider the following very simple example. Suppose that the data generating process is X ∼ N (0, 1), M |X ∼ N (X, 1) and Y |X, M ∼ N (XM, 1). This leads to )) = 0 and E (Y (1, M (1))) = 1. This implies that PNDE=PNIE=0 and TNDE=TNIE=TCE=1. This gives summary natural direct and indirect effects of SNDE=SNIE=0.5. However, if we wrongly assume there to be no XM interaction in the model for Y our assumed (incorrect) model would be: The values of (γ 0 , γ x , γ m ) satisfy: This implies that γ 0 = 1 and γ x = γ m = 0. Thus, if we wrongly assume there to be no XM interaction in the model for Y , we would obtain direct, indirect and total effects all equal to 0.

Web Appendix E: Identification of the CDE with two mediators
In Section 4.2.1 of the main manuscript, we mention that the CDE with two mediators can be identified, under assumptions (MC.1) and (MC.2) using the g-computation formula (Robins, 1986). The identification formula is given below:

Web Appendix F: On assumption (MN.5)
Consider the setting in which the data generating process is a NPSEM: where g · (·) are deterministic functions and {C, U X , U L 1 , U M 1 , U L 2 , U M 2 , U Y } are mutually independent. Consider the following counterfactuals derived from the model above: And consider the following conditional independence statements, for x = 0, 1, x * = 0, 1, x ′ = 0, 1 and ∀c, l 1 , l 2 , m 1 , m 2 , m ′ 1 : On the other hand, if we omit intermediate confounders L 1 and L 2 , the NPSEM be- , giving the following counterfactuals: After conditioning on C, we see that See Figure 4 in Web Appendix G for an example of part of this algebraic argument displayed graphically.

Web Appendix G: Cross-world directed acyclic graphs
In this section, we provide two examples of cross-world directed acyclic graphs (Shpitser and Pearl, 2008) which complement the arguments given, particularly with respect to intermediate confounding, in Section 4 of the main manuscript and the relevant Web Appendices referenced therein. Worlds in which different interventions are applied are drawn separately, but correlations between variables across different worlds are explicitly shown using the "U " variables. d-separation (Pearl, 1995) can then be applied to read off from the graph which conditional independence relationships between counterfactual variables hold.
Similar graphs corresponding to all the other algebraic arguments are omitted in the interest of brevity, but can be obtained from the corresponding author on request.

Web Appendix J: Identification and sensitivity analysis under a particular parametric model
When there is an effect of M 1 on M 2 , we saw in Section 4.2.1 of the main manuscript that eight of the 32 effects listed in Table 1 On the other hand, if the structural equation for M 1 is of the form: with the second equality justified under assumption (MN.4).
The ( and M 1 (1) are perfectly correlated and independent (given C), respectively. We may not believe either of these extremes to be appropriate. By specifying the form of the SEM more restrictively, we can also find SEMs for which the conditional correlation of M 1 (0) and M 1 (1) given C is between 0 and 1, and for which the boxed quantity in equation (4) of the main manuscript is identified up to a sensitivity parameter. For example, consider the following 13 form for the SEM for M 1 : Note that can be estimated from the data. However, the data contain no information on κ 2 , the proportion of residual variance shared across worlds; this becomes the sensitivity parameter, to be varied from 0 to 1. An example of this sort of sensitivity analysis is given in Section 6 of the main manuscript.
If M 1 were binary, a similar approach could be taken, where the marginal probabilities and a sensitivity parameter could then be defined for respecting the fact that this probability lies within .
A similar approach was taken for discrete mediators by Albert and Nelson (2011) and in a different context by Roy et al. (2008).

Web Appendix K: Identification for the linear model with interactions
Looking back at the expressions for the path-specific effects given in Web Appendix B, we see that the identification of the effects for the linear model with interactions given in Section 3.5 of the main manuscript simplifies if some of these interactions are absent, and also in the absence of an effect of M 1 on M 2 (in accordance with the more general result in Section 4.2.1 of the main manuscript).
For the NDE effects, we see that the sensitivity analysis would not be required if γ xm 1 m 2 = 0, i.e. in the absence of a three-way interaction in the model for Y . Also, if β m 1 = β xm 1 = 0, corresponding to no effect of M 1 on M 2 , then, as we would expect, the sensitivity analysis is not needed.
For the NIE 1 and NIE 12 effects, we see that the sensitivity analysis would not be required if γ m 1 m 2 = γ xm 1 m 2 = 0, i.e. in the absence of both the two-way interaction between M 1 and M 2 and the three-way interaction in the model for Y . Again, if β m 1 = β xm 1 = 0, corresponding to no effect of M 1 on M 2 , then, as we would expect, the sensitivity analysis is not needed.
Finally, for the NIE 2 effects, the sensitivity analysis would again not be required if γ m 1 m 2 = γ xm 1 m 2 = 0. And if β xm 1 = 0, corresponding to no interaction between X and M 1 in their effect on M 2 , (even if there is a main effect of M 1 on M 2 ), then the sensitivity analysis is not needed. We consider replacing assumption (MN.5) by:

Web Appendix L: Identification when there is intermediate confounding
Assumption (MN.5b).
which gives rise to the following counterfactuals:  Table 2 in the main manuscript.

(a) identifying natural direct effects.
Each NDE (see Table 1 in the main manuscript) is of the form This can be re-written as: As before, the boxed quantity in (A.37) is not identified, and requires a stronger parametric model specification, or a sensitivity analysis, as discussed in Web Appendix J.

(b) identifying natural indirect effects through M 1 alone.
Each NIE 1 (see Table 1 in the main manuscript) is of the form for some x, x ′ , x ′′ . This can be re-written as:

Again, the boxed quantity in (A.41) is not identified, and thus the approach discussed in
Web Appendix J is needed.

(c) identifying natural indirect effects through M 2 alone.
Each NIE 2 (see Table 1 in the main manuscript) is of the form for some x, x ′ , x ′′ . This can be re-written as: Steps ( There are now two boxed quantities in (A.43) that are not identified. The approach discussed in Web Appendix J would need to be extended to incorporate the joint crossworld distribution of M 2 (x, m 1 ) for different x, as well as the joint cross-world distribution of M 1 (x) for different x. This can be done in exactly the way described in Web Appendix J but would now involve two sensitivity parameters rather than one.

Web Appendix M: More details on the data analysis
In this appendix more details of the data analysis (see section 6 of the main manuscript) are given. Web Table 3 gives some descriptive statistics. In Web

Web Appendix N: Plots for the sensitivity analysis varying the value of κ
In this section we include several plots based on the analysis of the data from the Izhevsk Family Study (see Section 6 of the main manuscript), mainly focussing on the sensitivity parameter κ.
The general conclusion is that the results here are not very sensitive to the value of κ. The final figures (16 to 19) confirm that those effects whose estimation did not require knowledge of κ are indeed insensitive to the value of κ. Of note is the fact that the summary effects are less sensitive to κ than most of the other effects.
for j = 2, . . . , n: for j = 2, . . . , n, k = 1, . . . , j − 1: and for j = 2, . . . , n:  m1, m2) and M1 (x * ) are not in general conditionally independent given C and L1 whenever L1 is a common cause of M1 and Y and is affected by X. The path shown in bold is not blocked by (C, L1). The numbers superimposed on the bars represent the code for that effect type (as defined in the caption of Table 1 in the Web Appendix). The numbers along the x-axis represent the decomposition number, also defined in Table 1 in the Web Appendix.  Figure 16 in the Web Appendix but with each box plot centered at its median, so that the variability across effects can be more easily assessed.  Figure 19. The same as Figure 18 in the Web Appendix except that the y-axis scale is the same for all columns in the same row, so that a comparison of sensitivity to κ can be made.

Table 1
All 6 possible decompositions of the total causal effect (TCE) into a direct effect (DE), an indirect effect via M1 (IE1) and an indirect effect via M2 (IE2) when M1 and M2 are not causally ordered. In each decomposition, there is one level-0 effect, one level-1 effect, and one level-2 effect. The definitions of each of these effects is given in Table 3 of the main manuscript. In columns 2-4, the effect types are labelled: 1=00, 2=10, 3=01, and 4=11.

Table 4
Akaike Information Criterion (AIC) for different associational models for M1, M2 and Y . The "basic" model for M1 includes each of the four confounders (C) and the exposure (X). In addition, the "basic" model for M2 includes M1 and the interaction between M1 and X. The "basic" model for Y additionally includes M2, the interaction between M2 and X, the interaction between M2 and M1 and the three-way interaction XM1M2. "Quad C" denotes that quadratic terms for the two continuous confounders (age and SES) are included. "Inter XC" denotes interaction terms between X and each of the 4 confounders. "Quad M1" is a quadratic term in M1 and "quad M" is a quadratic term in both M1 and M2. The chosen model, i.e. the model with the highest AIC, is shown in bold type.