Causal Mediation Analysis in Economics: objectives, assumptions, models

The aim of mediation analysis is to identify and evaluate the mechanisms through which a treatment a ects an outcome. The goal is to disentangle the total treatment e ect into two components: the indirect e ect that operates through one or more intermediate variables, called mediators, and the direct e ect that captures the other mechanisms. This paper reviews the methodological advancements in causal mediation literature in economics, in particular focusing on quasi-experimental designs. It de nes the parameters of interest under the counterfactual approach, the assumptions and the identi cation strategies, presenting the Instrumental Variables (IV), Di erence-in-Di erences (DID) and the Synthetic Control (SC) methods.


Introduction
In economics, causal analysis, or more in general program evaluation, is a fundamental instrument that allows to study causal eects of a variable of interest, known in literature as treatment.
Causal analysis answers questions like: Do subsidies to private capital boost rm's growth?
or Are these eects positive or negative?. But this kind of analysis cannot answer to another important question: Why are these treatments eective?. As pointed out by Gelman and Imbens (2013) not only the "eect of a cause", i.e. the treatment eect, seems relevant in many problems, but also "the cause of the eect", i.e. the mechanisms through which the total eect materializes.
To use the words of Imai, Tingley and Yamamoto (2015): A standard analysis of data [. . . ] can only reveal that a program had such impacts on those who participated into it. It means that we can quantify the magnitude of these impacts, we can know how much a treatment aects an outcome, but these estimates tell us nothing about how. We know something about the causal eects, but nothing about causal mechanisms.
To overcome these limits a solution can be found in the causal mediation analysis, i.e. a formal statistical framework that can be used to study causal mechanisms. Following the denition given by Imai, Keele, Tingley, Yamamoto (2013) a mechanism is a process where a causal variable of interest, that is a treatment, inuences an outcome through an intermediate variable, the mediator, that lies in the causal pathway between the treatment and the outcome variables.
Studying causal mechanisms permits to know something more about social and economic policy implications than the total eect alone. This allows policy makers to optimize decisions, making them more ecient. The main elds in which mediation has been developed are psychology and sociology. For instance, Brader, Valentino and Suhay (2008) go beyond estimating the framing eects of ethnicity-based media cues on immigration preferences and ask: Why the race of ethnicity of immigrants, above and beyond arguments about the consequences of immigration, drives opinion and behavior?. That is, instead of simply asking whether media cues inuence opinion, they explore the mechanisms through which this eect operates. Consistent with earlier work suggesting the emotional power of group-based politics (Kinder and Sanders, 1996), the authors nd that the inuence of group-based media cues arises through changing individual levels of anxiety.
Another example is in electoral politics literature. Gelman and King (1990) found the existence of a positive incumbency advantage in the election. A few years later, in 1996, Cox and Kats lead the incumbency advantage literature in a new direction by considering possible causal mechanisms that explain why incumbents have an electoral advantage. They decomposed the incumbency advantage into a scare o/quality eect and eects due to other causal mechanisms such as name recognition and resource advantage.
Mediation is playing an increasing important role also in educational studies. Following the words of A. Gamoran the next generation of policy research in education will advance if it oers more evidence on mechanisms so that the key elements of programs can be supported and the key problems in programs that fails to reach their goals can be repaired (A. Gamoran, 2013, President of the William T. Grant-Foundation). Also in a recent special issue of the Journal of Research on Educational Eectiveness focused on mediation and it has been noted that such eorts in mediation analysis are fundamentally important to knowledge building, hence should be a central part of an evaluation study rather than an optional`add-on' (Hong, 2012). Can be found some empirical researches in the educational eld like in Bijwaard and Jones (2018), who study the impact of education on mortality via cognitive ability, or Heckman, Pinto and Savelyev  (2017), who investigate whether the employment eect of more rigorous caseworkers in the counselling process of job seekers in Switzerland is mediated by placement into labor market programs. The common approach used to study causal mechanisms in economics is structural equation model (SEM), see for instance the seminal work by Baron & Kenny (1986). But, as demonstrated by Imai, Keele, Tingley and Yamamoto (2011), SEM is not the appropriate method to study and to identify causal mechanisms. They showed that structural models rely upon untestable assumptions and are often inappropriate even under the validity of those assumptions. In particular, conventional exogeneity assumptions alone are insucient for identication of causal mechanisms 1 , whereas it can be a sucient condition for identication of the classical average treatment eect. In addition to that, the mediator could be interpreted as an intermediate outcome: in such a model we should control for a large set of covariates (pre and post treatment), risking to have dierent results depending on the covariates chosen and then increasing the sensitivity of the estimates. Therefore, the use of mediation in economics can be useful and ecient, and this is the main motivation of this brief exploration of mediation in economics.
To overcome these problems, relaxing the structural restrictions, over the last decades, some Tchetgen Tchetgen and Shpister (2012); Vansteelandt, Bekaert and Lange (2012) from many others. As in the classical treatment analysis, using the counterfactual approach, rather than structural models, allows to formalize the concept of causality without making assumptions on the functional form of the parameters and, then, to have more exible identication procedures.
Moreover, in this kind of models, it is not necessary to know the entire set of covariates that could aect the design. Most of this literature handles identication by assuming that the treatment and the mediator are conditionally exogenous given observed characteristics, an assumption known as Sequential Ignorability. Nevertheless, this assumption sometimes is hardly satised, above all in economics, because of the presence of post-treatment confounders, that can confound the relations between variables. To handle this problem, recently some researchers have used quasi-experimental designs inside the mediation framework. These procedures are particularly attractive in this context also because the gold standard of causal analysis, i.e. randomization of the treatment, is not a sucient condition for the identication of causal mechanisms, a requirement that make the counterfactual approach more appropriate than structural models.
Causal mechanism is an important issue to better understand why a policy works and go beyond the limits of this approach is one of the aim of the current research elds. Mediation analysis seems to be one of the ttest frameworks to describe these relations and many researchers have developed new methods or have readapted the classical ones to go deep with the analysis. This is a promising methodology in economics because it permits to study causal mechanisms and to analyze the causal steps between treatment and outcomes and, then, it permits to give a causal interpretation to the changes that occur in between. In addition to that, these new methods that are emerging allow to do this kind of analysis without making too restrictive assumptions, a key issue in economic studies; mediation turns out to be a precious tool for policy makers. Thus, where the subscripted i is the unit observation.
It is easy to see that for each unit i only one of the two potential outcomes or mediator states is observed. Thus, also in mediation analysis I have to face the so called missing values problem (Holland, 1986). Because of the presence of two driver variables we must also take into account the potential presence of an interaction between them, making the analysis more challenging.
The goal of mediation analysis is to decompose the total treatment eect of D on Y into the indirect and the direct eect. The rst one reects one possible explanation for why treatment works, explicitly dening a particular mechanism behind the causal impact and it answers the to prescribe better policy alternatives. Finally, mediation analysis is the set of techniques by which a researcher assesses the relative magnitude of these direct and indirect eects.

Denition of parameters
Using the potential outcome notation, I can dene three quantities of interest, mostly used in mediation analysis, see for instance VanderWeele (2015): First, I dene the average indirect eect (ACME) 5 as: 5 Also known as Average Causal Mediation Eect. 6 It corresponds to the change in mean potential outcome when exogenously shifting the mediator to its potential values under treatment and non treatment state but keeping the treatment xed at D = d . Note that only one component of the right side equation is observable, whereas the other one is by denition unobservable (under treatment status d we never observe the value of M that it naturally would have under the opposite treatment state, i.e. M (1 − d)).
In the same way, I dene the average direct eect (ADE) as: It represents the average causal eect of the treatment on the outcome when the mediator is set to the potential value that would occur under treatment status d.
It can be easily shown that ATE can be rewritten as the sum of the natural direct and indirect eect dened on the opposite treatment status: (1) I obtaine these results simply adding and subtracting the counterfactual quantity E[Y (0, M (1))] after the second equality, and adding and subtracting E[Y (1, M (0))] after the third equality.
More in general, I can write this result as: Obviously, neither eect is identied without further assumptions: only one of Y (1, M (1)) and Y (0, M (0)) is observed for any unit, because both outcomes cannot be observed at the same time as stated in the fundamental problem of causal inference; and the counterfactual quantities Y (1, M (0)) and Y (0, M (1)) are never observed for any individual, because I never observe the potential value of M dened under the opposite treatment state, but I only know the factual M that follows a particular treatment state. To face this identication issue I need to dene a proper set of assumptions.

Controlled direct eect versus natural direct eect
An important advantage of the counterfactual notation is that it allows for the potential presence of heterogeneity. Such heterogeneity is important both in practical and theoretical, as it is often the motivation for the endogeneity problems that concerns economists (Imbens and Wooldridge, 2009). In structural models the eects are assumed to be constant, implying that the eect of various policies could be captured by a single parameter. In mediation this heterogeneity is even more important, because it implies not only that the direct eect of the treatment on the outcome could be dierent across i, but also that this eect can be dierent for dierent values of the mediator. With the counterfactual notation, then, the presence of non linearities and interactions is not a problem, because I don't need to specify the functional form and I don't need to model the relations between variables. But if the eect of the treatment is the same for the entire population, meaning that it doesn't change for dierent level of the madiator, then there is no interaction between treatment and mediator. In this particular case, implying that the controlled direct eect is equal to the natural direct one, CDE = N DE (Baron & Kenny, 1986). Formally:δ In this situation, the dierence between the total eect and the controlled direct eect gives the indirect eect, or more formally: ∆ −θ =δ. In such a context, I need to distinguish two situations: the identication of controlled eects and identication of natural eects. Following VanderWeele (2015) to estimate the CDE we need two assumptions: • A1. There must not be confounders between treatment and outcome relationship • A2. There must not be confounders between mediator and outcome relationship For the satisfaction of the rst assumption is sucient randomize the treatment, but even with randomized treatment the second assumption might not hold. If I refer to the previous example, to satisfy A1 I need to adjust for common causes of the exposure and the outcome -for example information about rms' size or rms' performance or any other factor (X ) that can confound this relation in the analysis; or I can give subsidies randomly, implying the same distribution of X for treated and non-treated rms. At the same time, to satisfy A2 I need to adjust for common causes of the mediator-outcome relation -for example information about administration's quality or other factors (W ) that can confound this relation. In this case, I need to think carefully to all possible post-treatment confounders and include them in the analysis, because the randomization of the treatment is not a sucient condition to control for W .
To identify natural direct and indirect eects I need two more assumptions. In particular: • A3. There must not be confounders between treatment and mediator relationship • A4. There must not be confounders aected by the treatment between mediator and outcome relationship Also in this case, to satisfy A3 is sucient randomize the treatment, but again for the fourth assumption this is not enough. In particular, A4 is a strong assumption, because it requires that there is nothing on the pathway from the treatment to the mediator that also aects the outcome. This assumption is more plausible if the mediator occurs shortly after the treatment

Identication under Sequential Ignorability
The key insight is that under randomized designs ATE is identied, but where: ∀ d ∈ {0, 1} and m, x in the support of M, X 6 The rst part of the sequential ignorability assumption, equation (4), is the classical conditional independence of the treatment, also known as no-omitted variable bias, conditional exogeneity or unconfoundedness, see for instance Imbens (2004). By equation (4), there are no unobserved confounders jointly aecting the treatment and the mediator and/or the outcome given X, meaning that I can consistently identify the eect of D on Y and D on M . In non-experimental designs, the validity of this assumption hinges on the richness of pre-treatment covariates, whereas in experimental designs, this assumption holds if the treatment is either randomized within strata dened by X or randomized unconditionally 7 . The second part of sequential ignorability assumption, equation (5), states that there are no unobserved confounders jointly aecting the mediator and the outcome once I condition on D and X. It means that there are no unobserved confounders between mediator and outcome, ruling out the presence of post-treatment confounders not captured by X. This is a strong assumption because randomizing both treatment and 6 Imai, Keele, Tingley and Yamamoto (2011) wrote this common support assumption as: 0 < P r(D i = d|X i = x) and 0 < P (M i = m|D i = d, X i = x) for d = 0, 1 and all x and m in the support of X and M. 7 In this case, the stronger version of the assumption mediator does not suce for this assumption to hold; in addition to this, it is more plausible if treatment and mediator are measured at a short distance, as I mentioned in the previous subsection. The last part of sequential ignorability, equation (6), is the common support assumption.
It states that the conditional probability to receive or not receive the treatment given M and X, recalling the propensity score literature, is larger than zero 8 . By Bayes' theorem, this version of common support implies that P r(M i = m|D i = d, X i = x) > 0 if M is discrete or that the conditional density of M given D and X is larger than 0 if M is continuous. The main implication of the equation (6) is that conditional on X, the mediator state must not be a deterministic where, assuming a continuous mediator, the rst equality follows from the law of iterated expectation; equation (4) is used to establish the second, the fourth and the last equalities; equation (5) is used to establish the third and the fth equalities, whereas the sixth equality follows from the fact that The interesting fact is that, in the presence of heterogeneity, the exogeneity assumption still holds if treatment and mediator are randomized, but the correlation between the error terms of M and Y is dierent from 0, implying biased estimations of the eects, that structural models are not able to capture.

Other interpretations of Sequential Ignorability
The main limit of this result is that the nonparametric identication works only if I don't condition on post-treatment confounders, implying that the set of pre-treatment observable confounders must be sucient to control for them, requirement not always credible. This issue has been addressed by Robins (2003). In his fully randomized causally interpreted structured tree graph model (FRCISTG), he used a dierent version of sequential ignorability: the rst part is the same of equation (4), whereas equation (5)  Another formalization of Sequential Ignorability is given by Pearl (2001). In particular, in his Theorem 1 and Theorem 2 for the identication of the average natural direct eect and in Theorem 4 for the identication of the average natural indirect eect, he used a dierent set of assumptions arriving anyway at the same expression of ADE and ACME given by Imai, Keele and Yamamoto (2010). It is important to note that sequential ignorability implies Pearl's assumptions, whereas the converse in not always true, but in practice, the dierence is only technical. Another advantage of sequential ignorability is that it is easier to interpret than Pearl's assumptions, in which I have an independence between two potential quantities 9 . This diculty in the interpretation is pointed out also by Pearl himself: "Assumptions of counterfactual independencies can be meaningfully substantiated only when cast in structural form" 10 . In contrast, in the second part of sequential ignorability, eq. (5), I have the observed value M i (d) independent of potential outcome, in other words M i is eectively randomly assigned given D i = d and X i = x, a concept that is easier to understand.
A further version of sequential ignorability is given by Petersen, Sinisi and Van der Laan (2006).
They split equation (4) into two parts: (5) is the same 11 . This is just a mathematical dierence, because in experimental designs, in which treatment is randomized, equation (4) is equivalent to them. To identify the natural direct eect they also assume that the potential value of mediator under non-treatment state is independent of the potential outcome. This additional assumption is necessary to identify the counterfactual quantity Y (d, M (0)).
Anyway, if treatment is randomized this last assumption is not necessary for the nonparametric identication given by Imai, Keele and Yamamoto (2010), making their sequential ignorability a preferable solution once again.

Quasi-experimental designs
As mentioned in the previous section, most recent research in mediation analysis considers more general identication approaches based on the potential outcome framework, commonly used in treatment evaluation (Rubin, 1974) to overcome the limits of structural models. The gold standard of this approach is the randomness of the treatment, a condition that is easily met in experiments. When treatment or mediator cannot be determined exogenously, the only way to estimate the parameters of interest and give them a causal interpretation is to use quasiexperimental designs, in which endogeneity can be controlled under particular assumptions.
Mediation analysis borrowed these methods from causal literature in order to identify and estimate causal mechanisms, but, nowadays, there are only few studies using these approaches. interaction between treatment and mediator. In particular, they found that having a younger brother lowers the potential sibling size of a rst-born girl to a degree that the positive indirect eect cancels out the negative direct eect on her education outcomes, resulting in a near zero 12 See M. Angelucci, V. Di Maro (2010): they provide a practical guide for the identication of treatment eect on eligibles and the indirect eect on ineligibles based on conditional independence, RD and IV assumptions 15 total eect. These results oer new evidence about gender bias in family settings that has not been detected in the previous literature. This was possible thanks to the decomposition of the total eect and thanks to the presence of heterogeneity captured by the interaction between sibling size and sibling gender. A second contribution using the potential outcome approach is given by Frölich and Huber (2017). They used a counterfactual framework and join a nonparametric identication using two dierent instruments respectively for treatment and mediator, allowing, then, for the endogeneity of them. In addition, both instruments and mediator can be discrete or continuous. The main advantage of their result is that they identify natural and controlled eects for all treatment compliers, overcoming the limit of identication only of the controlled direct eect for subpopulations dened on compliance in either endogenous variable (see Miquel, 2002

Dierence-in-dierences
The rst contribution that deals the identication of direct and indirect eect using a dierent framework than sequential ignorability and instrumental variables approach is given by E. Deuchert, M. Huber and M. Schelker (2018). They disentangle the total eect basing on a dierence-indierences (DID) approach within subpopulation or strata (Frangakis and Rubin, 2002) dened upon the reaction of a binary mediator to treatment, implying the presence of four subpopulations: always takers, never takers, compliers and deers (see for instance Angrist, Imbens and Rubin, 1996). In particular, they identify the direct eect on always takers and never takers, whose mediator doesn't react to treatment, i.e. treatment doesn't change the mediator's state, corresponding to the controlled direct eect, and then they identify the indirect eect and the direct eect on compliers, whose mediator reacts to treatment. The main assumptions that they use are the classical random treatment assignment; the second one is the monotonicity assumption that comes from the local average treatment eect (LATE) literature (see Imbens and Angrist, 1994; Angrist, Imbens and Rubin, 1996), ruling out the presence of deers. The last important set of assumptions is the common trend assumptions, which come from the DID literature, but now dened across strata. This fact permits to control for post-treatment confounders and it allows for dierences in the eects of unobservable confounders on specic potential outcomes across strata, as long as these dierences are time constant. As discussed in this paper, the identication of the eects of interest under principal strata in mediation has been criticized for not permitting a decomposition of direct and indirect eect on compliers in a DID framework and focussing on subgroups that may be less interesting than the entire population (VanderWeele 2008). But thanks to previous set of assumptions the authors identify the eects on compliers and they present an empirical application in which the eect on subgroups is relevant for political decision making 13 . A second critique is about confusion made in the literature between mediation and principal stratication causal eects (VanderWeele 2012). In particular, it is important to note A good surrogate may be often a mediator, but it need not be (Vander Weele, 2012). Principal stratication is a good framework to capture surrogacy, whereas natural eects (Pearl 2001, from many others) are the appropriate concept to study mediation. An intuitive example is given by Lindsay Page (2012), who provides evidence that Career Academies program (D) had a substantial eect on subsequent earnings (Y ) those for whome the program would change exposure to the world-of-work (M ) but not those for whome it would not change exposure to the world-of-work. In her analysis, she used a Bayesian approach to principal stratication and she used covariates to attempt to predict which principal stratum dierent individuals belong to. But, even if these assumptions hold, it could happen that there are still some unmeasured confounders of the mediator-outcome relationship, like motivation (U ), that make M a surrogate rather than a mediator, like in A possible solution is to study mediation with principal strata approach, but adding the sequential ignorability assumption to rule out the potential presence of post-treatment confounders. 13 The empirical application is about the Vietnam draft lottery in the US (1969-1972) on political preferences and personal attitudes. The mediator of interest is military service during the Vietnam War. 17

Synthetic control
To the best of my knowledge, the only contribution that uses synthetic control method (SCM) to study causal mechanisms is given by G. Mellace and A. Pasquini (2018). The main advantage of this method is that it estimates total causal eects, even in presence of only one treated unit and few control units (Abadie and Gardeazabal (2003)). They develop a generalization of SCM that allows disentangling the total eect into its direct and indirect component dening a Mediation Analysis Synthetic Control (MASC). The procedure that they use consists in re-weighting control

Conclusion
Mediation analysis is a promising methodology in economics, because it allows to study causal mechanisms of transmission of a policy without making unreliable and often restrictive assumptions: it permits to know, not only if a policy is working or not, but also why, going into a deeper level of analysis. In literature, there are not many economic applications and it could be due to technical diculties and to the absence of clear methodological developments. I reviewed the pillars of this methodology, presenting current results and advancements and providing that it's a validated method, that can be used to investigate the changes that occur between inputs and outputs, answering the opened questions of economic studies. Causal mediation analysis is the statistical tool to understand causal mechanisms and it may bring to an improvement in the power of quantitative analysis of economic phenomena.
This paper provides a survey of methodological developments in causal mediation analysis in economics, with a specic focus on quasi-experimental designs. I presented several methods, often used by economists and statisticians, that are clearly useful and fruitful for economic causal analysis. In the rst part, I dened direct and indirect eects, both formally and mathematically.
Next, I discussed the main assumptions needed for the identication of the counterfactual quantities of interest, with particular attention to the sequential ignorability assumption. In the fourth section I reviewed the main studies that use quasi-experimental designs, a new frontier in this eld, discussing in particular instrumental variables, dierence-in-dierences and synthetic control approaches. 18