Do jihadist terrorist attacks cause changes in institutional trust? A multi-site natural experiment

. Results from previous research suggest that terrorist attacks lead to relatively short-term increases in trust in institutions. The explanation for this increase is known as the ‘rally effect’, whereby individuals respond to crises and threats with more positive support for political leaders and institutions. Even though the number of related natural experiments with survey data is increasing, these studies merely represent case studies of single incidents with limited external validity. To advance quasi-experimental research on the effects of terrorist attacks on institutional trust, we propose a new methodological approach by assessing all jihadist terrorist attacks resulting in at least one civilian death in a European country that take place during the ﬁeldwork of the European Social Survey and combining the results of eight unique natural experiments in ﬁve different countries using meta-analytic and meta-regression techniques. The results of this ‘multi-site natural experiment’ indicate that support for the rally-hypothesis is mixed at best. While some attacks appear to signiﬁcantly increase at least some measures of institutional trust (e.g., The Netherlands 2004, France 2015, Israel 2012), others seem to have no effect at all (e.g., Germany 2015, France 2018), or even substantially decrease trust in domestic political institutions (Russia 2012). Summary effects from multilevel meta-analyses are non-signiﬁcant for any institutional trust outcome. These results are robust to a large number of robustness tests and alternative speciﬁcations. In comparison with previous research, it appears that a lot of the European evidence for the rally-hypothesis was based on ‘outlier’ case studies like the Charlie Hebdo attack in France, 2015. Accordingly, our results cast doubt on the unrestricted generalisability of rally effects after terrorist attacks to different geographic

A growing number of studies has taken advantage of the coincidental overlap between the terrorist attack and survey fieldwork to estimate causal effects. This analytical approach, known as unexpected event during survey design (UESD), allows researchers to use the timing of the interview to assign respondents to either treatment (i.e., interviewed after the attack) or control (i.e., interviewed before the attack) group (Muñoz et al., 2020). The unexpected nature of the event means that assignment to treatment and control is exogenous to respondent characteristics, conditional on excludability and ignorability assumptions.
While the number of natural experiments with survey data (i.e., the UESD identification strategy) is increasing, most of these studies merely represent case studies of single incidents with limited external validity. Publication bias, like the 'file drawer' problem, might lead researchers to publish only case studies with significant results, making the rally-around-the-flag hypothesis vulnerable to a type 1 error. At the same time, pooled analyses of multiple events and surveys may mask heterogeneity in effects across contexts (Giani, 2021;Peri et al., 2021), increasing the likelihood for a type 2 error.
To advance quasi-experimental research on the effects of terrorist attacks on institutional trust, we assess all jihadist terrorist attacks resulting in at least one civilian death in a country included in the European Social Survey (see Giani, 2021;Giani & Merlino, 2021). Instead of pooling data prior to analysis, we systematically assess each case study separately, with attention to evaluating the assumptions relevant to making causal inferences (i.e., ignorability, excludability). To gain an overview of the distribution of effects, we use meta-analytical techniques to pool and summarise effect size estimates. On the meta-analytical level, we are able to assess the overall size and heterogeneity of effects, and what drives variation in effect sizes within and across cases.
A meta-analysis of self-conducted natural experiments has some advantages over traditional meta-analyses. Traditional meta-analyses are limited by the quality of previously published studies, but our multisite natural experiment approach allows us to apply the same high methodological standards to each individual case study, improving the validity and comparability of the conclusions. Additionally, traditional meta-analyses may face issues with measurement variance in different outcomes. Godefroidt found in her meta-analysis on the effect of terrorist attacks on political attitudes that '…there is considerable variation in average correlations depending on the exact operationalisation of the outcome variable'. (Godefroidt, 2022, p. 32). Our approach, using only the ESS and its standardised measurement instruments, eliminates these problems. Finally, we are not subjected to only using published research, since we design and conduct all case studies ourselves, alleviating issues of potential publication bias. In this way, our research design can add to the reliability of findings regarding the important societal, institutional and policy implications of violent extremism.

Rally-round-the-flag effects
Research generally shows that jihadist terrorist attacks lead to relatively short-term increases in trust in institutions (Dinesen & Jaeger, 2013;Satherley et al., 2021). The explanation for this increase is known as the 'rally effect', whereby individuals respond to crises and threats to the in-group with more positive support for political leaders and institutions, reinforcing the political status quo (Van Hauwaert & Huber, 2020). Terrorist attacks, particularly those committed by 'outgroup' actors such as Islamic terror groups, generate negative emotional reactions and feelings of uncertainty among the public (Godefroidt, 2022). These cognitive and emotional responses motivate collective coping, increase in-group solidarity and strengthen attachments to national institutions and leaders (Godefroidt, 2022;Perrin & Smolek, 2009;Porat et al., 2019;Schraff, 2020). Rally effects are typically understood as emotional responses to threats as opposed to post hoc evaluations of institutional performance (Schraff, 2020). Importantly, the results from studies examining the effects of terrorist attacks on institutional trust are mixed. A recent meta-analysis showed that rally effects were stronger for Islamist terror attacks compared to other ideologies, weaker for general population studies compared to other convenience samples, and stronger for studies conducted in the United States compared to elsewhere (Godefroidt, 2022). In fact, Godefroidt found that most positive 'rally' effects were driven by studies in the United States, and specifically on the aftermath of the September 11th terrorist attacks. Research on the effect of terrorist attacks on support for domestic institutions in Europe is less clear. While some studies find positive effects on political trust following terrorist attacks (Dinesen & Jaeger, 2013;Van Hauwaert & Huber, 2020;Wollebaek et al., 2012), others find only short-lived effects (Arvanitidis et al., 2016) or null effects (Holman et al., 2022). One study investigating multiple attacks using a pooled analysis found overall support for rally effects, but substantial differences in the size of effects when evaluating large-scale attacks separately (Peri et al., 2021). Specifically, Peri and colleagues found significant national rally effects following the Charlie Hebdo attacks in France and the murder of filmmaker Theo van Gogh in the Netherlands, but not following the Christmas Market attack in Germany. In light of these mixed effects, the current study examines multiple deadly jihadist attacks that have taken place in Europe since 2002.
An additional issue that hinders broader conclusions about rally effects is the variation in outcome measures. Studies often aim to capture some form of institutional or political trust, with measures ranging from aggregate scales including trust in parliament/government, politicians, police, and even the United Nations (Arvanitidis et al., 2016), and trust in local, state and federal governments (Perrin & Smolek, 2009), to single-item measures of trust in a government's ability to handle health crises (Van Hauwaert & Huber, 2020), satisfaction with the current government (Satherley et al., 2021), or support for the current president or prime minister (Holman et al., 2022;Lambert et al., 2011). As a result, it is unclear to what extent rally effects apply to all relevant domestic institutions, such as the police and legal system. One could argue that the rallyeffect implies that government trust is unconditional and does not discriminate between different branches of government or government institutions. Accordingly, we should expect overall increases in trust in domestic institutions. On the other hand, it is plausible to expect variations in the public image of law enforcement institutions after high-profile political events since the police are the most visible representation of government authority (Fleming & McLaughlin, 2012). For example, Fenn and Brunton-Smith (2021) examined a range of domestic and international terrorist attacks on support for the London Metropolitan police and found that only certain Islamic extremist attacks were associated with increases in support following the attacks. While elite messaging following attacks may work to consolidate national unity towards political leaders and institutions (Groeling & Baum, 2008), other institutions, such as the police, may come under increased scrutiny for what are perceived to be failures in preventing or handling the attacks (Thomassen et al., 2014). Conversely, media coverage may become more positive towards the police during periods of terrorist threats (Sela-Shayovitz, 2015). Trust could also be understood to be instrumental (Lee et al., 2022); citizens thus might turn to executive branches of the government because they have the capacity to offer protection against immediate threats. When looking at aggregated measures of trust, these opposing effects between institutions might even cancel each other out.
The current study therefore evaluates the effect of jihadist terrorist attacks on both aggregate and disaggregated measures of institutional trust. Since we are interested in the generalisability of the rally-hypothesis, our aim is to study potential heterogenous effects across domestic institutions.

Hypotheses
Based on the arguments outlined above, we test the following hypothesis regarding rally effects: H1: Following a jihadist terrorist attack (treatment), respondents will report higher trust in institutions compared to respondents participating prior to the attack (control).
The above hypothesis concerns the main treatment effect of jihadist attacks on aggregate trust in institutions. As discussed above, however, effects may differ by institution. Our interest is to explore whether potential general effects also translate to other domestic institutions. However, it is beyond the scope this paper to causally explain these differences. In addition, we thus assess the treatment effect on the different institutions separately: H2: Following a jihadist terrorist attack (treatment), respondents will report higher trust in police compared to respondents participating prior to the attack (control). H3: Following a jihadist terrorist attack (treatment), respondents will report higher trust in the legal system compared to respondents participating prior to the attack (control). H4: Following a jihadist terrorist attack (treatment), respondents will report higher trust in parliament compared to respondents participating prior to the attack (control). H5: Following a jihadist terrorist attack (treatment), respondents will report higher trust in politicians compared to respondents participating prior to the attack (control).
One issue highlighted in research on terrorist attacks and public opinion is the potentially 'short term' impact of such events, whereby effects may last for merely days (Satherley et al., 2021). A number of studies have put specific emphasis on analysing timeframes of effects when it comes to changes in political attitudes after a terrorist attack. Mancosu and Ferrín Pereira (2021), for example, find that negative attitudes against immigrants after the 2017 Manchester bombing lasted only for a very limited time, while Epifanio et al. (2023), only observe changes in public attitudes toward counter-terrorist policies in the following weeks after the 2005 London bombings. We therefore examine to what extent the size of the effect varies by the investigated period (i.e., the bandwidth after the event); whereby larger periods (in days) reflect a longer timeframe post-attack in which to detect an effect. Specifically, we evaluate the following hypothesis: H6: The size of the treatment effect is negatively related to the size of the bandwidth (in days), whereby effects are smaller when using larger bandwidths.
We pre-registered our study on OSF on 21 February 2022 [https://osf.io/zgq2u]. Some minor deviations were necessary due to miscoding one case study or methodological improvements concerning the analyses. We still report all pre-registered analyses and add the deviations in these instances. A list of these minor deviations can be found in Section A5 in the Appendix in the Supporting Information.

Methods and data
The study combines observational data with natural experiments (i.e., jihadist terrorist attacks) to assign respondents to treatment (post-attack) and control (pre-attack) groups. This design resembles a quasi-experiment making it possible to analyse shifts in trust after unexpected events occurring during the fieldwork period of survey programs. We will examine all jihadist terrorist attacks resulting in at least one civilian (non-perpetrator) death in a European country that took place during the fieldwork of the European Social Survey (ESS) with respect to their potential impact on institutional trust. The ESS is a cross-national survey that reviews attitudes, beliefs and behaviour patterns in European countries every two years on the basis of probability samples gathered through face-toface interviews. We use ESS rounds 1, 2, 4, 5, 6, 7, 8 and 9 for our analyses. More information on the ESS can be found here: https://www.europeansocialsurvey.org/ The data on jihadist terrorist attacks in Europe are drawn from the Global Terrorism Database (National Consortium for the Study of Terrorism and Responses to Terrorism, 2021), which provides information on, among other things, the date and location of the event, the perpetrators, the type of event (e.g., bombing, shooting) and the number of deaths associated with the attack.
Our selection of jihadist terrorist attacks depended on three conditions: first, the attack had to be clearly attributed to jihadist perpetrators, second, the attack resulted in one or more civilian (nonperpetrator) deaths and third, the attack had to have taken place during the fieldwork of the ESS. The fieldwork of the ESS typically takes place between September and March, meaning that attacks that fit the other three criteria that occur outside this period are generally not included. These selection criteria mirror a recent paper examining the effects of jihadist attacks on perceptions of safety and prejudice using ESS case studies (Giani, 2021). We therefore cross-checked our list of case studies with Giani's to ensure that all potential case studies are included in our sample. Our selection corresponds to Giani's with three important exceptions: (1) since we do not investigate perpetrator-only deaths, we do not include the (failed) terror attack in Stockholm, 2010. (2) We do include the Strasbourg, Christmas market attack in 2018, which was not included by Giani (possibly because the data had not been released at the time when the study was conducted). (3) Finally, we do not include an attack that happened on October 7, in Israel since there was no non-perpetrator death and the attack thus does not fit our inclusion criteria. 1

Cases under study
As outlined in the pre-registration plan, we conducted initial checks to compare the pre-and postevent sample sizes and thus the feasibility of the respective single case studies, see Table A1.1 in the Appendix in the Supporting Information.

Dependent variables
We selected variables for analyses that were included in the core ESS questionnaire. This allowed us to use consistent instruments across attacks. We analyse both aggregated and disaggregated measures of trust in institutions as dependent variables. The aggregated variable trust in institutions consists of four items including trust in police, the legal system, parliament and politicians. Each item is measured on a scale from 0 to 10, whereby 0 means that the respondent has no trust at all in the given institution, and 10 means the respondent has complete trust. For each case study, we constructed a mean scale, which all had acceptable reliability values (Cronbach's alpha > 0.70 in all cases). We ran these analyses prior to testing any hypothesis to ensure the feasibility of our analysis plan (see Section A1.3 and Table A2 in the Appendix in the Supporting Information).

Covariates
We included a range of socio-demographic variables that were relevant for assessing the ignorability assumption, and to include as control(s) if necessary. These include age (in years), gender (1 = female, 0 = male), household net income, education years and ethnic minority (1 = respondent reports to belong to ethnic minority group in country, 0 = respondent reports not to belong to ethnic minority group in country). We used the natural logarithm of the income variable to account for the skewed distribution. The decision to use these specific variables was based on recommendations for this particular design (Muñoz et al., 2020).

Analytical approach
As outlined in our pre-registration plan, we followed best practice in UESD research (Muñoz et al., 2020) by running ordinary least squares (OLS) regressions with two modelling strategies: (1) We began with a baseline model in each country that regressed the combined mean scales of trust in different institutions as well as trust in each institution respectively on a binary treatment indicator that separates people interviewed before the event from those interviewed after the event. People interviewed on the day of the event will generally be considered part of the control group, but we have run robustness checks to see whether deleting these cases might influence results (see Section A.3.3, Figures A24 and A25 in the Appendix in the Supporting Information). (2) We then added the host of exogenous covariates (age, gender, household income (logged), education years, ethnic minority indicator) to account for potential pre-and post-event imbalances. We first use the complete bandwidth for each of the main models with and without covariates (i.e., all respondents interviewed before and after the attack). To study in how far the effects are sensitive to the exact time frame, we conducted additional analyses that varied the after-period from 14, 21, 28, to 35 days.
Since we utilised four institutional trust variables relevant for our analyses (legal institutions, police, politicians, country's parliament), the analytical approach in this study will result in multiple effects for each case study and outcome. Specifically, for each case study (n = 8) and outcome (n = 5) a number of models will be estimated: (1) effect of treatment without covariates at the complete bandwidth, (2) effect of treatment with covariates at the complete bandwidth, (3) effect of treatment with and without covariates for four shorter time-frame-periods (14, 21, 28, 35 days). This means that for each outcome and case study, we will estimate a maximum of 10 models resulting in 10 effects (10 × (4 +1 outcomes) = 50 total per case study). The maximum number of additional bandwidth analyses depends on the length of the fieldwork following each attack. 2

Assumptions: Ignorability and excludability
The validity of conclusions drawn from our design hinges on two important causal inference assumptions. (1) Excludability implies that the timing of the interview affects the outcome through no other channel than the event itself, that is, the exclusion restriction in instrumental variable analysis (Labrecque & Swanson, 2018;Murray, 2006;Stock & Watson, 2007). This assumption can be violated when collateral or simultaneous events occurred, when unrelated time trends are present or the timing of the event itself is endogenous. We followed the literature (Muñoz et al., 2020) and applied the following robustness checks: Examination of pre-existing time trends as well as falsification tests on other units (placebo specifications for the previous ESS round) and on other outcomes (interpersonal trust) were assessed in each case study. (2) The ignorability assumption considers whether the chance of being assigned to either the control or the treatment group is as-good-as-random (Bor et al., 2014;Gangl, 2010;Legewie, 2013;Muñoz et al., 2020). Possible violations include imbalances on observables, reachability, attrition, noncompliance, heterogeneous effects. We conducted balance tests, chose multiple bandwidth selections, adjusted for the covariates listed above, and analysed non-response patterns and placebo treatments in each respective case study.

Meta-analysis and meta-regression
We use meta-analytical techniques to examine the distribution of effects over all cases. In these models, effect sizes are nested within case studies. We thus use a multilevel meta-analytical approach to estimate summary effects for each outcome. Multilevel meta-analyses are similar to multilevel models in that it is possible to model sampling variance of each effect size (level 1), variance between effect sizes within each case study (level 2) and variance between case studies (level 3) (Assink & Wibbelink, 2016). The dependent variable in the multilevel metaanalysis is the unstandardised beta coefficient for the treatment effect, and the variance is calculated by squaring the standard error of the effect size (Harrer et al., 2021). As pre-registered, we use a restricted maximum likelihood estimator (REML) and the Knapp-Hartung adjustment for estimating summary effects (Viechtbauer, 2010). REML is generally recommended as an estimator of heterogeneity that provides a balance between 'unbiasedness and efficiency' (Langan et al., 2019;Viechtbauer, 2010, p. 291). The Knapp-Hartung adjustment, which accounts for the uncertainty in the estimate of the between-study heterogeneity, is usually recommended in random-effects models since it reduces the chance of type 1 error in small samples (Harrer et al., 2021) We also estimated the I 2 statistic for each level, which reflects the percentage of variation that can be attributed to heterogeneity at that level (Cheung, 2019;Higgins, 2003). One limitation of this approach is that it does not capture the correlation in sampling errors (i.e., dependence within case studies) due to the use of the same sample to calculate multiple effects (Harrer et al., 2021). This is problematic since within each case study, our pooled estimates derive from the same data and are therefore dependent by definition. We thus also calculate correlated and hierarchical effects models with robust variance estimation for each summary effect and meta-regression (not pre-registered). Specifically, we define a constant correlation coefficient used in the model (rho = 0.80, since there are only small differences between models and we assume the correlations to be large), which is then used to calculate the variance-covariance matrix for each case study (Harrer et al., 2021). We use the bias-reduced linearisation method ('CR2') recommended for small samples to calculate the confidence intervals and p-values (Tipton & Pustejovsky, 2015). Here we present both the pre-registered and robust meta-analytic results.
Previous studies that synthesised effects of different terrorist attacks employed standard regression techniques to draw overall conclusions (e.g., Giani, 2021;Peri et al., 2021;Turkoglu & Chadefaux, 2022). We believe meta-analyses are preferable for answering our specific research question for the following two reasons: (1) Each terrorist attack should be considered a unique case study, which might affect political attitudes differently depending on the specific context. As such, meta-analyses are the appropriate tool to synthesise effects from different (case) studies. Prior research has indicated that positive rally effects could be driven largely by single and potentially exceptionable cases (Godefroidt, 2022). A meta-analytical approach allows one to evaluate each case study separately using multiple specifications and then pool analyses while accounting for dependence between and within each case study. While other researchers (e.g., Nussio, Böhmelt, & Bove, 2021), also include an analysis by terrorist attack, to avoid the problem of 'outlier' cases or opposing effects cancelling each other out, conceptually framing each attack as unique natural experiment, makes it reasonable that pooled results should always be compared to disaggregated findings since terror attacks are idiosyncratic by nature. (2) On a similar note, if each terrorist attack is considered to be a unique case study, it is clear that causal inference assumptions need to be tested for each respective case study. This means that the placebo and assumptions checks recommended by Muñoz et al. (2020) have to be tested for each individual case study -as each case study constitutes a natural experiment in its own right. Finally, using a meta-analysis to synthesise results conveniently allows researchers to calculate an effect size that summarises a large number of assumption checks (e.g., different bandwidth choices, different outcomes, baseline models and models with controls).

A priori assumption checks
To ensure the feasibility of each individual case study, we analysed non-response patterns before testing our hypotheses. Unlike pre-post treatment imbalances for variables included in the ESS, certain non-response patterns can be a source of considerable bias that cannot be dealt with by post hoc statistical methods like covariate adjustment or matching. If, on average, a subgroup of respondents is more likely to be interviewed early (or late) in the fieldwork period, regressing the outcome(s) on the binary treatment indicator might falsely attribute this correlation to a potentially non-existing treatment effect. The lower panel of Figure 1 shows that most of our cases do not have clear non-response patterns, as the daily number of interviews and their distribution over time appear to be comparable before and after the respective terror attack. Because this is less true for the Chechnya attack in 2008, we ran our analysis with and without this case. 3 The upper panel in Figure 1 shows the results of an imbalance analysis (Ferrín et al., 2020;Legewie, 2013;Mancosu & Ferrín Pereira, 2021;Rubin, 2001), which assesses whether the respondents in the pre-and post-intervention groups are sufficiently comparable regarding our included exogenous covariates. The x-axis provides the standardised difference in means and the y-axis provides the variance ratio of the covariates age, education years, gender and household net income. Different shapes correspond to the respective covariate while the colours indicate the affiliation to the respective case study. The box in the middle of the image is the 'Rubin's threshold' illustrating the acceptable limit of imbalance. The absolute standardised difference in means should not exceed 0.25 and the variance ratio should be between 0.5 and 2. Case studies 1 (age), 3 (household net income), 4 (household net income) and 5 (Ethnic Minority) show problematic values. These could be due to reachability issues, reduced sample size (e.g., Case study 1), or random processes. Controlling for these covariates thus increases confidence in the assumption that the design can be considered (quasi-) experimental and that the assignment to pre-or postintervention groups is essentially exogenous. Figure 2 gives an overview of the results for each individual case study for every outcome for a baseline specification using the full possible bandwidth and no covariates. The plot shows considerable variation within and between the case studies. For example, while the attack in Tel Aviv 2003 seemed to have had little significant impact on any outcome, the murder of Theo Van Gogh, in Amsterdam in 2004, apparently increased trust in the parliament but in no other state institution. Even within countries, results are barely comparable: The Chechnya attack in Russia in 2008 shows no association with any outcome variable, while respondents being interviewed after the bombing in Moscow in 2011 report significantly less trust in the police and the legal system. Still, the results are based on a naïve before/after comparison and might be biased by covariate imbalances as presented in Figure 1 (violation of ignorability), as well as collateral or simultaneous events due to the prolonged bandwidth (violation of excludability).

Meta-analytic and meta-regression results
To arrive at a better understanding of the effect and to test H1-H5, we turn to the results of the meta-analysis. While it is possible to select one of the estimates to calculate average effects across outcomes and case studies, this ignores potential model-level variation in effects within each case study (Cheung, 2019). Accordingly, we use meta-analytical techniques to estimate the average treatment effect across models for each outcome.
As can be seen in Figure 3, none of the estimates are significantly different from zero for the pre-defined inference criterion of p <0.05. The estimates for trust in politicians, the parliament, as well as the additive index are positive, while the estimates for trust in the police and the legal system are null or negative. The only coefficient that comes close to supporting any of our hypotheses is the effect on trust in the parliament in the robust model with an unstandardised beta of 0.142 (p = 0.099). We thus find no clear support for H1-H5. We also computed I 2 statistics to quantify the degree of heterogeneity between estimates (level 1), between effect sizes within each case study (level 2) and between case studies (level 3). Since these statistics were not extractable from the robust estimation, we use the I 2 statistics provided by the basic (pre-registered) analyses. For each outcome, we generally observe the same pattern: Between-study heterogeneity (level 3) makes up the largest share of the total variation (between 68 per cent and 82 per cent), whereas variation on level 1 and level 2 is basically zero.
To address H6, we examined the size of the after-period in days as a moderator in a metaregression model, where we use the treatment effect size as the dependent variable to model the association between bandwidth size (14,21,28,35) and the size of the effect. Again, due to the nested nature of the effects, we use a three-level mixed effects model to estimate the variation in effect size by bandwidth size (Harrer et al., 2021), as well as the CHE with a robust variance estimator model. Since the 'complete' bandwidth can vary according to fieldwork dates, we focused here on assessing the first four bandwidth categories (14,21,28,35) where possible.
Since effects can be positive or negative, we used the absolute value of the effect size as the dependent variable. We treat bandwidth as a continuous variable to estimate whether the absolute size of the effect is a function of time since the incident. Figure 4 summarises the results from these meta-regressions, which suggest that the effect size is unrelated to chosen bandwidth. Thus, we do not find support for H6 either.    Table A4 in the Appendix in the Supporting Information gives an overview of the results of all assumption tests for each individual case study. A value of '1' indicates that this specific test has been passed by the respective case study while '0' indicates a failed check. Details on the specific tests and their results are included in Section A2 in the Appendix in the Supporting Information. While no case study passes every single assumption check, most case studies perform well which can be seen from the additive score on the left column of the table. To assess whether the case studies which are less robust regarding the causal inference assumptions might bias the results, we excluded the two cases studies that score ≤7. This did not change any summary effects (see Section A2.8, Figures A19 and A20 in the Appendix in the Supporting Information).

Robustness to design choices 4
First, as has been mentioned before, Case Study 3 (Russia 2008) shows very clear differences in response patterns before and after the incident. This threatens the ignorability assumption since it might imply that fieldwork attrition endangers the validity of the design. We excluded this case, which did not change the results in any significant way (see Section A3.1, Figures A21 and A22 in the Appendix in the Supporting Information). Second, we included another terrorist strike that had happened in Israel on 7 October 2015, which did overlap with the fieldwork period of the ESS. We had listed this attack in our pre-registration but realised that it did not fit our inclusion criteria since there occurred no non-perpetrator death (more information in Section A1.2 in the Appendix in the Supporting Information, Case Study X, Israel 7 October 2015: Hamas-member wounds a civilian and a soldier). We reran the analyses including this case study since excluding it is a deviation from our pre-registration. Doing so does not alter our results as can be seen in Section A3.2 in Figure  A23 in the Appendix in the Supporting Information. 5 Finally, we deleted all respondents from the analyses who were interviewed on the day of the event. This seems plausible since we cannot be sure whether those respondents have been exposed to the event or not. This does not change our results (see Section A3.3, Figures A25 and A26 in the Appendix in the Supporting Information).

Non-pre-registered exploratory analyses
We also conducted some exploratory analyses that we did not pre-register. In these analyses, we primarily investigate whether the results vary by geographical and socio-political context. Following Turkoglu and Chadefaux (2022), we assume that countries in civil conflict, defined by Gleditsch et al (2002), might generally experience terrorist attacks more frequently. Accordingly, an attack in Russia or Israel could be perceived differently from an attack in an EU country. This threatens the Stable Unit Treatment Value Assumption (SUTVA) because respondents might not have been exposed to the treatment similarly, or in other words, 'treatment' might not mean the same for respondents in countries with and without civil conflict. We explored three different questions to test whether the presence of civil conflict influences our conclusions drawn from the results: (1) Do the results change when we exclude the Russian case studies? (2) Do the results change when we exclude the Israeli case studies? And finally, (3) do the results change if we exclude both the Russian and Israeli case studies? As can be seen in Section A4 in the Appendix in the Supporting Information, the answer to all three questions is 'No'. If anything, excluding the Russian case studies leads to an unstandardised beta coefficient of 0.362 for the trust in parliament outcome with p = 0.069, which, however, does not meet our pre-defined inference criterion.
Previous research indicates that trust is dependent on political attitudes, and that citizens who vote for governmental parties are more positive towards the political system in general compared to supporters of other parties. Accordingly, it is sensible to account for this effect in regression models. We have created a dummy variable based on the respective 'Party voted for in last national election' variable of each election. The resulting variable takes value 1 if a respondent voted for a party that was in government at the time of the interview and 0 otherwise. For the 29th Israeli government (2001)(2002)(2003), we were not able to match all government parties to the ESS list, but we included all larger ones (especially Likud and Shas). For France, we used the presidential elections. Results are shown in Section A4.4, Figures A31 and A32 in the Appendix in the Supporting Information. This specification change does not alter our conclusions.
As a final robustness check, we employed matching methods to reduce sample imbalances. We apply both propensity score matching as well as coarsened exact matching. Details can be found in Section A4.5, Figures A33 and A34 in the Appendix in the Supporting Information. Notably, our results remain largely unchanged.

Discussion and conclusion
In this study, we applied multilevel meta-analytic techniques on a set of eight unique and selfconducted natural experiments to answer the question whether jihadist terrorist attacks lead to increases in political trust. Summarizing our results from the descriptive comparison of these natural experiments or 'case studies' reveals that only the Charlie Hebdo attack in November 2015 in Paris demonstrated a clear positive effect on most of the chosen outcomes, with the only exception being trust in legal institutions. Conversely, only the Moscow airport bombing in January 2011 seemed to have had a clear negative effect on trust in institutions, although the effect on trust in politicians and the parliament is less clear here. While the Charlie Hebdo attack had been the focal event for several studies (Castanho Silva, 2018;Muñoz et al., 2020;Savelkoul et al., 2022), to the best of our knowledge, the attack in Russia 2011 has been largely ignored by previous UESD research. This is problematic, since this particular case illustrates that effects can go in the opposite direction from theoretical expectations depending on the context (for another example, see Falcó-Gimeno et al. (2022).
All other individual case studies provide rather mixed results between outcomes. For example, the murder of Theo Van Gogh in 2004 has been the subject of a considerable number of previous quasi-experimental studies (Boomgaarden & de Vreese, 2007;Das et al., 2009;Finseraas et al., 2011;Gautier et al., 2009), including studies investigating potential 'rally effects' (Peri et al., 2021). Surprisingly, Peri et al. focus, among some other outcomes like attitudes toward immigrants, on 'trust in the parliament' while not considering most of the other institutional trust outcomes that could, theoretically, also be affected. Our results in Figure 2 show that 'trust in the parliament' is the only institutional trust outcome that consistently shows positive effects within this case study. The focus on one particular outcome can mask heterogeneity in effects on theoretically related outcomes, and lead to type 1 errors. This also raises the question on what exactly people are supposed to rally around. Outcomes used in previous studies vary widely, and it is not clear for which institutions we can expect consistent rally-effects, which might be subjected to more scrutiny, or which may not be affected at all by terrorist attacks. As our results show, differences in the effects between outcomes illustrate that the reality of findings is much less clear than previous research made it appear. 6 In short, there is very weak evidence for a generalisable 'rally effect' of Jihadist attacks. Effects on trust in the police and (partially) the legal system are negative on average, while all other effects are positive. Still, none of those estimates are significantly different from zero.
To test H6, we examined whether the effects were time dependent by assessing to what extent the absolute size of the effect varied by the length of the bandwidth (i.e., 14, 21, 28, 35 days). Results from meta-regressions did not provide evidence for this hypothesis either. The small and insignificant effects we find appear to be stable over time. Still, this finding could also be due to data limitations as we are not able to compare effects over a longer period since the ESS fieldwork is rarely longer than a few months. An alternative way to study long-term effects would be to analyse repeated cross-sections of the ESS in a difference-in-differences fashion, where the before/after difference of a country in which an attack occurred could be compared with countries where no attacks happened. However, causal inferences are problematic since the exclusion restriction is unlikely to hold as unrelated or collateral events are bound to easily bias the findings.
On the one hand, our results are conflicting with a large number of previous studies focusing on single incidents (Geys & Qari, 2017;Strebel & Steenbergen, 2017;Van Hauwaert & Huber, 2020), or considering a number of terrorist strikes (Chen, 2020;Peri et al., 2021) that all found evidence for rally effects. Our results suggest that while surges in political trust are evident in some instances, these are rather the exception than the rule. In this respect, our conclusions are more in line with a recent meta-analysis that found that rally-effects appear to be largely driven by single outlier studies (Godefroidt, 2022). While the increased support for President George Bush in the wake of the 9/11 attacks drove a large extent of the positive effects in the studies that Godefroidt included in her meta-analysis, we have a similarly outstanding finding for Europe regarding the Charlie Hebdo attack in 2015. Researchers should consider this finding a cautionary tale; as sociopolitical mechanisms like the rally-around-the-flag hypothesis may not be generalisable across social, political, institutional and cultural contexts. This might partially be because the treatments -in this case 'Jihadist terror attacks' -are not comparable across these contexts either. This is why we used strict characteristics of what constitutes a 'deadly' Jihadist terror attack to achieve a comparable operationalisation. Even if we broaden or restrict this operationalisation, as we have done in the included robustness checks, we still find no evidence for generalisable dynamics across any of the measured dimensions of political or institutional trust.
Relatedly, our findings have relevance for a policy-oriented audience. Following the diversionary theory of war, unpopular leaders can use crises such as the threat of terrorists from abroad to divert the public's attention from dissatisfaction with their rule, strengthen their political prowess through the rally effect (Tir, 2010), and enforce policies which are undesirable under other circumstances (Eichenberg & Stoll, 2003). Given that our study reveals very limited evidence that people 'rally' around political leaders and administrations after the Jihadist terrorist attacks in our sample, enforcing questionable policies like increasing security budgets might meet public backlash in these circumstances after all. Just like researchers, policy makers should not take rallyeffects for granted. This applies similarly to law enforcement institutions, which might even be put under closer societal scrutiny as the negative net effect of the meta-analyses and the case study in Moscow imply. The public, on the other hand, should be wary that security crises can be seized and exploited politically.
Of course, absence of evidence is not evidence of absence, and our null effects might be due to idiosyncrasies and limitations of our design. Additionally, comparing different terrorist attacks in different countries to identify overall effects rests on strong assumptions since terrorist attacks are unique and difficult to compare between cultural, and socio-political contexts. For instance, two attacks each happened in Israel, Russia and France, and one in the Netherlands and in Germany (see Table A1 in the Appendix in the Supporting Information for an overview and Section A1.2 in the Appendix in the Supporting Information for a detailed background on each attack). There are important differences between these cases. First, the number of fatalities ranges from one to 37. Second, while the attacks on the editorial team of Charlie Hebdo lead to the global Je suis Charlie-movement, there are almost no media reports on the attack in Chechnya in 2008 (see pages 5 f. in the Supporting Information). Third, national exposure to terrorism differs between the investigated countries. While Israel has suffered countless terror attacks, especially since the Second Intifada starting in late 2000, the European countries witnessed major Islamist terrorist attacks mostly from 2015 on. Accordingly, prior studies report that terrorism has stronger effects on political trust in Israel (Godefroidt, 2022). Fourth, the political climate differs across cases. While political elites in France tend to use more anti-immigration rhetoric, elites in Germany and the Netherlands tend to be more balanced (Czymara, 2020). Such political discourses are important because they can contribute to ethnic boundary making (Wimmer, 2008) and thereby emphasise the distinction between the national in-and out-groups. Fifth, the motivations differed between cases. While all attacks were conducted by religious fundamentalists, the attacks in Israel and Russia also have a geopolitical background, while the Charlie Hebdo attacks and the Van Gogh murder particularly targeted vocal critics of Islam. Finally, the attack in Chechnya differs from the others by the fact that the victims were not from the Russian ethnic majority population but members of the Chechen community.
However, the strength of our design lies particularly in considering each case study independently, while summarizing the overall effect by accounting for the correlated errors. This design helps to both avoid type 1 and type 2 errors. Using only the ESS and conducting each case study ourselves also increases comparability and control of quality, something which is often not possible in regular meta-analyses when authors synthesise results from other research teams. While none of our summary estimates provides support for our hypotheses, the specific case studies enable us to consider each case separately. As we have explained above, our analyses provide considerable heterogeneity of overall effects between case studies and of outcomes within case studies. Additional limitations arise because our exploitation of these natural experiments depends on terrorist attacks overlapping with the ESS fieldwork period. Accordingly, we were unable to use all relevant domestic jihadist terror attacks, simply because they did not overlap with the ESS. Examples include the November 2015 Paris attacks, the 2016 Nice truck attack or the Manchester Arena bombing in 2017. Future studies might be able to verify or reject our conclusions when they use more than the ESS as a source of measurement. Suitable data sources include the World Values Survey, the European Value Survey, or the different Barometer studies (Eurobarometer, Arab Barometer, Latinobarometer, Afrobarometer). While this increases statistical power to detect potential effects by increasing the number of case studies, it comes at the cost of comparability of case studies because different survey programs are less comparable to one another.
We urge researchers not to put too much confidence in individual natural experiments but rather to include their findings in meta-analyses. While natural experiments are praised for 'maximizing' external validity, it should be clear that one natural experiment is still a case study of one particular moment in time.
Finally, we hope that our approach to answer the question whether Jihadist terror attacks shape public opinion as reasoned by the 'rally-effects', will help researchers answer other salient sociological and political science questions building from our proposed design of synthesizing results of self-conducted natural experiments. The clear advantage of this approach, apart from minimizing type 1 and type 2 errors, lies in the control and comparability of the included case studies. This meta-analytic design of a multi-site natural experiment can be extended to other outcomes like conservatism, outgroup attitudes, feelings of safety and contexts, such as police misconduct and right-wing extremism.

5.
We also pre-registered to include another possible case study involving a 'terrorist' attack in Kongsberg, Norway in October 2021. We have not done so since (1) the data had not been published before submission, (2) the case is not included in the GTD and (3) there has been doubt on whether the attack was actually a Jihadist terror strike or a since the perpetrator has a history of mental illness and no clear connection to any terrorist organisation https://tinyurl.com/4kbf7td8 6. One way to disentangle the divergent effects on different outcomes could be to operationalise the announcements of investigation committees or comparable political measures on specific institutions. For example, the Moscow airport bombing was linked to failures of the Russian security services which apparently suffered from considerable corruption problems which might have facilitated the attack. While this might be one explanation for the negative effects in this case study, investigating such questions from a comparative perspective is difficult since information on such a measure is not available in a standardised form, unlike other specifics of terrorist attacks in the GTD.