Causal assessment in evidence synthesis: A methodological review of reviews

Abstract In fields (such as population health) where randomised trials are often lacking, systematic reviews (SRs) can harness diversity in study design, settings and populations to assess the evidence for a putative causal relationship. SRs may incorporate causal assessment approaches (CAAs), sometimes called ‘causal reviews’, but there is currently no consensus on how these should be conducted. We conducted a methodological review of self‐identifying ‘causal reviews’ within the field of population health to establish: (1) which CAAs are used; (2) differences in how CAAs are implemented; (3) how methods were modified to incorporate causal assessment in SRs. Three databases were searched and two independent reviewers selected reviews for inclusion. Data were extracted using a standardised form and summarised using tabulation and narratively. Fifty‐three reviews incorporated CAAs: 46/53 applied Bradford Hill (BH) viewpoints/criteria, with the remainder taking alternative approaches: Medical Research Council guidance on natural experiments (2/53, 3.8%); realist reviews (2/53, 3.8%); horizontal SRs (1/53, 1.9%); ‘sign test’ of causal mechanisms (1/53, 1.9%); and a causal cascade model (1/53, 1.9%). Though most SRs incorporated BH, there was variation in application and transparency. There was considerable overlap across the CAAs, with a trade‐off between breadth (BH viewpoints considered a greater range of causal characteristics) and depth (many alternative CAAs focused on one viewpoint). Improved transparency in the implementation of CAA in SRs in needed to ensure their validity and allow robust assessments of causality within evidence synthesis.


What is new
To our knowledge this is the first methodological review to establish current practice of CAAs in SRs. Bradford Hill viewpoints (sometimes called criteria) were the most commonly used, but how they were implemented and transparency in reporting implementation varied greatly. There was overlap across the approaches with some focusing on one or two viewpoints while others considering several elements of causal assessment.
Potential impact for RSM readers outside the authors' field For CAAs to be incorporated into SRs/RoRs across all fields, investigators must ensure transparency in choice of viewpoints and clarity around implementation, including justification or guidance used to inform operationalisation.This methodological review offers examples of how CAAs can be implemented to maintain the transparency, robustness, and rigorous approach of SRs.

| INTRODUCTION
Causal assessment involves researchers and policy makers interrogating the evidence to understand if a cause-and-effect relationship exists between an exposure and an outcome. 1,2 By bringing together evidence surrounding a research question, evidence synthesis is arguably preferable to relying on an individual study for causal assessment. 3 This is particularly true in population health where evidence is mixed and potential causes are complex. 4 The utility of evidence synthesis, including systematic reviews (SRs), in causal inference depends both on review conduct (which should be done as rigorously and transparently as possible 5 ) as well as what evidence is synthesised. The types of studies included in SRs may affect the certainty of a causal relationship. This may be especially important where the available evidence is predominantly from non-randomised studies (NRSs) 4,6 where there is a high risk of bias due to confounding when compared to randomised controlled trials (RCTs), 7,8 as is common in SRs addressing population health questions. 4 Results from NRSs, even those with large sample sizes, 9 may be misleading if not interpreted in light of potential sources of bias 10 and may threaten the potential for SRs to evaluate causality.
The approach to evidence synthesis to evaluate a putative causal link between an exposure and outcome may differ from evaluating an association between an exposure and outcome. 5,8 To improve the assessment of causality, methods used in SRs may need to be adapted. 11 While there are not clearly defined and agreed means of adjudicating causality, including within SRs, 11 there are various guidelines and approaches that can be used to assess one or more aspects of causality. 4 Going forward, the guidelines and approaches used to assess causality will be referred to as causal assessment approaches (CAAs), with the Bradford Hill (BH) viewpoints or criteria particularly influential. They may be incorporated into the evidence synthesis-sometimes referred to as 'causal reviews'-to help establish if a causal relationship exists. 11 Some CAAs, such as the BH, qualitatively evaluate different characteristics of causal relationships. 3 BH viewpoints address several key characteristics of causal relationships: strength of association, temporality, dose response, consistency, specificity, plausibility, experiment, coherence, and analogy. Similarly, the Grading of Recommendations, Assessment, Development and Evaluation (GRADE) methodology provides a systematic approach to assessing certainty within reviews which indicates confidence that the effect estimated in evidence synthesis is close to the true effect (i.e., the causal effect). 12 While GRADE is not always thought of as a CAA, it has been argued that it incorporates many aspects of the BH viewpoints 13 such as incorporating risk of bias, indirectness and confounding. 14 Other CAAs may be explicitly based on the counterfactual definition of causality. The 'fundamental issue in causal inference' of missing, unobserved data means that investigators cannot determine the difference between the observed effect when the individual has been exposed to the potential cause under investigation and the unobserved counterfactual outcome had the individual not been exposed, all other things being equal. 15 Thus, application of the counterfactual definition asks investigators to consider if the unexposed group would have the same risk of the outcome as the exposed group had they also been exposed. 6 Direct acyclic graphs (DAGs) 16 and sufficient component cause (SCC) models (also known as causal pies) incorporate counterfactual principles in their systematic evaluation of, among other things, confounding and multifactorial causes. 17 Epidemiologists have argued triangulating across different CAAs may help improve evaluation of putative causal relationships. 4,18 This might be particularly valuable in population health, where randomised trials are typically not possible.
The aim of this methodological review is to understand how CAAs are incorporated into population health SRs and review of systematic reviews (RoRs). We will identify SRs/RoRs that explicitly incorporate CAAs and consider how they have implemented CAAs. We will seek to elucidate any differences in the conduct of SRs/RoRs for causal assessment and consider the implications for investigators interested in using SRs/RoRs to assess causality.

| Review aims and scope
In this paper we use the term 'causal SR/RoR' to refer to SRs/RoRs which have self-identified as assessing causality and have explicitly incorporated a CAA. Our focus on self-identifying SRs/RoRs that have explicitly incorporated a CAA is largely due to resource and time constraints. SRs/RoRs were included if they referred to causal assessment in the title or abstract and explicitly applied CAA in the main text. Therefore, we will likely not identify SRs/RoRs that use elements of CAA but do not explicitly refer to it in the title or abstract. However, as the overall aim is to gain a broad understanding of how CAAs are incorporated into population health SRs/RoRs and offer insight into the variation for researchers wanting to conduct a causal SR/RoR, we believe the SRs/RoRs identified within this aim will provide that.
For the purposes of this review, CAA refers to the plans and procedures applied by investigators and may include any guideline, framework, tool or method used by investigators to assess causality. 19 Some CAA examples include BH viewpoints, DAGs, GRADE or causal pies. CAAs may be informed implicitly and explicitly, and to varying degrees, by investigators' philosophical worldviews, study designs and research methodology. 19 The assumptions about a causal relationship may be viewed through a variety of frameworks including, but not limited to: deterministic (an exposure is expected to always produce the outcome and the outcome does not occur without the exposure); probabilistic (an exposure increases the likelihood of an outcome); or multifactorial (an exposure may be a component of a complex cause that is sufficient, but not necessary, to produce the outcome). 20,21 For the purposes of this methodological review, we are agnostic under which frameworks authors were operating.
A methodological review analyses study methods. 22 The aim of this methodological review is to identify and describe the various approaches to assessing causality in public/population health SRs/RoRs. We focus on population health, both because of its importance and the challenges in elucidating causal relationships due to the complex relationship structures and reliance on NRSs. 4 Our aim to consider the ways in which CAAs are incorporated into population health SRs and RoRs was addressed using three objectives: The eligibility criteria for this methodological review were developed according to a protocol for mixed methods: sample, phenomenon of interest, design, evaluation and research type (SPIDER). 23 We excluded 'research type' due to limited relevance to our research aims. Because of the variety in CAAs and because we are not limiting our search to specific interventions or outcomes, SPIDER was deemed more appropriate than a protocol based on population, intervention, comparison and outcome (PICO). 24 Explanations and justifications of how each protocol category and the corresponding inclusion and exclusion criteria are summarised in Table 1.
Full list of exclusion criteria 1. Reviews of clinical intervention or evaluation studies or other studies not related to population or public health. We limited SRs/RoRs to those that explicitly incorporated one or more causal assessment approach (CAA) because of resources and time constraints No explicit mention approach has been incorporated to support causal assessment 2. Reviews that do not self-identify as having conducted or considered causal assessment. 3. Reviews that do not self-identify as a SR or RoR.
We excluded reviews that hypothesised, but did not evaluate, possible causal mechanisms, links or pathways, [25][26][27][28][29] or reviews that included studies that aimed to assess, or stated that they had assessed, causality but did not implement any causal assessment (see Table 1).

| Search strategy
The goal of the search was to identify SRs and RoRs in population health that assess causality. We identified reviews in a systematic search of three electronic bibliographic databases conducted in February 2020: EMBASE, Medline, and CINAHL. Our search included keywords related to 'systematic review' and 'causality' in the title and abstract and, where possible, as subject headings. To limit the search to SRs, one of our key terms was the subject heading, 'systematic review'. We also included terms such as 'causal' or 'causation' or 'causal assessment' or 'causal evaluation' in the title or abstract. As we focused on recent practice in SRs and RoRs used for causal assessment, our search was limited to January 2000-February 4, 2020. The reviews were further limited to English language reviews and the population in our search were limited to human subjects. The research team finalised the search strategy in consultation with an information specialist (see Appendix A for full search strategy).

| Stage 3: Study selection
Following de-duplication using Covidence, titles and abstracts were exported to EndNote X9 © and screened in two stages: (1) title and abstract and (2) full-text. At both stages, reviews were independently reviewed by two investigators (MS and: HT, SVK, or AP). A third reviewer was consulted about disagreements at either stage.

| Data extraction
Data extraction was completed by MS. A second reviewer (HT, SVK or AP) checked a 10% sample of purposively selected reviews that spanned a range of different CAA and provided good coverage all the potential issues that might arise. As most of the outcomes were qualitative descriptions of methods rather than statistical estimates, we did not calculate specific interrater reliability measures. Rather, we aimed to explore interpretation of phenomena through discussion as is common in qualitative research, particularly focusing on non-BH CAA methods. 30 The data extraction form (see Appendix B) included both structured and free-text domains and was piloted before finalising. We extracted data on key study information such as type of review, study designs included in the review, and PICO features as well as which CAA was used (e.g., BH viewpoints), key features of causal approaches (e.g., identifying confounders, temporality, etc.) and criteria used to meet each CAA (e.g., specific study design).

| Data summary and synthesis
The data were tabulated to facilitate comparison across SRs/RoRs that used a particular CAA as well as comparison across reviews that incorporated different CAAs. The data for each CAA were then summarised narratively to describe the variations in how CAAs were implemented. We tabulated the following information which was considered to be quantifiable: the number of reviews that used each CAA; which BH viewpoints were used; if the viewpoints were defined; how authors determined if viewpoints were met (or in other words, did they identify and apply indicators); how overall support for viewpoints was determined; and how the viewpoints were applied (Table 3). We thematically collated free-text responses, such as the impact of causal approach on SR/RoR stages (Table 4), where possible. Both this 'quantifiable' information and other qualitative information were synthesised descriptively.

| RESULTS
3.1 | Included reviews Figure 1 shows the flowchart of the searches. 31 The search resulted in 1345 references. Out of 1339 de-duplicated screened references, 140 full texts were assessed and 53 reviews were included (five were RoRs, 32-36 all of which used BH viewpoints).

| CAAs
The review characteristics, including the exposure topic area, CAA(s) used, and critical appraisal tool(s) applied by the review are provided in Appendix C. Forty-six reviews ( 84 The complete list of CAAs identified (objective 1) and descriptions of how CAAs were implemented (objective 2) can be found in Table 2. We provide additional detail comparing implementation of BH viewpoints in Table 3. Because most other CAAs were only used by one or two reviews, we were only able to compare implementation for BH viewpoints. A comparison of how realist reviews and MRC guidance on natural experiments were implemented was described narratively in Sections 3.2.2 and 3.2.3.

| BH viewpoints
While the majority of reviews applied BH viewpoints to assess causality, there was considerable variation in how they were implemented. As described in Section 2.3.1, we extracted information to evaluate how implemention of BH viewpoints varied which we categorise into five key domains: (1) viewpoints used; (2) viewpoint definition; (3) viewpoint indicators (i.e., how was the viewpoint assessed as being 'met'); (4) assessment of overall support for viewpoints; and (5) if viewpoints were considered across the body of evidence or in another way (e.g., across a single study or relationship). An overview of each domain can be found in Table 3.

Records identified from databases: n = 1345
Duplicate records removed before screening: n = 6 Records screened (titles/abstract screening): n = 1330 Records excluded from title/abstract screening: n=1199 Full-text articles assessed for eligibility: n = 140 Reports excluded: 37 Not causal assessment 30 Not systematic review/review of review 8 conference abstract 6 pharmaceutical 2 Full text not found 2 not English language 2 pre-2000 Full-tests included in review: n = 53 48 systematic reviews 5 reviews of reviews T A B L E 2 Overview of description of causal assessment approaches (CAAs) and how they were incorporated into systematic reviews (SRs) and reviews of reviews (RoRs)

Number of reviews
Description of CAA

How CAA was incorporated into SRs
Bradford Hill (BH) viewpoints 46  BH viewpoints, also known as criteria, are a set of nine characteristics to consider when assessing a causal relationship. 2 The nine viewpoints are: strength of association, consistency, specificity, temporality, doseresponse, plausibility, coherence, experiment, and analogy The most commonly used CAA, there was considerable variation in which BH viewpoints were used and how they were operationalised. There was also variation in transparency and clarity about how the viewpoints were incorporated and used in causal assessment Medical research council (MRC) guidance on natural experiments 2 78,79 The MRC guidance on natural experiments posits that certain study designs and analytic methods are more suitable to assess causality than others, and suggests that results from different studies be compared. 85 The MRC guidance on natural experiments highlights study design, including carefully defining control groups to establish exchangeability with exposed individuals and testing underpinning methodological assumptions as important for establishing causality. It also draws attention to some methods (such as difference-in-differences, regression discontinuity designs, and instrumental variable analysis) which can address unmeasured, as well as measured, confounders Two reviews 78 Authors hypothesised reasons for heterogeneity found in RCTs evaluating breast cancer screening on mortality-including attendance rates, the accuracy of screening tests, and social class. The logic model in Figure 1 of the review illustrates the framework of causal relationship and includes the key cascade components (attendance rates and sensitivity) that may account for differences in two outcomes (advanced breast cancer and breast cancer mortality). The authors then considered the trial evidence across these different inter-related factors to consider whether heterogeneity in the evidence base could be explained by these factors. Based on the assumptions in the logic model and the included studies, the review estimated the relative risk of advanced-stage breast cancer and breast cancer mortality by three different attendance rates and sensitivity in trials (a total of nine scenarios). Overall, they found that attendance rate and sensitivity may explain statistical heterogeneity across trials Of the 14 reviews that designed criteria to reflect incorporating CAAs, 10 reviews (all of which utilised BH viewpoints except for one realist review) included studies that considered potential causal pathways [60][61][62]64,68,[73][74][75]80 or excluded studies that did not. 45 Another four reviews 43 32 None of the reviews that provided indicators for strength of association considered confounding adjustment, including residual or unmeasured confounding, when assessing whether strength of association was met. This is important as bias may fully explain a large association (and small associations may not entirely explained by bias). However, some of the reviews (9/46, 19.6%) 44,47,49,50,56,58,59,66,69 broadly considered the findings from individual studies or their findings when evidence was synthesised in the context of confounding and bias, which in some reviews was also referred to as 'alternative explanations'. While strength of association relied on quantitative indicators, some indicators for other viewpoints were less definitive. Five reviews 38,42,46,52,55 provided indicators for the plausibility viewpoint (out of 38 reviews that included plausibility in their assessment). Two reviews 52,55 determined that plausibility was met if at least one credible, hypothetical mechanism explained the association (e.g., empirical studies demonstrating a relationship), though neither clarified what was meant by 'credible'. Similarly, Hughes and colleagues determined that the relationship under study was plausible if there were positive animal or mechanistic data. 46 On the other hand, rather than focus on hypothetical explanations for an association, two other reviews 38,42 noted that an association between the exposure and outcome under study in human studies was sufficient evidence for plausibility. None of the SRs/RoRs explained why certain indicators were used, making it challenging to discern the underlying reasons for the variation of indicators used for a given viewpoint (e.g., range of indicators for strength of association).

| MRC guidance on natural experiments
Two SRs 78,79 used the MRC guidance on natural experiments 85 to conduct causal assessment. This CAA involves identifying observational studies that appropriately and comprehensively address bias, deeming them most suitable to assess causality. The guidance focusses predominantly on natural experiment study designs and other analytical approaches that compare outcomes pre-and post-intervention, partly to discern if the exposure preceded the outcome. The guidance favours analytical methods that address observable and measurable sources of bias from confounding (e.g., matching, regression adjustment, and propensity scores) and unmeasured or residual confounding (e.g., differences in differences, instrumental variables, and regression discontinuity).
Martin and colleagues 78 identified studies that consider the relationship between the built characteristics of an environment and obesity that applied any of the analytical methods described by the MRC guidance on natural experiments to address observable or unobservable confounders. 78 They found that the observed associations in studies using methods to address particular sources of bias (e.g., longitudinal studies which are more suitable to consider the temporal ordering of variables) were comparable with those that did not (such as cross-sectional studies, which cannot always establish temporality). The comparable results appear to increase the validity of observational studies in determining strength of association.
Molenberg and colleagues, 79 on the other hand, did not limit their search to studies that incorporated these analytical methods. Instead, they extracted evidence from the included studies that, based on the MRC guidance on natural experiments, used methods that may help elucidate the possible causal relationship between infrastructural intervention to promote cycling and cycling outcomes. Specifically, they noted which studies considered multiple comparison groups to test robustness of findings (e.g., infrastructural intervention on cycling for cyclists vs. non-cyclists) and the use of complementary research methodologies (e.g., trends from surveys). They aimed to also consider the effect of changes in the infrastructural intervention on a neutral outcome that is expected to be independent from the intervention (i.e., a falsification outcome), though did not identify any studies that used falsification outcomes. Thus, this application of the MRC guidance on natural experiments appears to reflect the principles of three BH viewpoints: temporality (focusing on study designs that ensure the exposure preceded the outcome), experiment (study designs focus on comparing pre-and post-intervention), and specificity (falsification outcomes).

| Realist reviews
A realist review is an evidence synthesis strategy used to investigate the context and mechanisms through which an exposure-outcome relationship operates. 86 Realist reviews aim to provide a more iterative approach to examining complex interventions than traditional SRs, which have been criticised for being too inflexible. In doing so, the included realist reviews appear to focus on the BH viewpoint of plausibility. Two SRs utilised the realist review approach to assess causality. 80,81 DeBono and colleagues evaluated the relationship between participation in the US food stamp programme and obesity 81 while Blair and colleagues 80 applied a realist review to understand the causal mechanism through which neighbourhood impact depression. Both SRs underscored the goal of realist reviews to explore and explain the causal mechanism of the relationship under study, which both SRs did in part by extracting the posited causal pathways from the included studies and then narratively assessing the evidence for different pathways. Neither SR found strong evidence for any of the proposed mechanisms.

| Horizontal SR
Kuper and colleagues implemented a novel approach to causal assessment across a body of evidence for a range of risk factors which they called a 'horizontal SR'. They examined the relationship between four risk factors (depression, exercise, C reactive protein, and diabetes) and coronary heart disease, using various study designs. 82 Within and across the risk factors, they compared findings across study designs which addressed confounding and reverse causality to different degrees and in different ways: observational studies with multivariable adjustments; studies using genetic variants as instrumental variables (Mendelian randomisation); and RCTs. 82 For each risk factor, they conducted a meta-analysis by each study design type and subsequently compared the meta-analysis results of the three risk factors with an unknown causal role (depression, exercise, and C reactive protein) against the meta-analysis results of the risk factor they designated an established cause (diabetes). The comparison of observational studies suggested that diabetes and C reactive protein had a causal role in coronary heart disease, while, according to the authors, observational evidence for exercise and diabetes was more susceptible to bias and thus their causal effect on coronary heart disease was inconclusive. There was only evidence from Mendelian randomisation studies and RCTs for C reactive protein, where it appears that C reactive protein did not have a causal role, making it difficult to compare results and thus make any causal inferences.
In identifying studies that address bias and study designs comparable to experimental evidence, this CAA utilised the principles of the BH viewpoints strength of association across different study designs and experiment. Unlike the reviews applying BH viewpoints, in a horizontal review the size of association identified from the observational studies is considered in the context of appropriately account for confounding and other forms of bias. Kuper and colleagues also considered other forms of bias including measurement bias and publication bias. The authors also appear to consider specificity as by looking at different risk factors they may implicitly, and unintentionally, suggesting there is no evidence for specificity. They also appear to account for consistency within the horizontal SR as they are not only evaluating effect estimates across different study designs and risk factors, but also explicitly review explanations for statistical heterogeneity and evaluate temporality while considering reverse causality.

| Sign test hypotheses
A SR by Kroger and colleagues explored the relationship between socioeconomic status and health by comparing two competing hypotheses that could explain the putative causal relationship. 83 The health selection hypothesis suggests that differences in health status cause socioeconomic status while the social causation hypothesis suggests that resources available to people with higher socioeconomic status have better health (i.e., reverse causality). To determine which mechanism is more likely to be causal, Kroger and colleagues conducted a sign-test to compare the probabilities of health selection versus social causation based on the conclusions of included studies. This CAA reflects an approach to testing for temporality by testing the reverse direction of the pathways between the exposure and outcome.
The authors ran three meta-regressions: one for all studies providing support to the health selection hypothesis; one for all studies in support of the social causation hypothesis; and one for all studies that found equal support for both hypotheses (i.e., the null hypothesis). They regressed the preference for the three theories against study characteristics including age, education and income of the included studies' samples, which were found to be somewhat predictive of support for a given theory. Overall, they did not find a consensus in support for either theory. Thus, it appears that strength of association is implemented in the context of understanding temporality. This CAA uses temporality, also used in BH viewpoints, to assess reverse causation.

| Causal cascade model
One SR implemented the principles of DAGs ('conditional independence for the parameters and variables implicated' p5 84 ) and developed a Bayesian causal model illustrating the 'framework of causal relationships', p3. 84 The model illustrated the framework of causal relationships Chen and colleagues aimed to understand the heterogeneity of advanced breast cancer risk and mortality breast cancer across breast cancer screening trials. They focused on two hypothesised reasons for variation in trials examining breast cancer mortality: attendance rate in screening trials and test sensitivity to breast cancer mammography (i.e., incidence rate of interval cancer/expected incidence rate). In other words, their aim was to elucidate the statistical heterogeneity of advanced breast cancer risk within breast cancer screening trials given these two possible explanations. They considered the impact of different combinations of attendance rates (90%, 60%, and 30%) and sensitivity rates (95%, 75%, and 55%) on breast cancer risk and mortality rates. They found that both attendance rates and sensitivity explained the heterogeneity of trials. This CAA overlaps with the BH viewpoint of consistency, which is concerned with heterogeneity across the evidence.

| CAAs impact on conduct of SR stages
In this section we considered whether and how CAAs impacted different stages of SR conduct (objective 3): objective of the review; description of the study design; inclusion and exclusion criteria; search strategy; data extraction; and evidence synthesis and conclusion. 87 The key findings and adaptations made in each stage are summarised below in Table 4.
There were seven SRs/RoRs 33,39,47,49,58,71,77 (all utilising BH viewpoints) where causal assessment does not appear to have been incorporated into conduct of research objectives, review design, search strategies, inclusion criteria or data extraction. One SR 74  Though there was some variation in how it was implemented, the most common CAA used by SRs/RoRs was BH viewpoints, which are considered among the most influential and comprehensive approaches to causal assessment. 88,89 Other CAAs included realist reviews and MRC guidance on natural experiments, which both have existing implementation guidance. 85,86 The remaining CAAs (horizontal SR; sign-test hypothesis, and causal cascade model) were developed by SR authors, though the causal cascade model incorporated principles of DAGs. A common theme across the alternative CAAs was that most focused on one or two key aspects of causal assessment (e.g., one of the BH viewpoints). The overlap across CAAs also suggests that insight into implementing viewpoints should include reviews utilising BH viewpoints as well as reviews utilising alternative CAAs as both may offer useful insights for a given viewpoint. The comparison across CAAs suggests that while it may be preferable for some SRs/RoRs to take an in-depth look at one characteristic of causal assessment, in another SR/RoR it would be preferable to consider many, depending on the focus and priorities for the review. Reviews that focus on one or two BH viewpoints (as opposed to several or all viewpoints) may find it easier to provide greater transparency about how the given viewpoint was implemented.
We found considerable variation in how BH viewpoints were used including their transparency, which was part of a broader understanding in how CAAs were implemented (objective 2). Transparent reporting of methods is a key component of SRs and lack of transparency in how CAAs were implemented in SRs/RoRs might result in assessments of causality not being reproducible which undermines the strength a SR/RoR. Based on our assessment of SR/RoRs using BH viewpoints, transparency of how viewpoints were implemented can be improved by (1) providing reasons for why certain viewpoints were used or omitted, (2) offering clear viewpoint definition and indicators, and (3) utilising a variety approaches for assessing support for viewpoints and applying viewpoints.
Firstly, as only three reviews explained why certain viewpoints were excluded, 42,54,57 we are unsure if variation in which viewpoints were used reflected differences in viewpoints' perceived relevance for causal assessment or which viewpoints were more easily understood and applied. Moreover, only one-third of reviews defined their included viewpoints while just 40% indicated how viewpoints were met. Limited clarity of why certain indicators were used makes it difficult to understand why there was, for example, a broad range for what was considered a 'large' effect estimate (strength of association) or what would be considered a 'credible' mechanism (plausibility). Finally, different approaches for assessing support for viewpoints and applying viewpoints improved overall transparency. Reviews that, for example, used both narrative and quantitative support for viewpoints provided more comprehensive assessment of the extent to which viewpoints were met than those that only provided quantitative or narrative assessments of support. Relatedly, reviews that implemented viewpoints across different study groupings (e.g., across all synthesised studies, across studies synthesised by exposure, and across individual studies) appear to more comprehensively consider causality than those that do not. Despite its importance, only a few reviews stood out as example of a rigorous and transparent application of BH viewpoints. Four reviews 38,46,54,70 defined the viewpoints, provided indicators and used both a narrative and quantitative rankings to describe certainty or likelihood of viewpoints having been met, with one of them explaining why certain viewpoints were not included. 54

| Impact of incorporating CAAs on conducting SRs (objective 3)
Explicitly incorporating causal assessment into review objectives and CAAs into review study design, as most reviews did (see Table 4), are examples of how researchers can conduct causal SRs with clear research goals and explicit use of causal inference. 1 To a lesser degree, CAA also impacted the search strategy, inclusion criteria and data extraction. It may be that so few reviews (3/53) 43,74,78 designed their search to specifically identify terms such as 'causal mechanism' or 'causality' because doing so creates a low sensitivity search. It appears that an alternative approach is to have a highly sensitive search with a set of inclusion and exclusion criteria designed to identify studies most relevant to causal assessment, which about one quarter of reviews did (14/53, 26.4%). For example, the horizontal review and the MRC guidance on natural experiments review by Martin and colleagues designed their inclusion criteria to ensure their review included studies that assess bias and experimental evidence. Similarly, one quarter of reviews extracted information from included studies that supported causal assessment. Most CAAs were incorporated into the synthesis process. This includes using evidence to understand if BH viewpoints were met, synthesising evidence to understand causal mechanisms in realist reviews, sign test of the evidence for reverse causality, or test the evidence for statistical heterogeneity (causal cascade model). Finally, all reviews drew conclusions regarding causal relationships, suggesting it is a key component of a causal SR.

| Strengths and weaknesses of methodological review
This methodological review is the first we have identified that summarises the use of causal approaches in SRs in current practice. It builds on literature exploring the use of SRs in causal assessment 11,90 and aiming to improve transparency and robustness around causal assessment. 1,91 Our findings are consistent with criticisms of causal SRs that there is no consensus on how to conduct a causal SR, though we found this variety may in fact strengthen causal assessment in SRs. The range of CAAs and variety in how a given CAA was implemented (both within BH viewpoints and across CAAs that utilised one or two BH viewpoints) provide many examples of causal SR that may be of use to different causal SRs with different areas of focus. In other words, it may be more relevant (given the exposure/outcome relationship under study, type of evidence available, main point of disagreement in the literature) for some reviews to focus on the BH viewpoint of experiment or temporality and for others to focus on several viewpoints in less detail. This review has several limitations. The primary limitation in this methodological review was that the search was not sufficiently sensitive to identify reviews that did not use causal language in their title or abstract. Thus, we missed reviews that either implicitly applied causal approaches (such as sensitivity analyses for unmeasured confounding) or explicitly applied causal approaches but did not reference them in the title or abstract. 7 We did not limit the search to specific population health topics, such as sexual health or men's health, as we aimed to include a broad range of population health SRs. That is, we designed the review to help us explore the range of possible CAAs across a broad area rather than exploring in greater detail issues of causal assessment specific to a particular topic. It is possible that we have overlooked useful insights from SRs of NRSs in subject areas outside population health. In addition, we focused our search on SRs/RoRs as they are considered the gold standard of evidence synthesis, so we may have missed additional CAA used by non-SRs/RoRs. Relatedly, due to the limited number of reviews utilising alternative CAAs, we were only able to describe differences in how BH viewpoints were implemented. Moreover, we did not critically appraise the reviews and thus did not account for quality of SRs in our consideration of how causal approaches were applied.

| IMPLICATIONS FOR FUTURE CONDUCT OF CAUSAL SRS
The range of CAAs, including variation across reviews that applied BH viewpoints, offer examples of how the same characteristic of causality could be implemented. Alternative CAAs that focus on one or two viewpoints appear to go into greater detail on those viewpoints (compared to reviews incorporating BH viewpoints) both in how the viewpoint is implemented and also appear to present greater transparency about how it has been implemented. However, reviews incorporating BH viewpoints (even though most did not use all nine viewpoints) appear to consider a broader range of characteristics of causality than the CAAs we identified. Investigators aiming to conduct causal SRs may need to consider which balance of depth and breadth is most appropriate for their consideration of a putative causal relationship.
Investigators should consider a range of CAAs and choose the approach that provides the greatest insight into whether a causal relationship exists, and this is especially true of BH viewpoints. This finding is consistent with an earlier theoretical comparison of BH viewpoints with other CAAs to elucidate viewpoints' theoretical underpinnings; our findings suggest that alternative CAA offer practical examples for improving the way individual viewpoints are implemented. For instance, the causal cascade model approach for evaluating heterogeneity and the horizontal review approach to evaluate the impact of different study designs the potential biases associated with each are potentially valuable for implementing the BH viewpoints consistency and strength of association. Formal testing (horizontal review, sign-testing mechanisms) and comprehensive evaluation (realist review) of putative mechanisms is necessary to increase transparency around assumptions of plausibility. The MRC guidance on natural experiments lays out the analytical methods and study designs useful for implementing experiment. Falsification outcomes, as used by the MRC guidance on natural experiments, or comparing the associations of different exposure/outcome groupings, as the horizontal review and sign-testing mechanism CAA did, may be useful approaches to evaluating specificity. Moreover, coherence and analogy were two of the three most infrequently used viewpoints. Though we are unsure why they were excluded, as they were used by fewer than half of reviews and as they were the only viewpoints that did not overlap with the alternative CAAs, their utility in causal assessment is not clearly supported.
SRs/RoRs applying BH viewpoints varied in how the viewpoints were implemented and transparency reporting on implementation. We found that transparent reporting of why viewpoints were implemented in a certain way (or considered not at all) is potentially as important as how viewpoints were implemented. That is, it may be more useful to understand why viewpoints have been excluded than to apply all nine viewpoints. Clarity, such as in defining viewpoints and providing criteria for how they may be met, also increases transparency reporting BH viewpoints. Where possible, we believe a more comprehensive approach to implementing the viewpoint is preferable. For example, reviews that describe the support for each viewpoint both narratively and quantitatively (e.g., strong/moderate/weak) offer greater transparency of how support for each viewpoint was considered. Transparent reporting of how viewpoints were implemented may clarify inconsistencies in how BH viewpoints were used.

| CONCLUSION
This methodological review has evaluated how SRs/RoR that assess causality ('causal reviews') in population health research are conducted and reported. It contributes to the literature aimed at improving causal assessment in SRs, for which there are currently no established guidelines. While our goal was not to propose guidelines, our findings suggest overlap across the CAAs with BH viewpoints such that alternative CAAs appear to emphasise one or two viewpoints. This indicates that alternative CAAs should be used to inform, and improve, how BH viewpoints are implemented. Moreover, as there are also no guidelines for incorporating BH viewpoints, the most commonly applied CAA, we identified five key areas where reviews can be transparent: reasons for excluding viewpoints; viewpoint definition; viewpoint indicators; support for viewpoints; and application of viewpoints. The more transparent and clear reviews are about how CAAs are implemented, the greater clarity there is likely to be on how CAAs impact different SR stages which was not always clear. Overall, we found that clarity, transparency and engagement with other CAAs are the key approaches to conducting a causal SR.

AUTHOR CONTRIBUTIONS
MS led (and AP, HT, and SVK supervised) conceptualization, methodology, investigation, analysis, and writingoriginal draft. AP, HT, and SVK also validated findings and contributed to writing -review & editing.

FUNDING INFORMATION
Michal Shimonovich received funding from the Medical Research Council and the Medical, Veterinary and Life Sciences School at the University of Glasgow for her PhD. The author received no additional financial support for the research, authorship, and/or publication of this article.