Machine learning with the hierarchy‐of‐hypotheses (HoH) approach discovers novel pattern in studies on biological invasions

Research synthesis on simple yet general hypotheses and ideas is challenging in scientific disciplines studying highly context‐dependent systems such as medical, social, and biological sciences. This study shows that machine learning, equation‐free statistical modeling of artificial intelligence, is a promising synthesis tool for discovering novel patterns and the source of controversy in a general hypothesis. We apply a decision tree algorithm, assuming that evidence from various contexts can be adequately integrated in a hierarchically nested structure. As a case study, we analyzed 163 articles that studied a prominent hypothesis in invasion biology, the enemy release hypothesis. We explored if any of the nine attributes that classify each study can differentiate conclusions as classification problem. Results corroborated that machine learning can be useful for research synthesis, as the algorithm could detect patterns that had been already focused in previous narrative reviews. Compared with the previous synthesis study that assessed the same evidence collection based on experts' judgement, the algorithm has newly proposed that the studies focusing on Asian regions mostly supported the hypothesis, suggesting that more detailed investigations in these regions can enhance our understanding of the hypothesis. We suggest that machine learning algorithms can be a promising synthesis tool especially where studies (a) reformulate a general hypothesis from different perspectives, (b) use different methods or variables, or (c) report insufficient information for conducting meta‐analyses.


| INTRODUCTION
Research synthesis is an essential scientific endeavor that integrates and assesses disparate data, concepts, and/or theories to yield novel insights or explanations 1 by consolidating collected evidence. 2,3 It is expected to contribute to fostering evidence-based research, policy, and practice. 4,5 Hypotheses and ideas that express a simple, yet general, statement are attractive in scientific disciplines studying highly context-dependent systems (e.g., medical, social, political, educational, and biological sciences). In these disciplines, such general hypotheses must be tested repeatedly from various perspectives, under different contexts, and using different methods. Consequently, it is often the case that some studies support the hypothesis, while others reject it. An example is the enemy release hypothesis, one of the most prominent hypotheses to explain biological invasions. [6][7][8][9][10][11][12] The hypothesis posits that the absence of enemies in the exotic range of non-native species determines their invasion success. 13,14 Several synthesis studies suggested that the validity of this hypothesis is dependent on the context, determined by species' identity, ecosystem type, ecological level, and test method. 8,[14][15][16][17] Synthesis is methodologically quite challenging under such a strong context-dependency. Meta-analysis (quantitative synthesis) is sometimes not applicable, particularly when studies addressing the same question use different methods or measure different response variables. 18 Note that we refer to meta-analysis as a statistical method combining the magnitude of the effect sizes from different data sets (following 19,20 ), although broader definitions also exist. 21 On the other hand, various qualitative synthesis approaches that can deal with context-dependency have recently emerged (eg, thematic synthesis, textual narrative synthesis, and framework synthesis; reviewed in Dixon-Woods et al and Barnett-Page 4,22 ), but these can retain considerable subjectivity and bias, such as "cherry-picking" of preferable evidence. 19,20,23 While these limitations are recognized and cannot be denied, policy-makers and practitioners need evidence synthesis by any means to comprehend and solve urgent real-world problems that are highly complex and usually lack mechanistic explanations. 4,22 Recognizing the strong necessity to adequately deal with complexity and context-dependency, we consider the hierarchy-of-hypotheses (HoH) approach as an attractive synthesis approach (Figure 1). [13][14][15]24 The HoH approach allows to integrate evidence addressing a given general hypothesis in various contexts by conceptually mapping the general hypothesis and the associated subhypotheses (ie, less general versions of the overarching hypothesis) and connecting them in a hierarchically nested fashion. [13][14][15] The HoH approach is regarded as a hybrid of quantitative and qualitative synthesis, as it can combine the results of statistical tests using meta-analyses or semi-quantitative methods, while experts decide how to organize the hierarchical structure from the collected evidence. Thereby, for instance, one can readily detect specific contexts where the general hypothesis is more likely to be supported. Introducing the idea of hierarchical structuring enhances research synthesis, as it allows encompassing evidence from various contexts to gain an overview of the empirical base of the general hypothesis.
The HoH approach based on experts' judgement, however, is limited in reproducibility (ie, structure of the hierarchy can depend on expert perspective) and dimensionality (ie, only a few contexts can be considered at the same time because of limited human capacity). To overcome these limitations, we consider the use of machine learning, which is envisaged to play significant roles in research synthesis. 18,25 It is a set of statistical analysis tools in the field of artificial intelligence that find associations among variables automatically in a non-parametric equation-free manner. Machine learning with text mining can screen out articles relevant to the topic of interest from a large literature database automatically and instantaneously (eg, Cheng et al, Marshall et al, Zhaohan et al, and Przybyla et al [25][26][27][28]. Moreover, the equation-free modeling approach can discover unexpected patterns from collected data. [29][30][31] For instance, in material science, machine learning discovered novel hypotheses on the crystallization of vanadium selenites from approximately 4000 single independent tests. 32 Yet, the potential of machine learning in research synthesis remains largely elusive.
In this study, we show that machine learning is a promising tool in research synthesis for integrating collected evidence for discovering novel patterns and for finding the source of controversy in a general hypothesis. We apply a decision tree algorithm 33,34 for synthesizing evidence on the enemy release hypothesis as a case study. This method solves classification problems by splitting data into some groups hierarchically and hence applies the same principle as the HoH approach. The algorithm selects some explanatory variables that explain context-dependency of the general hypothesis. We then evaluate if these variables were expected or surprising from an expert perspective by comparing the model structure with the original HoH built based on expert judgment.

| MATERIALS AND METHODS
Two steps are taken for building an HoH. The first step is a literature review that includes the decisions on which studies to collect and how to categorize each study. The second step is building the hierarchy along with the decision on which of the information to use. For the second step, we will show two approaches: a first based on expert judgement and a second based on machine learning. Note that the first step and the expert judgement approach of the second step were conducted in the previous studies by some of the authors, Heger and Jeschke. 14,15 The data shown here are the ones assessed by them and presented in the book chapter. 15 Step 1:. Literature review and data collection We summarize here how the database for analyzing evidence for the enemy release hypothesis was prepared (see Heger & Jeschke 14,15 for a more elaborate description). Heger and Jeschke 14,15 searched the Web of Science to identify studies addressing the enemy release hypothesis in its broad sense, ie, "the absence of enemies is a cause of invasion success." Manual screening and assessing the eligibility resulted in 163 articles, which accounted for 248 subhypotheses (n = 248).
The authors categorized each subhypothesis based on the following aspects: hypothesis-formulation (ie, how to make the overarching hypothesis less general and better testable by assigning a particular definition on each term in the general hypothesis); context-dependence (ie, additional factors that may influence the conclusion); and test-design (ie, detailed information on the designs of hypothesis testing). Hypothesis-formulation: 1. indicator to assess enemy release (damage the alien species received by enemies/infestation of the alien species/performance of the alien species); 2. type of comparison (alien and native species/alien species in their native range and in their invaded range/invasive and non-invasive alien species/alien species' performances with and without enemies); 3. type of enemies (specialist/generalist); Context-dependence: 4. geographic continent (eg, North America and Asia); 5. studied habitat (terrestrial/freshwater/marine); Test-design: 6. the number of focal species; 7. the number of replicates; 8. study method (observation [correlation]/experiment); and 9. degree of realism (lab/enclosure/field).
These choices were made based on knowledge gained from a first screening of the articles concerning which research approaches have been used (1 to 3) and based on general knowledge about typical biases in ecology (4 to 9).
Then, the authors classified the conclusion of each subhypothesis based on the results and discussion into three categories-supported, questioned, or inconsistent. One may imagine the conclusion for each subhypothesis must be either supported or not (ie, questioned), based on statistical significance with frequentist statistics. The class "inconsistent" was assigned to a study that had tested a hypothesis using multiple approaches and obtained inconsistent indications: eg, through multiple experiments, statistical models, response variables, and geographic locations. For example, the class inconsistent is assigned to a study where a hypothesis was supported based on a species richness measure but questioned based on an abundance measure, or a hypothesis was supported for two out of four geographic regions. Note that in the previous studies, 14,15 the class inconsistent was termed as "undecided." Step 2a:. Structuring the HoH with expert judgement (expert-led HoH) In Heger and Jeschke, 14,15 the authors structured an HoH using the collected evidence based on expert judgement (see there for a more elaborate description). Taking into account all categories at once was nearly impossible with human F I G U R E 1 The hierarchy-ofhypotheses (HoH) approach as an evidence synthesis method. This approach is useful where a general hypothesis was tested repeatedly, and a single conclusion cannot be (or should not be) drawn from the collection of evidence because conclusions may be context-dependent. First, some common attributes that differentiate the collection of evidence into some groups need to be found (step 1). Second, the collection is split into some groups in a hierarchically structured manner (step 2).
Step 2 becomes difficult when many attributes are considered (ie, multidimensionality), and how the structure should look like may depend on researcher's perspective (cf. reproducibility) [Colour figure can be viewed at wileyonlinelibrary.com] cognition, because how to structure an HoH becomes too flexible (eg, which variable should come first). Moreover, the collection of evidence would be split into too many subgroups if all categories were included. Therefore, the authors needed to decide which of the criteria should be used to build the hierarchy. After some trials, the authors agreed that up to three levels of subdivision can be understood readily and offer enough detailed contexts.
The authors also agreed that it is of major interest whether biological attributes may explain the context dependency. Therefore, the categories of hypothesis-formulation, (1) to (3), were preferred over the others for building the hierarchical structure. Whether the level of support differs according to context and test design was less attractive to investigate. For each node, the numbers of studies that supported, questioned, or were inconsistent about the hypothesis were shown (see Figure 2). Note that decision criteria can depend on a researcher's perspective, as with any synthesis approaches.
Step 2b:. Structuring the HoH with machine learning (machine-led HoH) For this study, we structured another HoH with the machine learning algorithm, conditional inference tree. 34,35 We used the entire set of Heger and Jeschke's predictors, 14,15 categories (1) to (9). The conditional inference tree is a decision tree algorithm suitable for detecting nonlinear, nonadditive patterns in a hierarchically structured manner. 33 The algorithm partitions the samples into two subsamples by searching for a predictor variable and its threshold which differentiate the variability in the two subsamples the most. Once the sample is split, the algorithm attempts to further split each of the split subsamples. This trial is recursively done for each subsample until the algorithm cannot find a statistically significant split with any predictor and its threshold. [33][34][35] When a predictor contains more than two categories, the split is done by finding the two groups of the categories that maximize among-group variability.
The significance level was set to 0.05. In this study, the algorithm applied Kruskal-Wallis tests for numeric predictors and Cochran-Mantel-Haenszel tests for categorical predictors. 35 We used the "Partykit" package (version 1.

| RESULTS
Of 248 subhypotheses, there were about as many studies "supporting" as "questioning" the enemy release hypothesis (n = 105 and 101, respectively), with the remaining studies being inconsistent (n = 42). Both expert-and machine-led HoHs showed different levels of support for each cluster of subhypotheses with a particular importance of hypothesisformulation, or more precisely, the type of comparison (2nd level split in Figure 2 and splits of nodes 1 and 4 in Figure 3). Among the types of comparison, only the studies that compared aliens in their native range and in invaded ranges tended to support the hypothesis (node 7 in Figure 3), but the others were more likely to question it (ie, invasive vs non-invasive aliens [node 2], aliens with vs without enemies [node 2], and aliens vs natives [node 6]).

F I G U R E 2 Hierarchy of hypotheses
for enemy release hypothesis, built with the method described in Heger and Jeschke 14 and updated with data from Heger & Jeschke. 15 The hierarchical structure classifies the collected evidences based on three chosen criteria. The relative lengths of color bars and the numbers bracketed (supported/inconsistent/questioned, respectively) in each box indicate the relative proportion of conclusions [%] within the given context The two other criteria thought to be important in the expert-led HoH (ie, indicator for enemy release and type of enemies) were not selected in the machine-led HoH. Instead, geographic region (whether including Asia or not, node 2) and type of study design (either experiment or observation, node 3) appeared as significant factors. More than 80% of the studies related to Asia supported the hypothesis (node 9). Observational studies tended to support the hypothesis (node 8), whereas experimental studies tended to question it (node 4).

| DISCUSSION
We demonstrated the ability of a machine learning algorithm with a hierarchical concept to perform synthesis of contextrich evidence. In this approach, a first step is conducting a literature review with categorizing each study in a structured manner, and a second step is to apply a decision tree algorithm for its quantitative analysis. This synthesis approach can be useful where (a) studies reformulate a general hypothesis from different perspectives, (b) studies use diverse methods or variables, 18 (c) many of them report insufficient information for conducting meta-analysis. 38,39 A machine-led HoH can be built instantaneously, in a reproducible manner, and it can detect unexpected patterns. Yet, this has some caveats: it sacrifices flexible imagination by experts in structuring an HoH. Given the strict quantitative rule, it may just find patterns that are out of scope. In addition, it does not consider the relative importance of each study as default (but weighting would be possible). A single study that conducted a test very rigorously can be clustered with less rigorous ones. Therefore, the expert-led HoH approach (or another qualitative synthesis tool) should be applied to complement the machine-led HoH approach. We believe that experts' intuition and creativity are important for directing the structure of the theoretical body, and for deciding what should be further investigated.
The most unexpected discovery was that more than 80% of the studies focused on Asia supported the enemy release F I G U R E 3 Hierarchy of hypotheses for enemy release hypothesis, built with the conditional inference tree machine learning algorithm. This analysis included the categories of hypothesis-formulation, context-dependency, and test-design (nine predictors). Categories for split were automatically selected based on importance, and all splits are statistically significant (α = .05) [Colour figure can be viewed at wileyonlinelibrary.com] hypothesis (node 9 in Figure 3). A study focus on Asia means that these empirical tests examined either (a) species native to Asian regions introduced to non-Asian regions or (b) species non-native to Asian regions introduced to Asian regions. We investigated whether either of the two features can explain the reason for the high support, but both features were evenly studied (13 case studies for each) and equally supportive. Therefore, the direction of invasion pathway, 40 either species arriving from or to Asian regions, could not explain this pattern. We note that this finding does not necessarily imply any causal relationships and we cannot discard the possibility of a Type II error on this finding (ie, there is no true relationship, but it suggests statistical significance solely by chance). However, Asian regions have been largely understudied in invasion biology, 41 and therefore, "Why Asia?" remains an important open question in invasion biology. Such novel pattern discovery using machine learning for generating a new question is useful in fields where theory and mechanistic understanding are still immature and context-rich systems are studied. [29][30][31]42,43 The structure of the machine-led HoH differed from that of the expert-led HoH, but this does not mean that machine learning algorithms suggest results totally unexpected by experts. Indeed, the results of the machine-led HoH are in harmony with previous narrative syntheses about the enemy release hypothesis: (a) whether an alien species is invasive or not does not generally depend on whether it is released from its enemies or not (node 5 in Figure 3) 16 ; (b) experimental evidence is more equivocal than observational (correlational) evidence (nodes 4 and 8) 8 ; and (c) community studies (node 6) are less supportive than biogeographical studies (nodes 7 and 9). 16 Our analysis supports these findings quantitatively and statistically, and these agreements corroborate the usefulness of machine learning for supporting expert judgement in research synthesis.
The particular benefits of employing machine learning over relying solely on expert judgment are the instantaneous speed for building an HoH, reproducibility ensured by the quantitative decision rules, and the ability of analyzing far more factors than a human expert. On the other hand, the expert-led HoH is necessary if the literature analysis has the aim to answer specific questions. For example, the expertled HoH in our case revealed that species in many cases seem to be released from their enemies (less damage and less infestation), but that this release is not necessarily connected to a better performance of the species (low support for the respective hypothesis, see Figure 2). Moreover, the expert-led HoH helps to discover gaps of knowledge. The expert-led HoH (Figure 2), for example, can emphasize the necessity of more studies involving (a) experimental tests comparing aliens' statuses with vs without enemies and (b) specialist enemies. The first one is necessary to strictly test the enemy release hypothesis positing the absence of enemies as a cause of invasion success. The second one is also important, because given basic ecological knowledge, specialist enemies should play a key role in enemy release. 8 Thus, both machine-and expert-led HoHs can make valuable contributions to research synthesis: the former by identifying groups of cases in which a major hypothesis tends to be supported, and the latter by answering questions derived based on theoretical considerations and by identifying gaps of knowledge.

ACKNOWLEDGMENTS
The basic idea for this study was developed during the workshop "The hierarchy-of-hypotheses approach: Prospects and implementation in ecology and beyond." We thank the participants of this workshop, especially Thierry Hanser, for these initial discussions. The workshop was funded by the Volkswagen Foundation (Az 92807) and additional support was received from the German Federal Ministry of Education and Research (Bundesministerium für Bildung und Forschung; BMBF) within the Collaborative Project "Bridging in Biodiversity Science (BIBS)" (01LC1501A), the Deutsche Forschungsgemeinschaft (grants JE 288/9-1, JE 288/9-2), and the Grant-in-Aid for Japan Society for the Promotion of Science (JSPS) Overseas Research Fellowships. We also thank A. Lehmann and anonymous reviewers for offering constructive insights on the manuscript.

CONFLICT OF INTEREST
The author reported no conflict of interest.

DATA AVAILABILITY STATEMENT
The R script with detailed descriptions and the data is available at github (https://github.com/masahiroryo/Machine-Learning-and-Hierarchy-of-Hypotheses).