CINeMA: Software for semiautomated assessment of the confidence in the results of network meta‐analysis

Abstract Network meta‐analysis (NMA) compares several interventions that are linked in a network of comparative studies and estimates the relative treatment effects between all treatments, using both direct and indirect evidence. NMA is increasingly used for decision making in health care, however, a user‐friendly system to evaluate the confidence that can be placed in the results of NMA is currently lacking. This paper is a tutorial describing the Confidence In Network Meta‐Analysis (CINeMA) web application, which is based on the framework developed by Salanti et al (2014, PLOS One, 9, e99682) and refined by Nikolakopoulou et al (2019, bioRxiv). Six domains that affect the level of confidence in the NMA results are considered: (a) within‐study bias, (b) reporting bias, (c) indirectness, (d) imprecision, (e) heterogeneity, and (f) incoherence. CINeMA is freely available and open‐source and no login is required. In the configuration step users upload their data, produce network plots and define the analysis and effect measure. The dataset should include assessments of study‐level risk of bias and judgments on indirectness. CINeMA calls the netmeta routine in R to estimate relative effects and heterogeneity. Users are then guided through a systematic evaluation of the six domains. In this way reviewers assess the level of concerns for each relative treatment effect from NMA as giving rise to “no concerns,” “some concerns,” or “major concerns” in each of the six domains, which are graphically summarized on the report page for all effect estimates. Finally, judgments across the domains are summarized into a single confidence rating (“high,” “moderate,” “low,” or “very low”). In conclusion, the user‐friendly web‐based CINeMA platform provides a transparent framework to evaluate evidence from systematic reviews with multiple interventions.

domains that affect the level of confidence in the NMA results: (a) within-study bias, (b) reporting bias, (c) indirectness, (d) imprecision, (e) heterogeneity, and (f) incoherence. Reviewers assess the level of concerns for each relative treatment effect from NMA as giving rise to "no concerns," "some concerns," or "major concerns" in each of the six domains. Then, judgments across the domains are summarized into a single confidence rating ("high," "moderate," "low," or "very low").
The six domains include considerations pertaining to all stages of the systematic review, including literature search, data extraction, and statistical analysis. Within-study bias domain refers to limitations in the individual studies that may lead to a biased estimated relative treatment effect. Reporting bias results from the inclusion in the systematic review of a nonrepresentative set of the eligible studies, that may occur for example from an uncomplete literature search.
Indirectness refers to the relevance of the included studies to the research question, which includes the definition of the population, interventions, and outcomes of interest. A core assumption in NMA is that of transitivity; that there is an underlying true relative treatment effect which applies to all studies regardless of the treatments being compared. Assessment of transitivity is challenging and is usually done by exploring the distribution of effect modifiers per comparison. CINeMA's approach for indirectness intends to also address the assumption of transitivity by indicating which comparisons may suffer from different definitions of the setting of interest. Assuming that transitivity holds implies that consistency-which refers to the agreement of the estimated treatment effects-also holds. This can be assessed under the incoherence domain in CINeMA. Finally, imprecision and heterogeneity domains refer to the certainty with which each effect is estimated and the variability in the results of studies contributing to each comparison respectively.
The CINeMA framework has been implemented in a userfriendly web application (see https://cinema.ispm.unibe.ch/; CINe-MA, 2017). From a technical point of view, CINeMA is a single page application which communicates to an R back-end server; in particular, the packages meta and netmeta are used (Rücker, Schwarzer, Krahn, & König, 2016;Schwarzer, 2019). It is developed as a custom functional reactive framework and written in JavaScript and PureScript. CINeMA does not permanently store the data, or any other information related to the uploaded projects; only temporary storage takes place for the sake of the calculations or network efficiency. The source code of CINeMA can be found in (Papakonstantinou).
The methodology described in (Nikolakopoulou et al., 2019) has been implemented in CINeMA using "rules" that can automate derivation of domain-specific judgments. Three rules can be used to summarize the risk of within-study bias and of indirectness for each relative effect estimate and produce automated judgments. Two levels of judgment for reporting bias are suggested, based on completeness of the literature search, empirical studies, and statistical analyses. The rules for judging imprecision and heterogeneity are based on whether the confidence interval or prediction interval includes the line of no-effect and prespecified clinically important treatment effects. The use of rules is optional, and the outputs can be partially or fully overridden. However, the semiautomated process helps researchers to form judgments. Early applications of CINeMA have appeared in the literature (Cipriani et al., 2018;Schwingshackl et al., 2018).
Here we provide a tutorial describing the functionality of CI-NeMA. We explain how the software works, the data formats and requirements, the default options implemented in the rules, and their rationale. We describe the functionality of CINeMA and illustrate its use with the example of a NMA that compared the incidence of diabetes in patients taking antihypertensive drugs or placebo. The network included 22 randomized trials which evaluated the differences between angiotensin-converting-enzyme inhibitors (ACE), angiotensin-receptor blockers (ARB), calcium-channel blocker (CCB), Beta Blocker, diuretics, and placebo. The NMA found that the risk of diabetes was lower with ARB, and higher with diuretics than with placebo (Elliott and Meyer, 2007). In this example, data on studylevel indirectness only serve to illustrate how the indirectness domain is assessed and do not reflect the relevance of each study to the research question.

| UPLOADING DATA: MY PROJECTS
Under "My Projects," users upload a .csv file with the study outcome data for their project. The dataset should also include the data on the study-level risk of bias (RoB) and judgments on indirectness. Studylevel RoB would normally summarize considerations on selection, performance, attrition, detection, and reporting bias  It might be that summary data are not available for each intervention group for each study. In this situation data can be imported in "inverse variance" format, where a comparison-specific estimate of the relative treatment effect (assumed to follow a normal distribution) and its standard error are reported (e.g., log odds ratios, standardized mean differences, etc., see Table 2). When the "inverse variance" format is used, CINeMA will prompt the user to define whether the outcome is binary or continuous. Table 2 can be used as a guide on how outcome data of "inverse variance" format can be used as input to CINeMA.
Users should choose one of the five data formats in Tables 1 and 2. The names of variables can be as in Tables 1 or 2 (in which case CINeMA will automatically recognize which column refers to which variable) but custom field names are also allowed (e.g., "events" instead of "r" for number of events in Table 1a). If custom field names are used, CINeMA will prompt the user to specify which column represents which field after uploading the dataset. Once the procedure is done (or directly after uploading the data, if variable names are exactly as in Tables 1 or 2), information on the file format (long, wide), outcome type (binary, continuous), number of studies, number of interventions, and number of comparisons with direct evidence appears. Renaming the project's title is also possible under "Rename." Then, users can click on "Proceed" to go to "Configuration."

| Worked example
Uploading the network of antihypertensive drugs, CINeMA recognizes the file format (long) and outcome type (binary) and provides summary of the dataset: it includes 22 studies, 6 interventions, and 14 comparisons with direct data.

| SETTING-UP THE NMA: CONFIGURATION
The "Configuration" tab is activated once the dataset has been uploaded and variable names have been successfully defined. In this tab the user needs to define the NMA analysis and is presented with a network plot. This page also allows users to evaluate only a subset of all possible intervention comparisons.

| Network plot
The network plot corresponding to the uploaded dataset is automatically drawn with equally sized nodes and edges. Users can choose to weight nodes and/or edges according to the sample size or the number of studies (under "Node size by" and "Edge width by"). Note: Data should be imported as a .csv file. The displayed column names are the default expected names; if other names are provided CINeMA will return a query. id specifies the study; t specifies the treatment (numeric or string); r is the number of events; n is the sample size; t1 and t2 specify the treatment codes (numeric or string); r1 and r2 are the number of events in treatments t1 and t2; n1 and n2 are the sample sizes in treatments t1 and t2, respectively; y is the mean; sd is the standard deviation; n is the sample size; y1 and y2 are the means in treatments t1 and t2; sd1 and sd2 are the standard deviations in treatments t1 and t2; rob specifies risk of bias and indirectness specifies level of indirectness; rob and indirectness can take either 1, 2, and 3 or L, M, H values for low, moderate, and high risk of bias or level of indirectness. Note: Data should be imported as a .csv file. id specifies the study, t1 and t2 specify the treatment codes (numeric or string), effect is the effect estimate of t1 versus t2 which can be log odds ratio, log risk ratio, log hazard ratio, mean difference or standardized mean difference, se is the standard error of the effect estimate, rob specifies risk of bias, and indirectness specifies level of indirectness; rob and indirectness can take either 1, 2, and 3 or L, M, H values for low, moderate, and high risk of bias or level of indirectness.
Nodes can either be all blue or colored according to the proportion of studies with low (green), moderate (yellow), and high (red) RoB or indirectness (under "Node color by"). "Edge color by" dropdown menu allows coloring edges according to the most prevalent bias level within each comparison ("Majority RoB"), the average RoB of the included studies ("Average RoB"), or the maximum bias level within each comparison ("Highest RoB"); the respective categories for indirectness are also available ("Majority Indirectness," "Average Indirectness," "Highest Indirectness"). Different representations may be chosen according to users' interests: for example, "Highest RoB" or "Highest Indirectness" could be chosen when users are interested in viewing the worst pieces of evidence feeding into each compar-"Save Plot" button. The outcome data appear next to the network plot. By clicking on a specific edge or node, the respective outcome data corresponding to that edge or node appear on the data table.

| Define your analysis
Here users are asked to choose whether to perform a fixed effect or a random effects NMA (under "Analysis model") and to define effect measure type (under "Effect measure"). For binary outcomes, the options "Odds Ratio," "Risk Ratio," and "Risk Difference" will appear, and for continuous outcomes the options "Mean Difference" and "Standardized Mean Difference."

| Select intervention comparisons for evaluation
An NMA that compares several interventions produces estimates for all possible relative effects. However, it can be the case that not all of them are of interest (e.g., comparisons between placebo and older drugs that are no longer used). CINeMA offers the option to select which intervention comparisons are to be evaluated. Users should first select the interventions of interest and then specify whether they want to evaluate all the comparisons that contain these interventions ("Containing any of the above interventions") or only the comparisons that are formed between the selected interventions ("Between the above interventions"). For example, in a network with four interventions A, B, C, and D selecting A and B with the "Containing any of the above interventions" option will result in evaluation of comparisons AB, AC, AD, BC, and BD (all possible comparisons except CD). Selecting A and B with "Between the above interventions" option will result in evaluating a single comparison (AB). A list of the comparisons to be evaluated then appears. Note that the analysis is performed using all studies irrespective of whether a subset or all comparisons are evaluated.
After defining the comparisons for evaluation, the "Set up your evaluation" button appears. Clicking on this performs two actions.
First, it calls netmeta in R to estimate all relative effects from the network and a common heterogeneity parameter. The relative effects are found in the league table, which can be downloaded and saved as a .csv file ("Download league table"). Second, it calls an R function that calculates the contribution matrix (Papakonstantinou).
The contribution matrix shows the percentage contribution of information from each study and each direct comparison (shown in columns) to the estimation of each relative effect (shown in rows). It is calculated using the flow decomposition method described in (Papakonstantinou et al., 2018) and is used later in the evaluation of within-study bias and indirectness. Users can download the output in .csv format using options "Download per study contribution matrix" or "Download per comparison contribution matrix." During evaluation, the user can abort computations by pressing the "Cancel" button. Once calculations are done, the "Reset your evaluation" button deletes all previous choices and computations.
"Proceed" saves the analysis (CINeMA will remember choices made so far in the case of refreshing or closing and revisiting the page) and takes users to the "Within-study bias" domain.

| Worked example
The results of selecting different options for weighting the network plot are shown in Figure 1. In the "Define your analysis" section, we select a "Random effects" model and "Odds Ratio" as the effect measure to be analyzed. We select all interventions to be evaluated; note that in this case there is no difference between choosing "Containing any of the above interventions" and choosing "Between the above interventions." Table 3  and "Highest RoB." Choosing "Majority RoB" will lead to a level of concern according to the RoB with the greatest total percentage contribution (the greatest block between green, yellow, and red in each bar). The "Highest RoB" will assign a level of concern determined by the highest RoB in each bar. Summarizing RoB assessments using "Average RoB" uses a weighted average score for each relative effect estimate according to the percentage contribution of studies at each bias level. For example, if the contributions from low (arbitrarily assigned a score of 1), moderate (score 2), and high (score 3) RoB studies are 40%, 25%, and 35% respectively, the total RoB score will be × + × + × = 0.40 1 0.25 2 0.35 3 1.95 which rounds to 2 and leads to "Some concerns." In this example the judgment for "Majority RoB" would be "No concerns" and for "Highest RoB" would be "Major concerns." After selecting a rule, the boxes under the dropdown menu-which correspond to the each estimate of relative effects-are colored according to the level of concern, and judgments under each of the three rules are also given in the boxes. Manual change of judgments independently of the applied selection rule is possible; if a judgment is manually changed, the corresponding box is colored gray. "Reset" (the chosen rule) and "Proceed" (to "Reporting bias") buttons also appear. Figure 2 shows the bar chart for the worked example. Studies at low RoB contribute 53% in the estimation of ACE versus Beta Blockers, 43% of the contribution comes from studies at moderate RoB, and studies at high RoB contribute the remaining 4%. These RoB contributions resolve into "No concerns," "Some concerns," and "Major concerns" using the "Majority RoB," "Average RoB," and "Highest

| Worked example
RoB" rules respectively. Figure 3 shows the boxes that appear in the software showing the judgments for all relative effects.

| Reporting bias
The "Reporting bias" domain refers to biases that can occur due to publication bias, time-lag bias, selective nonreporting bias, or any other bias that renders the included studies a nonrepresentative sample of the studies undertaken (Dickersin and Chalmers, 2011;Stern and Simes, 1997). Two levels of judgment for reporting bias are suggested: "suspected" and "undetected." Completeness of the search, considerations related to the particular field (based on existing evidence and empirical studies), and statistical methods undertaken should inform the assessment of reporting bias for each relative treatment effect (Chaimani and Salanti, 2012;Mavridis, Sutton, Cipriani, & Salanti, 2013;Mavridis, Welton, Sutton, & Salanti, 2014).
To facilitate assessment of each effect separately, users can initially "Set all undetected" or "Set all suspected." They can then change the judgment manually for those estimates that do not fall into the category initially assigned. Note that a manual change from "Suspected" to "Undetected" and vice versa is not considered a deviation from the rule (and relevant boxes are not colored gray) as no rule for reporting bias is implemented. "Reset" and "Proceed" (to the "Indirectness" domain) buttons appear after initial population of judgments. We plan to develop the "Reporting bias" domain of CINeMA further in the months and years to come.

| Indirectness
For the indirectness domain, similar to "Within-study bias," the summary shows how many studies have been characterized as of low, moderate, and high indirectness at the top of the page. Subsequently, a bar graph shows the contribution of studies at each indirectness level to each NMA estimate. As for the "Within-study bias" domain, users can select between "Majority," "Average," and "Highest" rules to summarize indirectness for each relative effect estimate. Areas are colored accordingly, while judgments under each rule are shown in the boxes. Manual changes can be made, and "Reset" and "Proceed" (to the "Imprecision" domain) buttons appear.

| Imprecision
In the CINeMA framework, imprecision is assessed by 95% con-

| Worked example
For illustration, we choose an odds ratio of 1.2 as clinically important and CINeMA informs us that "relative effect estimates below 0.83 and above 1.2 are considered clinically important." The confidence interval for the comparison between diuretics and placebo ranges from 1.12 to 1.57, which corresponds to case 3 Figure 4a. The automatically generated judgment is "No concerns" and the explanation reads "Confidence interval does not cross clinically important effect."

| Heterogeneity
The importance of heterogeneity depends on the variability of effects (beyond chance) in relation to the clinically important size of effect.
The clinically important size of effect is the same as in "Imprecision"; if already specified it will automatically appear on the top of "Heterogeneity." Otherwise, users need to specify it here; if this is the case, it will also be copied to the "Imprecision" domain. Users can press "Reset" to reset the clinically important effect size; note that this will affect the "Imprecision" domain too.
CINeMA considers the agreement between confidence and prediction intervals to assign a judgment for "Heterogeneity" The estimated value of common between-study variance τ 2 is also displayed above the boxes but does not affect automated judgments. It is possible to view "Between-study variance estimates for each direct comparison along with reference intervals." To view these, users need to select the types of intervention and outcome and press "View." Boxes for each relative treatment effect are then updated to include between-study heterogeneity measures based on direct comparisons (I 2 and τ 2 ) and reference values for τ 2 (first quantile, median, and third quantile). The reference quantiles are taken from empirical studies and are specific to the type of outcome and comparison (Rhodes, Turner, & Higgins, 2015;Turner, Davey, Clarke, Thompson, & Higgins, 2012).
Reference quantiles that are lower than the estimated direct τ 2 appear in black digits and reference values greater than the estimated τ 2 appear in gray digits. The comparison with the reference values does not affect judgments. However, their critical appraisal may lead to changing the automatically generated judgments manually.

| Worked example
An odds ratio of 1.2 (and 0.83) has already been specified as clinically important. CINeMA reports that "The estimated value of betweenstudy variance for the network meta-analysis is 0.016." The comparison of beta blockers with placebo is judged as case 3 Figure 4a. In particular, "Some concerns" for heterogeneity are assigned, as the confidence interval (1.05-1.46) lies above the interval (0.83-1) while the prediction interval (0.90-1.70) crosses 1. "Prediction interval extends into clinically important or unimportant effects" is given as an explanation in the respective box. Selecting the intervention type (pharmacological for all interventions apart from placebo) and outcome type (semiobjective), boxes are updated to include extra information on heterogeneity (Turner et al., 2012). Figure 5 shows the box that appears in the software referring to the comparison of beta blockers with placebo.

| Incoherence
The range of clinically important effects is also considered in the "Incoherence" domain; resetting it using the "Reset" button will The rules used to produce automatic judgments are as follow: (1) Effect estimates based on both direct and indirect evidence and with a p value from SIDE greater than 0.10 are assigned "No concerns." (2) To assign judgments for effect estimates with both direct and indirect evidence and with a p value from SIDE <0.10, areas a, b, and c are defined as illustrated in Figure 4a (below, within, and above the clinically important effects). The confidence intervals for the direct and indirect evidence are then compared with these areas and incoherence judged according to Table 5. As with other domains, judgments can be updated manually and a "Reset" and "Proceed" (to "Report") button appear if clinically important size of effect is set.

| Worked example
As in case 5 of Figure 4c, "Some concerns" apply to the ACE inhibitor versus Beta Blockers comparison with respect to "Incoherence." Confidence intervals of both direct (0.68-1.03) and indirect (0.49-0.75) treatment effects extend below the clinically important effects zone, only the direct effect's confidence interval lies within the (0.83-1.2) interval and none extend above 1.2; thus, direct and indirect treatment effects do not have substantial, but only minor disagreement (Table 5). Figure 6 shows the boxes that appear in the software showing the judgments for all relative effects for "Incoherence."

| DISPLAYING JUDGMENTS FOR ALL SIX DOMAINS: REPORT
The "Report" page brings together all the judgments for the six domains across all evaluated treatment effects. Relative effects informed by only direct or both direct and indirect evidence are shown F I G U R E 5 Boxes showing the information for judging heterogeneity for the relative effect of beta blockers versus placebo in the network meta-analysis of antihypertensive drugs and diabetes incidence T A B L E 5 Summary of implemented rules for incoherence based on the agreement of direct and indirect estimates with their 95% confidence intervals in the areas defined in Figure 4c Common areas Incoherence judgment | 9 of 15 first, followed by relative effects informed only by indirect evidence.
A thick gray left border line appears for judgments whose automatically generated judgments have been manually modified. Users can visit the "Report" page as soon as at least one domain has been assessed. If users wish to summarize judgments across domains, the "Confidence rating" dropdown menu can be used to manually assign an overall level of confidence to each relative effect. The default judgment is "High" confidence; downgrading by one, two, or three levels will lead to a confidence rating of "Moderate," "Low," or "Very low" respectively. We recommend considering judgments on different domains jointly rather than in isolation (Nikolakopoulou et al., 2019;Salanti et al., 2014). For example, "Indirectness" and "Incoherence" domains are closely related, as they both refer to considerations of similarity across included studies which could or F I G U R E 6 Boxes showing the judgments for incoherence for all relative effects in the network meta-analysis of antihypertensive drugs and diabetes incidence. ACE, angiotensin-converting-enzyme inhibitors; ARB, angiotensin-receptor blockers; CCB, calcium-channel blocker F I G U R E 7 Final output of CINeMA for the network of antihypertensive drugs and incidence of diabetes. The table shows the level of concern for each of the six domains for each comparison and can be downloaded as a .csv file by clicking on "Download Report" in "Report." ACE, angiotensin-converting-enzyme inhibitors; ARB, angiotensin-receptor blockers; CCB, calcium-channel blocker could not manifest statistically in the data. "Imprecision" and "Heterogeneity" are also related as big heterogeneity will also affect the precision of relative treatment effects. By clicking "Reset" all judgments are set to "High." Users can also download the final report as a .csv file by clicking on "Download report."

| Worked example
The report of the judgments for the Elliot et al. network is shown in CINeMA, with semiautomation of methods via a guided on-line process greatly simplifies this process, particularly for large networks. CINeMA is freely available and open-source and no login is required. It is largely based on the methodological framework described previously (Nikolakopoulou et al., 2019;Salanti et al., 2014).
While the main guiding principles of the CINeMA framework have been established (Nikolakopoulou et al., 2019;Salanti et al., 2014), specific methods, recommendations, and implementation of automated rules in CINeMA software are evolving.
Important platform updates will follow. These include users being able to upload multiple projects they are working on concurrently. With the addition of this feature, users will be able to also have multiple outcomes per project, something that is currently not supported by CINeMA. Note, however, that dependency between outcomes will not be assessed. Users will also be able to download a league table with studies of only low, or only low and moderate RoB (or indirectness). The "Report" page will be updated so that users can click on each comparison by domain judgment and decide whether they will downgrade their confidence or not, and if yes, for one or two levels.
Subjectivity is inevitable in any process or system evaluating evidence, and CINeMA is no exception. Several aspects of the eva- Evidence synthesis is used by organizations to take decisions about whether to reimburse a medicinal product, by clinical guideline panels to recommend one drug over another, and by clinicians to prescribe an intervention or recommend a diagnostic procedure.
CINeMA is a transparent framework to evaluate evidence from systematic reviews with multiple interventions, and we hope that the software presented here will facilitate its uptake.

ACKNOWLEDGMENTS
The development of the software was supported by the Campbell Note: Columns refer to studies and rows refer to NMA relative treatment effects. Entries show how much each study contributes to the estimation of NMA relative treatment effects. The table can be downloaded as a .csv file by clicking on "Download per study contribution matrix" in "Configuration." Abbreviations: ACE, angiotensin-converting-enzyme inhibitors; ARB, angiotensin-receptor blockers; BBlocker, Beta Blocker; CCB, calcium-channel blocker.