metaCOVID: A web-application for living meta-analyses of COVID-19 trials

Outputs from living evidence syntheses projects have been used widely during the pandemic by guideline developers to form evidence-based recommendations. However, the needs of different stakeholders cannot be accommodated by solely providing pre-defined non amendable numerical summaries. Stake-holders also need to understand the data and perform their own exploratory analyses. This requires resources, time, statistical expertise, software knowledge as well as relevant clinical expertise to avoid spurious conclusions. To assist them, we created the metaCOVID application which, based on automation processes, facilitates the fast exploration of the data and the conduct of sub-analyses tailored to end-users needs. metaCOVID has been created in R and is freely available as an R-Shiny application. Based on the COVID-NMA platform (https://covid-nma.com/) the application conducts living meta-analyses of randomized controlled trials related to COVID-19 treatments and vaccines for several outcomes. Several options are available for subgroup and sensitivity analyses. The results are presented in downloadable

What this adds • metaCOVID is an online tool that facilitates the exploration and analysis of complex data structures, such as the COVID-NMA database, without requiring technical and programming skills. • This application provides free access to the most up-to-date database for COVID-19 trials for treatment and vaccine comparisons and can assist guideline developers to form timely recommendations.
• Using this open-access application, different stakeholders can rapidly investigate the impact of different characteristics on the results and easily produce comprehensive graphical displays.
Potential impact for Research Synthesis Methods readers outside the authors' field • This pilot application forms the basis for creating fast and efficient automation tools that accelerate the analyses and the dissemination of the findings for future living evidence synthesis projects. • Freely available software tools, like metaCOVID, are indispensable tools in the era of open science but should be accompanied by careful and thoughtful interpretation of the outputs.

| INTRODUCTION
The emergent situation of the COVID-19 pandemic motivated researchers worldwide to rapidly develop and evaluate preventive and therapeutic interventions. In light of an unprecedented explosion of clinical research findings and the uncertainty they are inevitably accompanied with, several initiatives [1][2][3][4][5] were set up to provide "living" evidence on the effects of the different interventions; this means to continuously collect and synthesize all available evidence on COVID-19 treatments, preventive interventions, and vaccines. The COVID-NMA platform 1-3 (https://covid-nma. com/) provides public access to the most up-to-date information with respect to the effects of the different COVID-19 interventions and supports timely clinical decisions and policy-making. However, guideline developers and other stakeholders, apart from real-time access to highquality data, also need to be able to investigate the data and the impact of different characteristics on the results as well as to produce their preferred evidence summaries. For example, different countries have different policies concerning the inclusion or not of preprints 6 when forming COVID-19 guidelines while others prefer to exclude, by default, small trials (e.g., with less than 100 participants). Accommodating such preferences of different end-users of the platform is not feasible in the form of pre-conducted non-amendable meta-analyses, given the high number of comparisons included in the living review. In particular, producing and publishing online all possible sub-analyses for each comparison (a) is extremely time-consuming and (b) would make the platform cumbersome for the users.
On the other hand, it is of great importance to reassure that all sub-analyses undertaken are based on clinical rational. It is well-known that the larger the number of subgroup analyses conducted the higher the risk of obtaining false-positive results. 7 A database of more than 400 variables, like the COVID-NMA database, is always subject to this phenomenon and should be used very carefully from investigators with understanding and knowledge of the data. Therefore, within COVID-NMA, only a small number of such secondary analyses are considered; these are pre-defined exploratory analyses included in the COVID-NMA protocol chosen on the basis of the clinical expertise of the steering committee. 1,2 To address all these important points, we developed and made freely available the "metaCOVID" application (https://covid-nma.com/metacovid/). This web-application allows the end-users of the COVID-NMA platform and other external researchers to directly use the most up-todate database and perform meta-analyses tailored to their needs in a user-friendly environment. The results of all analyses are summarized in the form of downloadable forest plots. A key feature of the application is that the numerical results are presented alongside study characteristics and risk of bias assessments. In this article, we describe the functionality of metaCOVID and the different available options and sub-analyses it offers. We also discuss how it can inform guideline developers about the impact of difference characteristics on the results using examples from the COVID-NMA data.

| Data embedded in metaCOVID
The application is directly connected to the COVID-NMA database which is updated weekly or bi-weekly with data from all new randomized controlled trials (RCTs) evaluating COVID-19 treatments and vaccines. The search is conducted on a daily basis using the L.OVE 8 and the Cochrane COVID-19 Study register platforms. As of March 1, 2022, the COVID-NMA revised its protocol for treatments to update only comparisons evaluating immunomodulators and antiviral therapies. Every RCT in the database is also accompanied by domain-specific RoB assessments (for randomization, missing data, outcome measurement, etc.) as well as by an overall risk of bias assessment by outcome based on the Cochrane Risk of Bias 2 (RoB 2) tool. 9 In addition, a variety of RCT and population characteristics are being extracted from each trial report including baseline severity, location, type of funding, presence of conflict of interest, and many others.
The COVID-NMA database is separated in two distinct databases which are imported and are continuously updated in metaCOVID: one containing data on all possible treatments against COVID-19 and one on COVID-19 vaccines. In this way, the application always uses the most up-to-date available databases for treatments and vaccines. As of November 2022, the treatment database included 493 RCTs and 522 pharmacologic and nonpharmacologic interventions forming in total 322 pairwise comparisons. Trial populations are separated into hospitalized patients and outpatients. We focus on binary and time-to-event outcomes as well as on both efficacy and safety (Table 1). With respect to vaccines, metaCOVID contains all available comparisons between a vaccine and placebo or no vaccination; up to November 2022 data from 117 RCTs on 55 different vaccines were available. The available pairwise comparisons are organized according to the platform type (e.g., RNA based vaccines, nonreplicating viral vector vaccines, etc.). Given that the aim of metaCOVID is to perform data synthesis, it only uses data from RCTs to reassure that the setting of the studies will be similar to some degree. Data from observational studies for COVID-19 vaccines are presented in the COVID-NMA platform.

| Data synthesis with metaCOVID
The application performs a meta-analysis of all the available RCTs based on the selection of the user regarding the comparison and the outcome of interest. For vaccines, where few RCTs and only comparisons between vaccines and controls are available, it is sufficient to specify the type of vaccine under consideration and to obtain the results for all vaccines under this category. Intervention effects are estimated using four different effect measures depending on the research question and the outcome of interest. Specifically, in our primary analysis, for treatments, risk ratios (RR) are used for binary outcomes and hazard ratios (HR) for time-to-event outcomes and for vaccines, vaccine efficacy (VE) is used for efficacy outcomes and RR for safety outcomes with In the latter, RR may be the RR or the rate ratio. All background computations are based on T A B L E 1 List of all the outcomes and corresponding effect sizes incorporated in metaCOVID.

Outcome
Effect size Research question the R packages metafor and meta. 10,11 The source code of our application is open access through GitHub at https:// github.com/TEvrenoglou/metaCovid By default, metaCOVID uses the inverse-variance approach and a random-effects model where the between-study variance τ 2 is estimated with the restricted maximum likelihood method (REML). [12][13][14] This is because COVID-19 trials are generally heterogeneous in several characteristics and this heterogeneity needs to be taken into account in the synthesis model. Although RCTs with zero events in both intervention arms are excluded from the analysis, they are included in the forest plots for completeness and transparency. For RCTs with zero events in one arm only we use the 0.5 continuity correction. 12 Of course, the user can optionally change the default settings and select other estimation methods.
For example, when the number of available RCTs is small, the REML estimator for heterogeneity may underestimate the true between-study variance. 13 The REML method may also rarely fail to provide an estimate of the heterogeneity due to lack of convergence of the iterative process that it is based on. Therefore, within metaCOVID we provide as an option alternative estimators: the DerSimonian-Laird estimator, 15 the Paul-Mandel estimator, 16 the Sidik-Jonkman estimator, 17 the estimator that is based on empirical Bayes inference, 18 and finally the maximum likelihood estimator. 19 The different estimators may result in similar or different τ 2 estimates and, thus, it is useful to explore whether such differences impact the summary effect size estimated from the metaanalysis. The option of using a common-effect model assuming no heterogeneity across the RCTs of the metaanalysis is also available and may be used to monitor any differences in the summary results between the two models. Overall, the users should be careful when interpreting meta-analyses with only two or three RCTs since estimation of heterogeneity is always challenging in such small datasets. 13,20

| Investigating heterogeneity within metaCOVID
Every forest plot produced by metaCOVID contains information on the τ 2 estimate, the (two-sided) p value of the Q test for homogeneity, the I 2 index-here is obtained as study variance 21 -that gives the percentage of the total variation that is due to heterogeneity, 22 and the prediction interval. 22 The latter represents the 95% interval within which the effect size of a future study is expected to lie. These aid the user to infer on the presence or absence of important heterogeneity across the RCTs.
When important heterogeneity is suspected, potential explanations need to be investigated. Typically, prespecified subgroup analyses or meta-regressions are used to explore the impact of certain characteristics on the results. In metaCOVID, we only provide subgroup analyses since for most comparisons the number of available RCTs is not sufficient to perform meta-regression. For treatments, a pre-selected subgroup analysis based on the baseline severity of the RCT participants is conducted by default for every comparison. This reflects the assumption that certain treatments might be effective for participants with severe and/or critical disease but not for those with a milder disease. We have already seen that this is a reasonable assumption at the trial level with the example of corticosteroids in RECOVERY. 23 Additional optional subgroup analyses provided within the application are the type of the control intervention (standard care or placebo), the location of the study, the type of funding (public, mixed-private), and the presence or not of conflict of interest. Presentation of results without any subgroup is also possible. The latter might be particularly useful, for instance, when every severity subgroup has one RCT only and the sub-diamonds coincide with the RCT results. Every subgroup analysis is accompanied by the test for subgroup differences. This χ 2 test is similar to the Q test but tests the assumption of homogeneity across the subgroups. 24 2.4 | Assessing the robustness of results from metaCOVID Sensitivity analyses show how robust the results remain when excluding certain RCTs which are considered suspicious for being substantially different (methodologically or clinically) from the other RCTs. A typical example here is the exclusion of RCTs rated being at high risk of bias (RoB). In metaCOVID, we go one step further and on top of the exclusion of high RoB RCTs we also allow the exclusion of RCTs with "some concerns" for RoB. Overall, RoB is considered a key component within the application and, thus, the RoB assessments for all domains are presented along with the numerical results in our forest plots. However, here it should be noted that in meta-COVID the exclusion of trials is done only according to their overall RoB status. This is because in such an urgent situation it is very likely to compromise the quality of the studies in order to give priority to the rapidity. In addition, we allow the users to exclude the preprints and obtain the results only from the RCTs published in medical journals. This sensitivity analysis aims to reflect and explore the concerns from some researchers and guideline developers with respect to the reliability of preprints.
For treatments, we further allow for a sensitivity analysis related to the assumption about missing outcome data. In our primary analysis, we follow the conservative approach of the intention-to-treat principle and we use for every intervention arm the number of participants randomized. Hence, participants for whom the outcome is missing are included in the analysis as non-events. It is important to highlight that in the presence of missing outcome data any approach relies on untestable assumptions and it is prone to bias if these assumptions are not valid. 25,26 Consequently, the option of an available case analysis, where participants without the outcome are ignored, is also provided. The application further allows the calculation of the confidence interval (CI) through the Hartung-Knapp (HK) method, 27 which has been found previously to result into better type I errors compared to the DL method. 28 However, this method may yield very wide and non-informative CIs in the presence of only 2-3 studies. 29 Finally, given that for some outcomes such as mortality the events are generally rare, the use of the inversevariance model might not be optimal as results can be particularly imprecise or even biased. 30 Other more appropriate models for rare events are available in the literature and should be used to evaluate the sensitivity of the results to the model choice. Within metaCOVID we provide three alternative approaches: the Mantel-Haenszel (MH) method, the Peto method, and the recently introduced penalized-likelihood meta-analysis (PL-MA). [31][32][33] The key advantage of these approaches is that they avoid the normal approximations that introduce bias in the presence of rare events and they model directly the arm-level data of the RCTs. Apart from less biased, these approaches usually provide also more precise results. In contrast to all other approaches, the PL-MA has the additional advantage that it allows incorporating in the analysis the double-zero RCTs without any continuity correction. This is because it uses prior information for the probability of an event through the Jeffrey's invariant prior which was originally proposed by Firth. 33,34 However, the users should bear in mind that the Peto method and the PL-MA only provide odds ratio (OR) estimates; hence the comparison of the results between this sensitivity analysis and the primary analysis might not be straightforward.

| RESULTS
We illustrate through different examples from the COVID-NMA database how metaCOVID can assist clinicians and guideline developers to explore the impact of different characteristics and modeling approaches on the results of meta-analysis. We chose examples involving well-known treatments and vaccines as well as several studies. For treatments, we use throughout RCTs on hospitalized adult patients for the outcome of all-cause mortality at 28 days as it is probably the most objective outcome. For vaccines, we use an example for the outcome of any adverse events. A tutorial on the use of the application can be found in our Data S1. Figure 1 shows the results for the comparison convalescent plasma compared to standard care or placebo for all-cause mortality. There are 29 RCTs available with hospitalized patients and one of them includes only patients with mild disease. The summary RR suggests that there is probably no difference in efficacy between the experimental and the control intervention for this outcome. We can see quickly that this applies to mild or mixed in terms of baseline severity trial participants and that there is only one very small RCT at high RoB that has minimal impact (0.61% weight) on the results. Hence there are no concerns about the reliability of the results regarding RoB. However, the graph suggests some heterogeneity in the results of the studies that it is not reflected in the estimated b τ 2 ¼ 0. To explore this unexpected finding, we used also the other available estimators in metaCOVID. According to Table 2, the choice of the heterogeneity estimator has an impact on the results in this case. Specifically, all other estimators except maximum-likelihood suggest that there is some amount of heterogeneity across these RCTs with Sidnik-Jonkman suggesting that there might be even important heterogeneity b τ 2 ¼ 0:14, I 2 ¼ 82:1%, 95% prediction interval ¼ À 0:39, 1:84 ½ Þ . Table 2 also shows how the different heterogeneity estimators affect the point estimate of the summary RR and the lower limit of the CI. This sensitivity of the results to the choice of the estimator for heterogeneity should be taken into account when interpreting the results. Figure 2 focuses on the comparison between hydroxychloroquine and control for all-cause mortality and presents a subgroup analysis where the RCTs have been separated based on the control intervention (standard care or placebo). Interestingly, hydroxychloroquine seems to be clearly worse than standard care but results are more uncertain when it is compared to placebo. Although the test for subgroup differences suggest that there is not a statistically significant difference (p = 0.24), providing potential explanations for this difference is still important. Obvious candidate characteristics seem to be the use of preprints or RCTs at high risk of bias; nevertheless, excluding such RCTs and/or preprints does not impact on the results. We can see, though, that there are two large RCTs in the standard care group that affect a lot the results, while at the placebo group we miss RCTs with so large sample size. In such a case, a funnel plot could be further drawn to assist the conclusions.
Mortality is a generally rare outcome in the context of COVID-19. We conducted a sensitivity analysis for the comparison between favipiravir and standard care or placebo seven RCTs are available but only three have some events observed. This is a situation where the inverse-variance approach might be biased and other methods should be preferred. Table 3 shows that the three approaches (MH, Peto, and PL-MA) give very similar OR estimates and 95% CI, whereas the inverse-variance approach yields substantially smaller OR estimate and different 95% CI yet to the same direction. Figure 3 shows the comparisons between nonreplicating viral vector vaccines and control for the incident of any adverse events. As expected, the vaccines cause more adverse events than the control interventions.
All RCTs here but two are at low RoB, while these three RCTs with some concerns tend to give more favorable results for the two corresponding vaccines (AstraZeneca and Janssen) than other RCTs. Excluding these two RCTs has a small impact on the summary RR estimate for AstraZeneca but a large impact for Janssen that moves from 1.50 (0.92, 2.47) to 2.31 (1.80, 2.97). This change, though, should be interpreted with caution as, according to Figure 3, the study that was excluded for the Janssen vaccine is a very large RCT which might be more appropriate to estimate correctly the adverse events of the vaccine than a small RCT. Interestingly, the only study suggesting less adverse events for the vaccine (AstraZeneca) than the control involves two control interventions (i.e., another vaccine or placebo); as expected it might not be appropriate for this outcome to combine RCTs using active controls with RCTs using placebo as control.  In this paper, we present a new online application, called metaCOVID, that offers a user-friendly environment for performing living meta-analyses of RCTs for COVID-19 treatments and vaccines. This application is the core tool for updating regularly the analyses of the COVID-NMA platform, 35,36 but it is also used by external organizations, such as Cochrane South Africa and the UK National Institute for Health and Care Excellence (NICE), to supplement their process for forming recommendations. This application offers open access to the most-up-to date database of COVID-19 RCTs for researchers, clinicians, or guideline developers interested to perform amendable meta-analyses and explore the impact of certain characteristics on the results. Users can download and freely use the outputs from the application given that they recognize metaCOVID and the COVID-NMA platform.
A key characteristic of metaCOVID is that it gives the opportunity to the users to undertake fast multiple analyses and present complex data structures without requiring any technical and software knowledge. At the same time, it preserves from arbitrary selections of variables for performing subgroup and sensitivity analyses by following a pre-defined protocol. 1,2 The living approach implemented in metaCOVID does not apply only to the continuous incorporation of new RCTs but designates all aspects of the systematic review process; namely all considerations regarding data extraction, risk of bias assessments, data synthesis, and appraisal of results are reevaluated regularly based on the evolution of the data and the knowledge for the disease. Any proposed changes need to be accepted by the steering committee of the COVID-NMA. For instance, one important change that was decided when concerns were raised by methodologists of the steering committee and after several discussions with clinical experts was to resign from the plan of a network meta-analysis of all the COVID-19 treatments and focus on smaller network meta-analyses for treatments with similar characteristics. That was decided to avoid obtaining misleading and biased results from synthesizing highly heterogeneous interventions and populations and violating the assumptions. Therefore, metaCOVID is currently restricted to pairwise metaanalysis only.  The present version of metaCOVID is restricted to the use of the COVID-NMA database. This certainly limits the applicability of the tool to a different setting but future versions will allow the users to upload their data. Also, so far, the databases imported in metaCOVID cannot be downloaded; this will be possible, though, in future updates of application. To avoid large amounts of heterogeneity and misleading conclusions, we have further restricted the application to meta-analyses of RCTs. In this way, possibly important information from observation studies is disregarded. An additional limitation of our application is related to the use of only pre-defined subgroup and sensitivity analyses in the COVID-NMA protocol. However, this reassures that all selected variables have been selected based on scientific rationale and relevant clinical expertise. Finally, as with all similar tools, the ease that metaCOVID provides for performing complex analyses can also be a drawback if the application is used without cautiousness. Users should be aware that interpretation of meta-analysis results requires understanding of the data and the clinical setting, considerations on the credibility of the evidence, and investigation of all possible sources of within-and across-study biases. 7,37 Hence, although the application provides a user-friendly environment, it cannot guarantee the proper interpretation of the findings by the users.
Our application serves as a pilot version of a generalized application that could be tailored for living metaanalyses of any condition where the users will be able to upload their own data. Although, here the application is linked to the COVID-NMA platform, it can also be seen as a standalone tool since its use does not require any knowledge of the material available in the platform. All living evidence synthesis projects are highly resourceand time-demanding as they involve large amounts of data that need to be continuously updated and analyzed. Rapid and efficient automation tools are required for the sustainability of any living systematic review that aims to have a very short delay (i.e., a few weeks) between every update as well as to allow for immediate dissemination and access to the findings. There is a general tendency in evidence synthesis toward long-term projects and broader reviews covering several research questions; such projects are impossible to move forward without tailored software facilitating and accelerating the data manipulation and the analysis. In addition, online platforms and software applications promote open-access research provided that they are used carefully and results are interpreted with a global view of all the available information.
AUTHOR CONTRIBUTIONS Theodoros Evrenoglou drafted and revised the manuscript, he programmed the application and he maintains metaCOVID. Isabelle Boutron revised the manuscript and provided clinical expertize regarding the statistical analysis and the data extraction for metaCOVID. Georgios Seitidis revised the manuscript and wrote part of the code. Lina Ghosn revised the manuscript and performed F I G U R E 3 Forest plot produced from metaCOVID for the comparisons between non-replicating viral vector vaccines and control for the incident of any adverse events. [Colour figure can be viewed at wileyonlinelibrary.com] data extraction for metaCOVID. Anna Chaimani drafted and revised the manuscript, wrote part of the code and supervised the construction of the application.