Random-effects meta-analysis methods include an estimate of between-study heterogeneity variance. We present a systematic review of simulation studies comparing the performance of different estimation methods for this parameter. We summarise the performance of methods in relation to estimation of heterogeneity and of the overall effect estimate, and of confidence intervals for the latter. Among the twelve included simulation studies, the DerSimonian and Laird method was most commonly evaluated. This estimate is negatively biased when heterogeneity is moderate to high and therefore most studies recommended alternatives. The Paule–Mandel method was recommended by three studies: it is simple to implement, is less biased than DerSimonian and Laird and performs well in meta-analyses with dichotomous and continuous outcomes. In many of the included simulation studies, results were based on data that do not represent meta-analyses observed in practice, and only small selections of methods were compared. Furthermore, potential conflicts of interest were present when authors of novel methods interpreted their results. On the basis of current evidence, we provisionally recommend the Paule–Mandel method for estimating the heterogeneity variance, and using this estimate to calculate the mean effect and its 95% confidence interval. However, further simulation studies are required to draw firm conclusions. Copyright © 2016 John Wiley & Sons, Ltd.

]]>In a network meta-analysis, comparators of interest are ideally connected either directly or *via* one or more common comparators. However, in some therapeutic areas, the evidence base can produce networks that are disconnected, in which there is neither direct evidence nor an indirect route for comparing certain treatments within the network. Disconnected networks may occur when there is no accepted standard of care, when there has been a major paradigm shift in treatment, when use of a standard of care or placebo is debated, when a product receives orphan drug designation, or when there is a large number of available treatments and many accepted standards of care. These networks pose a challenge to decision makers and clinicians who want to estimate the relative efficacy and safety of newly available agents against alternatives. A currently recommended approach is to insert a distribution for the unknown treatment effect(s) into a network meta-analysis model of treatment effect. In this paper, we describe this approach along with two alternative Bayesian models that can accommodate disconnected networks. Additionally, we present a theoretical framework to guide the choice between modeling approaches. This paper presents researchers with the tools and framework for selecting appropriate models for indirect comparison of treatment efficacies when challenged with a disconnected framework. Copyright © 2016 John Wiley & Sons, Ltd.

In prognostic studies, a summary statistic such as a hazard ratio is often reported between low-expression and high-expression groups of a biomarker with a study-specific cutoff value. Recently, several meta-analyses of prognostic studies have been reported, but these studies simply combined hazard ratios provided by the individual studies, overlooking the fact that the cutoff values are study-specific. We propose a method to summarize hazard ratios with study-specific cutoff values by estimating the hazard ratio for a 1-unit change of the biomarker in the underlying individual-level model. To this end, we introduce a model for a relationship between a reported log-hazard ratio for a 1-unit expected difference in the mean biomarker value between the low-expression and high-expression groups, which approximates the individual-level model, and propose to make an inference of the model by using the method for trend estimation based on grouped exposure data. Our combined estimator provides a valid interpretation if the biomarker distribution is correctly specified. We applied our proposed method to a dataset that examined the association between the biomarker Ki-67 and disease-free survival in breast cancer patients. We conducted simulation studies to examine the performance of our method. Copyright © 2016 John Wiley & Sons, Ltd.

]]>Pairwise meta-analysis is an established statistical tool for synthesizing evidence from multiple trials, but it is informative only about the relative efficacy of two specific interventions. The usefulness of pairwise meta-analysis is thus limited in real-life medical practice, where many competing interventions may be available for a certain condition and studies informing some of the pairwise comparisons may be lacking. This commonly encountered scenario has led to the development of network meta-analysis (NMA). In the last decade, several applications, methodological developments, and empirical studies in NMA have been published, and the area is thriving as its relevance to public health is increasingly recognized. This article presents a review of the relevant literature on NMA methodology aiming to pinpoint the developments that have appeared in the field. Copyright © 2016 John Wiley & Sons, Ltd.

]]>When considering data from many trials, it is likely that some of them present a markedly different intervention effect or exert an undue influence on the summary results. We develop a forward search algorithm for identifying outlying and influential studies in meta-analysis models. The forward search algorithm starts by fitting the hypothesized model to a small subset of likely outlier-free studies and proceeds by adding studies into the set one-by-one that are determined to be closest to the fitted model of the existing set. As each study is added to the set, plots of estimated parameters and measures of fit are monitored to identify outliers by sharp changes in the forward plots. We apply the proposed outlier detection method to two real data sets; a meta-analysis of 26 studies that examines the effect of writing-to-learn interventions on academic achievement adjusting for three possible effect modifiers, and a meta-analysis of 70 studies that compares a fluoride toothpaste treatment to placebo for preventing dental caries in children. A simple simulated example is used to illustrate the steps of the proposed methodology, and a small-scale simulation study is conducted to evaluate the performance of the proposed method. Copyright © 2016 John Wiley & Sons, Ltd.

]]>When meta-analysing intervention effects calculated from continuous outcomes, meta-analysts often encounter few trials, with potentially a small number of participants, and a variety of trial analytical methods. It is important to know how these factors affect the performance of inverse-variance fixed and DerSimonian and Laird random effects meta-analytical methods. We examined this performance using a simulation study.

Meta-analysing estimates of intervention effect from final values, change scores, ANCOVA or a random mix of the three yielded unbiased estimates of pooled intervention effect. The impact of trial analytical method on the meta-analytic performance measures was important when there was no or little heterogeneity, but was of little relevance as heterogeneity increased. On the basis of larger than nominal type I error rates and poor coverage, the inverse-variance fixed effect method should not be used when there are few small trials.

When there are few small trials, random effects meta-analysis is preferable to fixed effect meta-analysis. Meta-analytic estimates need to be cautiously interpreted; type I error rates will be larger than nominal, and confidence intervals will be too narrow. Use of trial analytical methods that are more efficient in these circumstances may have the unintended consequence of further exacerbating these issues. © 2015 The Authors. Research Synthesis Methods published by John Wiley & Sons, Ltd.

Goodness of fit evaluation should be a natural step in assessing and reporting dose–response meta-analyses from aggregated data of binary outcomes. However, little attention has been given to this topic in the epidemiological literature, and goodness of fit is rarely, if ever, assessed in practice. We briefly review the two-stage and one-stage methods used to carry out dose–response meta-analyses. We then illustrate and discuss three tools specifically aimed at testing, quantifying, and graphically evaluating the goodness of fit of dose–response meta-analyses. These tools are the deviance, the coefficient of determination, and the decorrelated residuals-versus-exposure plot. Data from two published meta-analyses are used to show how these three tools can improve the practice of quantitative synthesis of aggregated dose–response data. In fact, evaluating the degree of agreement between model predictions and empirical data can help the identification of dose–response patterns, the investigation of sources of heterogeneity, and the assessment of whether the pooled dose–response relation adequately summarizes the published results. © 2015 The Authors. *Research Synthesis Methods* published by John Wiley & Sons, Ltd.

This paper investigates how inconsistency (as measured by the *I ^{2}* statistic) among studies in a meta-analysis may differ, according to the type of outcome data and effect measure. We used hierarchical models to analyse data from 3873 binary, 5132 continuous and 880 mixed outcome meta-analyses within the Cochrane Database of Systematic Reviews. Predictive distributions for inconsistency expected in future meta-analyses were obtained, which can inform priors for between-study variance. Inconsistency estimates were highest on average for binary outcome meta-analyses of risk differences and continuous outcome meta-analyses. For a planned binary outcome meta-analysis in a general research setting, the predictive distribution for inconsistency among log odds ratios had median 22% and 95% CI: 12% to 39%. For a continuous outcome meta-analysis, the predictive distribution for inconsistency among standardized mean differences had median 40% and 95% CI: 15% to 73%. Levels of inconsistency were similar for binary data measured by log odds ratios and log relative risks. Fitted distributions for inconsistency expected in continuous outcome meta-analyses using mean differences were almost identical to those using standardized mean differences. The empirical evidence on inconsistency gives guidance on which outcome measures are most likely to be consistent in particular circumstances and facilitates Bayesian meta-analysis with an informative prior for heterogeneity. © 2015 The Authors. Research Synthesis Methods published by John Wiley & Sons, Ltd.

When conducting research synthesis, the collection of studies that will be combined often do not measure the same set of variables, which creates missing data. When the studies to combine are longitudinal, missing data can occur on the observation-level (time-varying) or the subject-level (non-time-varying). Traditionally, the focus of missing data methods for longitudinal data has been on missing observation-level variables. In this paper, we focus on missing subject-level variables and compare two multiple imputation approaches: a joint modeling approach and a sequential conditional modeling approach. We find the joint modeling approach to be preferable to the sequential conditional approach, except when the covariance structure of the repeated outcome for each individual has homogenous variance and exchangeable correlation. Specifically, the regression coefficient estimates from an analysis incorporating imputed values based on the sequential conditional method are attenuated and less efficient than those from the joint method. Remarkably, the estimates from the sequential conditional method are often less efficient than a complete case analysis, which, in the context of research synthesis, implies that we lose efficiency by combining studies. Copyright © 2015 John Wiley & Sons, Ltd.

]]>Whilst it is common in clinical trials to use the results of tests at one phase to decide whether to continue to the next phase and to subsequently design the next phase, we show that this can lead to biased results in evidence synthesis. Two new kinds of bias associated with accumulating evidence, termed ‘sequential decision bias’ and ‘sequential design bias’, are identified. Both kinds of bias are the result of making decisions on the usefulness of a new study, or its design, based on the previous studies. Sequential decision bias is determined by the correlation between the value of the current estimated effect and the probability of conducting an additional study. Sequential design bias arises from using the estimated value instead of the clinically relevant value of an effect in sample size calculations. We considered both the fixed-effect and the random-effects models of meta-analysis and demonstrated analytically and by simulations that in both settings the problems due to sequential biases are apparent. According to our simulations, the sequential biases increase with increased heterogeneity. Minimisation of sequential biases arises as a new and important research area necessary for successful evidence-based approaches to the development of science. © 2015 The Authors. Research Synthesis Methods Published by John Wiley & Sons Ltd.

]]>An unobserved random effect is often used to describe the between-study variation that is apparent in meta-analysis datasets. A normally distributed random effect is conventionally used for this purpose. When outliers or other unusual estimates are included in the analysis, the use of alternative random effect distributions has previously been proposed. Instead of adopting the usual hierarchical approach to modelling between-study variation, and so directly modelling the study specific true underling effects, we propose two new marginal distributions for modelling heterogeneous datasets. These two distributions are suggested because numerical integration is not needed to evaluate the likelihood. This makes the computation required when fitting our models much more robust. The properties of the new distributions are described, and the methodology is exemplified by fitting models to four datasets. © 2015 The Authors. Research Synthesis Methods published by John Wiley & Sons, Ltd.

]]>We present an alternative to the contrast-based parameterization used in a number of publications for network meta-analysis. This alternative “arm-based” parameterization offers a number of advantages: it allows for a “long” normalized data structure that remains constant regardless of the number of comparators; it can be used to directly incorporate individual patient data into the analysis; the incorporation of multi-arm trials is straightforward and avoids the need to generate a multivariate distribution describing treatment effects; there is a direct mapping between the parameterization and the analysis script in languages such as WinBUGS and finally, the arm-based parameterization allows simple extension to treatment-specific random treatment effect variances.

We validated the parameterization using a published smoking cessation dataset. Network meta-analysis using arm- and contrast-based parameterizations produced comparable results (with means and standard deviations being within +/− 0.01) for both fixed and random effects models. We recommend that analysts consider using arm-based parameterization when carrying out network meta-analyses. © 2015 The Authors Research Synthesis Methods Published by John Wiley & Sons Ltd.

In this note, we clarify and prove the claim made Higgins *et al*. () that the design-by-treatment interaction model contains all possible loop inconsistency models. This claim provides a strong argument for using the design-by-treatment interaction model to describe loop inconsistencies in network meta-analysis. © 2015 The Authors. Research Synthesis Methods published by John Wiley & Sons, Ltd.

Meta-analysis of a survival endpoint is typically based on the pooling of hazard ratios (HRs). If competing risks occur, the HRs may lose translation into changes of survival probability. The cumulative incidence functions (CIFs), the expected proportion of cause-specific events over time, re-connect the cause-specific hazards (CSHs) to the probability of each event type. We use CIF ratios to measure treatment effect on each event type. To retrieve information on aggregated, typically poorly reported, competing risks data, we assume constant CSHs. Next, we develop methods to pool CIF ratios across studies. The procedure computes pooled HRs alongside and checks the influence of follow-up time on the analysis. We apply the method to a medical example, showing that follow-up duration is relevant both for pooled cause-specific HRs and CIF ratios. Moreover, if all-cause hazard and follow-up time are large enough, CIF ratios may reveal additional information about the effect of treatment on the cumulative probability of each event type. Finally, to improve the usefulness of such analysis, better reporting of competing risks data is needed. Copyright © 2015 John Wiley & Sons, Ltd.

]]>No abstract is available for this article.

]]>No abstract is available for this article.

]]>No abstract is available for this article.

]]>Bayesian statistical approaches to mixed treatment comparisons (MTCs) are becoming more popular because of their flexibility and interpretability. Many randomized clinical trials report multiple outcomes with possible inherent correlations. Moreover, MTC data are typically sparse (although richer than standard meta-analysis, comparing only two treatments), and researchers often choose study arms based upon which treatments emerge as superior in previous trials. In this paper, we summarize existing hierarchical Bayesian methods for MTCs with a single outcome and introduce novel Bayesian approaches for multiple outcomes simultaneously, rather than in separate MTC analyses. We do this by incorporating partially observed data and its correlation structure between outcomes through contrast-based and arm-based parameterizations that consider any unobserved treatment arms as missing data to be imputed. We also extend the model to apply to all types of generalized linear model outcomes, such as count or continuous responses. We offer a simulation study under various missingness mechanisms (e.g., missing completely at random, missing at random, and missing not at random) providing evidence that our models outperform existing models in terms of bias, mean squared error, and coverage probability then illustrate our methods with a real MTC dataset. We close with a discussion of our results, several contentious issues in MTC analysis, and a few avenues for future methodological development. Copyright © 2015 John Wiley & Sons, Ltd.

]]>To examine how effectively forwards citation searching with Web of Science (WOS) or Google Scholar (GS) identified evidence to support public health guidance published by the National Institute for Health and Care Excellence.

Forwards citation searching was performed using GS on a base set of 46 publications and replicated using WOS.

WOS and GS were compared in terms of recall; precision; number needed to read (NNR); administrative time and costs; and screening time and costs. Outcomes for all publications were compared with those for a subset of highly important publications.

The searches identified 43 relevant publications. The WOS process had 86.05% recall and 1.58% precision. The GS process had 90.7% recall and 1.62% precision. The NNR to identify one relevant publication was 63.3 with WOS and 61.72 with GS. There were nine highly important publications. WOS had 100% recall, 0.38% precision and NNR of 260.22. GS had 88.89% recall, 0.33% precision and NNR of 300.88. Administering the WOS results took 4 h and cost £88–£136, compared with 75 h and £1650–£2550 with GS.

WOS is recommended over GS, as citation searching was more effective, while the administrative and screening times and costs were lower. Copyright © 2015 John Wiley & Sons, Ltd.

Quasi-randomization might expedite recruitment into trials in emergency care settings but may also introduce selection bias.

We searched the Cochrane Library and other databases for systematic reviews of interventions in emergency medicine or urgent care settings. We assessed selection bias (baseline imbalances) in prognostic indicators between treatment groups in trials using true randomization versus trials using quasi-randomization.

Seven reviews contained 16 trials that used true randomization and 11 that used quasi-randomization. Baseline group imbalance was identified in four trials using true randomization (25%) and in two quasi-randomized trials (18%). Of the four truly randomized trials with imbalance, three concealed treatment allocation adequately. Clinical heterogeneity and poor reporting limited the assessment of trial recruitment outcomes.

We did not find strong or consistent evidence that quasi-randomization is associated with selection bias more often than true randomization. High risk of bias judgements for quasi-randomized emergency studies should therefore not be assumed in systematic reviews. Clinical heterogeneity across trials within reviews, coupled with limited availability of relevant trial accrual data, meant it was not possible to adequately explore the possibility that true randomization might result in slower trial recruitment rates, or the recruitment of less representative populations. © 2015 The Authors. *Research Synthesis Methods* published by John Wiley & Sons, Ltd.

Meta-analyses are typically used to estimate the overall/mean of an outcome of interest. However, inference about between-study variability, which is typically modelled using a between-study variance parameter, is usually an additional aim. The DerSimonian and Laird method, currently widely used by default to estimate the between-study variance, has been long challenged. Our aim is to identify known methods for estimation of the between-study variance and its corresponding uncertainty, and to summarise the simulation and empirical evidence that compares them. We identified 16 estimators for the between-study variance, seven methods to calculate confidence intervals, and several comparative studies. Simulation studies suggest that for both dichotomous and continuous data the estimator proposed by Paule and Mandel and for continuous data the restricted maximum likelihood estimator are better alternatives to estimate the between-study variance. Based on the scenarios and results presented in the published studies, we recommend the Q-profile method and the alternative approach based on a ‘generalised Cochran between-study variance statistic’ to compute corresponding confidence intervals around the resulting estimates. Our recommendations are based on a qualitative evaluation of the existing literature and expert consensus. Evidence-based recommendations require an extensive simulation study where all methods would be compared under the same scenarios. © 2015 The Authors. *Research Synthesis Methods* published by John Wiley & Sons Ltd Legal Statement.

Network meta-analysis enables the simultaneous synthesis of a network of clinical trials comparing any number of treatments. Potential inconsistencies between estimates of relative treatment effects are an important concern, and several methods to detect inconsistency have been proposed. This paper is concerned with the node-splitting approach, which is particularly attractive because of its straightforward interpretation, contrasting estimates from both direct and indirect evidence. However, node-splitting analyses are labour-intensive because each comparison of interest requires a separate model. It would be advantageous if node-splitting models could be estimated automatically for all comparisons of interest.

We present an unambiguous decision rule to choose which comparisons to split, and prove that it selects only comparisons in potentially inconsistent loops in the network, and that all potentially inconsistent loops in the network are investigated. Moreover, the decision rule circumvents problems with the parameterisation of multi-arm trials, ensuring that model generation is trivial in all cases. Thus, our methods eliminate most of the manual work involved in using the node-splitting approach, enabling the analyst to focus on interpreting the results. © 2015 The Authors Research Synthesis Methods Published by John Wiley & Sons Ltd.

In systematic reviews based on network meta-analysis, the network structure should be visualized. Network plots often have been drawn by hand using generic graphical software. A typical way of drawing networks, also implemented in statistical software for network meta-analysis, is a circular representation, often with many crossing lines. We use methods from graph theory in order to generate network plots in an automated way. We give a number of requirements for graph drawing and present an algorithm that fits prespecified ideal distances between the nodes representing the treatments. The method was implemented in the function netgraph of the R package netmeta and applied to a number of networks from the literature. We show that graph representations with a small number of crossing lines are often preferable to circular representations. Copyright © 2015 John Wiley & Sons, Ltd.

]]>