Volume 11, Issue 1
RESEARCH ARTICLE
Open Access

Random‐effects meta‐analysis of few studies involving rare events

Burak Kürsad Günhan

Corresponding Author

E-mail address: burak.gunhan@med.uni‐goettingen.de

Department of Medical Statistics, University Medical Center Göttingen, Göttingen, Germany

Correspondence

Burak Kürsad Günhan, Department of Medical Statistics, University Medical Center Göttingen, Göttingen, Germany.

Email: burak.gunhan@med.uni‐goettingen.de

Search for more papers by this author
Christian Röver

Department of Medical Statistics, University Medical Center Göttingen, Göttingen, Germany

Search for more papers by this author
Tim Friede

Department of Medical Statistics, University Medical Center Göttingen, Göttingen, Germany

Search for more papers by this author
First published: 26 July 2019
Citations: 7

Abstract

Meta‐analyses of clinical trials targeting rare events face particular challenges when the data lack adequate numbers of events for all treatment arms. Especially when the number of studies is low, standard random‐effects meta‐analysis methods can lead to serious distortions because of such data sparsity. To overcome this, we suggest the use of weakly informative priors (WIPs) for the treatment effect parameter of a Bayesian meta‐analysis model, which may also be seen as a form of penalization. As a data model, we use a binomial‐normal hierarchical model (BNHM) that does not require continuity corrections in case of zero counts in one or both arms. We suggest a normal prior for the log‐odds ratio with mean 0 and standard deviation 2.82, which is motivated (a) as a symmetric prior centered around unity and constraining the odds ratio within a range from 1/250 to 250 with 95% probability and (b) as consistent with empirically observed effect estimates from a set of 37 773 meta‐analyses from the Cochrane Database of Systematic Reviews. In a simulation study with rare events and few studies, our BNHM with a WIP outperformed a Bayesian method without a WIP and a maximum likelihood estimator in terms of smaller bias and shorter interval estimates with similar coverage. Furthermore, the methods are illustrated by a systematic review in immunosuppression of rare safety events following pediatric transplantation. A publicly available R package, MetaStan, is developed to automate a Bayesian implementation of meta‐analysis models using WIPs.

1 INTRODUCTION

Individual clinical studies are often underpowered to detect difference of probabilities or rates of rare events, for example, safety events, and thus, meta‐analysis may be the only way to obtain reliable evidence of treatment differences with regard to the rare events.1 On the other hand, meta‐analysis of clinical studies for rare events faces particular challenges, since the numbers of events might be very small in some treatment arms. The problem is even more pronounced when some studies have no events either in one or in both treatment arms (so‐called single‐zero or double‐zero studies).

The exclusion of the double‐zero studies from the analysis can bias the treatment effect parameter estimate away from the null (especially for the unbalanced design)2 and also causes loss of information, since double‐zero studies contain information through their sample sizes.3 Hence, we consider methods that do not remove double‐zero studies from the analysis. Two established fixed‐effect meta‐analysis methods exist for rare events, namely, Peto's method4 and the Mantel‐Haenszel (MH) method.5 On the other hand, an assumption of homogeneity, that is, a single common parameter for all studies, is typically unrealistic for studies in the biomedical sciences.6-8 Therefore, we focus on random‐effects methods in this paper.

Standard (approximate) random‐effects meta‐analysis methods, for example, the normal‐normal hierarchical model,9 require a continuity correction in case of single‐zero or double‐zero studies, that is, the addition of a fixed value (typically 0.5) to all cells of the contingency table for studies with no events or with 100% events (no nonevents). Such simple approaches have been found problematic for meta‐analyses involving rare events.10 Therefore, statistical models based on exact distributional assumptions have been suggested. These include different parametrizations of the binomial‐normal hierarchical model (BNHM),11 a mixed effects conditional logistic model,12 a Poisson‐normal hierarchical model,13 a Poisson‐Gamma hierarchical model,14 or a beta‐binomial model (BBM).3 In this paper, we focus on a parametrization of the BNHM that was suggested by Smith et al.15

Consider an extreme case of meta‐analysis of rare events, where all studies include no events for the same treatment arm. These data sparsity problem in a meta‐analysis can be seen as a separation problem in the logistic regression context16 in which case a maximum likelihood estimate (MLE) for the treatment effect parameter does not exist. A very useful way to deal with separation problems, or, more generally, data sparsity in logistic regression is penalization, that is, adding a penalty (adjustment) term to the original likelihood function to regularize (or stabilize) the estimates.17 In a frequentist framework, penalty terms may be specified so that these nudge the MLE into a desired direction if the maximum is not or poorly defined; one such example is Firth penalization.17-19 From a Bayesian viewpoint, penalization may often be motivated as weakly informative priors (WIPs) that are multiplied to the likelihood function.20

Numbers of studies included in meta‐analyses are typically small, posing additional challenges.21 For Bayesian meta‐analysis of few studies, different WIPs have been suggested for the heterogeneity parameter; see Chung et al22 for penalized MLE approach and also see Gelman23 and Friede and Röver24 for Bayesian inference. Here, we consider the meta‐analysis of few studies targeting rare events. To deal with data sparsity present in the meta‐analysis of few studies with rare events, we suggest the use of WIPs for the treatment effect parameter in a fully Bayesian context inspired by penalization ideas.17, 20 We use a BNHM that is parameterized in terms of baseline risks and a treatment effect for the data. Note that this is a contrast‐based model meaning that relative treatment effects are assumed to be exchangeable across trials.25 Our suggested default WIP for the treatment effect parameter is motivated via the consideration of the prior expected range of treatment effect values. Furthermore, it is consistent with effect estimates empirically observed in a large set of meta‐analyses from the Cochrane Database of Systematic Reviews (CDSR) with binary endpoints.

The main contribution of this paper is the introduction of default WIPs as penalization for treatment effect parameters to deal with data sparsity in the meta‐analysis of few studies involving rare events. Another contribution is the introduction of an R package, MetaStan(https://CRAN.R-project.org/package=MetaStan), which is developed to automate a Bayesian implementation of meta‐analysis models using WIPs as described in the paper and which is publicly available from CRAN. In Section 2, we describe a systematic review concerning rare safety events associated with immunosuppressive therapy following pediatric transplantation. In Section 3, we describe the application of WIPs for the treatment effect parameter. We review a BNHM for meta‐analysis, discuss the derivation of WIP, and an empirical investigation of treatment effect parameter estimates from the CDSR. Long‐run properties of different methods including the proposed one are investigated in the simulation studies in Section 5. In Section 6, the example is revisited to illustrate the proposed method and its implementation. We close with some conclusions and provide a discussion.

2 AN APPLICATION IN PEDIATRIC TRANSPLANTATION

Several rare pediatric liver diseases can nowadays be successfully treated by liver transplantation with good long‐term outcomes. Crins et al26 conducted a systematic review of controlled but not necessarily randomized studies of the Interleukin‐2 receptor antibodies (IL‐2RA) basiliximab and daclizumab in pediatric liver transplantation. Primary outcomes were acute rejections (ARs), steroid‐resistant rejections (SRRs), graft loss, and death. Their analyses were based on a random‐effects meta‐analysis using a restricted maximum likelihood approach(REML).27 Crins et al26 used risk ratios as effect measures, while we use the odds ratios here. With rare events, however, these should be very similar. Heterogeneity was assessed using Cochrane's Q test.28 Secondary outcomes included renal dysfunction nt lymphoproliferative disease (PTLD). For illustrative purposes, here, we focus on death and PTLDs, and these outcomes are displayed in Table 1.

Table 1. Data on patient deaths and posttransplant lymphoproliferative disease (PTLD) from the meta‐analysis in pediatric transplantation conducted by Crins et al26
Outcome: Death Outcome: PTLD
Control Experimental Control Experimental
Events Total Events Total Events Total Events Total
Heffron et al29 3 20 4 61
Schuller et al30 0 12 0 18
Ganschow et al31 3 54 1 54 0 54 1 54
Spada et al32 3 36 4 36 1 36 1 36
Gras et al33 3 34 2 50

The specific problems with meta‐analyses concerning rare events outlined in the introduction are prominent here. Firstly, the numbers of events are very small. For the PTLD dataset, there is one single‐zero study and one double‐zero study. Secondly, there are few studies available, only four for deaths and three for PTLD. Empirical event rates are lower in three of the four experimental groups in the data on patient deaths. For PTLD, the data appear inconclusive for Schuller et al30 and Spada et al,32 and only a single event observed in the experimental group suggests an increased risk in the study by Ganschow et al.31

3 WIPS FOR THE TREATMENT EFFECT

In this section, we present the usage of WIPs for the treatment effect parameter to conduct random‐effects meta‐analysis of rare events with few studies. As a data model, we review a BNHM and then show how to derive a WIP for a treatment effect parameter. Then, empirical evidence obtained from the CDSR supporting the choice of WIPs is illustrated.

3.1 Data model

The BNHM has been introduced by Smith et al.15 In the BNHM, for each trial i∈{1,…,k} and treatment arm j∈{0,1}, the event counts rij are modeled using a binomial distribution, that is, rij∼Bin(πij,nij). The logit link is used to transform πij onto the log odds scale where effects can be assumed to be additive
urn:x-wiley:jrsm:media:jrsm1370:jrsm1370-math-0001(1)
where xij is a treatment indicator, namely, +0.5 = experimental (j = 1) and −0.5 = control (j = 0). The μi are fixed effects denoting the baseline risks of the event in each study i, θ is the mean treatment effect, and τ is the heterogeneity in treatment effects between trials. The BNHM belongs to the family of generalized linear mixed models (GLMMs); this family also includes models for other types of data including continuous or count outcomes. It is important to note here that by treating the baseline risks μi as fixed effects, the analysis effectively stratifies the risk by study, as pooling of risks might compromise the studies' randomization. In this sense, it constitutes a contrast‐based model.25 Unlike the normal‐normal hierarchical model, the BNHM does not rely on a normal approximation, since it builds on the binomial nature of the data directly.

The BNHM can be fitted using frequentist approaches, for example, via maximum likelihood estimation (MLE).11 Alternatively, Bayesian methods are commonly used. In a fully Bayesian approach, prior distributions for parameters θ, μi, and τ need to be specified. Note that the parameter θ is on the log‐odds ratio scale whereas μi are on the log‐odds scale. Baseline risks (μi) may be seen as intercept parameters in a standard logistic regression model. For μi, we use a vague normal prior with mean 0 and standard deviation 10, following the recommendation by Gelman et al.20 The prior choice for θ is our main focus and will be discussed in Section 3.2. The prior choice for the heterogeneity parameter τ, which is a standard deviation parameter, has gained much attention in the literature as discussed in the introduction. Friede et al24 have shown that for meta‐analysis of few studies, the use of WIPs for τ displays desirable long‐run properties in comparison with frequentist alternatives. Following their suggestions, we use a half‐normal prior with scale of 0.5 ( urn:x-wiley:jrsm:media:jrsm1370:jrsm1370-math-0002) for τ which has the median of 0.337 with an upper 95% quantile of 0.98. Values for τ of 0.25, 0.5, 1, and 2 represent moderate, substantial, large, and very large heterogeneity.34 Thus, a urn:x-wiley:jrsm:media:jrsm1370:jrsm1370-math-0003 prior captures heterogeneity values for log‐OR typically seen in meta‐analyses of log‐ORs and will therefore be a sensible choice in many applications.

3.2 Derivation of a WIP for the treatment effect

A common prior choice for the treatment effect parameter θ is a noninformative (vague) prior such as normal distribution with a large variance, for example, urn:x-wiley:jrsm:media:jrsm1370:jrsm1370-math-0004. One way of constructing a WIP works via consideration of the prior expected range of treatment effect values.35 Before the derivation of the WIP for treatment effect parameter θ, recall that θ is on the log‐odds ratio scale. Thus, a value of θ=0 means an odds ratio of 1, ie, no effect, and a value of θ=1 means that odds differ by a factor (ratio) of exp(1)=2.7.

We assume a symmetric prior centered around zero, implying equal probabilities for positive or negative treatment effects. Symmetry then implies (on the log‐odds ratio scale) that
urn:x-wiley:jrsm:media:jrsm1370:jrsm1370-math-0005
 
urn:x-wiley:jrsm:media:jrsm1370:jrsm1370-math-0006(2)
 where (on the odds ratio scale)
urn:x-wiley:jrsm:media:jrsm1370:jrsm1370-math-0007
 
urn:x-wiley:jrsm:media:jrsm1370:jrsm1370-math-0008(3)
 
The prior's scale parameter σprior then may be set such that a priori the odds ratio is with 95% probability confined to a certain range:
urn:x-wiley:jrsm:media:jrsm1370:jrsm1370-math-0009
 
urn:x-wiley:jrsm:media:jrsm1370:jrsm1370-math-0010(4)
 
In case of a normal prior with standard deviation σprior, we can then simply specify
urn:x-wiley:jrsm:media:jrsm1370:jrsm1370-math-0011
 
urn:x-wiley:jrsm:media:jrsm1370:jrsm1370-math-0012(5)
 

We conservatively specify δ as 250, meaning that we consider it unlikely that the odds ratio will be larger than 250 or smaller than 1/250. By plugging in this number into 5, we obtain σprior=2.82.

Another way to motivate the prior standard deviation is by using the idea of unit information priors.36, 37 When the treatment effect parameter is on the log‐odds ratio scale (as in the BNHM), then the standard error is given by urn:x-wiley:jrsm:media:jrsm1370:jrsm1370-math-0013. Assuming equal allocation, a neutral effect, and equal counts of events and nonevents, we can simply set the table allocation to urn:x-wiley:jrsm:media:jrsm1370:jrsm1370-math-0014. Therefore, if we (heuristically) reverse the argument, a prior for the log‐odds ratio with zero mean and 2.82 standard deviation gives37
urn:x-wiley:jrsm:media:jrsm1370:jrsm1370-math-0015
 
urn:x-wiley:jrsm:media:jrsm1370:jrsm1370-math-0016(6)
 

Hence, N=2. In other words, in terms of prior's effective sample size, the choice of σprior=2.82 is equivalent to adding two patients to the dataset. From this, it follows that urn:x-wiley:jrsm:media:jrsm1370:jrsm1370-math-0017 is a reasonable choice as a WIP for θ.

Note also the analogy between this WIP and commonly used continuity corrections: Zero entries in a contingency tables are commonly  fixed  by adding a correction term of 0.5 to each table cell of the single‐zero or double‐zero study, which also amounts to a total of two patients  added  to the data. This way of conducting continuity correction adds two patients to each single‐zero or double‐zero study, while the use of WIP is equivalent to adding two patients to the whole dataset.

3.3 Empirical evidence supporting the WIP for the treatment effect

For an empirical investigation of the WIP for treatment effect parameter, we consider the meta‐analysis datasets archived in the CDSR. All systematic reviews in the CDSR are available on the Cochrane Library website,38 and personal or institutional access is required. For downloading the data from the CDSR and converting to CSV files, we use the program Cochrane_scraper (version 1.1.0).39 We were able to access all Cochrane systematic reviews available in March 2018 (CD000004 to CD012788). Meta‐analyses were excluded if they included only one study, if the analysis was labelled as a subgroup or sensitivity analysis or there was insufficient information for classification, or if all data within the meta‐analysis appeared to be erroneous. Finally, we only consider meta‐analyses with dichotomous outcomes. In total, 37 773 meta‐analysis datasets from 4712 reviews are included. Note that we did not distinguish regarding efficacy or safety analyses.

The frequency of the number of studies k considered for each meta‐analysis is illustrated in Figure 1. The percentage of the meta‐analyses including five or less studies is 66%. This figure is consistent with other re‐analyses of the CDSR (see, eg, previous works8, 21, 40). We re‐analyzed the meta‐analysis datasets from the CDSR using the BNHM via an MLE approach. This procedure is implemented using the R package lme4.41 A histogram of the estimates of θ is illustrated in Figure 2A; 2.5% and 97.5% quantiles of the estimates of θ are −1.94 and 2.06, respectively. By following Turner et al,40 we exclude the zero heterogeneity estimates; nonzero estimates of τ are shown in Figure 2B. The fraction of nonzero heterogeneity estimates is 63%, which is also consistent with previous findings.40 The 95% quantile of nonzero estimates of τ is 1.51, while the 95% quantile of τ estimates including zeroes is 1.05. The underlying distribution of the estimates of θ and τ and their variability are useful to see how large these estimates are in some general population, in this case the CDSR. Thus, these give us a rough sense of what would be a reasonable default prior distribution. Therefore, we suggest the use of WIPs, urn:x-wiley:jrsm:media:jrsm1370:jrsm1370-math-0018 for θ and urn:x-wiley:jrsm:media:jrsm1370:jrsm1370-math-0019 for τ, which are consistent with estimates of θ and τ empirically observed in the CDSR, meaning that both indicate odds ratios within reasonable ranges, and heterogeneity mostly below 1.0.

jrsm1370-fig-0001
The distribution of numbers of studies included in each meta‐analysis obtained from the Cochrane Database of Systematic Reviews (CDSR). The category labelled 19+ corresponds to meta analyses of size 19 or larger [Colour figure can be viewed at wileyonlinelibrary.com]
jrsm1370-fig-0002
The distribution of the estimates of the mean treatment effect parameter θ A, and the distribution of the estimates of the (nonzero) heterogeneity standard deviation parameter ⊤ B, obtained from the reanalysis of meta‐analysis datasets in Cochrane Database of Systematic Reviews (CDSR) when the binomial‐normal hierarchical model (BNHM) via maximum likelihood estimate (MLE) is used for estimation. In A, two red lines (−1.94 and 2.06) show the 2.5% and 97.5% quantiles of the θ estimates, respectively. In B, the solid red line (1.05) and the dashed red line (1.51) indicate the 95% quantiles of the ⊤ estimates including zero‐estimates and excluding zero‐estimates, respectively. The fraction of zero‐estimates of ⊤ is 63% [Colour figure can be viewed at wileyonlinelibrary.com]

4 IMPLEMENTATION OF THE PROPOSED PROCEDURE IN R USING STAN

The Bayesian implementation of the BNHM can be fitted with the probabilistic programming language Stan,42 which employs a modern Markov chain Monte Carlo (MCMC) algorithm, namely, Hamiltonian Monte Carlo with the No‐U‐Turn Sampler.43 It is known that the parametrization of the model can affect the performance of an MCMC algorithm. In the presence of sparse data such as in the meta‐analysis of few studies involving rare events, Betancourt et al44 showed that centered parametrization of a hierarchical model (such as the BNHM) brings computational issues compared with a noncentered parametrization. Thus, we use the noncentered reparametrized version of the BNHM for our implementations. Specifically, applying both location and scale reparametrization, 1 becomes μi + θ xij + ui τxij where urn:x-wiley:jrsm:media:jrsm1370:jrsm1370-math-0020 and xij=+0.5 (experimental) or xij=−0.5 (control). (Correction added on 06 January 2020, after first online publication: The preceding equation has been updated from “μi + θi xij + ui τ2” to “μi + θ xij + ui τ xij”.)

For practical applications, learning Stan's syntax and the required knowledge of available features in Stan might present a hurdle preventing application of Stan. To this end, we developed a new R package MetaStan which is a purpose‐built package defined on top of Rstan, the R interface for Stan. Our package MetaStan(https://CRAN.R-project.org/package=MetaStan) includes the precompiled Stan model of the BNHM, which eliminates the compilation time and the need of learning Stan's syntax. The Stan code for the BNHM is shown in Listing 1. MetaStan includes different options for WIPs of the model parameters of the BNHM. MetaStan syntax is similar to the syntax of the popular meta‐analysis package metafor27 so that it should be easy for a metafor user to utilize our package. The syntax of MetaStan is displayed for the pediatric transplantation example in Section 6, and in Appendix A, we show how to install and use MetaStan.

5 SIMULATION STUDY

In order to assess the long‐run properties of the proposed approach and compare it with some alternatives, we conducted a simulation study.

5.1 Simulation setup

The simulation scenarios are similar to those considered by Friede et al,24 but with the important difference that we focus on rare events. The datasets are generated under the BNHM, more specifically 1. Numbers of studies (k∈{2,3,5}) and true treatment effects (θ={−5,−4,−3,−2,−1,−0.5,0,0.5,1,2,3,4,5}) are varied, resulting in a total of 39 simulation scenarios. To reflect the rare‐event cases, true baseline risks on the probability scale are taken uniformly between 0.005 and 0.05. Following Kuss,3 a log‐normal distribution is fitted to the sample sizes obtained from the CDSR data, resulting in a log‐normal distribution with parameters μ=5 and σ=1. Hence, sample sizes are generated from urn:x-wiley:jrsm:media:jrsm1370:jrsm1370-math-0021, the minimum sample size is restricted to two patients (values below 2 are rounded up to 2), and at least one patient in each treatment arm is assumed. The degree of heterogeneity (τ) is taken as τ=0.28 (moderate heterogeneity), which is the median value of the predictive distribution for between‐study heterogeneity in a meta‐analysis in a general setting as estimated by Turner et al.40 According to a binomial probability of 0.5, patients were allocated to the treatment groups, thus mimicking randomization. The simulations were carried out with 10 000 replications per scenario. The data sparsity is reflected in the average fractions of single‐zero or double‐zero studies in a simulated meta‐analysis dataset, which are shown in Figure 3A. Notice that the fractions of the single‐zero and double‐zero studies are the highest when true treatment effect is −5, and they are decreasing with the increase of the treatment effect.

jrsm1370-fig-0003
The average fraction of single‐zero or double‐zero studies in a simulated meta‐analysis dataset A, and the fraction of the estimation failure for maximum likelihood estimate (MLE) and Mantel‐Haenszel (MH) with different numbers of studies k used in the simulations (B and C) are shown [Colour figure can be viewed at wileyonlinelibrary.com]
The proposed approach (BNHM using a WIP, that is urn:x-wiley:jrsm:media:jrsm1370:jrsm1370-math-0022, for θ:  WIP ) and four comparators are included in the analysis, namely, BNHM using a vague prior ( urn:x-wiley:jrsm:media:jrsm1370:jrsm1370-math-0023) for θ ( Vague ), BNHM using MLE ( MLE ), the Mantel‐Haenszel ( MH ) method5 and a Bayesian implementation of the beta‐binomial model ( BBM ).3 It is important to note the differences of the MH and BBM from the BNHM methods. MH is a fixed‐effect meta‐analysis method, and BBM has a different underlying data generating process than the BNHM. For both Vague and WIP approaches, the prior for τ and μ are taken as urn:x-wiley:jrsm:media:jrsm1370:jrsm1370-math-0024, and urn:x-wiley:jrsm:media:jrsm1370:jrsm1370-math-0025, respectively. The MH estimator of the treatment effect parameter is given by
urn:x-wiley:jrsm:media:jrsm1370:jrsm1370-math-0026
 
urn:x-wiley:jrsm:media:jrsm1370:jrsm1370-math-0027
 where ni=ni0+ni1. In the BBM, the event counts rij are modeled using a binomial distribution, rij∼Bin(πij,nij), as in the BNHM. The probabilities of event are assumed to be beta distributed: πij∼Beta(αj,βj) where both arms share the same correlation parameter urn:x-wiley:jrsm:media:jrsm1370:jrsm1370-math-0028, implying α0+β0=α1+β1. It is common to reparametrize the model using mean parameters Φj such that urn:x-wiley:jrsm:media:jrsm1370:jrsm1370-math-0029. Finally, the linear predictor can be written as logit(Φj)=μ+θxj  where θ is the parameter for the treatment effect, and xj is a treatment indicator, 1 = experimental (j = 1) and 0 = control (j = 0). Vague priors are chosen for all parameters, namely, uniform priors across the interval [0,1] for all three parameters: Φ0, Φ1, and ρ.

Three MCMC chains were run in parallel for a total of 2000 iterations including 1000 iterations of burn‐in. These values are tested in some replications; convergence diagnostics are assessed and chosen accordingly. All chains were assumed to have reached convergence (no estimation failure). We used the package lme4 for the MLE (using the adaptive Gauss‐Hermite approximation to the maximum log‐likelihood) and metafor for the MH (without using any continuity corrections) whereas the Vague, WIP, and BBM methods were fitted with our MetaStan package. Note that we use highest density intervals (HDI), which are the shortest credible intervals, as opposed to the commonly used equal‐tailed credible intervals. The HDI were obtained using the HDInterval45 package. All computations were performed using R.46 The code for the computations for all methods used in the simulations is provided in Appendices A to D-A to D.

5.2 Simulation results

For the MLE and the MH, the fractions of estimation failures are shown in Figure 3B and  3C. Estimation failure occurred for the MLE when the Gauss‐Hermite approximation does not converge to the maximum log‐likelihood, or when the MH estimator is not defined. The MLE and MH methods show very similar behavior of the estimation failure. Estimation failure is closely related to the fraction of meta‐analysis datasets including single‐zero or double‐zero studies in the dataset, which can be seen by comparing Figure 3A and 3B,C. This is because when the data are highly sparse, estimation becomes more challenging for both MLE and MH. As a performance measure, we use the bias ( urn:x-wiley:jrsm:media:jrsm1370:jrsm1370-math-0030) based on the MLE, the MH estimator, and posterior medians. The direction of the bias is also important, since depending on the nature of the outcome (safety or efficacy), a positive or a negative bias may be considered  conservative . Moreover, the coverage probability and the mean length of interval estimates for θ are reported. The coverage probability of  95% for interval estimates and shorter interval estimates are desirable.

The bias of posterior medians from the Vague, the WIP and the BBM, and for MLEs from the MLE and for the MH estimator from the MH across scenarios is displayed in the first row of Figure 4. Note that failed runs were excluded from the calculation of performance measures that is relevant only for MLE and MH. The MLE shows unacceptably high bias for the scenarios with θ ≤ 0, corresponding to the scenarios in which the fraction of zero studies is also very high. On the other hand, the MH estimator clearly outperforms the MLE and exhibit bias very close to the WIP. The WIP displays somewhat positive bias whereas the Vague shows negative bias for the scenarios with θ ≤ 0. This behavior of WIP is expected, since the WIP shrinks the posterior towards zero. For safety analyses, a positive bias commonly means a more conservative behavior and may hence be considered less harmful than a negative bias. It is important to note that the results of the bias behave similar to the fraction of zero studies and the fraction of estimation failure of the MLE, meaning that the bias is higher in scenarios with more sparse data. Since the Vague approach uses a vague prior on θ, one might expect a somewhat similar behavior of bias from the Vague and the MLE approaches. However, the fact that the Vague approach includes a WIP for τ and that estimation is based on integration rather than maximization may be explanations of the better performance of the Vague method in comparison with the MLE. The WIP and the MH outperforms the BBM in terms of bias across all scenarios. Performance in terms of bias is improving for all methods when the number of studies k is increasing. For Figures 3 and 4, the curves are not symmetric around zero. This asymmetry is due to the fact that while the true treatment effect (log‐OR) is varied between −5 and +5, the true baseline risk (probability) is drawn uniformly between 0.005 and 0.05 in the simulations.

jrsm1370-fig-0004
The bias for the mean treatment effect θ, coverage probabilities, and log mean length of the interval estimates for θ obtained by five methods (beta‐binomial model [BBM], Mantel‐Haenszel [MH], maximum likelihood estimate [MLE], Vague, and weakly informative prior [WIP]) are shown [Colour figure can be viewed at wileyonlinelibrary.com]

Figure 4 also shows coverage probabilities and log mean lengths for 95% HDI obtained by the Vague, the WIP, the BBM, and for 95% Wald confidence intervals (CIs) obtained by the MLE and the MH. The CI and HDI obtained by the MH and the BBM show unacceptably low coverage especially for θ<−2. However, the undercoverage of the BBM and somewhat relative poor performance in terms of bias may stem from the fact that data are generated under the BNHM. Also, the CI obtained by the MLE displays low coverage especially for k=5. We will revisit the coverage of the MLE in the discussion. The WIP method shows higher coverage than nominal level across all different true treatment effects except for θ=−5. On the other hand, the HDI obtained by WIP are shorter in comparison with HDI obtained by the Vague and CI obtained by the MLE approaches.

Lastly, the bias for the heterogeneity parameter τ obtained by three methods (the MLE, the Vague, and the WIP) are demonstrated in Figure 5. For Bayesian methods, posterior medians are used as the point estimates. Recall that the prior used for τ both in the Vague and the WIP is weakly informative ( urn:x-wiley:jrsm:media:jrsm1370:jrsm1370-math-0031). The MLE underestimates the true heterogeneity, whereas the Vague and the WIP methods slightly overestimate it. The Vague and the WIP produce very similar bias. These observations are in alignment with the conclusions made by Friede et al.24

jrsm1370-fig-0005
The bias for the heterogeneity parameter ⊤ obtained by three methods (maximum likelihood estimate [MLE], Vague, and weakly informative prior [WIP]) is shown. True heterogeneity standard deviation is assumed to be ⊤=0.28 [Colour figure can be viewed at wileyonlinelibrary.com]

6 EXAMPLE REVISITED

Returning to the dataset described in Section 2, we consider the data on death and PTLD outcomes shown in Table 1. The observed log‐odds ratios are displayed in Figures 6 and 7. To be able to visualize the observed log‐odds ratios when there is a single‐zero or double‐zero study, a continuity correction of 0.5 is added to all cells of the single‐zero or double‐zero study's contingency table. The wide CI for observed log‐odds ratios reflect the rather small sample sizes in the datasets. Furthermore, the variability in the point estimates may be reflected upon to assess the degree of heterogeneity between trial estimates.

jrsm1370-fig-0006
The motivating pediatric transplantation application when the outcome is death: Top panel displays the observed log‐odds ratios (computed using a continuity correction in case of zero counts). The bottom panel shows mean treatment effect estimates of θ obtained by beta‐binomial model (BBM), Mantel‐Haenszel (MH), maximum likelihood estimate (MLE), Vague, and weakly informative prior (WIP). Heterogeneity parameter estimates ⊤ are also given on the left [Colour figure can be viewed at wileyonlinelibrary.com]
jrsm1370-fig-0007
The motivating pediatric transplantation application when the outcome is posttransplant lymphoproliferative disease (PTLD): Top panel displays the observed log‐odds ratios (computed using a continuity correction in case of zero counts). The bottom panel shows mean treatment effect estimates of θ obtained by beta‐binomial model (BBM), Mantel‐Haenszel (MH), maximum likelihood estimate (MLE), Vague, and weakly informative prior (WIP). Heterogeneity parameter estimates ⊤ are also given on the left [Colour figure can be viewed at wileyonlinelibrary.com]
We analyze the datasets using the five methods investigated in the simulation studies, namely, the Vague, WIP, MLE, MH, and BBM approaches. The code to implement the MLE and the MH are given in Appendix B. Recall that the only difference between Vague and WIP is the prior used for the treatment effect parameter θ in the model, namely, urn:x-wiley:jrsm:media:jrsm1370:jrsm1370-math-0032 for the former and urn:x-wiley:jrsm:media:jrsm1370:jrsm1370-math-0033 for the latter. WIP can be implemented in a routine data analysis using our MetaStan package as follows:

The argument delta corresponds to δ from 5 and thus is used to calculate the WIP for θ. Alternatively, one can directly specify the prior parameters for θ, in our case, equivalently, we can have theta_prior = c(0, 2.82). The Vague method is simply implemented by omitting the argument delta and specifying theta_prior = c(0, 100). The BBM is also implemented in MetaStan, and the required syntax is shown in Appendix C. To check MCMC convergence, we use the Gelman‐Rubin statistics and traceplots. For the WIP approach, the corresponding traceplots are shown in Figures A1 and A2 for death and PTLD outcomes, respectively. There was no divergence reported for both datasets. The MLE fit and the MH estimation for the dataset where death is the outcome does not cause any warning from lme4 and metafor, respectively. For the PTLD outcome, lme4 gives a warning suggesting that the estimates may not be reliable. Nevertheless, it produces the MLE estimate and CI for treatment effect parameter, and we report them. For PTLD outcomes, when computing the MH estimator, metafor gives a warning due to double‐zero studies (double‐zero studies are removed from the analysis by default) but still returns an estimate. Note that both MLE and MH ignore the double‐zero study (Schuller et al30); hence, the analyses are based on two studies only.

The results for the death and PTLD outcomes from the five methods are shown in Figures 6 and 7, respectively. For MLE and MH, the estimates and 95% CI are given. For Vague, WIP, and BBM, posterior medians and 95% HDI are shown. Both for PTLD and death outcomes, apart from the BBM, the point estimates of θ from the four methods look quite similar. The differing behavior of the BBM was also observed in the simulations. The PTLD data are similar to the scenarios when the number of studies is three, and the true treatment effect is in the range from 0 to 1. Negative bias obtained by the BBM can be seen in Figure 5 (in the corresponding scenario). The death data are similar to the scenarios when number of studies is five (since it is not highly sparse), and true treatment effect is in the range from −1 to 0. Here, positive bias obtained by the BBM can be seen in Figure 5 (in the corresponding scenario). Furthermore, the point estimates obtained by the WIP and the MH are very close as in the simulations. MLE gives shorter interval estimates compared with Bayesian alternatives, this is (partly) because τ was estimated as 0. In the original paper, Crins et al26 fitted a normal‐normal hierarchical model using REML,27 and the risk ratio was used as the measure of the treatment effect. They concluded that treatment IL‐2RA failed to show statistically significant result for reducing death. We obtained similar point estimates with somewhat wider interval estimates to Crins et al,26 specifically their risk ratio estimate was 0.61 (CI, 0.27‐1.37), and we obtained the odds ratio estimate 0.58 (HDI 0.20‐1.49) using the WIP method. Concerning the PTLD, the risk ratio was estimated as 1.60 (CI, 0.20‐12.67) by Crins et al,26 the odds ratio is estimated 1.98 (HDI 0.18‐25.18) using the WIP method. The wider interval estimates obtained by WIP may stem from the fact that the uncertainty in τ is taken into account.

The estimates of the between‐trial heterogeneity urn:x-wiley:jrsm:media:jrsm1370:jrsm1370-math-0034 are also included in the figure, which are only available for the Vague, WIP, and MLE. Considering death as outcomes, the heterogeneity parameter τ is estimated 0.29, 0.29, and 0.00 using WIP, Vague, and MLE, respectively. Similarly for the PTLD outcomes, for urn:x-wiley:jrsm:media:jrsm1370:jrsm1370-math-0035, we obtained 0.33, 0.33, and 0.00 using WIP, Vague, and MLE, respectively. The heterogeneity parameter of the BBM ρ is estimated as 0.34 and 0.03 for PTLD and death outcomes, respectively. Moreover, Crins et al26 concluded that there is no evidence for heterogeneity between trials using using Cochrane's Q test for both death and PTLD outcomes. Since the prior used for τ is the same for WIP and Vague, similar heterogeneity estimates are expected. The similar τ estimates by WIP and Vague were also observed in the simulations (Figure 5). On the other hand, the MLE estimate ( urn:x-wiley:jrsm:media:jrsm1370:jrsm1370-math-0036) is most probably underestimating the actual amount of heterogeneity. The underestimation of τ by MLE and slightly lower bias of the WIP compared with the Vague was observed in the simulations (Figure 5).

7 CONCLUSIONS AND DISCUSSION

An assumption of the homogeneity is  often considered unrealistic for meta‐analyses in biomedical sciences; hence, random‐effects meta‐analysis models are suggested.6 Furthermore, as can be seen in the CDSR, a substantial fraction of published meta‐analyses is based on few studies only. On the other hand, fitting a random‐effects models based on only few studies often poses problems for inference, as certain asymptotics cannot be relied upon.47 Additional issues arise for binary outcomes when only few or no events are observed in some of the studies or study arms. To deal with such data sparsity in the meta‐analysis, we have proposed the use of WIPs for the treatment effect parameter θ in a BNHM. We demonstrated how a normal WIP for θ can be derived by considering an a priori interval for the treatment effect on a log‐odds ratio scale. Also, the empirical evidence obtained from 37 773 meta‐analyses with binomial outcomes from the CDSR supports the proposed WIP. In simulation studies, the suggested method displays lower bias for θ and substantially shorter interval estimates for θ with somewhat higher coverage than nominal level in comparison to alternative methods.

The use of a Bayesian approach exhibits analogy of some degree to the use of continuity corrections. While continuity corrections might to some extent be perceived as ad hoc makeshift fixes, they have also quite doubtlessly proven very useful in practice. A Bayesian approach tackles the problem from a very different angle, but it is not so surprising that the resulting procedure again exhibits some similarity to continuity corrections. The relation to current common practice may in fact also be seen as somewhat comforting. Use of an (informative) prior within a Bayesian analysis on the other hand is not a desperate measure; it is rather an integral part of a coherent model specification that may also be subjected to checks of plausibility and operating characteristics; this is what we have tried to demonstrate in the present paper.

The simulation results displayed in Figure 4 are somewhat in contrast to the results given by Friede et al,24 who observed lower coverage than nominal level of MLE methods in a similar setting, but not based on rare events. We also investigated a scenario closer to their setup by considering higher baseline risks between 0.05 and 0.20. The results are shown in Figure D1,  and indeed, here, the MLE method exhibits lower coverage than nominal level, as reported by Chung et al22 and Friede et al.24 The high bias and too wide interval estimates obtained by the Vague and the MLE are still present, but not as high as in the results of the simulations in which true baseline risks are lower.

Jackson et al11 investigated seven random‐effects meta‐analysis models including the BNHM which we consider in this paper (model 4  in Jackson et al11) and another parametrization of the BNHM (model 2  in Jackson et al11). The only difference in the specification between the two models is that in their Model 2, the treatment indicator xik of 1 is +1 for the experimental arm, and 0 for the control arm. Note that commonly used network meta‐analysis models, for example,48 are generalizations of Model 2 in Jackson et al.11 As reported by Jackson et al,11 we also observe the underestimation of the heterogeneity parameter τ and hence decided to only consider their model 4. On the other hand, it is important to note that the usage of a WIP for θ also improves the performance in model 2, as we have seen for the model 4.

This investigation has some limitations. One crucial limitation is that we only considered the BNHM as a data‐generating process in our simulation study. Hence, we did not investigate the robustness of the BNHM under model misspecification. Also, the design of the simulation study constitutes a model misspecification problem for the MH method, which is a fixed‐effect model, and for the BBM, which assumes a different underlying data‐generating process. Moreover, we did not consider other parametrizations of the BNHM as described, eg, in Jackson et al.11 Lastly, one may find it too restrictive to have a normal prior for θ as we have in our proposed model, it may be worth exploring alternatives like Cauchy or log‐F distributions17, 20 for penalization.

The proposed approach is not restricted to the BNHM; similar approaches may analogously be defined in other models, eg, a Poisson‐normal hierarchical model. However, a crucial point is that the treatment effect parameter is explicitly parameterized in the model, so that it can directly be  penalized  via the prior specification. Hence, so‐called contrast‐based models25 (in which relative treatment effects are assumed to be exchangeable across trials) are suitable for this purpose unlike arm‐based models. Note that this is also related to the inclusion of baseline risks as fixed effects with vague priors. This was on purpose as we consider this closest to the idea of stratifying the analyses by study, a common feature of meta‐analyses regardless of fixed or random‐effects. Furthermore, the contrast‐based models such as the BNHM preserve the randomization, in contrast to the arm‐based models as explained in Dias and Ades.25

The BNHM can be extended to a network meta‐analysis model,49 which is desirable if there are multiple treatments, and/or multiarm trials in the dataset. Even if the dataset in a network meta‐analysis consists of many studies overall, some of the treatment effects may still be informed by few studies only. Thus, the use of WIPs for treatment effect parameters in the context of network meta‐analysis with rare events can be very helpful. Different distributions as WIP for θ, different parametrizations of BNHM, or different data models can be implemented in Stan or MCMC methods in general. Although, currently, our package MetaStan is restricted to use a BNHM and BBM for pairwise meta‐analysis, we will consider to extend it to conduct meta‐analysis and network meta‐analysis with flexible data model and prior options in the future.

ACKNOWLEDGMENT

We thank Leonhard Held who contributed valuable comments and pointed us to several important references.

 

    HIGHLIGHTS

    What is already known: Standard random‐effects meta‐analysis methods are not suitable for meta‐analysis of few studies with rare events.

    What is new: To deal with data sparsity present in the random‐effects meta‐analysis of few studies with rare events, we suggest the use of weakly informative priors as penalization for the treatment effect parameter.

    Potential impact for RSM readers outside the authors' field: To make it more accessible to meta‐analysts, a publicly available R package, MetaStan, is developed for fitting Bayesian meta‐analysis models using weakly informative priors.

       

     

    CONFLICT OF INTEREST

    The author reported no conflict of interest.

    APPENDIX A: HOW TO USE THE METASTAN R PACKAGE

     

    The stable version of MetaStan is available on CRAN (https://CRAN.R-project.org/package=MetaStan) and can be installed as follows: image

     The example described in the text (Crins dataset) is available in the package, and it can be loaded as follows:

    image

    image

    Additional information can be obtained by typing ?dat.Crins2014 (for any dataset and function in the package).

    meta_stan is the main fitting function of this package. The main computations are executed via the rstan package's sampling function. We can fit the binomial‐normal hierarchical using a WIP for treatment effect as follows:

     Convergence diagnostics and the results can be very conveniently obtained using the shinystan package as follows:

    image

     Traceplots for the estimated parameters θ and τ including burn‐in are shown in Figures A1 and A2 for death and PTLD outcomes, respectively.

    jrsm1370-fig-0008
    Traceplots for the estimated parameters θ and  ⊤  including burn‐in for death outcomes [Colour figure can be viewed at wileyonlinelibrary.com]
    jrsm1370-fig-0009
    Traceplots for the estimated parameters θ and⊤including burn‐in for posttransplant lymphoproliferative disease (PTLD) outcomes [Colour figure can be viewed at wileyonlinelibrary.com]

    Lastly, the posterior summary statistics can be obtained using the following command:

    image

    APPENDIX B: R CODE TO IMPLEMENT BNHM USING THE MLE AND THE MH METHODS

    Firstly, the BNHM using the MLE:

    image

    image

     Secondly, the MH method:

    image

     

    APPENDIX C: R CODE TO IMPLEMENT THE BBM METHOD

    image

    APPENDIX D: ADDITIONAL SIMULATION RESULTS

    We also conducted simulations using the same settings as described in Section 5 under BNHM, but using higher baseline risk probabilities, specifically, baseline risks (μi) are uniformly taken between 0.05 and 0.2. Results are illustrated in Figure D1 (analogous to Figure 5).

    jrsm1370-fig-0010
    Simulations with high baseline risks: The bias for the mean treatment effect θ, coverage probabilities, and log mean length of the interval estimates for θ obtained by three methods (maximum likelihood estimate [MLE], Vague, and weakly informative prior [WIP]) are shown [Colour figure can be viewed at wileyonlinelibrary.com]

      Number of times cited according to CrossRef: 7

      • Commentary to “antidepressants and suicidality: A re-analysis of the re-analysis”, Journal of Affective Disorders, 10.1016/j.jad.2020.04.025, 273, (252-253), (2020).
      • Commercial viability of locating pelagic longline branchline weights at the hook to reduce seabird bycatch, Endangered Species Research, 10.3354/esr01070, 43, (223-233), (2020).
      • Data monitoring committees for clinical trials evaluating treatments of COVID-19, Contemporary Clinical Trials, 10.1016/j.cct.2020.106154, 98, (106154), (2020).
      • Effect of pelagic longline bait type on species selectivity: a global synthesis of evidence, Reviews in Fish Biology and Fisheries, 10.1007/s11160-020-09612-0, (2020).
      • Likelihood-based random-effects meta-analysis with few studies: empirical and simulation studies, BMC Medical Research Methodology, 10.1186/s12874-018-0618-3, 19, 1, (2019).
      • Residential exposure to petrochemical industrial complexes and the risk of leukemia: A systematic review and exposure-response meta-analysis, Environmental Pollution, 10.1016/j.envpol.2019.113476, (113476), (2019).
      • Prior distributions for variance parameters in a sparse‐event meta‐analysis of a few small trials, Pharmaceutical Statistics, 10.1002/pst.2053, 0, 0, (undefined).

      The full text of this article hosted at iucr.org is unavailable due to technical difficulties.