Synthesis of evidence from zero‐events studies: A comparison of one‐stage framework methods

In evidence synthesis, dealing with zero‐events studies is an important and complicated task that has generated broad discussion. Numerous methods provide valid solutions to synthesizing data from studies with zero‐events, either based on a frequentist or a Bayesian framework. Among frequentist frameworks, the one‐stage methods have their unique advantages to deal with zero‐events studies, especially for double‐arm‐zero‐events. In this article, we give a concise overview of the one‐stage frequentist methods. We conducted simulation studies to compare the statistical properties of these methods to the two‐stage frequentist method (continuity correction) for meta‐analysis with zero‐events studies when double‐zero‐events studies were included. Our simulation studies demonstrated that the generalized estimating equation with unstructured correlation and beta‐binomial method had the best performance among the one‐stage methods. The random intercepts generalized linear mixed model showed good performance in the absence of obvious between‐study variance. Our results also showed that the continuity correction with inverse‐variance heterogeneous (IVhet) analytic model based on the two‐stage framework had good performance when the between‐study variance was obvious and the group size was balanced for included studies. In summary, the one‐stage framework has unique advantages to deal with studies with zero events and is not susceptive to group size ratio. It should be considered in future meta‐analyses whenever possible.

is to remove them from the meta-analysis or, in rare situations, to use a continuity correction to produce rough estimates.
• Methods based on the one-stage framework have a unique advantage to deal with studies with zero events in both arms.
What is new • Among many one-stage methods that can be used to deal with studies with zero events in both arms, the generalized estimating equation (GEE) with unstructured correlation and beta-binomial (BB) method generally have the best performance. • The one-stage meta-analytic methods are not susceptive to group size ratio; even for extremely unbalanced studies, they generally perform well.
Potential impact for Research Synthesis Methods readers outside the authors' field • Our study provide evidence that current routine methods for dealing with studies with no events in both arms in meta-analysis may not be the optimal choice; at least, the one-stage methods should be considered. • Whenever possible, systematic review authors should consider the GEE method with unstructured correlation to deal with zero-events problems when there is small between-study variance in a meta-analysis, and the BB method when there is considerable between-study variance.

| INTRODUCTION
In evidence synthesis, dealing with zero-events studies is an important and complicated task and has generated broad discussion. [1][2][3][4][5][6][7][8] Zero-events occur when the risk of events is low and/or the sample size is small, frequently seen with safety outcomes. For studies with zero-events in a single arm, there is a unanimous agreement that such studies should be incorporated into meta-analysis, and many well-established methods (e.g., Peto's odds ratio, continuity correction) are available to achieve this. [9][10][11] However, for studies with zeroevents in both arms (known as double-arm-zero-events studies), controversy remains whether such studies should be incorporated into meta-analysis or not.
Researchers who tend to discard double-arm-zeroevents studies from a meta-analysis may argue that such studies are non-informative and they add nothing to the pooled effect sizes of relative measures, such as odds ratio (OR) and risk ratio (RR). 12 Although this assumption is supported by the theory of conditional likelihood, it is problematic and has been criticized by the following three arguments.
1. Meta-analysis is a process to synthesize all available evidence for studies on the same topic for a comprehensive inference, and double-arm-zero-events studies are an important source of evidence.
2. Double-arm-zero-events studies are not necessarily "non-informative." For example, for studies with balanced sample sizes, zero-events in both arms indicate there are comparable treatment effects; excluding them may lead to under-or over-estimation of the effects. 4 Xu et al have shown that excluding studies with no events in both arms can result in the change of the direction of ORs, confidence intervals (CIs), and significance of p-values. 13 3. From the statistical point of view, whether doublearm-zero-events studies contribute information for the inference depends on the methods (along with its assumptions) and the effect measure chosen. Based on the one-stage methods, the effect and variance of such studies could be defined, and they contribute information for the inference. On this basis, we believe that these studies should not be simply discarded.
There are now many methods that provide valid solutions to synthesizing data from double-arm-zero-events studies. They are well-suited for analysis of various effect measures, including both relative and absolute measures. [14][15][16][17][18][19][20][21] These advances make it possible to solve the problem of zero-events in meta-analysis, and they improve decision-making. Kuss 4 has summarized 11 methods for meta-analysis to include information from studies with no events in both arms without a continuity correction. These methods are either based on a two-stage frequentist framework, for example, the arcsine difference 1 and MH risk difference, 22 or a one-stage frequentist framework, for example, the generalized linear mixed model (GLMM) 23,24 and beta-binomial model (BB). 5,25,26 In addition, Bayesian methods were also proposed to deal with studies with no events. 27,28 The advantages of the Bayesian methods have been widely recognized and documented. With proper prior distributions for the event risks and sample distributions for the events, Bayesian methods are expected to produce satisfactory effect estimates. 27,28 However, the major limitation of the Bayesian methods is that the choice of prior distributions may greatly influence the results (at least for those with double-arm-zero-events studies), resulting in uncertainty of the inference. 13,29 The one-stage frequentist framework methods do not suffer from such a problem because there is no need to borrow strength from prior information or post hoc continuity corrections. Moreover, the one-stage methods are generally easier to implement by a wide range of statistical software than the Bayesian methods.
Although the one-stage methods present their unique advantages to deal with zero-events studies, a very small proportion of meta-analyses utilized this group of methods in practice. As the most important methods under the one-stage framework, the GLMM 23,24,29-33 and BB 5,25,26 for meta-analysis have been widely discussed. The generalized estimating equation (GEE) is another one-stage framework method, 34,35 but it is seldom described in the meta-analysis literature. The lack of global recognition of one-stage methods results in unfamiliarity among most systematic reviewers with these methods. Kuss 4 has investigated the statistical properties of these methods and confirmed the superiority of them for meta-analysis with zero-events studies, while there are two additional issues to be addressed: (1) whether the total event count of a meta-analysis impacts the performance of one-stage methods? (2) whether the sample size ratio impacts the performance of one-stage methods? In this article, we give an overview of the one-stage frequentist methods for meta-analysis with zero-events studies, and compare these methods with continuity correction for the two-stage method.
2 | METHODS 2.1 | One-stage versus two-stage methods for meta-analysis To begin with, we must distinguish the concepts of "onestage" methods and "two-stage" methods. The "twostage" methods refer to the standard meta-analysis methods that contain a two-stage procedure. In the first stage, the study-specific effect sizes (and standard errors) are obtained or estimated from the included studies, and then these effect sizes are combined in the second stage through a certain weighting scheme (e.g., inverse variance). 36,37 For the "one-stage" methods, there is no need to obtain the study-specific effect sizes; instead, each study is treated as a stratum or cluster, and the overall effect size is directly obtained by either a generalized linear model or population-averaged method. 38,39 This article focuses on the frequentist framework and does not consider the one-stage Bayesian model. The "one-stage" methods include the "treat-as-one-trial" methods, the GLMMs, the stratified exact (logistic or Poisson) regression model (SERM), the GEE methods, and the BB model. The first three belong to a class of generalized linear models, while the GEE is a population-averaged method. The "treat-as-one-trial" method has been criticized as it may lead to Simpson's paradox as weights are not assigned to the studies, and it is considered not an appropriate solution. 40 Of note, both "one-stage" and "two-stage" methods are feasible for aggregated and individual participant data. We focus on the application of these methods to aggregated data because such data are easily accessible and commonly seen in practice.

| The one-stage framework: Generalized linear mixed model
GLMMs are the most commonly used one-stage method in meta-analysis and are feasible to a wide range such as meta-analysis of prevalence, 41 network meta-analysis, 33 diagnostic meta-analysis, 5 dose-response meta-analysis, 42 and so on. GLMMs provide a good solution to modeling individual-specific correlation and between-study variance by setting various random effects. The standard random effects GLMMs for meta-analysis include the random slope GLMM and random intercept and slope GLMMs. We use i (1, 2,…, k) to denote the included studies and j to denote treatment status (j = 0 for control and j = 1 for treatment). Suppose the study-specific true event risk is π ij . A random intercept and slope GLMM has the following form as described as model 3 by Jackson et al. 30 logit where θ i $ N θ, τ 2 ð Þ, γ i $ γ, σ 2 ð Þ, τ 2 and σ 2 refer to between-study variance on treatment effects and study effects. The model assumes that the true study effects (i.e., baseline risks) and true study-specific treatment effects differ across studies (i.e., random-effect model). If we assume that true study effects are the same, then the above model degrades to a random slope GLMM: If we assume treatment effects are the same (i.e., fixed-effect model) but the study effects differ, that is, replacing θ i with θ, then the model degrades to a random intercept GLMM. By centering the treatment status for Equation (1), say, replacing j with j À 0:5, we can establish a modified random intercept and slope GLMM (i.e., model 5 in Jackson et al); similarly, by replacing j with j À 0:5 for Equation (2), we can establish a modified random slope GLMM, say, model 4 in Jackson et al. 30,33 The joint likelihood function could be established when no random terms are involved, and we can estimate the parameters by maximizing the log-likelihood. Whenever there are random terms, the likelihood function has no closed-form, and some optimization methods should be used for the estimation. These include the Laplace approximation, the penalized quasilikelihood, the (adaptive) Gauss-Hermite quadrature, iteratively weighted least squares, and the expectationmaximization algorithm. 43 A large-scale simulation study has found that Laplace approximation performs better than the penalized quasi-likelihood and adaptive Gauss-Hermite quadrature for GLMMs. 32 GLMMs have been shown to have good properties for meta-analysis of rare events. [30][31][32][33] The major advantage of GLMMs is their ability to incorporate information from double-arm-zero-events studies without any prior information or post hoc correction. However, GLMMs cannot deal with the situation when the total event count of the meta-analysis is zero in either of the arm or both arms. In addition, when the total event in either arm is insufficient, the GLMMs may have poor performance. 30 Ju et al 32 suggested that at least 10 total events per arm should be contained to ensure the robustness of the results when utilizing GLMMs for meta-analysis.

| The one-stage framework: stratified exact regression model
The stratified exact regression model (SERM) provides a solution to analyzing single-arm-zero-events of binary outcomes as it does not rely on an asymptotic distribution; instead, it produces estimates through the exact distribution. 4 This method is similar to the random intercept GLMM; the difference is that the latter uses the likelihood estimation by an asymptotic distribution. There are two major SERMs for meta-analysis in terms of the link functions, that is, logit or log, which are referred to as stratified exact logistic regression and stratified exact Poisson regression, respectively. A stratified logistic regression model can be written as: Here, γ i 's are study-specific study effects, and θ is the regression coefficient indicating the treatment effect, that is, exp θ ð Þ represents the OR for the logistic model or the RR for the Poisson model. All studies are assumed to share a common effect. The SERM uses the permutation distribution of the sufficient statistics of γ i and θ to obtain the exact probability of each permutation for parameter estimation. 44 The detailed algorithm of the exact regression is provided by Tritchler. 45 The advantage of SERM is that it can deal with single-arm zero events based on exact estimation and thus does not suffer from the problem of separation. 46 It should be noted that there would be no finite estimates when a conditional likelihood method is employed for strata with zero events, so the conditional likelihood method is unsuitable here. Another advantage is that it considers the variation of the study effects across studies that allows more "freedom" for the variance. 33 The major limitation is that this method is not sensible when dealing with double-arm-zero-events studies. 4 This can be easily illustrated by considering the models that include and exclude double-arm-zero-events studies with the same data set, which will give exactly the same results. Another limitation is that the exact estimation process is computationally expensive when the number of included studies is large.

| The one-stage framework: Generalized estimating equation
The GEE combined with robust variance estimators is an effective method to solve the problem of correlated data within studies. It was proposed by Liang and Zeger, 34,35 and is an extension of generalized linear model for the estimation equation of likelihood function, which establishes a relationship between response means and the linear combination of potential predictors. 47 The GEE focuses on the population mean estimates and is considered a marginal approach. 48,49 A working correlation matrix R a ð Þ is used to define the GEE and obtain an efficient estimate of the variance. 34 The GEE is better described as a semiparametric estimation method, rather than a statistical model. We can specify the GEEs for meta-analysis as: The variance of subject λ has the form of where A λ is the usual estimate of the variance based on the information matrix and is obtained as the negative expected value of the second derivative of the log-likelihood. 47 Moreover, ψ is the dispersion parameter and can be readily estimated. 34,47 The working correlation R a ð Þ indicates the correlation of the probabilities of an event in treatment and control groups within each study (cluster). Liang et al 34 introduced seven types of working correlations, three of which are suitable for meta-analysis, that is, independent structure, exchangeable correlation, and unstructured correlation. The independent one assumes that there is no correlation between treatment and control arms; the exchangeable structure assumes an identical correlation between all effects within studies; and an unstructured one imposes no structure to the correlation matrix.
Similar to the GLMMs, the GEEs have a major advantage in dealing with double-arm-zero-events studies. Again, the GEEs cannot deal with the situation when the total event counts are zero in either of the arm or both arms across included studies. When the total event in either arm is insufficient, the estimation would be greatly biased.

| The one-stage framework: Beta-binomial model
The BB model was first introduced by Pearson 50 to deal with overdispersion for binomial data, and it has been widely discussed for its application to meta-analyses for zero-events studies in recent years. 4,25,26,[51][52][53][54] It uses a mixture of binomial distribution of the number of events (r) with a beta distribution for the true event risk π: r $ Bin(n, π) and π $ Beta(a, b), where a > 0, b > 0, and they are unknown parameters. Through the reparameterization for a and b,say, we can obtain the expectation and variance: E(r) = nμ and Var(r Here, ϕ is the overdispersion parameter. 53 This reparameterization allows us to use the maximum likelihood method to obtain the unknown parameters a and b: 55 Some more sophisticated parameter estimation methods have been documented elsewhere. 4,5,51 A feasible method to establish a one-stage BB model and account for the variation of treatment effects (randomeffect model) across studies is to treat study as a timeseries variable t: The advantage of this procedure is that it is expected to have better convergence than the multilevel random effects models.
2.6 | The one-stage framework: Bivariate model Several GLMMs can be applied to establish the bivariate model. A bivariate GLMM structure is 24,30 : Here, σ 2 0 and σ 2 1 are the variances according to the event risks on the logit scale, logit Þ , in the control and treatment arms, respectively. The ρ is the correlation coefficient between Þ . The advantage of the bivariate model is that it directly models the potential correlation of the event probabilities in two arms. When there are correlations, a bivariate model is expected to produce better point and variance estimates. However, due to the additional nuisance parameters, the estimation would be more complex and computationally expensive.

| The two-stage framework: Continuity correction
Within the two-stage framework, zero-events studies are generally dealt with Peto's OR method, Mantel-Haenszel method, or the continuity correction when the relative risk is utilized. When the absolute risk is utilized, zero-events studies are dealt with by the Mantel-Haenszel risk difference or the double-arcsine difference. 1 We focus on the former situation where the relative risk is used to measure the effect. When facing studies with no events in both arms, the only frequentist method is the continuity correction. Sweeting et al have documented several correction methods and discussed the performance of them. 2 The most widely used correction method is to add 0.5 to each cell of the 2 Â 2 table, which is primarily considered in this article.
We consider the increasingly popular tool, the inverse variance heterogeneous model, 56,57 to synthesis the study-specific effects. This model discards the classic random effects assumption; instead, it treats the between-study variance as an overdispersion correction. 56,57 Simulation studies have demonstrated that the inverse variance heterogeneous (IVhet) model outperformed other two-stage methods. 56,57 This model can be implemented using the admetan module 58 in Stata. The pooled estimate and variance by this model are given by: Here, s i 2 is the study-specific variance of study i in the two-stage framework, w i is the study-specific weight, and ψ i is the study-specific overdispersion correction which can be calculated as s i 2 þτ 2 s i 2 . The advantage of this two-stage method compared with the classic random effects methods is that it is expected to have a better error estimation and keep a better coverage. 56,57

| REAL-LIFE EXAMPLE
The COVID-19 pandemic in 2020 hugely impacted our society and threatened human lives. Physical prevention (e.g., physical distance, face mask) is currently the most important way to prevent person-to-person transmission of COVID-19. Chu et al 59 recently conducted a meta-analysis on the effect of physical prevention for severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) and COVID-19. In this meta-analysis, they collected 44 comparative studies, including 29 that dealt with the preventive effect of face masks, which were used as our example here (Table S2). Among the 29 studies, 6 were single-arm-zero-events studies, and 6 were double-arm-zero-events studies, while no arms' total counts were zero. Due to the involvement of double-arm-zero-events studies, the SERM method could not be used. Therefore, we employed the BB, GLMMs, GEEs, and IVhet methods for the meta-analysis. Table 1 presents the meta-analysis results. There was an obvious preventive effect of face masks on reducing the risk (ORs ranging from 0.16 to 0.53) of person-toperson transmission of COVID-19 and SARS-CoV-2, consistent with the original findings by Chu et al. 59 Except for the GEE with independent correlation and the bivariate GLMM with unstructured correlation, the effects were similar across the methods. When excluding six double-arm-zero-events studies, there were some changes in both ORs, CIs, and p-values.  Table 2 presents the simulation settings. We conducted three sets of simulations to investigate the statistical properties of the above methods, and we chose the OR as the effect estimator. As mentioned earlier, the SERM can only be used to deal with single-arm-zero-events studies but is not feasible for double-arm-zero-events studies, so we did not fit the SERM method here. We generated grouped data for each meta-analysis based on the "PCRandom" method 60 and specified the event risk in the control group as a random variable. The event risk in the control group, the sample size in the control group, the (log) true treatment effect, and the between-study variance were first defined, and these parameters were used to estimate the event risk and sample size of the treatment group. Then the simulated meta-analysis with grouped data was generated.
In the first set of simulations, we considered an event risk in the control arm ranging from 0.05 to 0.1 with a uniform distribution, that is, π 0 $ U 0:05,0:1 ð Þ ; in the second set of simulations, we considered π 0 $ U 0:01, 0:05 ð Þ ; and in the third set of simulations, π 0 $ U 0, 0:01 ð Þ. These settings were based on the empirical data (Supporting Information) from the meta-analyses with double-arm-zeroevents studies in the Cochrane Database of Systematic Reviews. Among those meta-analyses, in the control arm, the sample sizes were fitted well by a log-normal distribution with the mean log value of 3.3537, the median event risk of 0 (interquartile range: 0-0.09), and a mean event risk of 0.07. 13 The sample size in the control arm (n 0 ) was then set as logn 0 $ N 3:3537, 0:9992 ð Þ . The sample size in the treatment arm was derived by logn 0 and the empirical sample size ratio: logn 1 ¼ exp logn 0 ð ÞÂratio. In the empirical data, the ratio ranged from 0.12 (1/0.12 = 8.33) to 6. We used the 5% to 95% quartiles of the empirical sample ratio to generate ratio $ U 0:84, 2:04 ð Þ . In addition, since the continuity correction has been criticized for its large bias when the sample ratio is strongly unbalanced, 2 we considered an additional scenario with ratio $ U 2:04, 8:33 ð Þto examine if the one-stage framework might perform better in this situation.
In the first set of simulations, the majority of the meta-analyses had the total event count in each arm less than 10; the median event count in each study's control arm was exp(3.3537) Â 0.01/2. In the second set of simulations, about 50% of the meta-analyses had the total event count in each arm more than 10; the median event count in each study's control arm was exp(3.3537) Â (0.05 + 0.01)/2. In the third set of simulations, most of the meta-analyses had the total event count in each arm more than 10; the median event count in each study's control arm was exp(3.3537) Â (0.1 + 0.05)/2. For the number of studies included k, we took the first to third quartiles of the empirical data; again, we assigned a discrete uniform distribution to k in all simulations, that is, k $ U 4, 10 f g with a step width of 1. For the treatment effect θ, two values were considered: exp(θ) = 1 and 2, representing scenarios without true treatment effect (log1 ¼ 0) and with true treatment effect (log2), respectively. For each of the three sets of simulations, four equally-spaced monotonically increasing values of τ 0.2 (small), 0.4 (moderate), 0.6 (substantial), and 0.8 (large) were set to generate small to large between-study variances. 32 Therefore, each set of simulations contained 2 Â 4 = 8 scenarios with different treatment effects and between-study variances.
The following eight methods were compared: (1) the two-stage method IVhet with continuity correction (adding 0.5); (2) the BB model; (3) the GEE method with independent correlations; (4) the GEE method with an unstructured correlation matrix; (5) the random intercept GLMM; (6) the random slope GLMM; (7) the random intercept and slope GLMM; and (8) the bivariate GLMM model. We did not consider the reparameterization of GLMM models as in our previous study because we found that the standard GLMMs performed better than the former. 32 For each scenario, we first simulated 5000 meta-analyses, and then excluded the meta-analyses with zero total event counts in a single arm or in both arms. The percentage bias (PB, defined asθ À θ À Á =θ), mean squared error (MSE, defined as Varθ À Á þθ À θ À Á 2 ), coverage, and width of 95% CI were calculated; they were used to quantify the statistical properties of the candidate methods. Here,θ represents the estimated effects by these candidate methods. It should be noted that for the scenario θ ¼ 0, the PB is replaced byθ À θ. The PB and MSE reflect the performance of point estimates and the width of CI reflects that of interval estimates. Therefore, a method was considered to have better performance if it had a lower PB, lower MSE, and narrower CI while the coverage was close to the nominal level 95%. All simulations were performed through the Stata/SE 14.0 (Stata, College Station, TX) software. The program for the simulations is available in the Supporting Information.  Uniform (4,10) showed poor convergence under the tolerance of 0.000001. Figures 1-4 present the simulation results. In the first set of simulations (first row in Figures 1-4), where most of the meta-analysis (>98.07%) had less than 10 total events in each arm, substantial biases were observed (about À 24.14% to À 39.78% for OR = 1 and À 67.05% to À 93.72% for OR = 2). In the second set of simulations (second row in Figures 1-4), about 62-70% of the meta-analyses had at least 10 total events in both arms, and we observed much smaller biases (À 0.06 to 13.83% for OR = 1 and À 22.59 to 20.38% for OR = 2). In the third set of simulations (third row in Figure 1-4), about 96% of the meta-analyses had at least 10 total events in both arms, and the methods were almost unbiased in more homogeneous meta-analyses (τ = 0.2). As the number of total event counts increased, the estimation showed increased precisions (i.e., narrower CIs). This indicates that when the number of total events was less than 10 in each arm, none of these one-stage and two-stage methods showed good performance. For the one-stage methods, when the event risk was low, say 0 to 0.01, where most of the meta-analyses had less than 10 total events in the arms, the performance was poor with large bias, large MSE, large width of CI,  and extremely low coverage. As the event risk increased, the total events became sufficient ( ≥ 10) in each arm, the performance of one-stage methods was good. The BB model, GEE with independent correlation, GEE with unstructured correlation, and randomintercept GLMM had better performance than the other methods in terms of bias, MSE, width of CI, and coverage. This finding is similar to that by Kuss. 4 Among these methods, the GEE with unstructured correlation had the best performance when the between-study variance was not obvious (τ < 0.6), while the other three also performed reasonably well. The BB model had the best performance when there was considerable between-study variance (τ ≥ 0.6).

| Simulation results
The IVhet model with the continuity correction performed well in terms of MSE, width of CI, and coverage. Generally, this method outperformed the one-stage methods when the event risk ranged from 0 to 0.01 and the sample size ratio was balanced. Under this setting, we recorded that when OR = 2, the IVhet showed moderately large bias but obviously better coverage than one-stage methods. As the event risk increased, the performance of one-stage methods became better, and the advantage of IVhet against these methods decreased. This was more obvious for the bias of all one-stage methods and for the MSE, width of CI, coverage of the BB and GEE with independent correlation. The IVhet model showed larger bias than the one-stage methods when 0.00 0.50  there was no obvious between-study variance. However, when the between-study variance was considerable (τ ≥ 0.6), it showed similar or smaller bias than the one-stage methods. The forgoing was the situation when the group size of included studies was comparable. In our additional simulations, when the ratio of group size was not comparable, the continuity correction had poorer performance than the BB model, GEE with independent correlation, GEE with unstructured correlation, and random-intercept GLMM ( Figure S2). The ratio of group size showed little impact on the performance of the one-stage methods.

| DISCUSSION
In this study, we summarized five one-stage frequentist methods for meta-analysis to deal with zero-events studies. In addition, we compared the three major one-stage methods to a two-stage method (IVhet) that uses the continuity correction to deal with studies with no events in both arms. Our simulation studies demonstrated that the one-stage methods performed poorly when the total events in each arm were insufficient. When the total events in each arm were sufficient, they generally performed well, and the GEE (in the case of low  between-study variance) and BB model (in the case of obvious between-study variance) had better properties than other methods. The random intercepts GLMM also performed well when there was no obvious betweenstudy variance. However, GLMMs seem to have serious convergence problems under the environment of the Stata program. We also found that, when the group size was balanced in a meta-analysis, the continuity correction based on the IVhet model performed well in the presence of obvious between-study variance. The continuity correction was susceptible to the group size ratio, but this was not for the one-stage methods.
Although the continuity correction based on the twostage framework showed lower MSE and CI than the one-stage methods, we should not ignore the impact of the continuity correction on the error estimation. As we observed in the three sets of simulations, error estimation tended to be better when the total event count of each arm increased. For the continuity correction, 0.5 was added to each cell for zero-events studies, meaning the total event count was increased; as a result, the estimation is expected to be more precise, leading to smaller variances. This was more obvious in the scenario of no treatment effect, because studies with zero events and  those without zero events shared the same true effect and thus were more homogeneous, and precision substantially increased. In the simulations, there was a much smaller gap between the IVhet method (with the continuity correction) and the one-stage method (without the continuity correction) in terms of the MSE as the event risk increased. The increase in the event count by post hoc corrections could be traded-off by the increase in the total event count through the increase of the event risk. The predominant advantage of the one-stage methods is their ability to synthesize zero-events studies without prior information or post hoc continuity corrections, preventing "additional noise" in estimation. 61 This feature makes them attractive for solving the problem of zero-events. However, we demonstrated several limitations that might restrict their application. In our simulations, the total event count had a large impact on the performance of both one-stage and two-stage methods. For a meta-analysis with less than 10 total events in either arm, the one-stage methods could have poor performance. In addition, as mentioned earlier, when the total event count was zero in either arm, the one-stage methods were no longer feasible for meta-analysis. This could be one of the major limitations of the one-stage framework methods.
Other limitations of the one-stage framework include that measuring the between-study variance is more complex than with the standard two-stage methods. We can use the intraclass correlation coefficients (ICC) to reflect the proportion of between-study variance to the total variance as a proxy for I-squared for GLMMs, 62 but this is difficult for GEEs. Second, likelihood-based methods generally face the problem of insufficient convergence, especially when random terms are added into the model. Many factors can influence the convergence, including the effect estimates (OR or RD), the setting of the tolerance for determining convergence, iteration times, the choice of model (e.g., adding random terms), optimization methods. For example, setting a less strict tolerance (e.g., 0.001) can increase the convergence rate. As shown in Table S1, the convergence rate was generally low, especially for GLMMs, while this was not the case when using the R program that we recorded previously. 32 This is likely due to the different default settings of parameters for model estimation between software. A comprehensive summary of the different settings in these programs in the future would be useful. Third, it is difficult for the one-stage methods to test for the potential publication bias 63 ; the conventional methods for publication bias depend on study-specific effects and cannot be directly applied to the one-stage framework.
In this study, the one-stage Bayesian methods were not considered. Bayesian hierarchical models could be readily established based on the GLMMs. As mentioned earlier, the setting of prior distribution largely impacts the final results. In the case of rare events, the prior information could predominate the weights, and the sample information might have little contribution to the meta-results. In addition, a random effects Bayesian model inevitably also has the same problem of frequentist GLMMs, due to the random effects assumption. Further investigation and improvement on the Bayesian methods are worthwhile in this area.
In summary, the one-stage framework has a unique advantage to deal with studies with zero events and could be considered in future meta-analyses whenever possible. The continuity correction based on the IVhet model is a good alternative when the group size is balanced. Based on our findings, researchers are advised to choose the GEE method with unstructured correlation to deal with zero-events problems when there is small between-study variance in a meta-analysis, and choose the BB method when there is considerable between-study variance.