Nonparametric estimation of the random effects distribution for the risk or rate ratio in rare events meta‐analysis with the arm‐based and contrast‐based approaches

Rare events are events which occur with low frequencies. These often arise in clinical trials or cohort studies where the data are arranged in binary contingency tables. In this article, we investigate the estimation of effect heterogeneity for the risk‐ratio parameter in meta‐analysis of rare events studies through two likelihood‐based nonparametric mixture approaches: an arm‐based and a contrast‐based model. Maximum likelihood estimation is achieved using the EM algorithm. Special attention is given to the choice of initial values. Inspired by the classification likelihood, a strategy is implemented which repeatably uses random allocation of the studies to the mixture components as choice of initial values. The likelihoods under the contrast‐based and arm‐based approaches are compared and differences are highlighted. We use simulations to assess the performance of these two methods. Under the design of sampling studies with nested treatment groups, the results show that the nonparametric mixture model based on the contrast‐based approach is more appropriate in terms of model selection criteria such as AIC and BIC. Under the arm‐based design the results from the arm‐based model performs well although in some cases it is also outperformed by the contrast‐based model. Comparisons of the estimators are provided in terms of bias and mean squared error. Also included in the comparison is the mixed Poisson regression model as well as the classical DerSimonian‐Laird model (using the Mantel‐Haenszel estimator for the common effect). Using simulation, estimating effect heterogeneity in the case of the contrast‐based method appears to behave better than the compared methods although differences become negligible for large within‐study sample sizes. We illustrate the methodologies using several meta‐analytic data sets in medicine.


INTRODUCTION
Meta-analysis has become a powerful statistical tool to analyze and integrate the results of several independent studies on the same research question to obtain novel and more conclusive empirical evidence.This approach is widely used in many areas of science, especially in medicine and epidemiology or psychology and social sciences.In essence, a classical meta-analysis requires a point estimate of the effect of treatment, or risk factor for disease, from each individual study, along with a measure of the precision of that estimate.A summary estimate can be obtained with the so-called inverse variance-weighted average (IVW) method or two-stage meta-analysis. 1,2The effect estimate in a meta-analysis may be the mean, mean difference, relative risk, or odds ratio, depending on the type of outcome: continuous or count data.In this article, we concentrate on meta-analysis for dichotomous outcomes.Let X ij be a random count variable, representing the number of events in study i and group j, for i = 1, 2, … , k, and j = 0, 1.Here, k is the number of studies available in a meta-analysis, j = 0 and j = 1 denote the treatment and comparison or control groups, respectively.Suppose that the risk ratio is the effect size of interest and is the parameter to be estimated.If the conventional IVW meta-analysis is used, the risk ratio in each trial is first estimated as rr i = (X i1 n i0 )∕(X i0 n i1 ), where n ij is the sample size in study i and group j.The overall log risk ratio from k studies is then computed as log rr IVW = ∑ k i=1 w i log rr i ∕ ∑ k i=1 w i in the second stage, where w i denotes the weight of the effect size in study i and is obtained from the inverse variance of log rr i .The summary log risk ratio is then obtained by taking the anti-log.Unfortunately, using the method as noted above may have a limitation if no or few events were counted in a study.For instance, if a study has zero events in an arm, rr i is undefined and rr IVW cannot be computed.
A rare event is a situation where a small number or no events are observed in a trial.This can happen in one or both treatment arms of the 2 × 2 contingency tables.As investigated in several papers, bias in estimation in rare events meta-analysis can occur when excluding zero-studies or including zero studies with a continuity correction.4][5] Furthermore, in the classical approach the distribution of effect measures is always assumed to be at least approximately normal.However, it is inappropriate for the rare events situation, where small events, for example 0, 1, or 2, often occur.To address this issue, one may use a popular method suggested by Mantel and Haenszel. 6he Mantel-Haenszel method is not sensitive to zero counts unless all studies have no events in one arm.However, it does not allow extensions to include covariate information nor does it deal easily with effect heterogeneity.Alternatively, a meta-analytic one-stage approach using regression analysis with the log-linear model, also called the Poisson regression model, can be applied.According to Böhning et al 7 this method has several advantages and can be used in meta-analysis with zero-count studies, without excluding them, and there is no need to add a continuity correction to each count, as often done in the two-stage approach.Then, Böhning et al 8 and Holling et al 9 suggest nonparametric estimation of effect heterogeneity using a discrete mixture model.We emphasize that the nonparametric aspect refers to the random effects distribution whereas, conditional upon the random effect, the count outcome is still assumed to follow a Poisson regression model.This method leaves the random effects distribution of the effect measure unspecified (instead of assuming a bivariate normal distribution).Nonparametric mixture likelihood modeling is used on the basis of maximum likelihood estimation.This can be done in two ways.For one, an arm-based approach can be used to estimate the intercept and slope parameters in the model.Different study arms of the same study may belong to different mixture components.In contrast, in the contrast-based approach study arms of the same study always belong to the same mixture component.In other words, the contrast-based approach respects the study membership of the respective arms and is often viewed to provide a more appropriate assessment of treatments by means of within-study comparison.To clarify the behavior of these approaches, it appears reasonable to investigate the performance of the two types of nonparametric mixture models.Hence, the main objective of this work is: • to formulate the arm-based and contrast-based discrete mixture models in the Poisson regression context with their associated likelihoods; • and compare their performance by means of case studies and simulation work.
We note in passing that ideally meta-analysis is done best if individual study participant data are available, so that analysis can proceed on the level of study participants.This is the preferred option as, for example, participant-specific covariate information can be utilized.However, this is often not realistic and only information on outcome and covariates are available on study level.Here the option of contrast-based vs arm-based analysis arises.Furthermore, we note that in our setting we assume an intermediate information scenario where we assume that per study arm a count of the number of events is available, in contrast to the more restrictive setting which provides only an effect measure such as the log rate ratio with a measure of uncertainty.
The article is organized as follows.Section 2 presents some background information on count data modeling.It also discusses how to use the nonparametric mixture model to estimate effect heterogeneity and describes the EM algorithm to find the maximum likelihood estimates of the model parameters.In Section 3, all approaches are illustrated using three real data sets from medicine.In Section 4, a simulation study is performed to investigate the performance of discrete mixture models using the contrast-based and arm-based approaches.We evaluate the estimators under these two methods as well as the classical DerSimonian-Laird and the mixed Poisson regression model estimators in terms of bias and mean squared error.Model selection is conducted using the log-likelihood, the Akaike information criterion (AIC), and Bayesian information criteria (BIC).The article closes with some conclusions in Section 5.

Simple model for counting distribution
The meta-analytic settings and assumptions considered in this article are given as follows.In k independent studies, let X ij be a random count from a Poisson distribution with mean It is denoted as and n ij > 0 being the event risk parameter and sample size in study i and group j, respectively, where i = 1, 2, … , k and j = 0, 1.The probability mass function of X ij is given by Po(x|) = exp(−) x ∕x!, for x = 0, 1, 2, … , where  is the mean parameter of X ij .This probability model is reasonably assumed for rare event situations, where low frequency counts, such as 0, 1, or 2, are often observed in one or two groups of a trial.Clearly from (1), the incidence risk of an event is  ij = E(X ij )∕n ij for study i and group j.The risk ratio which is the parameter of interest becomes  i1 ∕ i0 .Note that if trials differ in observation time, n ij can be replaced by a person-time, which is a combination of size of trial and duration time.Also then,  ij = E(X ij )∕n ij is the rate parameter of interest and  i1 ∕ i0 is the rate ratio.
In the following, we point out that the risk ratio (or rate ratio) is a measure associated with the log-linear model. 10ere, taking logarithms on both sides of (1), we obtain the log-linear model: where  i is the slope of the model and represents the log-risk ratio in study i,  i is the intercept term or log-baseline risk in study i, and log n ij is the offset (ie, a covariate with a fixed known coefficient).If  i , for i = 1, 2, … , k, varies across studies, effect heterogeneity occurs.In contrast, if  i =  for all i, we are in the situation of homogeneity.Similarly,  i can be either homogeneous or varying across studies.The question arises how baseline ( i ) and/or effect ( i ) heterogeneity can be dealt with.To model heterogeneity of the effect measure and intercept, a standard generalized linear mixed model approach is often used.Suppose that  i and  i are independent and each has a normal distribution with baseline heterogeneity variance  2  and effect heterogeneity  2 , where  i ∼ N(,  2  ) and  i ∼ N(,  2 ).Then, the marginal random effects likelihood becomes where Po(⋅) is the Poisson density under model (2) and (⋅) is the normal density of the random effects given the related parameters.We note that  i and  i can be allowed to be correlated in which case we would use a bivariate normal distribution in (3) allowing for an additional covariance parameter.To estimate the unknown parameters involved in (3), a generalized linear mixed model is fitted.In practice, the glmer function in the lme4 package 11 of the R programming language is helpful to estimate the parameters in generalized linear mixed-effects models.We call (3) the mixed Poisson regression model which includes a random baseline (intercept) and a random treatment effect (slope) parameter.

Nonparametric mixture model
Mixture models are a class of statistical models used to classify observed data into unobserved classes or components.They are typically used for describing heterogeneity in data that cannot be adequately captured with a single distribution, such as a normal distribution with common unknown mean and common unknown variance. 12Hence it is more appropriate to assume a mixture which then could allow for different means and potentially different variances.When the number of components is finite, the associated mixture model is called discrete or nonparametric.Here the benefit lies in the fact that the mixing or random effects distribution is left unspecified.The nonparametric mixture model is therefore a useful statistical model and can be applied in several areas, including small count data modeling.An introduction to nonparametric mixture models and their estimation using the maximum likelihood method is given in Böhning, 13 Böhning and Seidel, 14 and Dorazio and Royle. 15Applications of this approach are presented, for example, in papers by Doebler and Holling, 16 Holling et al, 17 and Archambeau et al. 18 In this work, nonparametric estimation is used to evaluate the risk ratio and its effect heterogeneity.We first take a closer look at the basis of the approach.According to model (2), mixing can occur in the baseline parameter  i and the effect parameter  i .Two likelihood formulations are possible.A straight-forward discrete generalization of the continuous mixture model given in (3) is provided for study i as where q s is the non-negative weight of class s.The idea is that given the study i and the component s observations within study and component become independent.This leads to the marginal log-likelihood which is the contrast-based log-likelihood.Note that the contrast-based likelihood operates on the level of studies as units of observations.Hence, studies are classified into the estimated components (and not study arms).
We have already mentioned that (5) can be seen as a nonparametric replacement of the normal random effects distribution given in (3).It is a general result in mixture models that any mixing distribution, whether it is discrete or continuous, if estimated by maximum likelihood there is always a discrete finite mixture model estimate which achieves the largest likelihood.Hence, the Poisson mixed regression model (3) will have a likelihood not larger than the best finite mixture model. 19However, the discrete mixture model might have numerous classes leading to a considerable number of parameters.Keep in mind that the discrete mixture model has 3S − 1 parameters whereas the mixed Poisson regression model has only four.So, for example, with S = 4 classes we have 11 parameters, considerably more than the mixed Poisson regression model.Hence, likelihood considerations need to be complemented with information based criteria such as the AIC and/or BIC.In fact, if the AIC/BIC is similar for the discrete mixture model when compared with the mixed Poisson regression model, this might be taken as an indication of the suitability of the latter model.In contrast, if this is not the case, the validity of the mixed Poisson regression model, in particular the part referring to the normal random effects, might be doubtful.
Alternatively, one might work with the mixture likelihood of study i and arm j as so that the associated log-likelihood becomes This is the arm-based log-likelihood in which the unit of observation consists of the data from each arm of the respective study.Hence, no longer studies are classified but rather arms of the studies.This is the form of mixture likelihood used in standard software packages.Note that for S = 1 the contrast-based and arm-based approach coincide as the both likelihoods agree.The major benefit of the arm-based approach can be seen in the fact that allows easily to incorporate studies with only one arm information.The 3S − 1 parameters of the model are typically summarized in a 3-by-S matrix Q: We can think of Q as a mixing distribution giving mass q s to discrete points ( s ,  s ) for s = 1, 2, … , S. Any nonparametric mixing distribution has a discrete maximum likelihood estimator, 19,20 so that ( 5) and ( 7) can be used without loss of generality.The above log-likelihoods ( 5) and ( 7) provide the respective discrete mixture log-likelihood function with weights q s , for s = 1, 2, … , S, and S is the number of the components.The q s have to meet the constraints q s ≥ 0 and ∑ S s=1 q s = 1.The value of S is typically found by starting with S = 1 and increasing its value by one until no further significant increase in the log-likelihood is found.In practice, several criteria for selecting the correct number of components have been suggested.We consider the AIC and BIC as a criterion to select the suitable number of components S. The smallest AIC or BIC value determines the model of choice.
To obtain the maximum likelihood estimates for parameters involved in the mixture likelihood, algorithmic approaches need to be used.The EM algorithm 21,22 is a popular iterative method for finding the maximum likelihood estimates.It consists of two main steps: • the expectation (E) step: using the observed available (incomplete) data of the data set to estimate the values of the missing data; • the maximization (M) step: using the (unobserved) complete data generated after the expectation step to update the parameters.
EM algorithms tailored for the problem at hand are given in the next subsections.Further information on the EM algorithm and convergence theory for mixture likelihood problems are presented in Böhning. 13

Arm-based approach in the discrete mixture model
This section focuses on the likelihood based on the arm-based (AB) approach with the EM algorithm to find the maximum likelihood estimates of the mixture model.Let z ijs be an indicator of class s in study i and group j in a meta-analysis of k independent studies.We suppose that the random variable X ij has a Poisson distribution of the form Po(x ij |n ij exp( s +  s × j)), conditional upon treatment j and class membership s.The unobserved, complete data likelihood with the mixing distribution Q is given by with the unobserved, complete log-likelihood Since z ijs is unknown, it is estimated by the expected value of z ijs given X ij = x ij which is obtained based on the basis of Bayes theorem.It is written as Then, we substitute E ijs into z ijs of the log-likelihood given in ( 9) and maximize it with respect to intercept and slope parameters.The process to obtain the maximum likelihood estimate is presented in Algorithm 1.
Algorithm 1 (EM-algorithm for the arm-based approach).
Step 0. Choose initial values for q s ,  s , and  s , for s = 1, 2, … , S Step 1 (E-step).Estimate z ijs from , Step 2 (M-step).Maximize the expected complete data log-likelihood and find αs , βs as well as Convergence criterion.Steps 1 and 2 are repeated until each estimate of q s ,  s , and  s converges to a constant with an acceptable error.Specifically, the difference of the estimates in the (t)th and (t − 1)th iteration is less than 0.00001.

Contrast-based approach in the discrete mixture model
Another type of nonparametric mixture likelihood modeling considered in this section focuses on within study comparison.We assume that the joint distribution of X i0 and X i1 is a product Poisson distribution of the form ∏ 1 j=0 Po(x ij |n ij exp( s +  s × j)) conditional upon treatment and class membership.Note that the form of the likelihood is different from the arm-based approach.In this case, the observed, incomplete log-likelihood is given by This contrast-based log-likelihood is equivalent to the function given in (5).Suppose that z is is the unobserved indicator informing about component membership, where z is = 1 if the ith study belongs to component s, and z is = 0 otherwise.Estimating the parameters in (10), we derive the maximum likelihood estimates for q s ,  s , and  s using the unobserved, complete likelihood function with the mixing distribution Q: The log-likelihood function for the complete data is therefore accomplished as Here, z is is estimated by the conditional expected value of z ij given X ij = x ij .Under the contrast-based approach, it is derived as To estimate the unknown parameters, which are q s ,  s , and  s , for s = 1, 2, … , S, where q s ≥ 0 and ∑ s q s = 1, the EM-algorithm requires iterating between E-and M-steps.The process applied to this approach is given in Algorithm 2.
Algorithm 2 (EM-algorithm for the contrast-based approach).
Step 0. Choose initial values for q s ,  s , and  s , , where E is ≥ 0 and and find αs , βs , and Convergence criterion.Steps 1 and 2 are repeated until the difference of each estimate of q s ,  s , and  s , for s = 1, 2, … , S, in the (t)th and (t − 1)th iteration is less than 0.00001.
Note the difference between Algorithms 1 and 2. In the latter we calculate expected values E is for the ith study to belong to class s, whereas in the former we calculate expected values E ijs for study arm ij to belong to class s.

Choice of initial values and implementation of the EM algorithm
The EM algorithm for mixture models is prone to the occurrence multiple, local maxima.To avoid choosing a sub-optimal estimate of the mixing distribution, we have adopted the following strategy.Let us assume the number of components S is fixed.Then each study is randomly allocated into one of the S classes, so that a partition of all studies into the S classes is achieved.For each of the S classes, the parameter estimates are found by fitting a homogeneous Poisson regression model.This provides the component parameter estimates for each of the S classes.The mixing weights are then chosen according to the proportion of each class among all studies.The EM algorithm is then executed with these initial values and the maximum likelihood determined.The procedure is repeated 20 times (we did not find any gain going beyond 20 repetitions), and those estimates among the set of 20 with the largest likelihood is finally chosen.The rationale for this technique stems from the so-called classification likelihood, where maximum likelihood parameter estimates are found among all possible classifications of the data into the classes. 23

Estimation of effect heterogeneity
The most important purpose of a meta-analysis is to obtain an estimate of the effect size and its variance.In the previous section, we show how to obtain the maximum likelihood estimates of q s ,  s , and  s , involved in the mixing distribution Q, for s = 1, 2, … , S, using nonparametric mixture estimation.These estimates are denoted as respectively.It follows that the between-study heterogeneity variance for the log-risk ratio is estimated by where τ2 ≥ 0, necessarily.
As noted in the previous section, the number of latent classes problem can be addressed using model selection criteria.We select the number of components using AIC and BIC to search for the most appropriate model.The preferred model with a suitable value of S should have the smallest AIC or BIC value, compared to the other models.In this article, the relevant criteria for determination of the number of components are given by AIC = −2 log L obs + 2r and BIC = −2 log L obs + r log(k), where log L obs is the incomplete, observable mixture model log-likelihood, r is the number of parameters to be estimated in the model, computed by r = 3S − 1, and k is the number of studies in a meta-analysis.As noted in Böhning et al, 24 the BIC generally penalizes complex models more strongly than the AIC.However, we will apply both criteria in the real-world data sets given in the next section.

Example 1: Antibiotic prophylaxis for cesarean section
The meta-analytic example used in this section is taken from Smaill and Hofmeyr. 25It is about the effect of prophylactic antibiotic treatment on infectious complications in women undergoing cesarean section.One of the most important risk factors of post-partum maternal infection is cesarean delivery.The effects of antibiotic prophylaxis vs no prophylaxis (placebo) are compared.The data used here include 61 studies and are given in Table A1.It can be seen that many studies have zero events in the treatment arm and rare events occur in both groups.We estimate the overall log-risk ratio using several methods and model evaluation under both mixture models is considered.They are provided in Table 1.
When we use the IVW method with heterogeneity variance between-study estimation by DerSimonian and Laird, 26 the double-zero studies are excluded before the analysis.For this data set, risk ratios indicate homogeneity across studies, according to the Cochran Q test (p = 0.37), see also the forest plot given in Figure 1.From Table 1, we can see that the contrast-based model yields substantially smaller values for AIC and BIC than the arm-based approach.The model with S = 3 components is an appropriate choice for the discrete contrast-based mixture model (AIC is slightly better for S = 4 but the BIC is worse, hence we choose the less complex model here).Also, it has a log-likelihood value better than for the mixed Poisson model although differences are not large and the mixed Poisson regression model has smaller BIC value.The suitable estimate gives the estimated overall risk ratio of exp(−1.11)= 0.33, which indicates that the risk of complications following cesarean section is 67% lower in patients with antibiotic prophylaxis compared to the placebo.The effect estimate using the mixed Poisson model is quite similar with a slightly large heterogeneity variance estimate.
TA B L E 1 Parameter estimates (95% percentile bootstrap confidence intervals in brackets) of the mixing distribution and meta-analysis for the data on antibiotic prophylaxis for cesarean section.Furthermore, the estimate for  2 from the contrast model with S = 3 is 0.14.The result confirms low effect heterogeneity of the risk ratio in this meta-analysis.To gain more insights into the class stricture we look at the nonparametric maximum likelihood estimates in the three classes mixture model using the contrast-based approach which is given as
For comparison, we also provide the nonparametric maximum likelihood estimate using the arm-based approach as As could be expected due to the large differences in AIC and BIC the estimate distribution appears quite different, less in the baseline, but more in the treatment effect although both have similar marginal mean.Overall, the contrast-based model appears to be the better choice here as it most likely corresponds to a contrast-based design of the meta-analysis as well as it has the far better model fit.

Example 2: Complications in thoracoscopy and open surgery
Congenital lung malformations can be found in live infants.In fact, 1 in 2500 live born children are seen on routine antenatal scans with the condition. 27Two main treatments of asymptomatic antenatally diagnosed congenital lung malformations in young children include open and thoracoscopic resections.Prior to the 1990s, the open procedure or thoracotomy was usually performed on infants to excise these lesions.More recently, a new approach called thoracoscopy or key-hole surgery has been applied. 28Thoracoscopy involves making small incisions in the chest through which a fiber-optic camera and operating instruments can be passed.However, it is not completely clear whether thoracoscopy is better in terms of total complications after treatment than open surgery.In this application, we use the meta-analytic data from Böhning and Sangnawakij, 29 originally collected by Adams et al. 30 The data are given in Table A2.It can be seen that 12 studies include information on the number of complications and the number of children enrolled in both treatment groups.In addition, 15 studies have information on keyhole only and nine reports have information on open surgery.However, according to the approaches considered in this article, 24 studies with single-arm information are not possible to use in the contrast-based approach, since a comparative treatment is missing.For consistency, despite the fact that it would be possible to use all data, only those with both arm information are used in the arm-based approach.Hence, the results shown in Table 2 are obtained using the studies with information in both arms.As shown in Figure 2, there is no significant difference across risk ratios at 0.05 significance level (p = 0.23).From Table 2, the discrete contrast-based mixture model fits well to this data set, as it provides the smallest values of AIC and BIC, and also has the largest log-likelihood, compared to the model using the arm-based approach (although the difference is not large as can be expected in this case) and the mixed Poisson regression model.To select the number of components, it is clear that the two-component (S = 2) model has the smallest values for AIC and BIC, and is appropriate for this data set.The risk ratio estimate is then given by exp(−0.33)= 0.72.Summarizing from this data set, thoracoscopy has 28% lower risk of complications than open surgery for treatment of asymptomatic congenital lung malformations in young children.Furthermore, coming to the class details, the nonparametric maximum likelihood estimates for parameters q s ,  s , and  s for S = 2 components from the contrast-based approach are , with the estimated between-study variance τ2 = 0.26.For the arm-based approach with S = 2, we have the class estimates Here we can see that there is very little difference between the arm-and contrast-based approach.We now explore a bit further the potential benefits of the arm-based approach as it allows to include the nine studies available with the open surgery treatment as well as the 15 studies using keyhole treatment.First, we find the most F I G U R E 2 Forest plot of the risk ratio for the asymptomatic antenatally diagnosed congenital lung malformations in young children.
Weights are from random-effects model; continuity correction applied to studies with zero cells.All references in this figure can be found in Adams et al. 30

TA B L E 3
Model assessment for the meta-analysis for the data on total complications using all 36 studies; all mixture models allow for different intercepts, 95% bootstrap confidence interval in brackets.
For this model we find an estimate of the log-risk ratio of −0.54 with 95% confidence interval from −1.08 to 0.23.For comparison, if we only use the 12 studies with information in both arms (and using the same mixture model with two classes) the log-risk ratio is −0.32 with 95% confidence interval from −0.76 to 0.22.Using the full information changes the risk ratio to a more beneficial value, which shows the potential value of an arm-based approach when comparative information is not available across all studies.It also offers an interpretation of the estimated heterogeneity.It consists of two classes where the larger class with 66% provides a more beneficial log-risk ratio of −0.63 whereas the other class with 34% provides a less beneficial effect of −0.37.Note the change of the log-risk ratio for this class when compared to the contract-based approach using the reduced set of 12 studies.We also point out that the heterogeneity variance has decreased to a small value.Note that it is neither possible to provide a DerSimonian-Laird analysis (IVW) nor the mixed Poisson regression model as these are inherently contrast-based models. (

F I G U R E 3
Forest plot of the risk ratio for the data on rebleeding events after negative vs positive small bowel capsule endoscopy.
Weights are from random-effects model; continuity correction applied to studies with zero cells.All references in this figure can be found in Yung et al. 31

Example 3: Small-bowel capsule endoscopy
Capsule endoscopy (CE) or video capsule endoscopy is a procedure used to examine the small intestine section of the digestive system using a tiny wireless camera.It can be used as a choice for diagnostic tests for the evaluation of occult small bowel bleeding.We are interested in the case study given in Yung et al. 31 The meta-analytic data with 23 studies and the forest plot for the risk ratio on rebleeding events after negative vs positive small bowel capsule endoscopy are shown in Table A3 and Figure 3, respectively.The results show evidence for heterogeneity of the risk ratio across studies for these data (p-value = 0.00).Using the discrete mixture estimation, the arm-based approach with S = 3 provides the smallest values of AIC and BIC.We have that the risk ratio estimate is exp(−0.46)= 0.63 and the estimated between-study variance is τ2 = 0.92.In addition, if a contrast-based design is applied, the four-component model is then the best fit for these data based on the BIC.It provides an estimated risk ratio of exp(−0.43)= 0.65 and an estimated between-study variance of τ2 = 0.42 which is also close to the between-study variance estimate of the mixed Poisson regression model.AIC and BIC for the AB-model is slightly better than those from the CB-model although the differences are marginal.For details see Table 4.

Design of simulation
The simulation study is performed to compare the performance of nonparametric estimation of effect heterogeneity based on the arm-based (AB) and contrast-based (CB) approaches for meta-analysis of the risk ratio.They are also compared to the performance of estimators obtained from the random effects IVW method, using the estimated between-study variance by DerSimonian and Laird, and the mixed-Poisson regression.All simulation works of this study are accomplished in R statistical software 32 and simulation settings are given as follows.The numbers of studies (k) in a meta-analysis are given as 15, 30, and 60.We set the average sample size per study (n ij ) as 30, 60, 120, and 600, reflecting small to large size.The number of classes is set to S = 2.The parameter settings for q s ,  s , and  s , for s = 1 and 2 are given in Table 5.From each case, the overall parameters are calculated by  = ∑ s q s  s for the intercept,  = ∑ s q s  s for the log-risk ratio, and  2 = ∑ s q s ( s − ) 2 for the heterogeneity variance, given in the mixture model.The design of the simulation is separated into two parts: contrast-based design and arm-based design.The sample size per study is generated from Po(n ij ), Poisson distributions, for i = 1, 2, … , k and j = 0 and 1.The data (X i1 and X i0 ) for each replication are simulated under the two design parts as follows.
• Contrast-based design.The class s of each study is sampled from a Bernoulli distribution with probability parameter q 1 , to see which class we are in.If the outcome of the Bernoulli experiment is 1, we take X i1 for the treatment group from Po(n i1 exp( 1 +  1 )) and X i0 for the control group from Po(n i0 exp( 1 )).If the outcome of the Bernoulli experiment is 0 we sample X i1 from Po(n i1 exp( 2 +  2 )) and sample X i0 for the control group from Po(n i0 exp( 2 )).• Arm-based design.We generate a Bernoulli variable from a Bernoulli distribution with probability parameter q 1 .If the outcome is 1, we take X i1 from Po(n i1 exp( 1 +  1 )), else X i1 is sampled from Po(n i1 exp( 2 +  2 )).Again, a Bernoulli variable is generated with event probability q 1 .If the outcome is 1, we take X i0 from Po(n i0 exp( 1 )), else Po(n i0 exp( 2 )).
Note that in this design part there is no binding of treatment and control to the same class.
In each design part, both CB and AB approaches will be used to estimate the parameters.The design-consistency of the method is therefore investigated.For each scenario given above, 1000 replications are generated.Then, the estimated values of , , and  2 are computed using the formulas for , , and τ2 as presented in ( 13) and ( 14), respectively.To evaluate these estimators, the performances are examined in terms of bias and mean squared error (MSE).They are given where  h is the generic estimate in the hth simulation run for parameter .

Simulation results
To compare the performance of the models using AB and CB approaches, the log-likelihood estimate, together with the AIC and BIC from each model are evaluated.The main findings of the simulation studies can be summarized as follows.
• We first concentrate on the simulation results under the contrast-based design.Figure 4 and Table S1 given in the supplementary materials show the performance of  and τ2 in terms of bias for simulated data under Cases A, B, and C. The estimators are obtained from the four approaches.In general, the biases of estimators from AB and CB methods tend to be small.We observe that the estimates of  from the CB approach has smaller bias than the estimates from the AB approach, IVW, and mixed Poisson regression in Cases A and B. However, for Case C where  2 is large, all methods, except for mixed Poisson regression, perform well and have similar biases.For the heterogeneity variance estimator, the result shows that τ2 from the CB approach provides small bias and its bias is lower than the comparative estimators, for Cases B and C. Again, biases of τ2 from Poisson regression are larger than those of the other estimators.
• The MSEs of  and τ2 are shown in Figure 5.We can see that they are small, as the values are close to zero.In detail,  and τ2 calculated from the CB approach have MSEs smaller than those from the AB approach.In this simulation, τ2 estimated by the DerSimonian and Laird method under the IVW approach has the smallest MSE for some settings in Case A. For large n ij , the MSEs of estimators from all methods provide similar small values.
• Next, we look at the simulation results based on the arm-based design.From Figure 6 and Table S2 given in the supporting information, we conclude that  and τ2 computed from the AB approach outperform those from the other estimators in general situations.In particular, biases of  and τ2 obtained from the AB approach are closer to zero than those of the CB approach and two existing methods.Moreover, from Figure 7, MSEs of  and τ2 from the AB approach are always smaller than those of the other methods.
• We now look at the nonparametric estimation of the random effects distribution.The mean of log-likelihood for model selection (a way to measure the goodness of fit for a model) is shown in Figure 8.In Figure 8A the CB design is considered and it can be seen that the log-likelihood estimates of the CB model are larger than those of the AB model.This is supported by the percentage of simulation replications for which the log-likelihood from the CB model performs better than the respective values of the AB model.In all cases, we have a percentage greater than 95%.Therefore, the mixture model using a CB approach is the preferred model for this simulation setting.Referring to the AB design, Figure 8B also confirms that the model selection criterion by the log-likelihood shows better performance of the AB approach than the CB approach although there are many cases where the likelihood of the CB-model outperforms the AB-model likelihood.Note that we have restricted considerations here on the log-likelihood since AIC and BIC only differ for this comparison by the former.More details on the performance of log-likelihood, AIC, and BIC are given in the supplementary material.
As all of the above points demonstrate, nonparametric estimation of effect heterogeneity using the CB or AB approach in meta-analysis of rare events is appropriate under its design.This result is supported by the bias and MSE of the estimator.Furthermore, it is confirmed by the model selection criteria AIC, BIC, and log-likelihood that the discrete mixture models with both approaches are design consistent.These findings indicate that it is possible for a given meta-analytic data set to select the AB or CB approach on empirical grounds.

DISCUSSION AND CONCLUSION
In meta-analysis with rare events, the Mantel-Haenszel approach is a popular and accepted nonparametric method for estimation of the risk or rate ratio.However, this method is constructed under the assumption of effect homogeneity.Hence it is desirable to extend the methodology to incorporate heterogeneity and it is suggested here to accomplish this by using nonparametric mixture models.Two forms of mixture models are possible: the AB and CB mixture likelihood.For both situations, the associated EM algorithms have been developed to find maximum likelihood estimates of the mixing distribution.As a by-product, overall risk and heterogeneity variance can be found as functionals of the estimated mixing distribution (first and second moment).Note that mixtures model heterogeneity in the baseline as well as in the intervention effect.This explains why sometimes the information criteria select a mixture model with S > 1, but τ2 being small.Furthermore, this article considers the performance of estimators in the mixture model and evaluates the model performance by comparing the models based on CB and AB approaches.Simulations show that the nonparametric mixture model with the CB approach has performance benefits compared to the model with the AB approach, when the data are generated from a model in which treatment groups are nested in studies.Conversely, the AB approach has benefits when compared to the CB approach if data are generated from a model in which treatment groups are regarded as separate observations.This result is expected and shows that design and model are compatible.However, as Figure 8 shows in some cases under the AB-design, the CB-model performs better than the AB-model.It seems widely accepted that comparisons within studies are preferable as conditions within a study are more comparable than across studies.This speaks for the CB approach.The AB approach has the strong benefit that it can also include studies which provide only one of the two (or more) study arms.Often, in situations where only one-arm information is available, this is due to the fact that in the respective study center only one technique is available or there is lack of expertise in delivering both treatments.Hence, it might be best judged on the basis of the case study which approach should be taken.In the examples of the meta-analysis of cesarean section and small-bowel capsule endoscopy the CB approach appears appropriate, whereas in the example of the meta-analysis comparing keyhole with open surgery an AB approach seems to be reasonable, as it can also incorporate additional study information.More broadly, the differences between CB and AB approaches appear relatively minor.This might be used as a rationale for using the AB approach in cases where the CB approach is failing as in the case of missing competitor information.The comparison to the classical DerSimonian-Laird and mixed Poisson regression method shows only in the case of very low heterogeneity benefits of the mixed Poisson regression and the inverse-variance-weighted (DSL) method when estimating the heterogeneity variance.If one is interested only in estimation of the overall effect, the choice of the method is less critical.In practice, determining the number of components in mixture modeling is often an issue.Here, we followed the literature and used AIC and BIC as guidance for selecting the number of components.The first example used in the application section of this study showed with τ2 = 0.14 low heterogeneity, the second example had moderate heterogeneity with τ2 = 0.26 which disappeared using the full information in the AB-approach.Only the third example shows clear evidence of effect heterogeneity with at least four classes being required using either approach, AB or CB with τ2 = 0.42 for the CB-approach.If one is interested only in an overall estimate for  and  2 , the mixed Poisson regression model would suffice as it delivers very similar estimates for these parameters.Ultimately, the decision which approach to use should be driven by the available information for the case study at hand, less by likelihood and information criteria based values.However, these might be still used as supportive information if the case study does not provide any other guidance which of the two models to be used.In any case, our results show that the differences between the two approaches are rather marginal.

APPENDIX A. DATA FOR THE CASE STUDIES
The three examples of meta-analytic data are given in Tables A1-A3.They are used in Section 3 for the data analysis.

.
The estimated mixing distribution Q gives estimated weights qs to the combination of αs and βs .Hence, we can also give estimates for the expected values of  and  by  =

4
Biases of  and τ2 obtained from the four approaches, data generated under the CB design.

F I G U R E 5
MSEs of  and τ2 obtained from the four approaches, data generated under the CB design.

F I G U R E 6
Biases of  and τ2 obtained from the four approaches, data generated under the AB design.

F I G U R E 7
MSEs of  and τ2 obtained from the four approaches, data generated under the AB design.
Parameter estimates (95% percentile bootstrap confidence intervals in brackets) of the mixing distribution and meta-analysis for the data on total complications.
TA B L E 2 Parameter estimates (95% percentile bootstrap confidence intervals in brackets) of the mixing distribution and meta-analysis for the data on rebleeding after small bowel capsule endoscopy.Parameter settings for the three cases in simulations.
TA B L E 4 TA B L E 5

i1 Sample size: n i1 Events: x i0 Sample size: n i0
31ta-analytic data on prophylactic antibiotics in women undergoing cesarean section.All references mentioned in this table can be found in Smaill and Hofmey.25Meta-analyticdata on the negative small-bowel capsule endoscopy for small-bowel bleeding.All references in this table can be found in Yung et al.31