A new regression model for overdispersed binomial data accounting for outliers and an excess of zeros

Binary outcomes are extremely common in biomedical research. Despite its popularity, binomial regression often fails to model this kind of data accurately due to the overdispersion problem. Many alternatives can be found in the literature, the beta‐binomial (BB) regression model being one of the most popular. The additional parameter of this model enables a better fit to overdispersed data. It also exhibits an attractive interpretation in terms of the intraclass correlation coefficient. Nonetheless, in many real data applications, a single additional parameter cannot handle the entire excess of variability. In this study, we propose a new finite mixture distribution with BB components, namely, the flexible beta‐binomial (FBB), which is characterized by a richer parameterization. This allows us to enhance the variance structure to account for multiple causes of overdispersion while also preserving the intraclass correlation interpretation. The novel regression model, based on the FBB distribution, exploits the flexibility and large variety of the distribution's possible shapes (which includes bimodality and various tail behaviors). Thus, it succeeds in accounting for several (possibly concomitant) sources of overdispersion stemming from the presence of latent groups in the population, outliers, and excessive zero observations. Adopting a Bayesian approach to inference, we perform an intensive simulation study that shows the superiority of the new regression model over that of the existing ones. Its better performance is also confirmed by three applications to real datasets extensively studied in the biomedical literature, namely, bacteria data, atomic bomb radiation data, and control mice data.

quality of life. 7 Indeed they are also encountered in many other similar fields. [8][9][10] The binomial model is often used for this kind of discrete data, 11 where the probability of "success" is assumed to remain constant throughout the independent Bernoulli trials leading to the value of the binomial response variable. However, it is possible for the data to be overdispersed, that is, to be characterized by a larger variance than assumed by the model. In practice, this happens almost every time that binomial regression (BinReg) is applied to count data. 12 However, ignoring overdispersion can lead to a serious underestimation of standard errors, and to misleading inferences. 13 The phenomenon of overdispersion has been widely addressed in the literature, and the most popular model for overdispersed data is the beta-binomial (BB) model, proposed originally by Williams 14 and later applied in many contexts. 2,3,7,[15][16][17][18][19] Several different causes of overdispersion are reviewed in the literature, 4,13 which are mainly connected with the failure of the basic i.i.d. assumption of the individual responses (ie, the binary outcomes) within a covariate pattern. The BB model accounts for overdispersion, allowing the probability of "success" to vary according to a beta distribution, thus relaxing the i.i.d. assumption. In particular, the BB model enriches (and encompasses) the binomial model with an additional precision/dispersion parameter, which admits an interesting interpretation in terms of intraclass correlation as well. 16,20 Nevertheless, there are situations where overdispersion is due to concomitant causes, and a single additional parameter is unable to account for all of them. For example, in teratology experiments, overdispersion is naturally induced by the fact that litters, rather than individual animals, are regarded as the experimental units. Therefore, since we expect differences between litters on biological grounds, the dispersion parameter of the BB model is naturally dedicated to accounting for this source of overdispersion, disregarding any further sources. Of particular relevance are situations where, besides the omission of important explanatory variables, overdispersion is due to (or exaggerated by) the contaminating presence of outliers or excess of zero observations (ie, a sample proportion of zero observations higher than the one assumed by the model).
The purpose of this study is to generalize the BB distribution by introducing the flexible beta-binomial (FBB) distribution, and to define a new regression model for overdispersed data based on it. The new distribution can be seen as a special mixture of two BB distributions, which displays two further parameters. This allows to enrich the variance structure so as to account for multiple causes of overdispersion, though preserving the intraclass correlation interpretation as well. The great variety of possible shapes of the new FBB distribution (which includes bimodality and various tail behaviors) directly reflects on the flexibility of the corresponding regression model. Indeed, the latter succeeds in adapting to the presence of outliers as well as excessive zero observations without requiring ad hoc extra components accounting for them. This is possible because the new model dedicates one of its mixture components to a particular group of observations (eg, zero-values and/or outliers) automatically and only when necessary, providing interesting information about the possible sources of overdispersion. We adopt a Bayesian approach to inference, which is more suitable for complex models such as mixtures, as it avoids the computational and analytical problems of likelihood-based inference and its small-sample limitations. The potential of the new model is illustrated by means of three datasets extensively studied in the literature, all characterized by nonstandard modeling issues. More precisely, we focus on bacteria data related to a completely randomized experiment aimed at comparing two different biotypes of egg parasitoid, and characterized by a large amount of zero counts. 21 Then, we analyze atomic bomb radiation data concerning the study of chromosomal abnormalities of cells from survivors of the atomic bombs in Hiroshima and Nagasaki, where some latent groups are present. 5 Finally, we apply our model to control mice data referring to fetal deaths in groups of mice for different litters. 22,23 Here it is well recognized the presence of outlying observations. For all the examples, the new model is confirmed to be preferable to competing models in terms of fit via several diagnostics designed to detect discrepancies between observed and predicted data. The remainder of this article is organized as follows: Section 2 briefly describes some useful distributions for binary outcomes and their related regression models and introduces a novel regression model based on the new FBB distribution. Section 3 describes the Bayesian approach to inference as well as several model comparison criteria and model checks based on posterior predictive distributions and cross-validated leave-one-out (loo) approaches. Section 4 outlines the simulation studies used to evaluate the performance of the flexible beta-binomial regression (FBBReg) model, and compares it with that of the BinReg and the beta-binomial regression (BBReg) models. Section 5 discusses the results and main findings from the application of our new regression model to real datasets. Finally, Section 6 offers some concluding remarks.

DISTRIBUTIONS AND REGRESSION MODELS
In this section, we briefly review the BB distribution and introduce the new FBB one. Then, we illustrate the regression models based on these distributions (BBReg and FBBReg, respectively).

The BB distribution
Let Y be a response variable denoting the sum of n independent Bernoulli variables with probability parameter . Assuming that is constant leads to the binomial distribution: Y ∼ Bin(n, ) with probability mass function (pmf) If we sum dependent Bernoulli variables whose probability parameter is random and follows a beta distribution, we obtain a BB distribution. Specifically, given , we have Y | ∼ Bin(n, ), where ∼ Beta( , ) with probability density function (pdf) is the Beta function, and Γ(⋅) is the Gamma function. The pmf f BB (y; , ) of Y can be easily obtained by marginalization: where y ∈ {0, 1, ..., n} and n ∈ N. In particular, we have where Var(Y ) is derived by applying the law of total variance. Note that the parameter = 1 +1 can be thought of as an overdispersion parameter since Var(Y ) is an increasing function of , and the form of Var(Y ) approximates the binomial variance as → 0. Moreover, also admits an interesting interpretation in terms of the intraclass correlation coefficient (ICC), 16 that is, it represents the (common) correlation between the pairs of the Bernoulli variables that form the response count Y . In particular, let U 1 , ..., U n be the Bernoulli variables giving rise to the response count Y = ∑ n r=1 U r , and suppose they are identically distributed (with expected value ), but not independent. Then, the expression of Var(Y ) given by (3) together with the well-known equality Var(Y ) = ∑ n r=1 Var(U r ) + 2 ∑ r<l Cov(U r , U l ), allow to prove that: for r, l = 1, ..., n, r ≠ l. Given (4), the ICC takes the form and the variance (3) of the BB can be written as Var(Y ) = n (1 − )(1 + (n − 1) BB ).

The FBB distribution
The FBB distribution is obtained by compounding the binomial distribution with the flexible beta (FB) one. The latter is the univariate case of the flexible Dirichlet (FD) distribution, which is a generalization of the Dirichlet distribution. 24,25 In particular, the FB distribution is a special mixture of two beta distributions with a common precision parameter and two arbitrary means 1 > 2 . 26 Its pdf can be expressed as where f Be (⋅; ⋅) is given by formula (1), 0 < 2 < 1 < 1, > 0, and 0 < p < 1 is the mixing proportion. From a regression perspective, a convenient reparameterization of the FB distribution is given by which explicitly includes the mean 0 < = E[ ] < 1 and a normalized distance 0 < w < 1 between the two mixture components. This parameterization proves to be particularly useful since it defines a variation-independent parametric space, meaning that no constraints exist among the parameters , w, , and p. Specifically, the mean and variance of the FB distribution are given by where m( , p) = min ) .
Now, let Y | ∼ Bin(n, ) and ∼ FB( , w, , p). Then, the compound distribution Y ∼ FBB(n, , w, , p) has pmf where f BB (⋅; ⋅) is given by (2) and From Equation (7), it is immediately clear that the FBB distribution can be expressed as a finite mixture of two BB components with a common precision parameter and different means n 1 > n 2 . This allows a large extension of the possible shapes of the FBB distribution compared to those possible with the BB distribution, which is inherited from the considerable variety of shapes of the FB distribution. 26 In particular, in addition to the usual unimodal shape, J-shape, inverse J-shape, and U-shape that are possible with the BB distribution, the FBB distribution can be bimodal, asymmetric, and can also accommodate for various tail behaviors, as illustrated in Figure 1. In particular, the FBB can exhibit both symmetric unimodal (solid line in A) and bimodal probability functions (dashed and dotted lines in A). Of particular interest are the tail behaviors (B), since the FBB can give rise to unimodal distributions with heavy tails, and it includes pmfs resulting in only one heavy tail, and possibly large asymmetry. Moreover, note that the FBB contains the BB distribution as an inner point. Indeed, fixing = p, w = 1∕ , and = + 1 ( > 0) it is possible to show that FBB(n, , w, , p) d = BB(n, , ). Equation (5) allows us to compute the mean and variance of the FBB, which take the form where m( , p) is given by (6). A comparison between the FBB variance (9) and the BB variance (3) indicates that the former includes the extra variation due to overdispersion (second addend enclosed in square brackets) already present in the BB variance, which becomes zero when the parameter = 1 +1 → 0 (ie, when the precision of the beta distribution tends to infinity). However, the FBB variance (9) further includes a third addend, which can take on positive values when → 0, and depends on the normalized distance w between the mixture components and the mixing proportion p, approaching zero if one of the following limits holds: w → 0, p → 0, or p → 1. Therefore, the third addend in (9) is due to the presence of two clusters, typically attributable to an unobserved (or unobservable) qualitative explanatory variable. Very interestingly, we see in Section 4 that one of these two clusters can be adapted to capture a group of outliers or an excess of zero observations where appropriate.
Finally, it is noteworthy that the FBB variance admits an interesting interpretation in terms of intraclass correlation. In particular, taking advantage of the notation used in Section 2.1, the covariance between any two Bernoulli variables forming the count Y can be written as for r, l = 1, ..., n, r ≠ l. Given (10), the ICC of the FBB distribution takes the form where = 1 +1 . Moreover, the variance (9) of the FBB can be rewritten as Var(Y ) = n (1 − ){1 + (n − 1) FBB }. Equation (11) shows that FBB can be interpreted as a weighted mean of the maximum possible correlation (ie, 1) and the minimum possible correlation 0 < w 2 m( , p) < 1 for given . Note also that it is an increasing function of w, and it approaches = 1 +1 (ie, the ICC of the BB distribution) when one of the following conditions holds: w → 0, p → 0, or p → 1, that is, whenever the two latent clusters collapse. Moreover, p and enter FBB in a symmetric fashion. Finally, it is noteworthy that, differently from the BB intraclass correlation, the FBB one does not depend on only but also on , which means that, in a regression context, it is naturally modeled as a function of covariates.
A further investigation of the behavior of the FBB's ICC can be found in Section 2.1 of the Supplementary Material (SM).

Excess of zeros
Binomial data are often affected by an excess of zeros, that is, a larger proportion of zero values than the one allowed by the assumed model. Let [n] , where x [n] = x(x + 1) … (x + n − 1) is the rising factorial function. Indeed, the inability of the BB to accommodate excessive zero counts is mainly due to the limited range of its tail behaviors. In contrast, the extreme flexibility of the FBB distribution in terms of shapes proves beneficial for modeling zero counts. In particular, from (7) and (12), it follows that where 1 and 2 are given by (8). From an analytical point of view, one observes that is an increasing function of w, tending to BB( , ) 0 when w → 0. This happens because both 1 and 2 collapse to and, therefore, the two mixture components coincide (please note that w = 0 does not belong to the parameter space). Analogously, when p goes to the boundary of its parameter space, as it easily follows from (13) once the following limits are taken into account: In addition, FBB( ,w, ,p) 0 takes its maximum value when p = . It is noteworthy that, for fixed , the probability of zero values decreases as increases. Conversely, for fixed , as approaches 1, the probability of zero values tends to 0. This is a reasonable result since if the overall mean approaches its upper limit, most of the probability mass should be located in a neighborhood of 1. A graphical inspection of the behavior of FBB( ,w, ,p) 0 can be found in Section 2.2 of the SM.
Note that a zero-inflated binomial (or BB) model 15,27,28 is a two (respectively three) parameters model that accounts for the excess of zeros by expressly dedicating an ad hoc parameter to the zero-inflation. In particular, let Y be a random variable distributed according to a zero-inflated binomial (ZIBin) or to a zero-inflated BB (ZIBB) distribution with inflation parameter q. Then, its pmf can be expressed as where f (y; ⋅) is the pmf of the proper binomial or BB distribution. Note that, differently from inflated models, the FBB succeeds in addressing this issue by dedicating a component of its mixture to zero inflation automatically, and only when necessary, as we see in Section 4.
Almost all the properties of the FBB distribution are due to its mixture expression, which is inherited from the FB distribution. Indeed, the latter is a structured (ie, non generic) mixture with constraints on its components' parameters ensuring model identifiability. Despite these models are characterized by only two mixture components, they are flexible enough to handle several issues that occur quite often in applications, though preserving good theoretical properties (differently from generic mixtures).

The BinReg, BBReg, and FBBReg models
Let Y i represent the response variable observed for subject i(i = 1, ..., N), that is, the count of successes out of a sample of size n i and let E[ where = ( 0 , 1 , … , K ) ⊺ is a vector of regression coefficients, and g(⋅) is a twice differentiable and strictly monotone link function. Given that i takes values in the unit interval (0, 1), a straightforward choice for g(⋅) is log it Although other link functions can be adopted (eg, the probit or the complementary log-log), the logit link is a popular choice since it is the canonical link function for the binomial distribution, also allowing a simple interpretation in terms of odds ratios as well. The BinReg, BBReg, and FBBReg models are then defined by assuming that Y i follows a Bin(n i , i ), BB(n i , i , ), or FBB(n i , i , w, , p) distribution, respectively. In the case of the BinReg, the parameter i in Equation (14) must be replaced with i . The inflated ZIBin and ZIBB regression models (ZIBinReg and ZIBBReg, respectively) can be defined in a similar way.

ESTIMATION ISSUES
None of the three regression models described in Section 2.4 admits an explicit solution to the estimation problem. Moreover, neither the BB nor the FBB distribution belongs to the dispersion exponential family; thus, and differently from the binomial, the estimation of their parameters cannot be conducted by simply applying the standard iteratively reweighted least squares method. 11 There are many proposals in the literature for how to address the issues in likelihood-based inference within BB, as well as BBReg, models. 20,[29][30][31] We decided to adopt a Bayesian approach, which is particularly convenient for dealing with complex models such as mixtures, or simply with models involving many parameters. Moreover, this approach does not depend on asymptotic calculations and makes it easy to cope with the small sample problems that typically affect maximum likelihood inference.
Since the FBB is a finite mixture (see Equation (7)), it can always be expressed as an incomplete data model where the allocation of each observation to one of the mixture components is unknown. Therefore, a Bayesian approach based on Markov chain Monte Carlo (MCMC) techniques is particularly suitable, producing posterior (simulated) distributions for the parameter vector. In particular, we take advantage of the Hamiltonian Monte Carlo (HMC) algorithm, 32,33 which generalizes one of the most well-known MCMC, namely the Metropolis algorithm, by combining MCMC and deterministic simulation methods to generate efficient transitions. This is achieved also by considering the derivatives of the pdf of the target distribution (ie, the posterior). The popularity of HMC is increasing because it is more efficient than classical MCMC methods, and because it is easy to perform through the Stan modeling language which uses the standard No-U-Turn Sampler. 34 Its implementation requires the specification of the log-likelihood function and prior distributions for the parameters. Let y be an i.i.d. sample of size N from the response Y . Then, the log-likelihood is where f * (⋅; ⋅) denotes the pmf of the assumed distribution (binomial, BB, or FBB), and is its parameter vector. In the special case of the FBBReg, f * (⋅; ⋅) is given by Equation (7) and = ( , w, , p) ⊺ .
As for the priors, the variation-independent parameter space allows us to assume prior independence, which is the usual choice when no prior information is available. Thus, we can specify a prior distribution for each parameter separately. In the rest of this article, we use a diffuse multivariate normal prior for the regression coefficients, that is, where 0 is the zero vector and Σ is a diagonal covariance matrix with large values for the variances. Furthermore, we adopt a uniform distribution on (0, 1) for = 1 +1 , w, and p. Please note that BinReg requires only the specification of the prior for ; the BBReg also involves whereas the FBBReg requires the specification of all the four priors. These choices represent a non-(or weakly) informative-but still proper-option. Naturally, different priors can be considered, for example, a widespread method is a Gamma(k ⋅ g, g) for , with small values of g > 0 to induce a large variability around the prior mean k > 0. However, we prefer the uniform prior for = 1 +1 since it guarantees non-informativeness without requiring the specification of hyperparameters. A sensitivity study concerning priors has highlighted robustness, understood as a limited impact on inferential conclusions (see Section 1 of the SM).
Both in the simulations and real data applications, we diagnose convergence of the chains to the equilibrium distribution through graphical tools (trace and density plots), Geweke and Heidel diagnostics to ascertain stationarity, as well as through potential scale reduction and effective sample size to ascertain mixing of the chains. 35 To diminish the dependence of the results on the starting values, we choose these values randomly, and we discard the first half of each chain, imposing a warm-up of 50%. We do not need to set thinning intervals different from 1 (ie, we keep every element from each chain without discarding some) due to the large effective sample sizes produced by the HMC and to the low level of autocorrelation.

Bayesian diagnostic
To compare the three considered regression models, we use a fully Bayesian goodness-of-fit index, namely, the Watanabe-Akaike information criterion (WAIC), 36,37 which is well-defined also for non-regular models such as mixtures. WAIC uses the log pointwise posterior predictive density (lppd) as a measure of fit, and the effective number of parametersp WAIC as a correction for the model's complexity. Given a sample of size B ( (1) , (2) , … , (B) ) ⊺ simulated from the posterior distribution, we can compute these quantities aŝ Finally, WAIC is defined on the deviance scale, that is WAIC = −2(lppd −p WAIC ). Recently, Vehtari et al 37 proposed an efficient way to compute a more robust loo cross-validation criterion. In all our results, the WAIC and loo indexes are very close to each other. Thus, we report only WAIC, which is so far the most widespread one.
Another popular Bayesian diagnostic tool is posterior predictive checks, which aim to assess the validity of a model's assumptions. The main idea of this technique is that "replicated" data generated under the fitted model should behave similarly to the observed data; any differences between the simulated and observed data suggest a potential lack of fit for the model. Let (b) (b = 1, ..., B) be an element of a sample simulated from the posterior distribution, and let y (b) be a sample generated from the posterior predictive distribution f Y (y| (b) ). Furthermore, let T(⋅) be a function of data and model parameters many authors refer to as a discrepancy measure. 38,39 Then, it is possible to compare the empirical distribution of T ( with that of T(y) (ie, the value of T(⋅) computed based on the observed data). Such a comparison can be conducted via plots or through posterior predictive p-values defined as P ( (the closer to 0.5, the better). Posterior predictive checks are particularly useful for detecting overdispersion in a Bayesian framework, where classical tools based on deviance and/or Person's 2 are not suitable. Indeed, choosing the variance as a discrepancy measure, we can assess how the observed variances behave with respect to the theoretical ones. If the observed variance is far from that of the replicated datasets, we can conclude that the assumed distribution is not suitable for modeling the data.
Another critical issue regarding model diagnostics is outlier detection. Our Bayesian perspective has prompted us to use a tool recently introduced in the literature, namely, the conditional predictive ordinate (CPO). [39][40][41] This is a measure used to detect unlikely observations given the current model, and it is defined as the predictive density of the ith observation once the latter has been excluded from the dataset: Once a sample of size B has been generated from the posterior distribution of , and assuming that the Y i 's are conditionally independent given , it is possible to obtain an estimate of the CPO: 42 where f (y i | (b) ) is the pmf of the corresponding model with = (b) . Equation (15) estimates CPO i as the harmonic mean of the likelihood of y i over all the generated (1) , … , (B) . Note that this formula allows us to compute an estimate of CPO i without fitting the model N times, which would be a very time-consuming procedure, even for small datasets. The smaller CPO i is, the lower the likelihood of observing the ith response given the model so the presence of many smallĈPO values suggests that the model is not reliable. In Sections 4.3 and 5.3, we use the CPO measure to compare the outlier detection ability of the BinReg, BBReg, and FBBReg models.

SIMULATION STUDIES
To understand the FBBReg model better, we conduct some simulation studies with different purposes. The first study compares the fitting abilities of the BinReg, BBReg, and the FBBReg models in different data generating processes. Two further studies compare the three regression models in situations that are often problematic for binomial data, namely, the presence of an excess of zeros and the presence of outliers. In each simulation, we estimate the models as described in Section 3, running chains of length 10 000 with a warm-up of 50%.

Model fit study
To compare the fitting abilities of the models, we consider four scenarios with data generated based on (1) a BBReg, (2) an FBBReg, (3) a mixture of two BBReg's with different means and precision parameters (which is not an FBBReg), and (4) another generic mixture of BBReg's where one precision parameter is very small. Note that in scenarios (1) and (2), the data are generated from two of the three competing models. Thus, here, we can compare the performance of one model when the other is favored, and the performance of the estimation process can be investigated as well. On the other hand, scenarios (3) and (4) inspect more challenging cases where all models are misspecified. Please note that scenario (4) is affected by two potential causes of overdispersion, namely, the presence of two latent groups (ie, the two mixture components), each one characterized by a different ICC, and an excess of zeros due to the low precision of one component. This is illustrated by Figure S6 in Section 2.3 of the SM, which shows one randomly selected replication from this scenario. For each scenario, we simulate 1000 times a sample of size N = 150. More specifically, we generate a single covariate x from a uniform distribution on the interval (−1, 1) and n = (n 1 , … , n N ) ⊺ as i.i.d. observations from a Poisson distribution with mean parameter equal to 200. A logit link function is adopted to link the mean parameter to the covariate as follows: log it( i ) = 0 + 1 x i (i = 1, ..., N), for fixed 0 and 1 . Table 1 shows the biases, the mean square errors (MSEs), and the coverage probabilities resulting from having estimated the parameters through their posterior mean. The last column of Table 1 presents the mean of the WAIC criterion over the 1000 replications. The performance of the FBBReg is better than that of the BinReg in all four scenarios with overdispersion. When overdispersion is due to a common correlation among the binary outcomes forming the binomial count (ie, first scenario with data generated from a BBReg), then the FBBReg is competitively similar to the BBReg, exhibiting similar biases, MSEs, coverage probabilities as well as means of the WAIC criterion. When the presence of a common intraclass correlation cannot explain all the extra variation (second through fourth scenarios), the FBBReg is clearly the best model. Indeed, in the second and third scenarios, the biases and MSEs of 0 and 1 under the BBReg model are higher than those of the FBBReg model, and even higher than those of the BinReg model. Moreover, the lowest WAIC values are attained by the FBBReg model. In the fourth scenario, despite the high biases characterizing both the BBReg and the FBBReg, the latter shows a substantial better fit, being the preferable model in 100% of replications.
All these observations suggest using the FBBReg model instead of the BBReg one in general, since the former can recognize a wider spectrum of overdispersion scenarios than the latter, and it does not perform significantly worse when data are generated from a BBReg model. Moreover, the FBBReg yields unbiased estimates even in some scenarios of model misspecification.

Excess of zeros study
In this section, we focus on the performance of the three models in scenarios with a higher percentage of zeros than the one assumed by the binomial data generating process. More precisely, we generate samples of size N = 100 such that Y i ∼ Bin(n i , logit −1 (1 + 2x i )), where the number of trials for each observation n i is generated from a Poisson distribution with mean parameter equal to 50, and the continuous covariate x is distributed according to a uniform distribution on the interval (−1.5, 2). Then, in each scenario, we randomly select a different percentage of units (5%, 10%, 20%, and 50%) and set their outcome Y i to zero. For each scenario, we simulate 250 samples. For comparison purposes, we also estimate the zero-inflated ZIBinReg and ZIBBReg models (see Section 2.3). Figure 2 reports the mean of WAIC as a function of the percentage of zeros for all models. It is remarkable how the FBBReg performs better than the BinReg and the BBReg models in all scenarios. Moreover, it exhibits WAIC values only slightly worse than both inflated models, which are expressly conceived to handle excess of zero counts. This can be ascribed to the fact that one mixture component of the FBBReg model is dedicated to a particular (even small) group of zero observations. Due to the constraint 1 > 2 for the component means, the second component is the one devoted to modeling the group of zeros, meaning that its mixing weight is 1-p. These observations are confirmed by the posterior means of the parameter p (ie, 0.9905, 0.9790, 0.7822, and 0.4684), which are close to the percentage of unchanged observations (ie, 95%, 90%, 80%, and 50%, respectively). Additional details on the posterior distribution of p can be found in Section 3.1 of the SM. The role of the two mixture components can be better understood by observing that the relationship between the (overall) mean of the response variable and the component means 1 and 2 (see Equation (8)) gives rise to the regression models for 1 and 2 if is considered to be a function of covariates (according to Equation (14)). All the relevant regression curves are reported in Figure 3, which presents a randomly selected replication for each scenario. It is noteworthy that the 2 curve almost perfectly adapts to the "zero" group.

Outlier contamination study
The last simulation study compares the models in different scenarios with outliers. As in Section 4.2, for each replication, we generate N = 100 observations from a Bin(n i , logit −1 (1 + 2x i )). Then, we artificially modify the binomial count of a randomly selected subset of observations as y New r = n r − y Old r . To ensure that this approach leads to outliers, we have to draw observations corresponding to indexes r from the tails. For this reason, we randomly select three observations (3%) with x < x 0.15 (scenario I), three observations with x > x 0.85 (scenario II), and three observations with x < x 0.15 and three with x > x 0.85 (scenario III), where x q represents the qth empirical percentile, 0 < q < 1. Comparing the distribution of the WAIC under the three models in each scenario (Figure 4), we can indicate that BinReg model is the worst in handling outliers, even if the real data generating process is the BinReg itself for the major part of the data. The best model among the considered ones is clearly the FBBReg model. To better understand the reasons for this, consider Figure 5, which shows the scatter plot of one randomly chosen replication together with the estimated regression curves of the FBBReg model for each scenario. One component is entirely dedicated to model a group of outliers. When all the outliers are above (or below) the main cloud of data points (scenarios (I) and (II)), the other component is devoted to modeling the major part of the data. Otherwise, if one group of outliers is placed above the main cloud and one group below it (scenario (III)), only the "most extreme" group is modeled through a mixture component. This is a weakness of the model, which, however, does not undermine its superiority with respect to competing models. Panel (III) of Figure 5 helps to visualize this aspect: the FBBReg model treats the upper-left group of outliers as units of the same subpopulation of non-outlying observations (modeled by 1 ) whereas 2 models the bottom-right outliers. All the observations above are also confirmed by the posterior means and CS's of parameter p, reported in Section 3.2 of the SM.
Finally, we compute theĈPO values (see Equation (15)) for the outlying observations in the replications presented in Figure 5. We compare them in Figure 6, where models are represented by shapes. It is easy to see that the FBBReg model leads to higherĈPO values in each scenario, confirming its ability to model outliers in a more reliable way. Interestingly, the FBBReg model results in the highestĈPO values even for those observations in scenario (III) that are not modeled by a specific mixture component.

REAL DATA APPLICATIONS
In this section, we analyze three well-known biomedical datasets, showing how the FBBReg can be used in real data analysis. We compare the FBBReg with the BinReg, BBReg, and with their zero-inflated versions, if appropriate. Posterior predictive checks and p-values are available in the SM.

Bacteria data (excess of zeros)
Pests can easily infest crops of any kind. To control pests without damaging ecosystems, one can introduce pests' natural enemies in the environment. The Trichogramma galloi is an egg parasitoid able to control pests in sugar cane cultivations. Demétrio et al 21 conducted a completely randomized experiment to compare two different T. galloi biotypes, namely, "AA" and "DA." During the experiment, parasitoid groups with different numbers of female parasitoids were allowed to attempt to parasitize 128 eggs of an alternative host ("Anagasta kuehniella"). There were 10 replicates for each combination of biotype and number of females. The dataset contains the number Y of parasitized eggs (out of n i = 128) for the "DA" group as well as the number of females. In this dataset, there is a large number of zero counts (namely 37 out of 70, ie, 52.86% of replicates) for each number of females. To estimate the parameters of the regression models, we run four chains of length 20 000 with a warm-up of 10 000 iterations. Moreover, besides our FBBReg model and the two competitors BinReg and BBReg, we also estimated the zero-inflated counterparts of the latter two. Table 2A reports the posterior means and 95% credible sets (CS's) of all the parameters of the five models, together with WAIC values, treating the number of females as a numeric variable. More specifically, we defined the linear predictor as a quadratic function of the standardized number of females (x i ), that is since there is evidence of presence of a nonlinear relationship.
In particular, note that the FBBReg model exhibits a better fit (lower WAIC) and a higher precision (posterior mean of ) than the BBReg model, which is the other model expressly developed to address overdispersion. This has direct consequences on the CS's of the regression coefficients, which are larger for the BBReg than for the FBBReg model. Moreover, the FBBReg model detects two well-separated groups, as shown by the posterior means of p (0.453) and w (0.979). The group with the lower values of the response has posterior mean weight 1-p equal to 0.547, which is very close to the proportion of zero values in the dataset, thus suggesting that the second component of the FBBReg model is TA B L E 2 Bacteria data: Posterior means and 95% CS's for the parameters under the BinReg, BBReg, FBBReg, zero-inflated BinReg, and zero-inflated BBReg models, treating the covariate "female" as a quantitative variable, A and as a factor, B  dedicated to modeling the excess of zeros, as pointed out also by Figure 7. Due to the high proportion of zero values, we also fitted the ZIBinReg and ZIBBReg models. Both models detect the proportion of zeros quite accurately, though the FBBReg model exhibits a definitely better fit than the ZIBinReg model, and a comparable fit with respect to the ZIBBReg one.
Because of the experimental nature of the trials, Demétrio et al 21 treat the number of females as a 7-level factor. Therefore, for comparison purposes, we also fit the models using six dummy variables, with the level "2" as the baseline (see Table 2B). Once again, the FBBReg model exhibits the best fit (lowest WAIC) among non-inflated models even more clearly. Here, the BinReg and the FBBReg coefficient estimates are similar, whereas those for BBReg are much lower, thus denoting a weaker effect of the covariate. Note that, also in this case, the higher precision estimates from the FBBReg model, compared with those of the BBReg model, result in narrower CS's for all regression coefficients.
Quite interestingly, for the BBReg model, the numeric covariate "female" is not significant (in Table 2A the CS's of 1 and 2 contain zero). Coherently, when the covariate "female" is treated as a factor, none of the six corresponding dummies is significant under the BBReg model, possibly because the precision is too low. Instead, the FBBReg model is able to distinguish between significant levels (namely, F4 with a negative impact with respect to the baseline, and F32 and F64 with a positive impact) and nonsignificant ones. Contrarily to its non-inflated version, the ZIBBReg model displays a higher precision, so that it can detect some significant effects, but it performs only slightly better than the FBBReg model. Vice versa, the ZIBinReg model leads to an improvement with respect to its non-inflated counterpart, but it performs worse than all the remaining models. Note that the two FBBReg models, obtained by treating the covariate as quantitative or as factor respectively, show coherence from the empirical/interpretative point of view, although they convey different kinds of information. This is shown by Figure S7 of the SM, which reports the estimated points corresponding to the categorized covariate. The only point showing a different behavior refers to the first dummy, which is due to the fact that the latter is characterized by a large number of zero values, and thus the corresponding point it is inevitably "attracted" downward.
Regardless of how "female" is treated, the posterior predictive checks (reported in the SM) show that the FBBReg model exhibits the best performance with respect to the overdispersion issue. Indeed, all the tools suggest that the BBReg model treats the extra variation (the variance posterior predictive p-value is approximately equal to 0.5) at the expense of the modelization of the mean. Conversely, the FBBReg model's posterior predictive p-values are all close to 0.5, the only exception being the one associated with the deviance, which, however, is the closest to 0.5 among those obtained from the considered non-inflated models. All the above results suggest that the FBBReg can be the preferred model, as it performs as well as the ZIBBReg, even not being expressly developed to handle an excess of zeros, and better than all the remaining models.
A further plus of the FBBReg model is that it implies the modelization of the ICC as a function of the covariate (see Section 2.2), which allows deeper insight into the role and impact of the latter. More specifically, Figure 8 shows the simulated posterior distributions of the ICC for each level of the ordinal variable "female." It clearly emerges that these distributions do depend on the level of "female," especially for the significant levels F4 , F32 , and F64 , and can assume large values, thus suggesting that the overdispersion can also be due to the correlated binary data forming the binomial counts.

Atomic bomb radiation data (latent groups)
We now consider an application based on the data by Otake and Prentice. 5 For a large number of survivors of the atomic bombs in Hiroshima and Nagasaki, 100 cells have been analyzed, and the number Y of cells with chromosomal abnormalities has been recorded. Furthermore, for each subject, the estimated radiation exposure level ("dose"), expressed in rads, has been collected. Table 3A shows the posterior means and 95% CS's of the model parameters, as well as the WAIC values.
The effect of the dose of radiation exposure on the probability of chromosomal abnormalities is positive and significant for all three models, with the FBBReg showing the highest estimate (strongest effect). A graphical representation of the data together with the FBBReg regression curves can be found in Section 2.5 of the SM. The better performance of the FBBReg (ie, lower WAIC value) with respect to that of competitors is due to its ability to detect the two latent subpopulations forming the population under study (namely, Hiroshima and Nagasaki survivors). Indeed, the FBBReg model, thanks to its mixture structure, enables the determination of a "marginal" regression curve, which is a weighted mean of the clusters' regression curves. In particular, inspecting the regression curves, it clearly emerges that the one associated to 2 is dedicated to modeling Nagasaki survivors, whereas the curve for 1 models Hiroshima survivors. Based on the posterior predictive checks, the mean of the replicated outcomes agrees with the mean of the observed data under all three models. The panel on the variance discrepancy measure highlights that the BinReg model clearly suffers from the overdispersion problem since the distribution of the variance based on the replicated data is far away from that of the observed variance. However, both the BBReg and the FBBReg models handle the extra variation, with the posterior predictive p-values showing the FBBReg model to be the preferred model. The presence of two subpopulations is naturally better captured by a mixture model so to further compare the regression models in a more impartial scenario, we decided to include the city in which each subject survived the bomb (ie, the subpopulation) as a dummy covariate. Table 3B shows the parameter estimates and CS's, as well as the WAIC values for all the models. Quite interestingly, the FBBReg model still provides the best fit to the data. Further, note that the Bin-Reg model is still affected by overdispersion issues, suggesting that the location groups are not the only source of extra variation. Awa et al 43 suggest that important factors could be the age and type of aberration, meaning that further unobserved subpopulations could still exist. Indeed, even if the bomb dummy is included as a covariate, the FBBReg model still detects two groups of observations (posterior means of p and w equal to 0.827 and 0.705, respectively), showing no relevant differences between the posterior predictive checks without and with the location dummy.

Control mice data (outliers)
Preclinical studies represent an early step in the new drug development process. In particular, a potential drug must be tested on animals (eg, rabbits or mice) to establish if it can be safely administered to humans. In particular, some preclinical studies evaluate the undesired side effects of a new molecule in terms of negatively affecting the fertility of an animal by administering the drug to a male member of the considered species and mating it with one or more females. A greater number of deaths in fetal litters suggests a mutagenic effect. A control group is essential to assess whether there exists a drug-associated adverse effect. Haseman and Soares 22 report the number of fetal deaths in several control groups of mice for different litter sizes. Morgan 23 analyzes the same data and declares that a mixture of a BB and a binomial distribution could provide a better fit than the standard binomial distribution since the binomial component of the mixture can accommodate outlying litters with high mortality. To evaluate the performance of our new model in the presence of outliers, we estimate the parameters of the BinReg, BBReg, and FBBReg models with no covariates (ie, logit( i ) = 0 , i = 1, ..., N) for only one control group proposed by Haseman and Soares, 22 the CF1S group, which is composed by 524 litters. All three models provide similar estimates of the percentage of dead fetuses in the litters (see Table 4). However, the FBB model shows the best performance as it emerges from the WAIC values in Table 4, but also from the posterior predictive p-values. Indeed, the binomial and BB distributions are clearly affected by the overdispersion problem, as shown by the variance posterior p-values, which are far from 0.5.
Since the literature reports that these data are affected by outliers, the overdispersion can be plausibly ascribed to their presence. Therefore, we further compare the three models using the CPO measure. The left panel in Figure 9 shows theĈPO values under the binomial model. Although a large number of data points are modeled well by the binomial distribution, there is a group of litters characterized by high mortality and a low CPO value. We focus on the 16 litters with a binomialĈPO ≤0.01 and compare theirĈPO value in the three considered models in the right panel of Figure 9. The FBB model exhibits the highestĈPO values for all the outlying litters as it dedicates a mixture component to them. Since outliers are characterized by high mortality, and because of the constraint 1 > 2 , the mixture component that models the outliers is the first one. This is also confirmed by the posterior mean of p = 0.031 ≈ 16/524. Finally, note that the BB model exhibits the worst CPO performance, thus confirming its inadequacy in the presence of outliers.

CONCLUDING REMARKS
The overdispersion issue, often affecting the BinReg model, is usually addressed by the well-known BBReg model. However, the latter does not always succeed in handling multiple concomitant sources of extra variability. In this study, we proposed a new mixture distribution for constrained counts, and a novel regression model based on it, namely, the FBBReg model. This model involves a set of parameters that have a clear interpretation in terms of (possible) latent subpopulations. Indeed, results from an extensive simulation study and from some applications to real biomedical datasets show that this model can handle the extra variation due to a missing covariate (latent groups) and, surprisingly, it can also easily adapt to some other important sources of overdispersion that practitioners commonly encounter, namely, outliers and/or an excess of zero observations. The model achieves this by automatically dedicating a mixture component to them. The estimation issues are addressed using a Bayesian approach which can be easily implemented through standard tools such as Stan. In particular, posterior predictive checks (plots and posterior predictive p-values) prove to be a powerful tool for detecting overdispersion in a Bayesian context. Moreover, they can also be easily interpreted by practitioners. Indeed, in our context, they provide the important result that not only the BinReg model, but also the BBReg model is often inadequate for handling the extra variation, whereas the FBBReg model produces a very good, sometimes outstanding, performance in many (simulated and real) applications. Due to the promising features of the FBBReg model, in future work, we plan to extend it in at least three directions. A first relevant extension is the inclusion of random effects to allow for responses with a hierarchical structure (typically measured longitudinally or clustered), so that within-subject correlation can be handled. Moreover, since the parameters , p, and w deserve a clear interpretation, it seems worthwhile to explore the possibility of letting some of them depend on covariates too. This could greatly increase the flexibility of the model, enabling it to better fit and interpret more complex data patterns. Moreover, the presence of two groups of outliers (above and below the main cloud as in the simulative scenario of Section 4.3) could be handled in a parsimonious way (only one parameter more) via inflation.
Finally, we plan to work on the multivariate version of the FBBReg, which can be obtained by compounding the multinomial distribution with the multivariate FB distribution, that is, the FD. This could broaden the number of latent subpopulations and possibly lead to an extension of the Dirichlet-multinomial model, 44,45 allowing us to overcome some of its drawbacks such as its unimodality and the stiffness of its dependence structure.