Let y = (*y*_{1},*y*_{2}, . . . ,*y*_{n})′ denote an n × 1 vector of a dependent dichotomic variable and x_{i} = (*x*_{i}_{1}, . . . , *x*_{ik})′ denote the *k* × 1 vector of covariates for the patient *i*. A predictive regression model deals with the problem of estimating the binary variable *y*_{i}, which represents the fact of belonging or not to a study group. In this case, *y*_{i} = 1 if the *i*th individual suffers an NI, and *y*_{i} = 0 otherwise. Assume that *y*_{i} = 1 with probability *p*_{i} and *y*_{i} = 0 with probability 1 − *p*_{i}. In this dichotomous model, x_{i} includes the risk factors for the *i*th individual. The regression model is given by

where β = (β_{1}, . . . , β_{K})′ is a *k* × 1 vector of regression coefficients, which represents the effect of each factor in the model and *F*(·) is the link function. The likelihood function is given by

where x = (x_{1}, x_{2}, . . . , x_{n})′.

**Frequentist estimation of conventional logit models. ** For conventional logistic regression, the link function is equal to . Observe that this is a symmetric function with respect to zero, so *F*(−*z*) = 1 − *F*(*z*) for all *z*.

The regression coefficients, β, are usually estimated by numerical evaluation of the likelihood function. Then, the model provides the probability of infection for any individual. The normal procedure is then to consider a cutoff in this probability for detecting infected individuals.

**Bayesian estimation of symmetric and asymmetric logit models. ** A Bayesian estimation of the logistic regression model is obtained by assuming that the β coefficients are random nodes of the model. To facilitate the comparison with frequentist methods of estimation, we assume centered and noninformative normal densities as prior distributions for the coefficients.

We also propose the use of an asymmetric link function, fitting the resulting model from a Bayesian point of view. The model has been used in other contexts ([16,17,20,21], among others), but has had little application in the health field. The asymmetric model is adequate for binary response data when one response is much more frequent than the other, as occurs in the case we examine in this study.

Following Albert and Chib [11] and Chen et al. [17], we assume that the model uses a vector of latent variables w = (*w*_{1}, *w*_{2}, . . . , *w*_{n})′ in this form:

In this model, *G* is the cumulative distribution function of the half-standard normal distribution given by

*F* is the standard logistic cumulative distribution function, and *z*_{i} and *ε*_{i} are assumed to be independent. The skewness in this regression model is given by *δz*_{i}, where δ ∈ (−∞, ∞) is the skewness parameter. If δ < 0 then the probability of *p*_{i} = 0 increases, although if δ > 0, the probability of *p*_{i} = 1, i.e., the infection probability of the *i*th individual, increases. Obviously, if δ = 0, then the regression model is reduced to a standard logit.

The likelihood function in Eq. 1 can be rewritten as

- (2)

We assume that the prior distribution of the coefficients is normal, i.e., β_{j} ∼ *N*(0,10^{10}), ∀_{j} = 1, . . . , *k*, and δ ∼ *N*(0,10^{10}). These noninformative prior distributions with a very large variance reflect the absence of prior knowledge about the parameters of interest, and they facilitate comparison with classical models.

Combining this prior structure and the likelihood in Eq. 2, we obtained the posterior distribution of parameters (β, δ):

- (3)

where π(β, δ) is the prior distribution of (β, δ).

We can sample (β, δ) from this posterior distribution by using the WinBUGS package (Windows Bayesian inference Using Gibbs Sampling, developed jointly by the MRC Biostatistics Unit [University of Cambridge, Cambridge, UK] and the Imperial College School of Medicine at St. Mary's, London) [22], based on the Gibbs sampling applying Markov Chain Monte Carlo (MCMC) methods (see Carlin and Polson [23] and Gilks et al. [24] for further details).

One aim of our study is to use logistic regressions in order to make predictions. In Bayesian theory, predictions of future observables are based on predictive distribution. The predictive distribution of unobservable data *y*_{p}, given a new set of covariates x_{p} = (*x*_{p}_{1}, . . . , *x*_{pk}) is defined as

- (4)

The predictive distribution can also be simulated using MCMC techniques with WinBUGS [22]. We include the WinBUGS code for more details in the Supporting Information Appendix for this article.