Bayesian Analysis of Nosocomial Infection Risk and Length of Stay in a Department of General and Digestive Surgery


Miguel Ángel Negrín-Hernández, Department of Quantitative Methods in Economics and Management, Faculty of Economics and Business, Campus de Tafira, 35017, Las Palmas de Gran Canaria, Spain. E-mail:


Objective:  Nosocomial infection is one of the main causes of morbidity and mortality in patients admitted to hospital. One aim of this study is to determine its intrinsic and extrinsic risk factors. Nosocomial infection also increases the duration of hospital stay. We quantify, in relative terms, the increased duration of the hospital stay when a patient has the infection.

Methods:  We propose the use of logistic regression models with an asymmetric link to estimate the probability of a patient suffering a nosocomial infection. We use Poisson-Gamma regression models as a multivariate technique to detect the factors that really influence the average hospital stay of infected and noninfected patients. For both models, frequentist and Bayesian estimations were carried out and compared.

Results:  The models are applied to data from 1039 patients operated on in a Spanish hospital. Length of stay, the existance of a preoperative stay and obesity were found the main risk factors for a nosomial infection. The existence of a nosocomial infection multiplies the length of stay in the hospital by a factor of 2.87.

Conclusion:  The results show that the asymmetric logit improves the predictive capacity of conventional logistic regressions

1. Introduction

Nosocomial infections (NI) are infections that develop during hospitalization and are neither present nor incubating at the time of the patient's admission. Currently, hospital infection or NI remains a major problem, constituting one of the main causes of morbidity and mortality in patients admitted to hospital. Although the figure varies considerably among countries, some studies estimate that approximately one in ten hospitalized patients will acquire an infection after admission [1]. In Spain, the overall prevalence rate of patients with NI has decreased from 8.5% in 1990 to 7% in 2007 [2–4].

For this reason, determining the intrinsic and extrinsic risk factors to which these patients are exposed and predicting NI are important aims of research. Furthermore, NI clearly increases the duration of hospital stay, causing direct economic costs and other costs derived from specific laboratory and isolation techniques and from lengthy antibiotic treatments. Estimates of the cost of these infections, in 2002 prices, suggest that the annual economic burden is $6.7 billion per year in the United States [5] and £1.06 billion in the United Kingdom [6].

In view of the foregoing, the first aim of this article is to estimate the risk factors for NI in a hospital's general surgery and digestive department ([7–10], among others). One of the statistical techniques that has traditionally been used to predict NI is the logistic regression, which not only allows the effect of each risk factor to be evaluated, but also makes it possible to quantify the NI probability of a given patient. We carried out the Bayesian estimation of these regression models. Recently, there has been great interest in Bayesian regression techniques for dichotomous response variables in many fields of application [11–16]. Chen et al. [17] also apply a Bayesian approach in their proposal to use an asymmetric link for analyzing binary response data when one response is much more frequent than the other. We compare the results of applying a Bayesian estimation with those obtained by the frequentist estimation for logistic regression models.

Patients with hospital-acquired infections suffer a prolonged stay, during which time they occupy scarce bed-days and require additional diagnostic and therapeutic interventions [18]. As a second objective of this study, we set out to determine the factors that influence hospital stay, using a Poisson-Gamma regression model. A particular aim is to quantify, in relative terms, the increased duration of the hospital stay when a patient has NI. Frequentist and Bayesian estimations for this model are compared.

The article is organized as follows: section 2 describes the data, introducing the covariates used in the study and section 3 addresses the analysis of the methodology to be considered. The results of the article are shown in section 4, and section 5 is devoted to a discussion of the results and to summarizing the conclusions reached.

2. Data

Data were collected in a prospective cohort study of 1039 patients operated on between January 1, 1998 and December 31, 1998 at the General and Digestive Surgery Department of the North Area Hospital in the province of Jaén (Spain). Only patients of first admission and with at least 1 day of hospitalization were considered.

NI was defined as any infection that was active or under antibiotic treatment and that occurred 48 hours after the hospitalization [19]. Patients were followed up for 1 month after hospital discharge.

We consider both intrinsic and extrinsic risk factors for NI. The intrinsic factors are patient related and the extrinsic factors are related to medical intervention. The intrinsic factors considered were age, sex (male = 1 and female = 0), the presence or absence in each patient of coma, kidney failure, diabetes, neoplasy, chronic obstructive pulmonary disease, chronic hepatopathy, immunodeficiency, hypoproteinemia, obesity, and infection at admission, which includes NI due to a previous admission in the same hospital.

During the patients' hospital stay, the type of admission (scheduled = 1 and urgent = 0) and the presence or absence of the following extrinsic factors was recorded: peripheral tract, central tract, vesical probe, nasogastric probe, open drainage, closed drainage, artificial respiration, and immunosuppressive therapy. With regard to diagnosis-related data, the total number of diagnoses, based on important diagnosis, no symptoms, or isolated signs, was considered. Finally, the following factors related to surgery were taken into account: surgery type (scheduled = 1 or urgent = 0), length of surgery (in minutes), existence of antibiotic prophylaxis, preoperative stay, and degree of contamination, with four categories (always related to the main surgery method if several were applied): clean, clean-contaminated, contaminated, and dirty surgery. The total hospital stay (in days) was also recorded. A descriptive study of all these variables in the sample is shown in Tables 1–3.

Table 1.  Descriptive summary of quantitative variables
Length of stay1735.27125
Length of surgery346065.76305080
Preoperative stay0340.99000
Number of diagnoses161.68112
Table 2.  Descriptive summary of categorical variables (absence or presence)
VariableYes (1)No (0)
  1. COPD, chronic obstructive pulmonary disease; NI, nosocomial infection.

NI64 (6.27%)957 (93.73%)
Prophylaxis803 (78.65%)218 (21.35%)
Peripheral tract1019 (99.80%)2 (0.20%)
Central tract82 (8.03%)939 (91.97%)
Vesical probe191 (18.71%)830 (81.29%)
Nasogastric probe185 (18.12%)836 (81.88%)
Open drainage397 (38.88%)624 (61.12%)
Closed drainage118 (11.56%)903 (88.44%)
Artificial respiration19 (1.86%)1002 (98.14%)
Immunosuppressive therapy19 (1.86%)1002 (98.14%)
Coma18 (1.76%)1003 (98.24%)
Kidney failure10 (0.98%)1011 (99.02%)
Diabetes104 (10.19%)917 (89.81%)
Neoplasy91 (8.91%)930 (91.09%)
COPD111 (10.87%)910 (89.13%)
Chronic hepatopathy39 (3.82%)982 (96.18%)
Immunodeficiency7 (0.69%)1014 (99.31%)
Hypoproteinemia29 (2.84%)992 (97.16%)
Infection in admission177 (17.34%)844 (82.66%)
Obesity148 (14.50%)873 (85.50%)
Table 3.  Descriptive summary of categorical variables
  • *

    Indicates the reference category.

607 (59.45%)414 (40.55%)  
631 (61.80%)390 (38.20%)  
Surgery typeScheduledUrgent*  
671 (65.72%)350 (34.28%)  
Degree of contaminationClean*Clean-contaminatedContaminatedDirty
437 (42.80%)164 (16.06%)111 (10.87%)309 (30.26%)

3. Methodology

Firstly, we propose two alternative discrete choice models to predict the probability of NI. Symmetric and asymmetric links are considered, together with frequentist and Bayesian approaches. Secondly, Poisson-Gamma regression models are proposed to estimate the extension of the hospital stay caused by NI.

NI Predictive Models

Let y = (y1,y2, . . . ,yn)′ denote an n × 1 vector of a dependent dichotomic variable and xi = (xi1, . . . , xik)′ denote the k × 1 vector of covariates for the patient i. A predictive regression model deals with the problem of estimating the binary variable yi, which represents the fact of belonging or not to a study group. In this case, yi = 1 if the ith individual suffers an NI, and yi = 0 otherwise. Assume that yi = 1 with probability pi and yi = 0 with probability 1 − pi. In this dichotomous model, xi includes the risk factors for the ith individual. The regression model is given by


where β = (β1, . . . , βK)′ is a k × 1 vector of regression coefficients, which represents the effect of each factor in the model and F(·) is the link function. The likelihood function is given by


where x = (x1, x2, . . . , xn)′.

Frequentist estimation of conventional logit models.  For conventional logistic regression, the link function is equal to inline image. Observe that this is a symmetric function with respect to zero, so F(−z) = 1 − F(z) for all z.

The regression coefficients, β, are usually estimated by numerical evaluation of the likelihood function. Then, the model provides the probability of infection for any individual. The normal procedure is then to consider a cutoff in this probability for detecting infected individuals.

Bayesian estimation of symmetric and asymmetric logit models.  A Bayesian estimation of the logistic regression model is obtained by assuming that the β coefficients are random nodes of the model. To facilitate the comparison with frequentist methods of estimation, we assume centered and noninformative normal densities as prior distributions for the coefficients.

We also propose the use of an asymmetric link function, fitting the resulting model from a Bayesian point of view. The model has been used in other contexts ([16,17,20,21], among others), but has had little application in the health field. The asymmetric model is adequate for binary response data when one response is much more frequent than the other, as occurs in the case we examine in this study.

Following Albert and Chib [11] and Chen et al. [17], we assume that the model uses a vector of latent variables w = (w1, w2, . . . , wn)′ in this form:




In this model, G is the cumulative distribution function of the half-standard normal distribution given by


F is the standard logistic cumulative distribution function, and zi and εi are assumed to be independent. The skewness in this regression model is given by δzi, where δ ∈ (−∞, ∞) is the skewness parameter. If δ < 0 then the probability of pi = 0 increases, although if δ > 0, the probability of pi = 1, i.e., the infection probability of the ith individual, increases. Obviously, if δ = 0, then the regression model is reduced to a standard logit.

The likelihood function in Eq. 1 can be rewritten as


We assume that the prior distribution of the coefficients is normal, i.e., βj ∼ N(0,1010), ∀j = 1, . . . , k, and δ ∼ N(0,1010). These noninformative prior distributions with a very large variance reflect the absence of prior knowledge about the parameters of interest, and they facilitate comparison with classical models.

Combining this prior structure and the likelihood in Eq. 2, we obtained the posterior distribution of parameters (β, δ):


where π(β, δ) is the prior distribution of (β, δ).

We can sample (β, δ) from this posterior distribution by using the WinBUGS package (Windows Bayesian inference Using Gibbs Sampling, developed jointly by the MRC Biostatistics Unit [University of Cambridge, Cambridge, UK] and the Imperial College School of Medicine at St. Mary's, London) [22], based on the Gibbs sampling applying Markov Chain Monte Carlo (MCMC) methods (see Carlin and Polson [23] and Gilks et al. [24] for further details).

One aim of our study is to use logistic regressions in order to make predictions. In Bayesian theory, predictions of future observables are based on predictive distribution. The predictive distribution of unobservable data yp, given a new set of covariates xp = (xp1, . . . , xpk) is defined as


The predictive distribution can also be simulated using MCMC techniques with WinBUGS [22]. We include the WinBUGS code for more details in the Supporting Information Appendix for this article.

Regression Model for Determining the Extension of Hospital Stay due to NI

Frequentist estimation of Poisson-Gamma model.  We denote by losi (length of stay) the number of days that the ith individual remains hospitalized. We then denote by xi = (xi1, . . . , xik)′ the vector of factors for the ith individual. Finally, we denote by inline image a variable indicating the presence of infection in the ith individual; this variable takes the value one if NI is present, and zero, if otherwise.

We consider a Poisson-Gamma model in which losi ∼Poisson(viµi), so




and vi is a parameter of the model that represents a factor of individual heterogeneity, with an individual value for each patient. Values of v far from 1 indicate that the ith patient presents individual characteristics that explain the length of hospital stay and that are not included in the model. The vector β and the parameter βNI are the coefficients of the covariates xi and the indicator variable of infection inline image, respectively.

NI is featured among the risk factors related because if the only difference between two ith and i′th individuals is the presence of infection in the first of these, the ratio between the average hospital stay of the two after entry is given by expNI). Therefore, when the parameter βNI is known, it is possible to estimate the ratio between the average hospital stay of two individuals who are identical except that one of them has NI. Furthermore, this expression represents the pure hazard, i.e., given the covariates, the differences between the values of losi for individuals with the same values on covariates are random.

To introduce the possibility of heterogeneity not explained by factors in the model, it is considered that v is a random variable with distribution Gamma(α, α), with density


By specifying a gamma distribution for v with shape and scale parameters to be equal, the Negative Binomial (NB) model is derived [25]. As in the classical NB model, v follows a gamma distribution with E[v] = 1 and Var[v] = 1/α.

Thus, losi ∼ NB(α, µ), i.e.,


The model estimation is performed by optimizing the likelihood function using numerical methods, and specifically, R software (written by Robert Gentleman and Ross Ihaka, Statistics Department, University of Auckland, New Zealand) and the MASS package (Modern Applied Statistics in S package developed by W. N. Venables [CMIS Environmetrics Project, Australia] and B. D. Ripley [Department of Statistics, University of Oxford, Oxford, UK]) [26].

Bayesian estimation of Poisson-Gamma model.  As an alternative to the frequentist point of view of the Poisson-Gamma model, we also propose the Bayesian version [27]. Accordingly, we consider losi ∼ Poisson(viµi), with vi ∼ Gamma(α, α) and µi expressed as in Eq. 5.

Note that in this model vi again represents individual heterogeneity (not included in the covariates) of each patient, but now we can estimate an individual average value of the heterogeneity for each patient, while in the frequentist estimation only a general average value of the heterogeneity can be estimated.

The hyperparameters βNI and βj, ∀j = 1, . . . , k, follow prior noninformative normal distribution N(0,1010). We propose a flexible hierarchical prior structure for α, α ∼ exp(b) and b ∼ exp(0.005), where the hyperparameter b follows an exponential distribution with a large variance (Var(b) = 40000).

The posterior distribution is obtained combining this prior structure and the likelihood defined in Eq. 6:


where los = (los1, . . . , losn)′, x = (x1, . . . , xn)′, inline imageinline image, and π(β, βNI, α, b) is the joint prior distribution that can be decomposed in


Again, posterior distributions are obtained by applying MCMC methods implemented by R software, the BRUGS package and WinBUGS software [22]. Source codes are provided in the Supporting Information Appendix for this article at

4. Results

Predictive Models for NI

The statistical methods consisted of two steps: 1) estimation of logit models and analysis of goodness of fit using information criterion—Akaike information criterion (AIC) for frequentist estimation and deviance information criterion (DIC) for Bayesian estimation and 2) assessment of its predictive accuracy in a split-sample validation cohort [28]. The final sample size was 1021 patients after the elimination of 18 patients with missing values. The entire cohort (1021 patients) was randomly divided into two subcohorts of 766 (75%) and 255 patients (25%). The subcohort of 766 patients was used to develop the logit models. Subsequently, the logit models were externally validated using the remaining 255 patients, who represented the split-sample cohort. The percentage of correct classification, the c statistic, and the receiving operating characteristic (ROC) curves are used to quantify the predictive accuracy.

Frequentist estimation of logistic regression models.  Three alternative models with different numbers of covariates were fitted (Table 4). This table includes the parameter estimates, the standard errors and P-values. A model summary with the sample size, AIC, percentage of correct classifications, and c statistics is also provided. It should be noted that 5 of the 30 covariates considered in this study (peripheral tract, closed drainage, kidney failure, diabetes, and immunodeficiency) could not be included in the models due to the limited number of infected cases and/or problems of colineality.

Table 4.  Frequentist estimation of logistic models
Model VariableCompleteStepwiseReduced
inline imageSEPinline imageSEPinline imageSEP
  1. Parameter estimates, SE, and P-values (P).

  2. AIC, Akaike information criterion; COPD, chronic obstructive pulmonary disease; SE, standard errors.

Length of stay0.680.100.000.630.090.000.480.060.00
Surgery type−0.361.190.76      
Length of surgery−−   
Preoperative stay−0.890.180.00−0.820.140.00−0.690.110.00
Central tract−1.290.950.18−1.350.800.09   
Vesical probe0.390.820.63      
Nasogastric probe1.290.810.111.120.660.09   
Open drainage−0.580.850.49      
Artificial respiration−0.471.510.76      
Immunosuppressive therapy2.451.120.031.980.960.042.280.790.00
Chronic hepatopathy−0.931.110.40      
Infection at admission2.610.990.012.770.900.000.890.480.06
Number of diagnoses−0.580.350.10−0.700.320.03   
Model summary
% correct predictions96.0896.0897.25
c statistic0.9870.9860.987

The first model is the full model, including all the covariates. The AIC is 168.12. This model provides a rate of correct classification of 96.08% (10 errors, 7 false positives, and 3 false negatives). The factors length of stay, preoperative stay, immunosuppressive therapy, hypoproteinemia, and infection at admission are found relevant at the 5% significance level.

The second model, which we term the stepwise model, was obtained by the application of a backward variable selection method, in order to reduce the number of covariates. The resultant model includes 12 of the 25 variables contained in the full model, namely: the length of hospital stay, the duration of surgery, the three covariates corresponding to the degree of contamination, the preoperative stay, central tract, nasogastric probe, immunosuppressive therapy, hypoproteinemia, infection at admission, and the number of diagnoses. The AIC for this model is 147.48. This model, too, provides 96.08% of correct classification for the split-sample (10 errors, 6 false positives, and 4 false negatives).

Finally, we considered a third model, including only those covariates that turned out to be significant (P-value smaller than 5%) in the full model. The AIC, in this case, is 155.79. As we can see, the stepwise model has the lower AIC, indicating the best fitting. For the three models considered, the reduced model, with only five covariates, provides the best correct prediction rate (97.25%, seven errors, four false positives, and three false negatives).

Prediction accuracy is also measured by the area under the ROC curve, also known as c statistic. The three models estimated show a very similar value for this statistic (0.987 for the complete model, 0.986 for the stepwise model, and 0.987 for the reduced model).

Bayesian estimation of logistic regression models.  Bayesian estimation of the logit models involved two steps: firstly, we estimated the full model, for both the symmetric and the asymmetric links. Table 5 shows the results. Posterior mean, standard deviation, and 95% credibility intervals are provided. Then, after having verified the advantages of the asymmetric model, we fitted two reduced asymmetric models (Table 6): the first one included the covariates that were found to be relevant predictors of NI in the symmetric Bayesian logit (seven covariates); the second one included the relevant covariates for the asymmetric Bayesian logit (four covariates). We considered a variable to be relevant as a predictor of NI when the zero value is not included in the 95% credibility interval. The posterior distribution was simulated using WinBUGS [22]. A total of 100,000 iterations were carried out (after a burn-in period of 100,000 simulations). Three different chains were carried out and the convergence was evaluated for all parameters using several tests provided within the WinBUGS Convergence Diagnostics and Output Analysis software.

Table 5.  Bayesian estimation of symmetric and asymmetric full logistic models
MeanSDCI (95%)MeanSDCI (95%)
  1. Posterior means, SD, and 95% CI.

  2. CI, credibility intervals; COPD, chronic obstructive pulmonary disease; DIC, deviance information criterion; SD, standard deviations.

Intercept−7.011.41(−9.94, −4.41)−27.529.98(−46.50, −7.86)
Delta−64.045.20(−69.82, −50.63)
Age−0.000.02(−0.04, 0.03)−0.270.17(−0.63, 0.05)
Sex1.150.60(−0.00, 2.37)4.105.08(−6.26, 13.32)
Length of stay0.690.09(0.53, 0.88)7.571.03(5.58, 9.68)
Admission−0.761.12(−2.98, 1.42)1.027.10(−12.49, 13.75)
Surgery type0.621.17(−1.64, 2.92)3.836.99(−11.04, 14.38)
Length of surgery−0.000.01(−0.01, 0.01)−0.020.06(−0.14, 0.10)
Clean-contaminated0.911.06(−1.12, 3.05)0.476.49(−12.16, 12.65)
Contaminated−1.511.26(−4.00, 0.94)−8.065.41(−14.74, 5.03)
Dirty-contaminated−0.921.26(−3.43, 1.55)−4.356.39(−14.31, 9.44)
Prophylaxis−1.460.99(−3.37, 0.52)−6.605.67(−14.56, 6.42)
Preoperative stay−0.930.16(−1.25, −0.63)−9.181.50(−12.18, −6.33)
Central tract−1.010.92(−2.83, 0.77)−6.486.17(−14.66, 8.02)
Vesical probe−0.610.81(−2.22, 0.95)−4.236.28(−14.17, 9.24)
Nasogastric probe1.300.78(−0.22, 2.84)3.816.24(−9.50, 13.98)
Open drainage−0.590.80(−2.18, 0.96)−6.425.34(−14.47, 5.23)
Artificial respiration−0.751.56(−3.80, 2.37)−1.508.07(−14.32, 13.63)
Immunosuppressive therapy2.951.14(0.75, 5.20)8.705.15(−4.12, 14.79)
Coma0.651.71(−2.81, 3.90)−2.188.02(−14.34, 13.38)
Neoplasy−0.090.80(−1.68, 1.45)−5.046.13(−14.39, 8.30)
COPD−0.210.84(−1.91, 1.39)3.096.27(−9.94, 13.90)
Chronic hepatopathy−0.491.13(−2.78, 1.64)0.767.55(−13.34, 13.86)
Immunodeficiency−2.932.09(−7.27, 0.96)−2.068.32(−14.42, 13.75)
Hypoproteinemia3.911.18(1.58, 6.24)22.016.31(6.42, 29.72)
Obesity1.230.63(0.01, 2.48)11.405.64(0.70, 22.84)
Infection at admission2.670.95(0.88, 4.60)8.675.01(−3.65, 14.78)
Number of diagnoses−0.840.35(−1.56, −0.18)−2.452.85(−7.98, 3.27)
Model summary
% correct predictions96.4797.25
c statistics0.9870.990
Table 6.  Bayesian estimation of reduced asymmetric logistic models
VariableReduced 1Reduced 2
MeanSDCI (95%)MeanSDCI (95%)
  1. Posterior means, SD, and 95% CI.

  2. CI, credibility intervals; DIC, deviance information criterion; SD, standard deviations.

Intercept−31.608.20(−47.10, −15.94)−34.777.55(−48.17, −19.54)
Delta−69.708.57(−79.68, −48.43)−66.4610.61(−79.49, −40.47)
Length of stay6.691.19(4.27, 8.99)5.671.07(3.39, 7.52)
Preoperative stay−9.481.90(−13.23, −5.79)−7.781.62(−10.82, −4.48)
Immunosuppressive therapy19.0010.09(−2.96, 34.13)
Hypoproteinemia18.359.88(−2.71, 33.87)18.399.25(−0.19, 33.74)
Obesity9.264.77(−0.06, 18.29)9.874.41(1.34, 18.31)
Infection at admission8.295.30(−2.11, 18.27)
Number of diagnoses−5.942.70(−11.39, −0.080)
Model summary
% correct predictions10096.47
c statistic10.990

The frequentist and Bayesian estimations of the complete symmetric model coincide in defining some covariates as relevant predictors of NI (length of hospital stay, preoperative stay, immunosuppressive therapy, hypoproteinemia, and infection at admission), although some differences can be found in the estimates of the parameters. Furthermore, under the Bayesian approach model, obesity and the number of diagnoses are also considered relevant covariates. In general, the standard errors in the Bayesian models are slightly smaller. As a goodness-of-fit measure, we make use of DIC [29]. The DIC for the complete symmetric model is 205.00, although this criterion is not comparable to AIC. The percentage of correct predictions is slightly larger for the Bayesian approach (96.47%, nine errors, six false positives, and three false negatives).

With the Bayesian estimation of the full asymmetric model, the coefficient of asymmetry δ is both relevant and negative. This coefficient increases the probability of the patient not suffering infection—the largest group. The statistical relevance of this coefficient highlights the importance of considering the asymmetry in the logit model. There are important differences in the estimates of the coefficients with respect to those obtained with the symmetric model. The asymmetric model reduces the number of relevant covariates, eliminating immunosuppressive therapy, infection at admission, and number of diagnoses. Using the DIC criterion, the asymmetric model is preferred to the symmetric one, with a DIC of 198.09 versus 205.00 for the symmetric model. The asymmetric Bayesian logit model also presents a higher percentage of correct classifications of infected patients, 97.25% (seven errors, four false positives, and three false negatives) and a higher c statistic (0.99 vs. 0.987).

In the second stage of this study, we estimated two abbreviated asymmetric models (Table 6). Both models present very important coefficients of asymmetry δ. The DIC for both these models are clearly smaller than for the full model: 69.07 for the first model with seven covariates (Reduced) and 53.54 for the second one with only four covariates (Reduced 2). It is important to emphasize that the model with the seven covariates found to be significant in the full symmetric model provides 100% of the correct classification. The second abbreviated model, with only four covariates (those significant in the full asymmetric model), still provides 96.47% of the correct classification (nine classification errors, five false positives, and four false negatives).

Figure 1 shows the ROC curves and the c statistics according to six models: three frequentist logistic regressions and three asymmetric Bayesian logistic regressions, the full one and the two abbreviated models. The cutoff point to predict a patient with NI is fixed at 0.5. It is important to point out that all the asymmetric models have a predictive capacity better than the best of the frequentist estimations.

Figure 1.

ROC curves and c statistics for proposed models.

Variations in Length of Hospital Stay due to NI

Frequentist estimation of Poisson-Gamma model.  As a first approach to the problem of the relationship between length of hospital stay and NI, we estimate a full Poisson-Gamma model for the los variable that includes the 30 variables described in section 2. The final sample was 1013 patients after the elimination of 26 cases with missing values. This model was fitted using the maximum likelihood method. Subsequently, an abbreviated model was examined, considering only the covariates that were found to be statistically significant at the 5% level. The results obtained for both models are shown in Table 7.

Table 7.  Frequentist estimation of full and reduced Poisson-Gamma regression models for length of stay data
Model Poisson modelCompleteReduced
inline imageSEPinline imageSEP
  • *

    These variables have been standardized.

  • Parameter estimates, SE, and P-values (P).

  • AIC, Akaike information criterion; COPD, chronic obstructive pulmonary disease; NI, nosocomial infection; SE, standard errors.

Surgery type0.550.140.000.610.130.00
Length of surgery*
Preoperative stay*
Peripheral tract0.090.530.86   
Central tract0.080.130.55   
Vesical probe0.140.100.17   
Nasogastric probe0.400.100.000.480.090.00
Open drainage0.430.090.000.440.080.00
Closed drainage0.480.100.000.460.090.00
Artificial respiration−   
Immunosuppressive therapy−   
Kidney failure−   
Chronic hepatopathy0.100.140.45   
Infection at admission0.070.110.52   
Number of diagnoses0.
Gamma modelinline imageSEPinline imageSEP
Model summary

Based on the AIC criterion, the abbreviated model, with only 14 covariates, is preferred to the full model. The AIC for the full model is estimated to be 3893.02, whereas that for the abbreviated model is 3883.71.

In this section, we are interested in the effect of NI on the duration of hospital stay. For both models, the variable NI is statistically significant. In the abbreviated model, the coefficient of NI is estimated to be 1.03 units. This means that the average length of hospital stay for a patient with an infection will be multiplied by a factor of e1.03 = 2.80 in comparison to a noninfected patient with the same characteristics.

Bayesian estimation of Poisson-Gamma model.  The analysis was complemented with the Bayesian estimation of the Poisson-Gamma models. As in the previous Bayesian estimation of logit models, MCMC techniques were used to estimate the posterior distributions of the parameters of interest. Three chains of 100,000 samples were recorded after a burn-in sample of 100,000. Different diagnoses were carried out to ensure the desired convergence of the simulations.

Table 8 shows the results of the Bayesian estimation of the full Poisson-Gamma model and that of the abbreviated model, which includes only the relevant covariates. The goodness of fit for both Bayesian models was analyzed using the DIC. The full model is preferred to the abbreviated one with a value of DIC of 3534.11 versus 3537.65 for the abbreviated model. The posterior mean for the coefficients of the relevant covariates are similar in both models.

Table 8.  Bayesian estimation of full and reduced Poisson-Gamma regression models for length of stay data
Poisson modelComplete modelReduced model
MeanSDCI (95%)MeanSDCI (95%)
  • *

    These variables have been standardized.

  • Posterior means, SD, and 95% CI.

  • CI, credibility intervals; COPD, chronic obstructive pulmonary disease; DIC, deviance information criterion; NI, nosocomial infection; SD, standard deviations.

Intercept−1.310.55(−2.43, −0.10)−0.280.10(−0.47, −0.08)
NI1.050.11(0.84, 1.27)1.040.10(0.84, 1.25)
Age*0.170.04(0.10, 0.24)0.180.03(0.11, 0.25)
Sex0.120.07(−0.01, 0.25)   
Admission0.310.14(0.04, 0.59)0.270.13(0.02, 0.54)
Surgery type0.550.14(0.26, 0.83)0.610.14(0.34, 0.88)
Length of surgery*0.270.05(0.19, 0.36)0.280.04(0.20, 0.36)
Clean-contaminated0.150.11(−0.06, 0.36)   
Contaminated−0.070.13(−0.32, 0.18)   
Dirty-contaminated−0.340.13(−0.59, −0.09)−0.360.08(−0.53, −0.20)
Prophylaxis0.230.10(0.05, 0.42)0.250.09(0.06, 0.43)
Preoperative stay*0.290.04(0.21, 0.37)0.290.04(0.22, 0.37)
Peripheral tract0.020.54(−1.12, 1.15)   
Central tract0.070.13(−0.18, 0.32)   
Vesical probe0.140.10(−0.05, 0.34)   
Nasogastric probe0.400.10(0.20, 0.60)0.470.09(0.29, 0.65)
Open drainage0.430.09(0.26, 0.60)0.440.08(0.28, 0.60)
Closed drainage0.480.10(0.29, 0.68)0.460.10(0.27, 0.65)
Artificial respiration−0.150.26(−0.67, 0.37)   
Immunosuppressive therapy−0.250.20(−0.65, 0.15)   
Coma−0.140.27(−0.67, 0.38)   
Kidney failure−0.040.27(−0.56, 0.51)   
Diabetes0.230.11(0.02, 0.44)0.270.10(0.07, 0.48)
Neoplasy0.310.11(0.08, 0.53)0.270.11(0.06, 0.48)
COPD0.030.10(−0.17, 0.22)   
Chronic hepatopathy0.110.15(−0.18, 0.40)   
Immunodeficiency−0.030.35(−0.69, 0.67)   
Hypoproteinemia−0.300.16(−0.62, 0.02)   
Obesity0.110.09(−0.05, 0.28)   
Infection at admission0.070.12(−0.16, 0.30)   
Number of diagnoses0.090.04(0.01, 0.18)0.090.04(0.01, 0.16)
Gamma modelMeanSDCI (95%)MeanSDCI (95%)
α2.500.25(2.07, 3.03)2.470.24(2.04, 2.97)
Model summary

The results obtained by the Bayesian estimations in this section are similar to those obtained with the frequentist ones. In particular, the coefficients for the existence of NI are statistically significant both in the full and the abbreviated Bayesian models. The estimation of the posterior mean for the βNI coefficient in the full model is 1.05, versus 1.04 for the abbreviated model. To interpret these coefficients, we need to calculate their exponential transformation, from which we conclude that the existence of NI would multiply the length of hospital stay by a factor of e1.05 = 2.87 and by e1.04 = 2.83 for the full and abbreviated models, respectively.

In addition, with the Bayesian approach it is possible to specify an individual distribution for the parameter vi that refers to the heterogeneity of the sample. Using the results from the abbreviated model, we found that only 79 of the 1013 (7.8%) individuals of the sample showed a 95% Bayesian interval for vi that excludes the value 1, indicating significant individual heterogeneity.

5. Discussion

We have proposed the use of the Bayesian approach of logit models with an asymmetric link to estimate the NI probability of a patient undergoing hospital surgery, comparing the reliability of these estimates with that provided by the frequentist version of logistic regression models.

It should be emphasized that the Bayesian methodology establishes clear differences, even between the symmetric logit model and its analog in the classical methodology, fitted by the maximum likelihood method. These differences are observed not only in obtaining estimates, but also in the significant variables established in the two models. For instance, obesity and the number of diagnoses are not relevant factors in the classical analysis but they are so in the Bayesian analysis. Nevertheless, the estimates are similar for the common significant variables, although the standard errors are, in general, slightly lower in the Bayesian estimation of the logistic regression model.

Comparing the logit model (with a symmetric link) and the skewed logit model, we observe clear differences in the detection of significant variables: seven variables are significant in the first model, versus only four in the second; since immunosuppressive therapy, infection at admission and number of diagnoses are eliminated. Nevertheless, the great advantage of these skewed logit models is their great capacity for discrimination (as can be seen in Fig. 1), correctly classifying 100% of patients with NI. This discrimination capacity seems to show that the asymmetry node makes it possible to obtain a more accurate fit for data with different proportions of zeros and ones.

In addition to logistic regression, there are several other approaches to the problem of how to formally model the relationship between the probability of an event and a set of covariates, such as a probit analysis. Furthermore, an important variant of this class of problems arises when interest is not only in whether the event of interest occurs or not, but also in the time until the event occurs. The body of methods for analyzing such data is known as survival analysis [30].

Secondly, we proposed the use of Poisson-Gamma regression models as a multivariate procedure for identifying factors that really are related to a lengthening of hospital stay. Case-control studies are usually employed to estimate differences between infected and noninfected patients. Propensity Score Matching can be used to create groups of treated and control units that have similar characteristics and so comparisons can be made within these matched groups [31]. Nevertheless, regression models do allow us to distinguish the variables that really, and in a multivariate way, influence the lengthening of hospital stay. Likewise, they make it possible to evaluate the relative differences between the average hospital stay of infected and noninfected patients with the same conditions as for the other variables. In fitting these Poisson-Gamma regression models, we considered both the maximum likelihood method and the Bayesian techniques. It should be noted that, unlike the logit models, there are hardly any differences between the fits. Nevertheless, the Bayesian model has the advantage of providing a random model for the heterogeneity of each individual, which allows us to analyze the characteristics of the most atypical cases in the model.

Authors thank the editor and three anonymous referees for constructive comments and suggestions.

Source of financial support: This research has been partially support by the grant SEJ2006-12685 (Ministerio de Educación y Ciencia (MEC), Spain).

Supporting Information

Additional Supporting Information may be found in the online version of this article:

Appendix. WinBUGS codes.

Please note: Wiley-Blackwell are not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.