A new test and graphical tool to assess the goodness of fit of logistic regression models
Abstract
A prognostic model is well calibrated when it accurately predicts event rates. This is first determined by testing for goodness of fit with the development dataset. All existing tests and graphic tools designed for the purpose suffer several drawbacks, related mainly to the subgrouping of observations or to heavy dependence on arbitrary parameters. We propose a statistical test and a graphical method to assess the goodness of fit of logistic regression models, obtained through an extension of similar techniques developed for external validation. We analytically computed and numerically verified the distribution of the underlying statistic. Simulations on a set of realistic scenarios show that this test and the well‐known Hosmer–Lemeshow approach have similar type I error rates. The main advantage of this new approach is that the relationship between model predictions and outcome rates across the range of probabilities can be represented in the calibration belt plot, together with its statistical confidence. By readily spotting any deviations from the perfect fit, this new graphical tool is designed to identify, during the process of model development, poorly modeled variables that call for further investigation. This is illustrated through an example based on real data. Copyright © 2015 John Wiley & Sons, Ltd.
Citing Literature
Number of times cited according to CrossRef: 20
- Antonio Salsano, Carmelo Dominici, Antonio Nenna, Guido M. Olivieri, Ambra Miette, Raffaele Barbato, Elena Sportelli, Roberto Natali, Francesco Maestri, Massimo Chello, Giovanni Mariscalco, Francesco Santini, Predictive scores for major bleeding after coronary artery bypass surgery in low operative risk patients, The Journal of Cardiovascular Surgery, 10.23736/S0021-9509.20.11048-6, 61, 2, (2020).
- Giovanni Nattino, Michael L. Pennell, Stanley Lemeshow, Assessing the goodness of fit of logistic regression models in large samples: A modification of the Hosmer‐Lemeshow test, Biometrics, 10.1111/biom.13249, 76, 2, (549-560), (2020).
- Anna Zamperoni, Carlotta Rossi, Stefano Finazzi, Paolo Del Sarto, Matteo Mondini, Giovanni Nattino, Daniele Poole, Guido Bertolini, Case-mix affects calibration of cardiosurgical severity scores, Minerva Anestesiologica, 10.23736/S0375-9393.20.14280-9, 86, 7, (2020).
- Patricia Marcos-Garcia, Casey Brown, Manuel Pulido-Velazquez, Development of Climate Impact Response Functions for highly regulated water resource systems, Journal of Hydrology, 10.1016/j.jhydrol.2020.125251, 590, (125251), (2020).
- Feng Xie, Bibhas Chakraborty, Marcus Eng Hock Ong, Benjamin Alan Goldstein, Nan Liu, AutoScore: A Machine Learning-Based Automatic Clinical Score Generator and Its Application to Mortality Prediction Using Electronic Health Records (Preprint), JMIR Medical Informatics, 10.2196/21798, (2020).
- Rahul Raj, Teemu Luostarinen, Eetu Pursiainen, Jussi P. Posti, Riikka S. K. Takala, Stepani Bendel, Teijo Konttila, Miikka Korja, Machine learning-based dynamic mortality prediction after traumatic brain injury, Scientific Reports, 10.1038/s41598-019-53889-6, 9, 1, (2019).
- Giovanni Nattino, Stanley Lemeshow, Gary Phillips, Stefano Finazzi, Guido Bertolini, Assessing the Calibration of Dichotomous Outcome Models with the Calibration Belt, The Stata Journal: Promoting communications on statistics and Stata, 10.1177/1536867X1801700414, 17, 4, (1003-1014), (2019).
- Alice Quaegebeur, Loïc Brunard, François Javaudin, Marie-Anne Vibet, Pascale Bemer, Quentin Le Bastard, Eric Batard, Emmanuel Montassier, F Roman, P Llorens, F Salvi, R Galeazzi, M Ortega, F Marco, M Martinez Ortiz de Zarate, R Figueroa Ceron, F M Trovato, G Carpinteri, F Moustafa, J P Romaszko, M Pedersen, H Westh, P Dejaune, V Fihman, I Joost, B Blumel, F M Parrilla Ruiz, G Alvarez Corral, D Bieler, H Bergmann, H Granzer, P N Carron, G Prod’hom, G Greub, J M Gonzalez Del Castillo, F J Candel Gonzalez, M E Juvin, C Occelli, R Ruimy, P G Claret, J P Lavigne, P Hausfater, J Robert, N Ramacciati, A Mencacci, D Tartaglia, L Rossi, V Ojetti, C Petruzziello, B Fiori, J Bonenfant, C Piau-Couape, L Dejoies, Á Garcia-Garcia, O Cores-Calvo, C L Van Den Brand, S Q van Veen, S Laribi, M F Lartigue, Trends and prediction of antimicrobial susceptibility in urinary bacteria isolated in European emergency departments: the EuroUTI 2010-2016 Study, Journal of Antimicrobial Chemotherapy, 10.1093/jac/dkz274, (2019).
- Michael Jahn, Jan Rekowski, Guido Gerken, Andreas Kribben, Ali Canbay, Antonios Katsounas, The predictive performance of SAPS 2 and SAPS 3 in an intermediate care unit for internal medicine at a German university transplant center; A retrospective analysis, PLOS ONE, 10.1371/journal.pone.0222164, 14, 9, (e0222164), (2019).
- Guido Bertolini, Giovanni Nattino, Carlo Tascini, Daniele Poole, Bruno Viaggi, Greta Carrara, Carlotta Rossi, Daniele Crespi, Matteo Mondini, Martin Langer, Gian Maria Rossolini, Paolo Malacarne, Mortality attributable to different Klebsiella susceptibility patterns and to the coverage of empirical antibiotic therapy: a cohort study on patients admitted to the ICU with infection, Intensive Care Medicine, 10.1007/s00134-018-5360-0, 44, 10, (1709-1719), (2018).
- Sonia Rodríguez-Fernández, Encarnación Castillo-Lorente, Francisco Guerrero-Lopez, David Rodríguez-Rubio, Eduardo Aguilar-Alonso, Jesús Lafuente-Baraza, Francisco Javier Gómez-Jiménez, Juan Mora-Ordóñez, Ricardo Rivera-López, María Dolores Arias-Verdú, Guillermo Quesada-García, Miguel Ángel Arráez-Sánchez, Ricardo Rivera-Fernández, Validation of the ICH score in patients with spontaneous intracerebral haemorrhage admitted to the intensive care unit in Southern Spain, BMJ Open, 10.1136/bmjopen-2018-021719, 8, 8, (e021719), (2018).
- Maria Luce Caputo, Enrico Baldi, Simone Savastano, Roman Burkart, Claudio Benvenuti, Catherine Klersy, Roberto Cianella, Luciano Anselmi, Tiziano Moccetti, Romano Mauri, Gaetano M. De Ferrari, Angelo Auricchio, Validation of the return of spontaneous circulation after cardiac arrest (RACA) score in two different national territories, Resuscitation, 10.1016/j.resuscitation.2018.11.012, (2018).
- Giovanni Nattino, Stanley Lemeshow, Gary Phillips, Stefano Finazzi, Guido Bertolini, Assessing the Calibration of Dichotomous Outcome Models with the Calibration Belt, The Stata Journal: Promoting communications on statistics and Stata, 10.1177/1536867X1701700414, 17, 4, (1003-1014), (2018).
- Ming Hu, Xintai Zhong, Xuejiang Cui, Xun Xu, Zhanying Zhang, Lixian Guan, Quanyao Feng, Yiheng Huang, Weilie Hu, Development and validation of a risk-prediction nomogram for patients with ureteral calculi associated with urosepsis: A retrospective analysis, PLOS ONE, 10.1371/journal.pone.0201515, 13, 8, (e0201515), (2018).
- Mohammad Sohrab Hossain, Lisa A. Harvey, Md. Shofiqul Islam, Md. Akhlasur Rahman, Joanne V. Glinsky, Robert D. Herbert, A prediction model to identify people with spinal cord injury who are at high risk of dying within 5 years of discharge from hospital in Bangladesh, Spinal Cord, 10.1038/s41393-018-0211-y, (2018).
- Marika Fallenius, Markus B. Skrifvars, Matti Reinikainen, Stepani Bendel, Rahul Raj, Common intensive care scoring systems do not outperform age and glasgow coma scale score in predicting mid-term mortality in patients with spontaneous intracerebral hemorrhage treated in the intensive care unit, Scandinavian Journal of Trauma, Resuscitation and Emergency Medicine, 10.1186/s13049-017-0448-z, 25, 1, (2017).
- Anna Lee, Yip Sing Leo Cheung, Gavin Matthew Joynt, Czarina Chi Hung Leung, Wai-Tat Wong, Charles David Gomersall, Are high nurse workload/staffing ratios associated with decreased survival in critically ill patients? A cohort study, Annals of Intensive Care, 10.1186/s13613-017-0269-2, 7, 1, (2017).
- A. Lee, J. L. Mu, G. M. Joynt, C. H. Chiu, V. K. W. Lai, T. Gin, M. J. Underwood, Risk prediction models for delirium in the intensive care unit after cardiac surgery: a systematic review and independent external validation, BJA: British Journal of Anaesthesia, 10.1093/bja/aew476, 118, 3, (391-399), (2017).
- Daniele Poole, Greta Carrara, Guido Bertolini, Intensive care medicine in 2050: statistical tools for development of prognostic models (why clinicians should not be ignored), Intensive Care Medicine, 10.1007/s00134-017-4825-x, 43, 9, (1403-1406), (2017).
- John L. Moran, John Santamaria, Reconsidering lactate as a sepsis risk biomarker, PLOS ONE, 10.1371/journal.pone.0185320, 12, 10, (e0185320), (2017).




