Variable selection for zero‐inflated and overdispersed data with application to health care demand in Germany
Abstract
In health services and outcome research, count outcomes are frequently encountered and often have a large proportion of zeros. The zero‐inflated negative binomial (ZINB) regression model has important applications for this type of data. With many possible candidate risk factors, this paper proposes new variable selection methods for the ZINB model. We consider maximum likelihood function plus a penalty including the least absolute shrinkage and selection operator (LASSO), smoothly clipped absolute deviation (SCAD), and minimax concave penalty (MCP). An EM (expectation‐maximization) algorithm is proposed for estimating the model parameters and conducting variable selection simultaneously. This algorithm consists of estimating penalized weighted negative binomial models and penalized logistic models via the coordinated descent algorithm. Furthermore, statistical properties including the standard error formulae are provided. A simulation study shows that the new algorithm not only has more accurate or at least comparable estimation, but also is more robust than the traditional stepwise variable selection. The proposed methods are applied to analyze the health care demand in Germany using the open‐source R package mpath.
Citing Literature
Number of times cited according to CrossRef: 21
- Fraser Philp, Ahmad Al-shallawi, Theocharis Kyriacou, Dimitra Blana, Anand Pandyan, Improving predictor selection for injury modelling methods in male footballers, BMJ Open Sport & Exercise Medicine, 10.1136/bmjsem-2019-000634, 6, 1, (e000634), (2020).
- Marina Evangelou, Niall M. Adams, An anomaly detection framework for cyber-security data, Computers & Security, 10.1016/j.cose.2020.101941, (101941), (2020).
- Andreas Groll, Jonas Heiner, Gunther Schauberger, Jörn Uhrmeister, Prediction of the 2019 IHF World Men’s Handball Championship – A sparse Gaussian approximation model, Journal of Sports Analytics, 10.3233/JSA-200384, (1-11), (2020).
- Riccardo Borgoni, Andrea Gilardi, Diego Zappa, Assessing the Risk of Car Crashes in Road Networks, Social Indicators Research, 10.1007/s11205-020-02295-x, (2020).
- Marina Zeldovich, Yi-Jhen Wu, Anastasia Gorbunova, Ana Mikolic, Suzanne Polinder, Anne Plass, Amra Covic, Thomas Asendorf, Nada Andelic, Daphne Voormolen, Nicole von Steinbüchel, Influence of Sociodemographic, Premorbid, and Injury-Related Factors on Post-Concussion Symptoms after Traumatic Brain Injury, Journal of Clinical Medicine, 10.3390/jcm9061931, 9, 6, (1931), (2020).
- Cindy Feng, Zero-inflated models for adjusting varying exposures: a cautionary note on the pitfalls of using offset, Journal of Applied Statistics, 10.1080/02664763.2020.1796943, (1-23), (2020).
- Supranee Lisawadi, S. E. Ahmed, Orawan Reangsephet, Post estimation and prediction strategies in negative binomial regression model, International Journal of Modelling and Simulation, 10.1080/02286203.2020.1792601, (1-15), (2020).
- Sophia Dang, Alexander Duffy, Jonathan C. Li, Zachary Gandee, Tanvi Rana, Brittany Gunville, Tingting Zhan, Joseph Curry, Adam Luginbuhl, Elizabeth Cottrill, David Cognetti, Postoperative opioid‐prescribing practices in otolaryngology: A multiphasic study, The Laryngoscope, 10.1002/lary.28101, 130, 3, (659-665), (2019).
- Benjamin D Jaffe, Christelle Guédot, Vertical and temporal distribution of spotted‐wing drosophila (Drosophila suzukii) and pollinators within cultivated raspberries, Pest Management Science, 10.1002/ps.5343, 75, 8, (2188-2194), (2019).
- Hyunmin Kim, Juhwan Noh, Young Noh, Sung Soo Oh, Sang-Baek Koh, Changsoo Kim, Gender Difference in the Effects of Outdoor Air Pollution on Cognitive Function Among Elderly in Korea, Frontiers in Public Health, 10.3389/fpubh.2019.00375, 7, (2019).
- Michael Kossmeier, Georg Heinze, Predicting future citation counts of scientific manuscripts submitted for publication: a cohort study in transplantology, Transplant International, 10.1111/tri.13292, 32, 1, (6-15), (2018).
- Daowei Zhang, Ly Nguyen, Tariff and US Paper Products Trade, Forest Science, 10.1093/forsci/fxy028, 65, 1, (77-86), (2018).
- Prithish Banerjee, Broti Garai, Himel Mallick, Shrabanti Chowdhury, Saptarshi Chatterjee, A Note on the Adaptive LASSO for Zero-Inflated Poisson Regression, Journal of Probability and Statistics, 10.1155/2018/2834183, 2018, (1-9), (2018).
- Xueyan Liu, Bo Zhang, Li Tang, Zhiwei Zhang, Ning Zhang, Jeroan J. Allison, Deo Kumar Srivastava, Hui Zhang, Are marginalized two-part models superior to non-marginalized two-part models for count data with excess zeroes? Estimation of marginal effects, model misspecification, and model selection, Health Services and Outcomes Research Methodology, 10.1007/s10742-018-0183-6, 18, 3, (175-214), (2018).
- Habtamu K. Benecha, John S. Preisser, Kimon Divaris, Amy H. Herring, Kalyan Das, Marginalized zero‐inflated Poisson models with missing covariates, Biometrical Journal, 10.1002/bimj.201600249, 60, 4, (845-858), (2018).
- Saptarshi Chatterjee, Shrabanti Chowdhury, Himel Mallick, Prithish Banerjee, Broti Garai, Group regularization for zero‐inflated negative binomial regression models with an application to health care demand in Germany, Statistics in Medicine, 10.1002/sim.7804, 37, 20, (3012-3026), (2018).
- Rodolfo Metulini, Roberto Patuelli, Daniel Griffith, A Spatial-Filtering Zero-Inflated Approach to the Estimation of the Gravity Model of Trade, Econometrics, 10.3390/econometrics6010009, 6, 1, (9), (2018).
- Kyu Ha Lee, Brent A Coull, Anna-Barbara Moscicki, Bruce J Paster, Jacqueline R Starr, Bayesian variable selection for multivariate zero-inflated models: Application to microbiome count data, Biostatistics, 10.1093/biostatistics/kxy067, (2018).
- Feng Liu, David Pitt, Application of bivariate negative binomial regression model in analysing insurance count data, Annals of Actuarial Science, 10.1017/S1748499517000070, 11, 02, (390-411), (2017).
- Eva Cantoni, Marie Auda, Stochastic variable selection strategies for zero-inflated models, Statistical Modelling: An International Journal, 10.1177/1471082X17711068, (1471082X1771106), (2017).
- Himel Mallick, Hemant K. Tiwari, EM Adaptive LASSO—A Multilocus Modeling Strategy for Detecting SNPs Associated with Zero-inflated Count Phenotypes, Frontiers in Genetics, 10.3389/fgene.2016.00032, 7, (2016).




