Volume 57, Issue 5
Research Paper

Variable selection for zero‐inflated and overdispersed data with application to health care demand in Germany

Zhu Wang

Corresponding Author

Department of Research, Connecticut Children's Medical Center, Department of Pediatrics, University of Connecticut School of Medicine, Hartford, CT, 06106 USA

Corresponding author: e‐mail: zwang@connecticutchildrens.orgSearch for more papers by this author
Shuangge Ma

Department of Biostatistics, School of Public Health, Yale University, New Haven, CT, 06520 USA

Search for more papers by this author
Ching‐Yun Wang

Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, WA, 98109 USA

Search for more papers by this author
First published: 08 June 2015
Citations: 21

Abstract

In health services and outcome research, count outcomes are frequently encountered and often have a large proportion of zeros. The zero‐inflated negative binomial (ZINB) regression model has important applications for this type of data. With many possible candidate risk factors, this paper proposes new variable selection methods for the ZINB model. We consider maximum likelihood function plus a penalty including the least absolute shrinkage and selection operator (LASSO), smoothly clipped absolute deviation (SCAD), and minimax concave penalty (MCP). An EM (expectation‐maximization) algorithm is proposed for estimating the model parameters and conducting variable selection simultaneously. This algorithm consists of estimating penalized weighted negative binomial models and penalized logistic models via the coordinated descent algorithm. Furthermore, statistical properties including the standard error formulae are provided. A simulation study shows that the new algorithm not only has more accurate or at least comparable estimation, but also is more robust than the traditional stepwise variable selection. The proposed methods are applied to analyze the health care demand in Germany using the open‐source R package mpath.

Number of times cited according to CrossRef: 21

  • Improving predictor selection for injury modelling methods in male footballers, BMJ Open Sport & Exercise Medicine, 10.1136/bmjsem-2019-000634, 6, 1, (e000634), (2020).
  • An anomaly detection framework for cyber-security data, Computers & Security, 10.1016/j.cose.2020.101941, (101941), (2020).
  • Prediction of the 2019 IHF World Men’s Handball Championship – A sparse Gaussian approximation model, Journal of Sports Analytics, 10.3233/JSA-200384, (1-11), (2020).
  • Assessing the Risk of Car Crashes in Road Networks, Social Indicators Research, 10.1007/s11205-020-02295-x, (2020).
  • Influence of Sociodemographic, Premorbid, and Injury-Related Factors on Post-Concussion Symptoms after Traumatic Brain Injury, Journal of Clinical Medicine, 10.3390/jcm9061931, 9, 6, (1931), (2020).
  • Zero-inflated models for adjusting varying exposures: a cautionary note on the pitfalls of using offset, Journal of Applied Statistics, 10.1080/02664763.2020.1796943, (1-23), (2020).
  • Post estimation and prediction strategies in negative binomial regression model, International Journal of Modelling and Simulation, 10.1080/02286203.2020.1792601, (1-15), (2020).
  • Postoperative opioid‐prescribing practices in otolaryngology: A multiphasic study, The Laryngoscope, 10.1002/lary.28101, 130, 3, (659-665), (2019).
  • Vertical and temporal distribution of spotted‐wing drosophila (Drosophila suzukii) and pollinators within cultivated raspberries, Pest Management Science, 10.1002/ps.5343, 75, 8, (2188-2194), (2019).
  • Gender Difference in the Effects of Outdoor Air Pollution on Cognitive Function Among Elderly in Korea, Frontiers in Public Health, 10.3389/fpubh.2019.00375, 7, (2019).
  • Predicting future citation counts of scientific manuscripts submitted for publication: a cohort study in transplantology, Transplant International, 10.1111/tri.13292, 32, 1, (6-15), (2018).
  • Tariff and US Paper Products Trade, Forest Science, 10.1093/forsci/fxy028, 65, 1, (77-86), (2018).
  • A Note on the Adaptive LASSO for Zero-Inflated Poisson Regression, Journal of Probability and Statistics, 10.1155/2018/2834183, 2018, (1-9), (2018).
  • Are marginalized two-part models superior to non-marginalized two-part models for count data with excess zeroes? Estimation of marginal effects, model misspecification, and model selection, Health Services and Outcomes Research Methodology, 10.1007/s10742-018-0183-6, 18, 3, (175-214), (2018).
  • Marginalized zero‐inflated Poisson models with missing covariates, Biometrical Journal, 10.1002/bimj.201600249, 60, 4, (845-858), (2018).
  • Group regularization for zero‐inflated negative binomial regression models with an application to health care demand in Germany, Statistics in Medicine, 10.1002/sim.7804, 37, 20, (3012-3026), (2018).
  • A Spatial-Filtering Zero-Inflated Approach to the Estimation of the Gravity Model of Trade, Econometrics, 10.3390/econometrics6010009, 6, 1, (9), (2018).
  • Bayesian variable selection for multivariate zero-inflated models: Application to microbiome count data, Biostatistics, 10.1093/biostatistics/kxy067, (2018).
  • Application of bivariate negative binomial regression model in analysing insurance count data, Annals of Actuarial Science, 10.1017/S1748499517000070, 11, 02, (390-411), (2017).
  • Stochastic variable selection strategies for zero-inflated models, Statistical Modelling: An International Journal, 10.1177/1471082X17711068, (1471082X1771106), (2017).
  • EM Adaptive LASSO—A Multilocus Modeling Strategy for Detecting SNPs Associated with Zero-inflated Count Phenotypes, Frontiers in Genetics, 10.3389/fgene.2016.00032, 7, (2016).

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.