Get access

Parsimonious conditional-mean model selection with multiple covariates: an analysis of infant mortality in the USA


Correspondence to: A. Gregory DiRienzo, Department of Epidemiology and Biostatistics, University at Albany, SUNY, Rensselaer, NY 12144, U.S.A.



This paper proposes and evaluates an objective methodology to select a parsimonious conditional-mean model when faced with multiple candidate predictor variables. The methodology attempts to fine-tune a well-established covariate screening method such as iterative sure independence screening with smoothly clipped absolute deviation penalty by using the following: (i) cross-validated or bootstrap estimates of prediction error; (ii) an objective model comparison strategy; and (iii) multiple hypothesis testing. The methods are analytically and numerically shown to work well in the sense that the probability that the final model selected contains one or more unimportant variables is asymptotically bounded at a preselected level for arbitrary data-generating distributions. This methodology is illustrated with a dataset consisting of birth certificate information and mortality records from year 2001 from the US Department of Health and Human Services on non-Hispanic African American female and male infants. It is shown how the instantaneous daily mortality hazard can be modeled flexibly by allowing both the set of important predictors and their effect on the hazard to change arbitrarily thru time. Results indicate that once controlling for birth weight, no other variables on the birth certificate are significantly associated with mortality; furthermore, time and sex modify the birth weight/survival relationship, with the strongest association at earliest days and low birth weight female infants having a better survival experience than male counterparts. Copyright © 2013 John Wiley & Sons, Ltd.

Get access to the full text of this article