Get access

Bayesian variable selection for survival regression in genetics



Variable selection in regression with very big numbers of variables is challenging both in terms of model specification and computation. We focus on genetic studies in the field of survival, and we present a Bayesian-inspired penalized maximum likelihood approach appropriate for high-dimensional problems. In particular, we employ a simple, efficient algorithm that seeks maximum a posteriori (MAP) estimates of regression coefficients. The latter are assigned a Laplace prior with a sharp mode at zero, and non-zero posterior mode estimates correspond to significant single nucleotide polymorphisms (SNPs). Using the Laplace prior reflects a prior belief that only a small proportion of the SNPs significantly influence the response. The method is fast and can handle datasets arising from imputation or resequencing. We demonstrate the localization performance, power and false-positive rates of our method in large simulation studies of dense-SNP datasets and sequence data, and we compare the performance of our method to the univariate Cox regression and to a recently proposed stochastic search approach. In general, we find that our approach improves localization and power slightly, while the biggest advantage is in false-positive counts and computing times. We also apply our method to a real prospective study, and we observe potential association between candidate ABC transporter genes and epilepsy treatment outcomes. Genet. Epidemiol. 34:689–701, 2010. © 2010 Wiley-Liss, Inc.