This review surveys a number of common model selection algorithms (MSAs), discusses how they relate to each other and identifies factors that explain their relative performances. At the heart of MSA performance is the trade-off between type I and type II errors. Some relevant variables will be mistakenly excluded, and some irrelevant variables will be retained by chance. A successful MSA will find the optimal trade-off between the two types of errors for a given data environment. Whether a given MSA will be successful in a given environment depends on the relative costs of these two types of errors. We use Monte Carlo experimentation to illustrate these issues. We confirm that no MSA does best in all circumstances. Even the worst MSA in terms of overall performance – the strategy of including all candidate variables – sometimes performs best (viz., when all candidate variables are relevant). We also show how (1) the ratio of relevant to total candidate variables and (2) data-generating process noise affect relative MSA performance. Finally, we discuss a number of issues complicating the task of MSAs in producing reliable coefficient estimates.
If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.