We compare 330 ARCH-type models in terms of their ability to describe the conditional variance. The models are compared out-of-sample using DM–$ exchange rate data and IBM return data, where the latter is based on a new data set of realized variance. We find no evidence that a GARCH(1,1) is outperformed by more sophisticated models in our analysis of exchange rates, whereas the GARCH(1,1) is clearly inferior to models that can accommodate a leverage effect in our analysis of IBM returns. The models are compared with the test for superior predictive ability (SPA) and the reality check for data snooping (RC). Our empirical results show that the RC lacks power to an extent that makes it unable to distinguish ‘good’ and ‘bad’ models in our analysis. Copyright © 2005 John Wiley & Sons, Ltd.