Statistical and Numerical Computing
Published Online: 15 SEP 2006
Copyright © 2002 John Wiley & Sons, Ltd
Encyclopedia of Environmetrics
How to Cite
Friedl, H. and Stampfer, E. 2006. Cross-Validation. Encyclopedia of Environmetrics. 2.
- Published Online: 15 SEP 2006
Cross-validation is a resampling technique that is often used for the assessment of statistical models, as well as selection amongst competing model alternatives. Basically, it is a method to estimate the prediction error of statistical predictor functions. This technique can be very useful in data problems involving minimal distributional assumptions. It has found many applications ranging from linear regression, partial least squares, ridge regression, classification and discrimination, to smoothing and neural networks, in univariate as well as in multivariate settings. Cross-validation is rooted in the well-known phenomenon that estimating prediction error on the same data used for model building tends to give downward-biased estimates. The reason for this is that the parameter estimates are optimized to reflect the peculiarities of the dataset. When new data arrive, the model usually performs worse than expected on the grounds of assessment measurements on the training set. A reliable method to estimate the prediction error is especially necessary for very flexible models, e.g. neural networks and tree-based classifiers, where overfitting is ubiquitous.