Standard Article

Cross-Validation

Statistical and Numerical Computing

  1. Herwig Friedl,
  2. Erwin Stampfer

Published Online: 15 SEP 2006

DOI: 10.1002/9780470057339.vac062

Encyclopedia of Environmetrics

Encyclopedia of Environmetrics

How to Cite

Friedl, H. and Stampfer, E. 2006. Cross-Validation. Encyclopedia of Environmetrics. 2.

Author Information

  1. Technical University, Graz, Austria

Publication History

  1. Published Online: 15 SEP 2006

Abstract

Cross-validation is a resampling technique that is often used for the assessment of statistical models, as well as selection amongst competing model alternatives. Basically, it is a method to estimate the prediction error of statistical predictor functions. This technique can be very useful in data problems involving minimal distributional assumptions. It has found many applications ranging from linear regression, partial least squares, ridge regression, classification and discrimination, to smoothing and neural networks, in univariate as well as in multivariate settings. Cross-validation is rooted in the well-known phenomenon that estimating prediction error on the same data used for model building tends to give downward-biased estimates. The reason for this is that the parameter estimates are optimized to reflect the peculiarities of the dataset. When new data arrive, the model usually performs worse than expected on the grounds of assessment measurements on the training set. A reliable method to estimate the prediction error is especially necessary for very flexible models, e.g. neural networks and tree-based classifiers, where overfitting is ubiquitous.