Comparison of statistical methods commonly used in predictive modelling
Article first published online: 24 FEB 2009
2004 IAVS - the International Association of Vegetation Science
Journal of Vegetation Science
Volume 15, Issue 2, pages 285–292, April 2004
How to Cite
Muñoz, J. and Felicísimo, Á. M. (2004), Comparison of statistical methods commonly used in predictive modelling. Journal of Vegetation Science, 15: 285–292. doi: 10.1111/j.1654-1103.2004.tb02263.x
- Issue published online: 24 FEB 2009
- Article first published online: 24 FEB 2009
- Received 8 October 2002; Accepted 17 November 2003
- Classification and Regression Tree;
- Logistic regression;
- Multivariate Adaptive Regression Splines;
- Regression Tree Analysis
Logistic Multiple Regression, Principal Component Regression and Classification and Regression Tree Analysis (CART), commonly used in ecological modelling using GIS, are compared with a relatively new statistical technique, Multivariate Adaptive Regression Splines (MARS), to test their accuracy, reliability, implementation within GIS and ease of use. All were applied to the same two data sets, covering a wide range of conditions common in predictive modelling, namely geographical range, scale, nature of the predictors and sampling method.
We ran two series of analyses to verify if model validation by an independent data set was required or cross-validation on a learning data set sufficed. Results show that validation by independent data sets is needed. Model accuracy was evaluated using the area under Receiver Operating Characteristics curve (AUC). This measure was used because it summarizes performance across all possible thresholds, and is independent of balance between classes.
MARS and Regression Tree Analysis achieved the best prediction success, although the CART model was difficult to use for cartographic purposes due to the high model complexity.