Multiple additive regression trees with application in epidemiology
Article first published online: 14 APR 2003
Copyright © 2003 John Wiley & Sons, Ltd.
Statistics in Medicine
Special Issue: 8th Biennial CDC and ADSTR Syposium on Statistical Methods Issues Associated with Complicated Designs and Data Structures
Volume 22, Issue 9, pages 1365–1381, 15 May 2003
How to Cite
Friedman, J. H. and Meulman, J. J. (2003), Multiple additive regression trees with application in epidemiology. Statist. Med., 22: 1365–1381. doi: 10.1002/sim.1501
- Issue published online: 14 APR 2003
- Article first published online: 14 APR 2003
- predictive learning;
- regression trees;
- data mining;
- cervical cancer
Predicting future outcomes based on knowledge obtained from past observational data is a common application in a wide variety of areas of scientific research. In the present paper, prediction will be focused on various grades of cervical preneoplasia and neoplasia. Statistical tools used for prediction should of course possess predictive accuracy, and preferably meet secondary requirements such as speed, ease of use, and interpretability of the resulting predictive model. A new automated procedure based on an extension (called ‘boosting’) of regression and classification tree (CART) models is described. The resulting tool is a fast ‘off-the-shelf’ procedure for classification and regression that is competitive in accuracy with more customized approaches, while being fairly automatic to use (little tuning), and highly robust especially when applied to less than clean data. Additional tools are presented for interpreting and visualizing the results of such multiple additive regression tree (MART) models. Copyright © 2003 John Wiley & Sons, Ltd.