SEARCH

SEARCH BY CITATION

Keywords:

  • bagging;
  • data mining;
  • decision trees;
  • exploratory data analysis;
  • hypothesis generation;
  • machine learning;
  • prediction

ABSTRACT  Most ecologists use statistical methods as their main analytical tools when analyzing data to identify relationships between a response and a set of predictors; thus, they treat all analyses as hypothesis tests or exercises in parameter estimation. However, little or no prior knowledge about a system can lead to creation of a statistical model or models that do not accurately describe major sources of variation in the response variable. We suggest that under such circumstances data mining is more appropriate for analysis. In this paper we 1) present the distinctions between data-mining (usually exploratory) analyses and parametric statistical (confirmatory) analyses, 2) illustrate 3 strengths of data-mining tools for generating hypotheses from data, and 3) suggest useful ways in which data mining and statistical analyses can be integrated into a thorough analysis of data to facilitate rapid creation of accurate models and to guide further research.