Data-Mining Discovery of Pattern and Process in Ecological Systems



ABSTRACT  Most ecologists use statistical methods as their main analytical tools when analyzing data to identify relationships between a response and a set of predictors; thus, they treat all analyses as hypothesis tests or exercises in parameter estimation. However, little or no prior knowledge about a system can lead to creation of a statistical model or models that do not accurately describe major sources of variation in the response variable. We suggest that under such circumstances data mining is more appropriate for analysis. In this paper we 1) present the distinctions between data-mining (usually exploratory) analyses and parametric statistical (confirmatory) analyses, 2) illustrate 3 strengths of data-mining tools for generating hypotheses from data, and 3) suggest useful ways in which data mining and statistical analyses can be integrated into a thorough analysis of data to facilitate rapid creation of accurate models and to guide further research.