SEARCH

SEARCH BY CITATION

Keywords:

  • cross-validation;
  • hypothesis testing;
  • multivariate analysis;
  • Sweden;
  • variation partitioning

Summary

1 Multivariate analysis of complex data sets is plagued by problems of subjectivity and of finding statistically valid ways to test a large number of plausible hypotheses. We show how patterns in the data can be identified (data diving) as well as rigorously tested statistically by subdividing the data set.

2 We analysed data on weed biomass and environmental variables from more than 2000 plots in cereal and oil-seed crops in Sweden during 1970–94. Half the data set was used in an exploratory phase while the other half was used in a subsequent confirmatory phase.

3 The exploratory analyses included multivariate statistics [detrended correspondence analysis (DCA) and canonical correspondence analysis (CCA)] with various options and combinations of variables, and led to the formation of hypotheses that were then tested.

4 We tested the hypotheses in a sequential analysis with CCA and Monte Carlo permutation tests: after establishing the influence of one set of environmental variables, this set was covaried out in subsequent analyses. In this way the influence of (i) season of sowing of the crop; (ii) geographical region; (iii) soil type; (iv) crop species; and (v) temporal trends was tested. The four latter were tested separately for spring- and autumn-sown crops.

5 The sowing season of the crop had an overwhelming influence on the weed flora, and many weed species, both annual and perennial, showed strong associations with either autumn or spring. There were significant differences in weed flora composition between the geographical regions and soil types as well as between crop species. There were significant temporal trends only in the weed flora of autumn-sown crops.

6 This study provides a protocol that combines exploratory ‘data diving’ with strict hypothesis testing using direct gradient analysis methods such as CCA. Such two-phase analysis should improve the way complex data are analysed and patterns are interpreted.