Geochemistry, Geophysics, Geosystems

Multivariate statistical analysis and partitioning of sedimentary geochemical data sets: General principles and specific MATLAB scripts



[1] Multivariate statistical treatments of large data sets in sedimentary geochemical and other fields are rapidly becoming more popular as analytical and computational capabilities expand. Because geochemical data sets present a unique set of conditions (e.g., the closed array), application of generic off-the-shelf applications is not straightforward and can yield misleading results. We present here annotated MATLAB scripts (and specific guidelines for their use) for Q-mode factor analysis, a constrained least squares multiple linear regression technique, and a total inversion protocol, that are based on the well-known approaches taken by Dymond (1981), Leinen and Pisias (1984), Kyte et al. (1993), and their predecessors. Although these techniques have been used by investigators for the past decades, their application has been neither consistent nor transparent, as their code has remained in-house or in formats not commonly used by many of today's researchers (e.g., FORTRAN). In addition to providing the annotated scripts and instructions for use, we discuss general principles to be considered when performing multivariate statistical treatments of large geochemical data sets, provide a brief contextual history of each approach, explain their similarities and differences, and include a sample data set for the user to test their own manipulation of the scripts.