Multivariate statistical analysis and partitioning of sedimentary geochemical data sets: General principles and specific MATLAB scripts
Article first published online: 2 OCT 2013
©2013. American Geophysical Union. All Rights Reserved.
Geochemistry, Geophysics, Geosystems
Volume 14, Issue 10, pages 4015–4020, October 2013
How to Cite
2013), Multivariate statistical analysis and partitioning of sedimentary geochemical data sets: General principles and specific MATLAB scripts, Geochem. Geophys. Geosyst., 14, 4015–4020, doi:10.1002/ggge.20247., , and (
- Issue published online: 26 NOV 2013
- Article first published online: 2 OCT 2013
- Accepted manuscript online: 14 AUG 2013 12:00AM EST
- Manuscript Accepted: 5 AUG 2013
- Manuscript Revised: 26 JUL 2013
- Manuscript Received: 20 MAY 2013
- multivariate statistics
 Multivariate statistical treatments of large data sets in sedimentary geochemical and other fields are rapidly becoming more popular as analytical and computational capabilities expand. Because geochemical data sets present a unique set of conditions (e.g., the closed array), application of generic off-the-shelf applications is not straightforward and can yield misleading results. We present here annotated MATLAB scripts (and specific guidelines for their use) for Q-mode factor analysis, a constrained least squares multiple linear regression technique, and a total inversion protocol, that are based on the well-known approaches taken by Dymond (1981), Leinen and Pisias (1984), Kyte et al. (1993), and their predecessors. Although these techniques have been used by investigators for the past decades, their application has been neither consistent nor transparent, as their code has remained in-house or in formats not commonly used by many of today's researchers (e.g., FORTRAN). In addition to providing the annotated scripts and instructions for use, we discuss general principles to be considered when performing multivariate statistical treatments of large geochemical data sets, provide a brief contextual history of each approach, explain their similarities and differences, and include a sample data set for the user to test their own manipulation of the scripts.