Multivariate Association and Dimension Reduction: A Generalization of Canonical Correlation Analysis




Summary In this article, we propose a new generalized index to recover relationships between two sets of random vectors by finding the vector projections that minimize an L 2 distance between each projected vector and an unknown function of the other. The unknown functions are estimated using the Nadaraya–Watson smoother. Extensions to multiple sets and groups of multiple sets are also discussed, and a bootstrap procedure is developed to detect the number of significant relationships. All the proposed methods are assessed through extensive simulations and real data analyses. In particular, for environmental data from Los Angeles County, we apply our multiple-set methodology to study relationships between mortality, weather, and pollutants vectors. Here, we detect existence of both linear and nonlinear relationships between the dimension-reduced vectors, which are then used to build nonlinear time-series regression models for the dimension-reduced mortality vector. These findings also illustrate potential use of our method in many other applications. A comprehensive assessment of our methodologies along with their theoretical properties are given in a Web Appendix.