Partial least squares (PLS) was not originally designed as a tool for statistical discrimination. In spite of this, applied scientists routinely use PLS for classification and there is substantial empirical evidence to suggest that it performs well in that role. The interesting question is: why can a procedure that is principally designed for overdetermined regression problems locate and emphasize group structure? Using PLS in this manner has heurestic support owing to the relationship between PLS and canonical correlation analysis (CCA) and the relationship, in turn, between CCA and linear discriminant analysis (LDA). This paper replaces the heuristics with a formal statistical explanation. As a consequence, it will become clear that PLS is to be preferred over PCA when discrimination is the goal and dimension reduction is needed. Copyright © 2003 John Wiley & Sons, Ltd.