SEARCH

SEARCH BY CITATION

Keywords:

  • PCA;
  • SIMCA;
  • leverage distribution;
  • residual variance distribution;
  • type I error;
  • acceptance area;
  • classification;
  • influence plot;
  • outlier

Abstract

In the projection methods (PCA, PLS) two distance measures are of importance. They are the score distance (SD, a.k.a. leverage) and the orthogonal distance (OD, a.k.a. the residual variance). This paper shows that both distance measures can be modeled by the χ2-distribution. Each model includes a scaling factor that can be described by an explicit equation. Moreover, the models depend on an unknown number of degrees of freedom, which have to be estimated using a training dataset. Such modeling is further applied to classification within the SIMCA framework, and various acceptance areas are built for a given significance level. A triangular area, constructed using the sum of the normalized SD and OD, is deemed to be the most practical. This theoretical notion is supported by three examples. The first is based on a simulated dataset, while the other two employ real world data. Copyright © 2008 John Wiley & Sons, Ltd.