Application of SIC (simple interval calculation) for object status classification and outlier detection—comparison with regression approach

Authors


Abstract

We introduce a novel approach termed simple interval calculation (SIC) for classification of object status in linear multivariate calibration (MVC) and other data analytical contexts. SIC is a method that directly constructs an interval estimator for the predicted response. SIC is based on the single assumption that all errors involved in MVC are limited. We present the theory of the SIC method and explain its realization by linear programming techniques. The primary SIC consequence is a radically new object classification that can be interpreted using a two-dimensional object status plot (OSP), ‘SIC residual vs SIC leverage’. These two new measures of prediction quality are introduced in the traditional chemometric MVC context. Simple straight demarcations divide the OSP into areas which quantitatively discriminate all objects involved in modeling and prediction into four different types: boundary samples, which are the significant objects (for generating the entire data structure) within the training subset; insiders, which are samples that comply with the model; outsiders, which are samples that have large prediction errors; and finally outliers, which are those samples that cannot be predicted at all with respect to a given model. We also present detailed comparisons of the new SIC approach with traditional chemometric methods applied for MVC, classification and outlier detection. These comparisons employ four real-world data sets, selected for their particular complexities, which serve as showcases of SIC application on intricate training and test set data structures. Copyright © 2005 John Wiley & Sons, Ltd.

Ancillary