# 9. Classification Analysis: Allocation of Observations to Groups

Published Online: 27 MAR 2003

DOI: 10.1002/0471271357.ch9

Copyright © 2002 John Wiley & Sons, Inc.

Book Title

## Methods of Multivariate Analysis, Second Edition

Additional Information

#### How to Cite

Rencher, A. C. (2002) Classification Analysis: Allocation of Observations to Groups, in Methods of Multivariate Analysis, Second Edition, John Wiley & Sons, Inc., New York, NY, USA. doi: 10.1002/0471271357.ch9

#### Publication History

- Published Online: 27 MAR 2003
- Published Print: 22 FEB 2002

#### Book Series:

#### ISBN Information

Print ISBN: 9780471418894

Online ISBN: 9780471271352

- Summary
- Chapter

### Keywords:

- classification;
- allocation;
- prediction;
- pattern recognition;
- prior probabilities;
- asymptotically optimal;
- heterogeneity of covariance matrices;
- error rate;
- resubstitution;
- holdout method;
- leaving-one-out method;
- cross validation;
- stepwise discriminant analysis;
- categorical variables;
- dummy variables;
- kernel;
- smoothing parameter

### Summary

The descriptive aspect of discriminant analysis, in which group separation is characterized by means of discriminant functions, was covered in Chapter 8. In Chapter 9, we cover allocation of observations to groups, which is the predictive aspect of discriminant analysis. We prefer to call this classification analysis to clearly distinguish it from the descriptive aspect. In engineering and computer science, classification is usually called pattern recognition.

In classification, a sampling unit (subject or object) whose group membership is unknown is assigned to a group on the basis of the vector **y** = (*y*_{1}, *y*_{2}, …, *y _{p}*)′ of

*p*measured values associated with the unit.

If there are two groups, an observation **y** can be classified into one of the two groups by means of a simple procedure based on Fisher's linear classification function introduced in Chapter 5. Prior probabilities of group membership, if available, can be used with Welch's optimal classification rule to improve the classification.

To classify an observation **y** into one of several groups, we use linear classification functions or quadratic classification functions, depending on whether the population covariance matrices are assumed to be equal. Prior probabilities can be used to improve the classification.

To judge the ability of classification procedures to predict group membership, we usually use the probability of misclassification, which is known as the error rate. We could also use its complement, the correct classification rate. Various estimates of the error rate are discussed.

The experimenter often has available a large number of variables and wishes to keep any that might aid in predicting group membership but at the same time to delete any superfluous variables that do not contribute to allocation. Many selection schemes for classification analysis are based on stepwise discriminant analysis or a similar approach (Section 8.9).

Multinomial or categorical data can be classified using certain nonparametric classification procedures or by use of dummy variables with ordinary classification functions. Nonparametric approaches that can be used with ranked or continuous data include the method of density estimators and the *k* nearest neighbor classification rule.

Many examples using real data illustrate the techniques in this chapter. The problem set at the end of the chapter provides derivations of certain techniques in the chapter and further illustrates most procedures with real data.