Symbolic missing data imputation in principal component analysis

Authors

  • Paola Zuccolotto

    Corresponding author
    1. Quantitative Methods Department, University of Brescia, c.da S. Chiara 50, 25122 Brescia, Italy
    • Quantitative Methods Department, University of Brescia, c.da S. Chiara 50, 25122 Brescia, Italy
    Search for more papers by this author

Abstract

The concept of symbolic data has been developed with the aim of representing variables whose measurement is affected by some internal variation. This idea has been mainly concerned with the need of aggregating individuals in order to summarize large datasets into smaller matrices of manageable size, retaining as much of the original knowledge as possible. Nevertheless it is often applied also with variables structured from their outset as symbolic variables, although measured on single individuals. This paper deals with the latter framework, and aims at showing that symbolic data analysis techniques can be applied to the field of missing values treatment. The algorithm for a symbolic imputation technique in principal component analysis is presented as a generalization of the basic strategy called interval imputation. An illustrative example and a real data case study show how the proposed technique works. © 2011 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 4: 171–183, 2011

Ancillary