In recent years, the analysis of symbolic data where the units are categories, classes, or concepts described by intervals, distributions, sets of categories, and the like becomes a challenging task since many application ﬁelds generate complex and massive amounts of data that are diﬃcult to analyze with traditional techniques. In this article, we propose a strategy for extending standard principal component analysis (PCA) to such data in the case where the variables values are ‘bar charts’ (i.e., a set of categories called bins with their relative frequencies). First, we introduce ‘metabins’ which mix together bins of the diﬀerent bar charts and enhance interpretability. Standard PCA applied on the bins of such data tables can lose the bar chart constraints and suppose independencies between the bins. Therefore, we introduce a ‘Copular PCA’ as copulas take care of the probabilities and the underlying dependencies. Some theoretical results lead to the representation of the bar chart variables inside a hypercube covering the correlation sphere of a PCA applied on the bins. We give several ways for representing individuals and pathways of individuals × metabins or individuals × variables. Several tools of interpretation of such representations based on ‘coherency’ of metabins (or variables) among a trajectory (i.e., oriented pathway) of individuals and ‘diversity’ of individuals among a trajectory of metabins (or variables) are illustrated by some simple examples. © 2013 Wiley Periodicals, Inc. Statistical Analysis and Data Mining, 2013
If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.