FEATURE SELECTION BASED ON COMPACTNESS AND SEPARABILITY: COMPARISON WITH FILTER-BASED METHODS

Authors

  • Chien-Hsing Chen

    Corresponding author
    1. Department of Information Management, Ling Tung University, Taichung City, Taiwan
    • Address correspondence to Chien-Hsing Chen, Department of Information Management, Ling Tung University, No. 1, Lingtung Road, Nantun, Taichung City, 40852, Taiwan; e-mail: ktfive@gmail.com

    Search for more papers by this author

Abstract

Selecting a subset of salient features for performing clustering using a clustering learning algorithm has been explored extensively in many real-world applications. To select salient features during training, the filter model evaluates the intrinsic characteristics of each individual feature but is not permitted to use a clustering learning algorithm that provides clustered information to train the features. In particular, the filter model aims to predict unobservable clusters and measure how the features help provide satisfactory within-cluster and between-cluster scatters to achieve a good clustering quality. However, it is generally difficult to achieve both scatters in the filter model. For example, a random variable with a large variance may raise only the between-cluster scatter, whereas another variable following a uniform distribution may raise only the within-cluster scatter. In this paper, we present a new filter-based method to quantify features that consider feature compactness and separability to ensure that both scatters are raised. Moreover, our method adopts a new search strategy to locate the best feature salience vector instead of visiting the space of all the possible feature subsets. After the benchmark data sets are tested, the experimental results indicate that our method performs better than many benchmark filter-based methods at selecting a feature subset to perform clustering.

Ancillary