Get access

SMT: Sparse multivariate tree



A multivariate decision tree attempts to improve upon the single variable split in a traditional tree. With the increase in datasets with many features and a small number of labeled instances in a variety of domains (bioinformatics, text mining, etc.), a traditional tree-based approach with a greedy variable selection at a node may omit important information. Therefore, the recursive partitioning idea of a simple decision tree combined with the intrinsic feature selection of L1 regularized logistic regression (LR) at each node is a natural choice for a multivariate tree model that is simple, but broadly applicable. This natural solution leads to the sparse multivariate tree (SMT) considered here. SMT can naturally handle non-time-series data and is extended to handle time-series classification problems with the power of extracting interpretable temporal patterns (e.g., means, slopes, and deviations). Binary L1 regularized LR models are used here for binary classification problems. However, SMT may be extended to solve multiclass problems with multinomial LR models. The accuracy and computational efficiency of SMT is compared to a large number of competitors on time series and non-time-series data. © 2013 Wiley Periodicals, Inc. Statistical Analysis and Data Mining, 2013