A permutation approach to validation*

Authors

  • Malik Magdon-Ismail,

    Corresponding author
    1. Computer Science Department, Rensselaer Polytechnic Institute, 110 8th Street, Troy, NY 12180, USA
    • Computer Science Department, Rensselaer Polytechnic Institute, 110 8th Street, Troy, NY 12180, USA
    Search for more papers by this author
  • Konstantin Mertsalov

    1. Computer Science Department, Rensselaer Polytechnic Institute, 110 8th Street, Troy, NY 12180, USA
    Search for more papers by this author

  • *

    A preliminary version of this paper appears in the Siam Data Mining Conference, 2010

Abstract

We give a permutation approach to validation (estimation of out-sample error). One typical use of validation is model selection. We establish the legitimacy of the proposed permutation complexity by proving a uniform bound on the out-sample error, similar to a Vapnik-Chervonenkis (VC)-style bound. We extensively demonstrate this approach experimentally on synthetic data, standard data sets from the UCI-repository, and a novel diffusion data set. The out-of-sample error estimates are comparable to cross validation (CV); yet, the method is more efficient and robust, being less susceptible to overfitting during model selection. Copyright © 2010 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 3: 361-380 2010

Ancillary