SEARCH

SEARCH BY CITATION

Keywords:

  • document classification;
  • expert systems;
  • ANFIS;
  • SVM;
  • Turkish NLP;
  • ROC

Abstract

The security-level detection of a confidential document is a vital task for organizations to protect their confidential information. Diverse classification rules and techniques are being applied by human experts. Increasing number of confidential information in organizations is making difficult to classify all the documents carefully with human effort. The recommended frameworks in this study classify the internal documents of TUBITAK UEKAE (National Research Institute of Electronics and Cryptology of Turkey) by using classification algorithms naïve Bayes, support vector machines (SVMs) and adaptive neuro-fuzzy inference systems (ANFISs). A hybrid approach involving support vector classifiers and adaptive neuro-fuzzy classifiers exposes the most successful accuracy rates of expert system classification. This study also states preprocessing tasks required for document classification with natural language processing. To represent term–document relations, a recommended metric TF-IDF was chosen to construct a weight matrix. Agglutinative nature of Turkish documents is handled by Turkish stemming algorithms. At the end of the article, some experimental results and success metrics are projected with accuracy rates and receiver operating characteristic (ROC) curves.