• computer security;
  • content-based type detection;
  • computer files;
  • principle component analysis (PCA);
  • support vector machine (SVM);
  • MLP classifier


Digital information is packed into files when it is going to be stored on storage media. Each computer file is associated with a type. Type detection of computer data is a building block in different applications of computer forensics and security. Traditional methods were based on file extensions and metadata. The content-based method is a newer approach with the lowest probability of being spoofed and is the only way for type detection of data packets and file fragments. In this paper, a content-based method that deploys principle component analysis and neural networks for an automatic feature extraction is proposed. The extracted features are then applied to a classifier for the type detection. Our experiments show that the proposed method works very well for type detection of computer files when considering the whole content of a file. Its accuracy and speed is also significant for the case of file fragments, where data is captured from random starting points within files, but the accuracy differs according to the lengths of file fragments. Copyright © 2012 John Wiley & Sons, Ltd.