An unknown malware detection scheme based on the features of graph


Junfeng Wang, College of Computer Science, Sichuan University, Chengdu, China.



The traditional malware detection schemes based on specific signature give an unsatisfactory performance as disposing the previously unknown malware, so the general features of binary files should be explored to solve this problem. Recently, classification algorithms were employed successfully to choose the features in unknown malicious code, and most of the works use byte or operation code sequence n-gram representation of the executables. However, these n-gram representations are heavily dependent on the training data. In this paper, we present a graph-based method to detect unknown malware. The function call graph of an executable, which includes the functions and the call relations between them, is selected as the representation of the executable in this method. The features are defined according to both the statistical information and the topology of the function call graph. They are extracted and processed through machine learning to classify unknown Portable Executable files. For the sake of fixed sum of the features, the graph-based method can avoid so many features found in other methods. In our experiments, three types of malware datasets were tested, and as high as 96.8% accuracy can be achieved. Furthermore, it can achieve 92.1% accuracy when only 5% of the dataset is served as training set. Copyright © 2012 John Wiley & Sons, Ltd.