Get access

A refined decompiler to generate C code with high readability

Authors

  • Gengbiao Chen,

    1. Shanghai Key Laboratory of Scalable Computing and Systems, School of Software, Department of Computer Science and Engineering, Shanghai Jiao Tong University, China
    Search for more papers by this author
  • Zhengwei Qi,

    1. Shanghai Key Laboratory of Scalable Computing and Systems, School of Software, Department of Computer Science and Engineering, Shanghai Jiao Tong University, China
    Search for more papers by this author
  • Shiqiu Huang,

    1. Shanghai Key Laboratory of Scalable Computing and Systems, School of Software, Department of Computer Science and Engineering, Shanghai Jiao Tong University, China
    Search for more papers by this author
  • Kangqi Ni,

    1. Shanghai Key Laboratory of Scalable Computing and Systems, School of Software, Department of Computer Science and Engineering, Shanghai Jiao Tong University, China
    Search for more papers by this author
  • Yudi Zheng,

    1. Shanghai Key Laboratory of Scalable Computing and Systems, School of Software, Department of Computer Science and Engineering, Shanghai Jiao Tong University, China
    Search for more papers by this author
  • Walter Binder,

    1. Faculty of Informatics, University of Lugano, Switzerland
    Search for more papers by this author
  • Haibing Guan

    Corresponding author
    • Shanghai Key Laboratory of Scalable Computing and Systems, School of Software, Department of Computer Science and Engineering, Shanghai Jiao Tong University, China
    Search for more papers by this author

Correspondence to: Haibing Guan, Department of Computer Science and Engineering, Shanghai Jiao Tong University, China.

E-mail: hbguan@sjtu.edu.cn

SUMMARY

As a key part of reverse engineering, decompilation plays a very important role in software security and maintenance. A number of tools, such as Boomerang and IDA Hex_rays, have been developed to translate executable programs into source code in a relatively high-level language. Unfortunately, most existing decompilation tools suffer from low accuracy in identifying variables, functions, and composite structures, resulting in poor readability. To address these limitations, we present a practical decompiler called C-Decompiler for Windows C programs that (i) uses a shadow stack to perform refined data flow analysis, (ii) adopts inter-basic-block register propagation to reduce redundant variables, and (iii) recognizes library (i.e., Standard Template Library) functions by signatures. We evaluate and compare the decompilation quality of C-Decompiler with two existing tools, Boomerang and IDA Hex_rays, considering four aspects: function analysis, variable expansion rate, total percentage reduction, and cyclomatic complexity. Our experimental results show that on average, C-Decompiler has the highest total percentage reduction of 55.91%, lowest variable expansion rate of 55.79%, and the same cyclomatic complexity as the original source code for each considered application. Furthermore, in our experiments, C-Decompiler is able to recognize functions with a lower false positive and false negative rate than the other decompilers. A case study and our evaluation results confirm that C-Decompiler is a practical tool to produce highly readable C-style code. Copyright © 2012 John Wiley & Sons, Ltd.

Get access to the full text of this article

Ancillary