Research Article
Optimal integrated code generation for VLIW architectures
Article first published online: 17 JAN 2006
DOI: 10.1002/cpe.1012
Copyright © 2006 John Wiley & Sons, Ltd.
Issue
1532-0634/asset/cover.gif?v=1&s=6094df24c795ce080ff6df6ff3b6bcec19adb708)
Concurrency and Computation: Practice and Experience
Special Issue: 10th International Workshop on Compilers for Parallel Computers (CPC 2003)
Volume 18, Issue 11, pages 1353–1390, September 2006
Additional Information
How to Cite
Kessler, C. and Bednarski, A. (2006), Optimal integrated code generation for VLIW architectures. Concurrency Computat.: Pract. Exper., 18: 1353–1390. doi: 10.1002/cpe.1012
Publication History
- Issue published online: 28 JUL 2006
- Article first published online: 17 JAN 2006
- Manuscript Accepted: 8 JUN 2005
- Manuscript Revised: 7 FEB 2005
- Manuscript Received: 30 APR 2003
Funded by
- Linköpings Universitet
- Ceniit Project 01.06 OPTIMIST
- SSF RISE
- Abstract
- References
- Cited By
Keywords:
- instruction-level parallelism;
- integrated code generation;
- dynamic programming;
- instruction scheduling;
- instruction selection;
- clustered VLIW architecture;
- data partitioning
Abstract
We present a dynamic programming method for optimal integrated code generation for basic blocks that minimizes execution time. It can be applied to single-issue pipelined processors, in-order-issue superscalar processors, VLIW architectures with a single homogeneous register set, and clustered VLIW architectures with multiple register sets. For the case of a single register set, our method simultaneously copes with instruction selection, instruction scheduling, and register allocation. For clustered VLIW architectures, we also integrate the optimal partitioning of instructions, allocation of registers for temporary variables, and scheduling of data transfer operations between clusters. Our method is implemented in the prototype of a retargetable code generation framework for digital signal processors (DSPs), called OPTIMIST. We present results for the processors ARM9E, TI C62x, and a single-cluster variant of C62x. Our results show that the method can produce optimal solutions for small and (in the case of a single register set) medium-sized problem instances with a reasonable amount of time and space. For larger problem instances, our method can be seamlessly changed into a heuristic. Copyright © 2006 John Wiley & Sons, Ltd.

1532-0634/asset/olbannerleft.gif?v=1&s=a4e4e145787de94e1d91eaab3c8c29d8a9d96a26)