This paper proposes a parallel execution mechanism with the refinement between machine instruction level to the register transfer level. In the proposed mechanism, the processor is function-partitioned into a number of processing units, and instruction streams are assigned independently to the processing units. According to the arc indicating the dependency in the control flow graph, the processing units exchange the asynchronous control signals at a high speed. Compared with the synchronous parallel execution mechanism such as VLIW, the proposed mechanism can extract the parallelism in a flexible and detailed way.

First, the general parallel computation model based on the proposed mechanism is described. Then the basic configuration of the computer is described for the prototype model based on three classes of machine instructions. A simple evaluation is presented through execution examples of sample programs.