Current graphics processing units (GPU) typically offer only a limited number of programmable pipeline stages, whose usage, data flow and topology are mostly fixed. Although a more flexible, custom rendering pipeline can be emulated using the compute functionality of existing GPUs, this approach requires to manage work queues, synchronization, and scheduling in software. In this paper, we present a hardware architecture for a novel, programmable rendering pipeline, which is based on a circulating stream of data and control tokens that are iteratively modified via pattern matching. Our architecture provides light-weight mechanisms for dynamic thread creation, lock-free synchronization, and scheduling to support recursion, dynamic shader linkage and custom primitive types. A hardware prototype, running complex examples, demonstrates the improved reconfigurability also the scalability of our graphics architecture.