Stream reassembly is the premise of deep packet inspection, regarded as the core function of network intrusion detection system and network forensic system. As moving packet payload from one block of memory to another is essential for the reason of packet disorder, throughput performance is very vital in stream reassembly design. In this paper, a stream reassembly card (SRC) is designed to improve the stream reassembly throughput performance. The designed SRC adjusts the sequence of packets on the basis of the multicore network processing unit by managing and reassembling streams through an additional level of buffer. Specifically, three optimistic techniques, namely stream table dispatching, no-locking timeout, and multichannel virtual queue, are introduced to further improve the throughput. To address the critical role of memory size in SRC, the relationship between the system throughput and memory size is analyzed. Extensive experiments demonstrate that the proposed SRC achieves more than 3 Gbps in terms of reassembly and submission throughput and triply outperforms the traditional server-based architecture with a lower cost. Copyright © 2013 John Wiley & Sons, Ltd.