Similarity Based Join Over Audio Feeds in a Multimedia Data Stream Management System



Over the last several years, processing of high performance data streams has become very important in various domains. A new type of data processing is needed for applications where input data streams are modeled as multimedia data streams, such as audio and video feeds. For example, in the public safety sector, monitoring and automatic identification of particular individuals suspected of terrorist or criminal activity requires the processing of complex audio and video streams, which is beyond the capabilities of a typical data stream management system (DSMS). The concept of a multimedia data stream management system (MMDSMS) has recently been introduced in order to effectively process continuous queries over dynamic multimedia data streams. In this paper, we address MMDSMS functionalities related to speaker recognition problems in the area of detecting individuals who may pose security threats. We focus on audio feed processing using our novel similarity-based join and on parameterization of the multimedia signal for the process of recognition. We propose a set of signal parameters which a clearly discriminate among individual voices by describing the signal using a homomorphic processing method. Our research was primarily focused on assessing the applicability of cepstral analysis in speech recognition systems, based on a set of acquired digitized voice samples. We developed a research prototype to assess the proposed concepts, and verified the effectiveness of our framework in a lab environment. © 2013 Alcatel-Lucent.