Get access

Speaker verification using heterogeneous neural network architecture with linear correlation speech activity detection



This paper presents a multi-level speaker verification system that uses 64 discrete Fourier transform spectrum components as input feature vectors. A speech activity detection technique is used as a pre-processing stage to identify vowel phoneme boundaries within a speech sample. A modified self-organising map (SOM) is then used to filter the speech data by using cluster information extracted from three vowels for a claimed speaker. This SOM filtering stage also provides coarse speaker verification. Finally, a second speaker verification level of three multi-layer perceptron networks classifies the filtered frames provided by the SOMs. These multi-layer perceptrons work as fine-grained vowel-based speaker verifiers. The proposed verification algorithm shows a performance of 94.54% when evaluated using 50 speakers from the Centre for Spoken Language Understanding speaker verification database. In addition, it is shown that the novel discrete Fourier transform spectrum-based linear correlation pre-processing technique, presented here, provides the system with greater robustness against changes in speech volume levels when compared with an equivalent energy frame analysis.