Unsupervised segmentation of conversational transcripts



Contact centers provide dialog-based support to organizations to address various customer-related issues. We have observed that calls at contact centers mostly follow well-defined patterns. Such call flows could specify how an agent should proceed in a call, handle objections, persuade customers, follow compliance issues, etc., and could also help to structure the operational process of call handling. Automatically identifying such patterns in terms of distinct segments from a collection of transcripts of conversations would improve productivity of agents as well as enable easy verification of whether calls comply with guidelines. Call transcripts from call centers typically tend to be noisy owing to the noise arising from agent/caller distractions, and errors introduced by the speech recognition engine. Such noise makes classical text segmentation algorithms such as TextTiling, which work on each transcript in isolation, very inappropriate. But such noise effects become statistically insignificant over a corpus of similar calls. In this paper, we propose an algorithm to segment conversational transcripts in an unsupervised way utilizing corpus level information of similar call transcripts. We show that our approach outperforms the classical TextTiling algorithm and also describes ways to improve the segmentation using limited supervision. We discuss various ways of evaluating such an algorithm. We apply the proposed algorithm to a corpus of transcripts of calls from a car reservation call center and evaluate it using various evaluation measures. We apply segmentation to the problem of automatically checking the compliance of agents and show that our segmentation algorithm considerably improves the precision. Copyright © 2009 Wiley Periodicals, Inc., A Wiley Company