CLUSTERING TECHNIQUES AND DISCRETE PARTICLE SWARM OPTIMIZATION ALGORITHM FOR MULTI-DOCUMENT SUMMARIZATION

Authors


Address correspondence to Ramiz M. Aliguliyev, Department No 13, Institute of Information Technology of National Academy of Sciences, 9, F. Agayev Street, Az1141 Baku, Azerbaijan; e-mail: a.ramiz@science.az; aramiz@iit.ab.az

Abstract

Multi-document summarization is a process of automatic creation of a compressed version of a given collection of documents that provides useful information to users. In this article we propose a generic multi-document summarization method based on sentence clustering. We introduce five clustering methods, which optimize various aspects of intra-cluster similarity, inter-cluster dissimilarity and their combinations. To solve the clustering problem a modification of discrete particle swarm optimization algorithm has been proposed. The experimental results on open benchmark data sets from DUC2005 and DUC2007 show that our method significantly outperforms the baseline methods for multi-document summarization.

Ancillary