Get access

Distributed data mining patterns and services: an architecture and experiments


Eugenio Cesario, ICAR-CNR, Via P. Bucci 41C, 87036 Rende (CS), Italy.



Distributed data mining implements techniques for analyzing data on distributed computing systems by exploiting data distribution and parallel algorithms. The grid is a computing infrastructure for implementing distributed high-performance applications and solving complex problems, offering effective support to the implementation and use of data mining and knowledge discovery systems. The Web Services Resource Framework has become the standard for the implementation of grid services and applications, and it can be exploited for developing high-level services for distributed data mining applications. This paper describes how distributed data mining patterns, such as collective learning, ensemble learning, and meta-learning models, can be implemented as Web Services Resource Framework mining services by exploiting the grid infrastructure. The goal of this work was to design a distributed architectural model that can be exploited for different distributed mining patterns deployed as grid services for the analysis of dispersed data sources. In order to validate such an approach, we presented also the implementation of two clustering algorithms on the developed architecture. In particular, the distributed k-means and distributed expectation maximization were exploited as pilot examples to show the suitability of the implemented service-oriented framework. An extensive evaluation of its performance was provided. Copyright © 2011 John Wiley & Sons, Ltd.

Get access to the full text of this article