Special Issue Paper
Hadoop framework: impact of data organization on performance
Article first published online: 23 MAY 2011
DOI: 10.1002/spe.1082
Copyright © 2011 John Wiley & Sons, Ltd.
Issue

Software: Practice and Experience
Early View (Online Version of Record published before inclusion in an issue)
Additional Information
How to Cite
Tan, Y. S., Tan, J., Chng, E. S., Lee, B.-S., Li, J., Date, S., Chak, H. P., Xiao, X. and Narishige, A. (2011), Hadoop framework: impact of data organization on performance. Softw: Pract. Exper.. doi: 10.1002/spe.1082
Publication History
- Article first published online: 23 MAY 2011
- Manuscript Accepted: 27 FEB 2011
- Manuscript Revised: 30 DEC 2010
- Manuscript Received: 28 JUN 2010
- Abstract
- Article
- References
- Cited By
Keywords:
- mapreduce;
- hadoop;
- performance tuning;
- distributed computing
Abstract
Hadoop, based on the popular MapReduce framework, is an open-source distributed computing framework that has been gaining much popularity and usage. It aims to allow programmers to focus on building applications that deals with processing large amount of data, without having to handle other issues when performing parallel computations. However, tuning the performance of Hadoop applications is not an easy task due to the level of abstraction of the framework. In this paper, we present three case studies and some of the challenges and issues that are to be considered in performance tuning when running applications in Hadoop. The focus is mainly on the impact of input data on Hadoop's performance and how they can be tuned. Copyright © 2011 John Wiley & Sons, Ltd.

1097-024X/asset/olbannerleft.jpg?v=1&s=2d7d001211f2c40f177a231141601e9f52afc1f3)
1097-024X/asset/olbannerright.jpg?v=1&s=3aec7891a8ba78b361ead9743adfc0b6eae6369a)