Editorial
Special Issue: The Best of CCGrid'2007: A Snapshot of an ‘Adolescent’ Area
Article first published online: 22 JUL 2008
DOI: 10.1002/cpe.1360
Copyright © 2008 John Wiley & Sons, Ltd.
Issue
1532-0634/asset/cover.gif?v=1&s=6094df24c795ce080ff6df6ff3b6bcec19adb708)
Concurrency and Computation: Practice and Experience
Special Issue: The Best of CCGrid'2007: A Snapshot of an ‘Adolescent’ Area
Volume 21, Issue 3, pages 257–263, 10 March 2009
Additional Information
How to Cite
Cirne, W. and Schulze, B. (2009), Special Issue: The Best of CCGrid'2007: A Snapshot of an ‘Adolescent’ Area. Concurrency Computat.: Pract. Exper., 21: 257–263. doi: 10.1002/cpe.1360
Publication History
- Issue published online: 30 JAN 2009
- Article first published online: 22 JUL 2008
- Abstract
- Article
- References
- Cited By
1. THE CCGRID'2007 CONFERENCE
- Top of page
- 1. THE CCGRID'2007 CONFERENCE
- 2. GRID COMPUTING CIRCA 2007: AN ‘ADOLESCENT’ AREA
- 3. SUMMARY OF THIS SPECIAL ISSUE
- Acknowledgements
- REFERENCES
It is with great pleasure that we introduce you to this special issue with the best work presented in CCGrid'2007, the 7th IEEE International Symposium on Cluster Computing and the Grid, held in Rio de Janeiro, Brazil, on May 14–17, 2007. Over the years, CCGrid has become a premium conference of truly international coverage, bringing together researchers and practitioners, and enabling them to share their insight, results, and experience in the multi-faceted areas of Grid and cluster computing. Overall, 330 people attended the event, a CCGrid all-time record. This is a testimony to the fact that the grid and cluster computing communities continue to grow, and many early projects that were started in the beginning of the decade (when the conference series began) are now maturing and producing significant results.
The international coverage and impact of CCGrid are also worth highlighting. After Australia, Germany, Japan, the United States of America, the United Kingdom, and Singapore, CCGrid comes to Brazil without yet repeating a host country. Similarly, these proceedings carry papers from 22 different countries: Algeria, Australia, Austria, Brazil, Canada, China, Denmark, France, Germany, Greece, Holland, Israel, Italy, Korea, Japan, Poland, Singapore, Switzerland, Tunisia, the United States of America, the United Kingdom, and Singapore.
But, of course, the whole conference experience consisted of more than its strong technical program. It also included three thought-provoking keynote speeches: Grids Sandwiched by Web 2.0 and Multicore by Geoffrey Fox from Indiana University 1, Towards an International Computer Science Grid by Franck Cappello from INRIA, and Scale-up and Scale-out: Evolution and Trends in Parallel Processing by José E. Moreira from IBM, as well as a very well-attended industry track with presentations on Google Infrastructure for Massively Parallel Processing by Walfredo Cirne from Google; Promoting Cooperation in BitTorrent Communities by Miranda Mowbray from HP, and Opportunity and Challenges in e-Science by Fabrizio Gagliardi and Carlos Hulot from Microsoft.
Furthermore, in parallel with the main conference track, there were seven workshops on emerging topics: the IEEE Technical Committee on Scalable Computing Doctoral Symposium, Agent-based Grid Computing (AGC'07), Biomedical Computations on the Grid (BioGrid'07), Global and Peer-2-Peer Computing (GP2P'07), Context-Awareness and Mobility in Grid Computing (WCAMG'07), Programming Models for Grid Computing (PMGC'07), and Latin American Grid Workshop (LAGrid'07).
In the end, this massive wealth of high-quality papers, presentations, and discussions is the reason for the success of the conference. They turn the task of editing this special issue into a great challenge, albeit a delightful one. Selecting the best work among such great contributions is an arduous endeavor in itself, which we pursued together the Program Committee Co-chairs, to whom we are very grateful. But, it may be even more difficult to capture the ‘extra-technical’ discussions and conversations that happen in panels, Q&A sections, coffee breaks, and even in the conference's activities. Although they are often more well-informed speculation than scientific results, they are an important aspect of the conference, as they create a sense of community among the researchers and practitioners in the area, and often influence what endeavor each of us decides to pursue next.
We addressed this challenge by inviting extended and updated versions of some stronger technical contributions in the congress, together with an article laying the argument of the most discussed and controversial keynote presented. We also give our view on the current state of the area and why we say it is going through ‘adolescence’ as a way to complement the keynote article and try to capture the rich debates that took place during CCGrid'2007.
2. GRID COMPUTING CIRCA 2007: AN ‘ADOLESCENT’ AREA
- Top of page
- 1. THE CCGRID'2007 CONFERENCE
- 2. GRID COMPUTING CIRCA 2007: AN ‘ADOLESCENT’ AREA
- 3. SUMMARY OF THIS SPECIAL ISSUE
- Acknowledgements
- REFERENCES
High-performance computing (HPC) plays an interesting and unique role in the computer industry. HPC is a small niche, albeit a very important one, as it often debuts technologies that later become mainstream. Therefore, the computer industry tends to invest much more and to follow the developments in HPC much closely than its market importance would suggest. We believe that this ‘close monitoring’ of HPC was essentially what put Grid computing in the spot light, turning it into one of the hottest research areas in computer science of the early 2000s.
In the early 2000s, cluster computing has been consolidated as the way to HPC. Grid computing then appeared to be the promise of combining clusters, equipment, and talent into virtual organizations (VOs), delivering unprecedented levels of parallelism to high-performance applications, and making possible science that is just not achievable in any single place, by any single organization. The computer industry then made the observation that such a technology could also enable on-demand access and composition of any computational service provided by multiple independent sources. This could truly revolutionize the way we use computers, turning the focus from hardware, software, and peopleware into services, which could be composed on demand to serve whatever need one may have.
This was great news for the community working on HPC Grids. Not only would their work be relevant to many more people, but more funding would also be available. The last years saw great advances in Grid computing, as well as in cluster computing, as clusters remained the key ‘building block’ for Grids, HPC, and large Internet services. The technical papers we selected for this special issue provide an excellent example of how far we have reached. Two of them cover real and practical applications, which have been deployed and used (climate studies 9 and image analysis 4). The other works advance the state of the art in cluster and Grid computing in important ways but always evaluating their contributions in real systems, with at least prototype implementations. The days of ‘architecture papers’ and ‘simplified simulation results’ are over. Grid computing has concrete results to show. Although still young, one can no longer claim that the area is in its ‘infancy’.
Alas, the application of Grid computing beyond the HPC domain has been much slower. Many people believe that it is just a matter of time for it to happen. Other people have maintained that Grid computing at large was a difficult proposition because the standards being developed, based on Web Services (the so-called WS* protocols), were just too complex for wide acceptance. It used to be that believers would outnumber skepticals by at least one order of magnitude. In CCGrid'2007, we saw these groups having about the same size. We even had a keynote speech toying with the idea that ‘the WS* protocols are just too complex and heavyweight’ 1. Comparisons between the WS* standardization efforts and those of ISO/OSI network protocols were often heard in the discussions.
We argue that this leaves the area in an ‘adolescent’ state, with its own identity at stake. Some people keep believing that Grid computing at large is going to happen based on WS* protocols, which is the major focus of the area so far. Others argue that we should just forget about computing at large and focus on a scientific high-performance application, for which the results are concrete and relevant. Yet a third group maintains that the fundamental computer science problems posed by dynamic composition of services remain hard and relevant, and that the scientific contribution we have been making in tackling them is the real ‘deliverable’ of the area. According to this third group, we just need to make a technological change, moving from the WS* technology to something simpler, likely Web 2.0 (an argument well articulated in 1).
In reality, we do not know what the future holds for us. But it is clear that if the Grid computing community wants to have a real impact on computing as a whole, we need more people explicitly addressing problems and promoting use outside HPC. The buzzword appeal of the early days of Grid computing is over. It is time for concrete results. There are plenty of results in HPC, but can we do the same beyond it?
3. SUMMARY OF THIS SPECIAL ISSUE
- Top of page
- 1. THE CCGRID'2007 CONFERENCE
- 2. GRID COMPUTING CIRCA 2007: AN ‘ADOLESCENT’ AREA
- 3. SUMMARY OF THIS SPECIAL ISSUE
- Acknowledgements
- REFERENCES
The selected papers broadly cover key research issues in cluster computing and Grid computing as summarized below:
Fox and Pierce 1 observe that Grids (as envisioned around 2001) are being pressured by both emerging new computing resources (multicore, cell processors, GPUs, reconfigurable computing, etc.) and alternative approaches to service architectures (collectively, Web 2.0). They suggest that it is time to reappraise Grids—both the nature of the resources that they aggregate and the middleware that glues these resources together.
They discuss the application of Web 2.0 to support scientific research (e-Science) and related ‘e-moreorlessanything’ applications. Web 2.0 offers interesting technical approaches (protocols, message formats, and programming tools) to build core e-infrastructure (cyberinfrastructure) as well as many interesting services (Facebook, YouTube, Amazon S3/EC2, and Google maps) that can add value to e-infrastructure projects. They discuss why some of the original Grid goals of linking the world's computer systems may not be so relevant today as one has in many ways ‘too much computing’. Rather than cobbling together pre-existing heterogeneous systems, computing is so cheap that one can link together large scalable clouds with designed heterogeneity. Support for data-intensive systems is critical but these require interoperability interfaces at the data level and not always at the infrastructure level where Grids have focused. Web 2.0 can also support Parallel Programming 2.0—a better parallel computing software environment motivated by the need to run commodity applications on multicore chips. A ‘Grid on the chip’ will be a common use of future chips with tens or hundreds of cores. In spite of the unclear technology directions, they note that e-Science and more generally e-moreorlessanything are thriving with the advantages of distributed electronic enablement of many fields being very clear.
According to M. Klemm et al.2, most computational scientists have access to various worldwide computing resources in computational Grids. Alas, actually using these resources is far more difficult than desktop computing. Users have to deal with different architectures, network interconnects, etc. and also with the job scheduler that assigns CPUs to jobs. The user has to estimate the walltime it takes to run a job. As walltime is influenced by the degree of parallelism, by the input data, algorithms, and, more importantly, by unpredictable environmental issues (e.g. the amount of traffic and the communication network), it is difficult to correctly predict their walltime. With too short estimates, the scheduler terminates the job prematurely so that the user loses the computed results. Overestimation causes long waiting times in the queues of the cluster scheduler.
Their paper proposes a solution to this problem that makes Grid computing more transparent. If an OpenMP application is about to exceed its time slice, it is automatically checkpointed and transparently migrated to either a new local reservation or a reservation on a different (possibly remote) system of the computational Grid. The key technical contribution is that the runtime system adapts the degree of parallelism such that the application can even run if the target platform has a different number of CPUs. Threads may be added or removed as necessary while a parallel region is active, making the application's degree of parallelism malleable. The approach only slows down the application by about 4% (reparallelization) plus 2% (checkpointing), which seems acceptable, given the overall gain in application throughput.
The contribution by A. Vishnu et al.3 explores InfiniBand becoming a very popular interconnect, due to its advanced features and being open standard. Large-scale InfiniBand clusters are becoming very popular, as reflected by the TOP 500 supercomputer rankings. However, even with popular topologies such as constant bi-section bandwidth Fat Tree, hot-spots may occur with InfiniBand, due to inappropriate configuration of network paths, presence of other jobs in the network and unavailability of adaptive routing. Their paper presents a hot-spot avoidance layer (HSAL) for InfiniBand, which provides hot-spot avoidance using path bandwidth estimation and multi-pathing using LMC mechanism, without taking the network topology into account. They propose an adaptive striping policy with batch-based striping and sorting approach, for efficient utilization of disjoint network paths. Integration of HSAL with MPI, the de facto programming model of clusters, shows promising results with collective communication primitives and MPI applications.
S. Caton et al.'s 4 research demonstrates the use of a MATLAB-based image processing system to support applications deployed using Condor over a campus-wide grid. These resources are donated on a voluntary basis, and consequently the availability of workstations changes with time. At Cardiff University, they have access to a Condor system using approximately 2500 Windows XP workstations. Condor, as well as other distributed queuing systems and resource managers, does not consider particular requirements of image processing applications—thereby affecting the performance of such applications, especially when using a campus-wide grid. The main challenges are latency penalties for large data sets, granularity, and considerably shorter execution times than regular distributed computing applications. The use of volunteer resources adds further challenges to this list: unpredictable resource availability and longevity, multiple machine owners and administrators. They have augmented Condor to support deployment of such image processing applications using MATLAB. Their results demonstrate that opportunistic resources can facilitate disparate and short-running image processing applications without significant loss in performance and increase in cost.
L. Nassif et al.5 approach the resource selection problem challenges for achieving the best solution in the decision-making process, especially when considering many factors. A set of related works shows that a broker can select a resource for job execution considering aspects such as resource availability, access restrictions, job requirements, job execution time prediction, execution cost, and job execution success probability. This paper presents a taxonomy of resource selection systems to help in analyzing the related works and for classifying them concerning the decisor, the decision-making process, the number of resources selected, and the selection objectives. The authors then designed a solution, named MASK, based on user objectives. MASK is a broker middleware that uses the multi-agent system technology consisting of prediction, policy, and decision modules. The prediction model for job execution time was developed using the case-based reasoning paradigm. The prediction of a new job execution time is based on similar past job execution times. A policy model was developed based on fine-grain policies for access restriction verification purposes. The local policy verification anticipates the job submission, avoiding unsuccessful submissions. A decision model based on decision theory was developed. In this model, utility functions represent user preferences and the maximization of a multi-attribute utility model selects the machine with the best conditions to run a job. Results show that the prediction model is accurate and efficient in the prediction process and the distributed system runs faster than centralized approaches and considers access restrictions heterogeneities.
In their paper, A. Elghirani et al.6 mention that in large-scale data-intensive applications, data play a pivotal role in the execution of these applications, and data transfer is a primary cause of job execution delay. In environments such as the data Grids where execution of jobs that require large amounts of data is undertaken, a smart collaborative environment between the scheduling and data management services to achieve a synergistic effect on the performance of the Grid becomes essential. Their paper presents an intelligent data Grid framework where job scheduling and data and replica management are coupled to provide an integrated environment for efficient access to data and job scheduling. The data management service predicts and estimates the appropriate locations of replica and proactively replicates the data sets in these locations, while the intelligent Tabu Search-based scheduler incorporates information about the data sets, dispatching the jobs to the sites expected to promote minimum job execution time and better overall system utilization. This approach attempts to achieve a double optimization effect from both the replica management and the scheduling phases, while integrating scheduling and data replication to improve the performance of the Grid system. They have developed the cost model used by the framework and a heuristic to show the feasibility of the approach. Several experiments show that the approach improves the job makespan by 8–35%, depending on the scheduling and replication strategies used.
The text by S. Ayyub et al.7 presents the formation of a VO across otherwise autonomous organizations, providing an aggregation of many resources across a computational Grid. As the inclusion of more and more resources expands the Grid, the computational capabilities also increase. With this growth comes the ability to support increasingly larger applications. Certain large applications require the simultaneous commitment of many computational resources, execute for long periods of time, and make use of or generate large amounts of data. Several issues are inevitable for such applications.
Firstly, long-running applications are prone to failures, given the unpredictable nature of Grid environments. In such a scenario, it becomes beneficial to checkpoint applications, if possible, so that encountering a fault will only necessitate resuming the computation from the last successful checkpoint, rather than starting from scratch. However, this exacerbates both task and data management problems, by increasing the number of subtasks as well as intermediary files. The former impacts on how scheduling is performed effectively, while the latter may clash with disk storage quotas. Finally, although expanding the testbed across the resources of multiple VOs increases computational capabilities, it is rare that the computations and data movements are authorized across mutually exclusive trust domains.
They have dealt with all three issues through various means devised within the Nimrod/G parameter sweep meta-scheduler. First, long duration tasks may be checkpointed into subtasks, assuming that the application supports this feature. This should improve fault tolerance and augment scheduling flexibility in the face of faults. Second, they devised a garbage collection scheme that safely removes intermediate files between sub-computations, if they have become completely unnecessary. Finally, they devise a trust delegation scheme that enables direct third party transfers across VOs with mutually exclusive trust domains, as long as the applications are launched with credentials that are trusted by both VOs. These three strategies, described in detail in this paper, are important and elegant augmentations of the potential to support large parameter sweeps in multi-VO domains.
The paper by K. EL Maghraoui et al.8 focuses on malleability for iterative MPI applications. Malleability enables a parallel application's execution system to split or merge processes, thereby modifying the executed program's granularity. This is important because process migration that is widely used to adapt applications to dynamic execution environments is limited by the granularity of the application's processes. Hence, malleability empowers process migration by allowing the application's processes to expand or shrink following the availability of resources.
They have implemented malleability as an extension to the process checkpointing and migration (PCM) library, a user-level library for iterative MPI applications. PCM is integrated with the Internet operating system, a framework for middleware-driven dynamic application reconfiguration. The approach requires minimal code modifications and enables transparent middleware-triggered reconfiguration. This paper first presents the motivation of the work and describes the adopted approach to malleability in MPI applications. Then, it introduces the PCM library extensions for malleability and discusses its runtime system, including split and merge policies. To demonstrate the usefulness of malleability, experimental results from running a two-dimensional data parallel program that has a regular communication structure are presented.
Acknowledgements
- Top of page
- 1. THE CCGRID'2007 CONFERENCE
- 2. GRID COMPUTING CIRCA 2007: AN ‘ADOLESCENT’ AREA
- 3. SUMMARY OF THIS SPECIAL ISSUE
- Acknowledgements
- REFERENCES
We would like to thank the authors for contributing papers on their research in cluster computing and the Grid 9–15 to this special publication and thank all the reviewers for providing constructive reviews and in helping to shape this special issue. Finally, we would like to thank Prof. Geoffrey Fox for providing us an opportunity to bring forth this special issue.
REFERENCES
- Top of page
- 1. THE CCGRID'2007 CONFERENCE
- 2. GRID COMPUTING CIRCA 2007: AN ‘ADOLESCENT’ AREA
- 3. SUMMARY OF THIS SPECIAL ISSUE
- Acknowledgements
- REFERENCES
- 1, . Grids challenged by a Web 2.0 and multicore sandwich. Concurrency and Computation: Practice and Experience 2009; 21(3):265–280. DOI: 10.1002/cpe.1358.Direct Link:
- 2, , , , . Reparallelization techniques for migrating OpenMP codes in computational grids. Concurrency and Computation: Practice and Experience 2009; 21(3):281–299. DOI: 10.1002/cpe.1356.Direct Link:
- 3, , , , . . Topology agnostic hot-spot avoidance with InfiniBand. Concurrency and Computation: Practice and Experience 2009; 21(3):301–319. DOI: 10.1002/cpe.1359.Direct Link:
- 4, , . Distributed image processing over an adaptive Campus Grid. Concurrency and Computation: Practice and Experience 2009; 21(3):321–336. DOI: 10.1002/cpe.1357.Direct Link:
- 5, , . Resource selection in grid: a taxonomy and a new system based on decision theory, case-based reasoning and fine-grain policies. Concurrency and Computation: Practice and Experience 2009; 21(3):337–355. DOI: 10.1002/cpe.1355.Direct Link:
- 6, , . Intelligent scheduling and replication: a synergistic approach. Concurrency and Computation: Practice and Experience 2009; 21(3):357–376. DOI: 10.1002/cpe.1354.Direct Link:
- 7, , , , . Fault-tolerant execution of large parameter sweep applications across multiple VOs with storage constraints. Concurrency and Computation: Practice and Experience 2009; 21(3):377–392. DOI: 10.1002/cpe.1353.Direct Link:
- 8, , , . Malleable iterative MPI applications. Concurrency and Computation: Practice and Experience 2009; 21(3):393–413. DOI: 10.1002/cpe.1362.Direct Link:
- 9SchulzeB, BuyyaR, NavauxP, CirneW, RebelloV (eds.). Seventh IEEE International Symposium on Cluster Computing and the Grid (CCGrid'07). IEEE Computer Society Press: Silver Spring, MD, 2007.
- 10ReinefeldA, LöhrK-P, BalHE (eds.). Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'06). IEEE Computer Society Press: Silver Spring, MD, 2006.
- 11WalkerD, KesselmanC, RanaO (eds.). Fifth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'05). IEEE Computer Society Press: Silver Spring, MD, 2005.
- 12CatlettC, BeckmanP (eds.). Fourth IEEE International Symposium on Cluster Computing and the Grid (CCGrid'04). IEEE Computer Society Press: Silver Spring, MD, 2004.
- 13SekiguchiS, LeeS, MatsuokaS (eds.). Third IEEE International Symposium on Cluster Computing and the Grid (CCGrid'03). IEEE Computer Society Press: Silver Spring, MD, 2003.
- 14ReinefeldA, LöhrK-P, BalHE (eds.). Second IEEE International Symposium on Cluster Computing and the Grid (CCGrid'02). IEEE Computer Society Press: Silver Spring, MD, 2002.
- 15BuyyaR, Paul RoeGM (eds.). First IEEE International Symposium on Cluster Computing and the Grid (CCGrid'01). IEEE Computer Society Press: Silver Spring, MD, 2001.

1532-0634/asset/olbannerleft.gif?v=1&s=a4e4e145787de94e1d91eaab3c8c29d8a9d96a26)