1. Top of page

In many areas, high-performance computing (HPC) and simulation have become determinants of industrial competitiveness and advanced research. Following the progress in aerospace, automobile, environmental, energy, healthcare, and networking industries, most research domains nowadays measure the strategic importance of their developments vis-á-vis the mastering of these critical technologies. Intensive computing and numerical simulation are now essential tools that contribute to the success in systems designs, effectiveness of public policies such as prevention of natural hazards and taking account of climate risks, but also to security and national sovereignty. Yet, the fact that simulation is employed by a large number of users does not mean that they all contribute equally to the advancement of this science. It is widely anticipated that the continual progress and investment in HPC and simulation will bring about innovations and technologies that will contribute to the growth and evolution in all major scientific domains. For instance, the simulation of complex phenomena, such as biological and living components, will lead to spectacular scientific breakthroughs.

In terms of hardware and software architectures, we can expect exaflopic performances [1] to be reached before 2020. Exascale computing is however an inspiring challenge, implying difficult but invigorating technological obstacles. The arrival of General Purpose-Graphical Processing Units (GP-GPU) has impacted the pace of improvements in peak performances. However, this development implies rethinking the use of such architectures to obtain maximum performance or peak return whenever possible. In some cases, these technologies will require significant efforts to adapt them to existing applications. At the same time, they will also impact the design of future applications. Furthermore, they will require acquiring and building new tools and infrastructure [5, 4, 2, 3]. HPC has so far been a laboratory for the development of techniques, technologies, services, and applications that sooner or later will end up in future consumer desktop computers. Nowadays, desktops and laptops have vector processing capabilities, with Streaming SIMD Extensions (SSE) instructions, similar to what Cray proposed in the seventies (Advanced Vector Extensions (AVX) are also now available since the introduction of Intel's Sandy Bridge processor. Equally, the introduction of the personal ‘super-computer’ in 2008 with NVIDIA's Tesla boards (1 Teraflop - single precision) changed the way we think about HPC [6]. Such components have been introduced in the design of supercomputers and clusters [7]. At the time of the High Performance Computing and Simulation (HPCS) 2010 conference, three of the first five supercomputers ranked in the ‘top500’ [8] were hybrid, some with Tesla boards and others with the Fermi architecture, which considerably improved double precision performances [9, 10]. At the time of writing this editorial, double Graphical Processing Units (GPUs) with thousands of cores are available. An IBM BlueGene/Q system named Sequoia has been recently installed at the Department of Energy's Lawrence Livermore National Laboratory. This supercomputer achieved 16.32 petaflop/s on the Linpack benchmark using 1,572,864 cores. It is also one of the most energy efficient systems in the Top500 list. For next year, supercomputers are expected to be more energy efficient while surpassing the 20 petaflops milestone, and we anticipate even higher peak performances and efficiencies in the subsequent years. In addition, the introduction in 2009 of Intel's Sandy Bridge [11] and AMD's Accelerated Processing Unit (APU) [12] will also impact the way we will design and program these new parallel architectures. These exciting developments are also a challenge we will have to deal with. Generalist multicore architectures will arrive around 2013 with the commercial availability of the Intel Many Integrated Core (MIC) architecture [13, 14] (Xeon Phi is the final name retained by Intel for the commercialization of this accelerator board). Meanwhile, researchers will have to improve or adapt their parallel processing skills and programming standards.

In addition to hardware revolutions, another paradigm shift is occurring since the introduction of the notion of Cloud Computing. Even if this approach was first dedicated to business and web-based applications, on-demand HPC is possible and supposes that we will be able to obtain a set of resources as well as services adapted to a specific computing scenario [15, 17, 16]. There are indications that the scientific community is technologically ready for the implementation of private HPC clouds, although a full HPC cloud solution running on virtual machines (VMs) may remain application dependent [18]. The possibility of having HPC cloud computing brings fast-compute clusters within the reach of researchers and users for whom traditional HPC facilities are not a choice. For example, with the BiG Grid HPC Cloud, users get access to a virtualized HPC Cluster that they can configure to exactly match their needs. Still, it provides self-service and dynamically scalable high performance computing facilities [19]. Similarly, in 2010, SGI announced its SGI Cyclone for large-scale, on-demand cloud computing services specifically dedicated to technical applications [20]. This area will remain understandably active in terms of research and development for the foreseeable future, particularly because the virtualization and management features of cloud systems make them an ideal design point for exascale operating system and runtime, and will enable exascale for a broader class of applications [21].

For some time, it has been known that modeling and simulation will continue to play a major role in the future in various aspects of sciences and engineering [22]. We believe that the fields of HPC and simulation will play major roles in industry and in society in general [23]. This supposes that we give more focus to the software tools and research projects that use HPC and simulation in the hope that we will improve our parallel programming tools, languages, and libraries [24]. Just as important, fault tolerance [25] and scalability of algorithms [26] will have to be considered.


  1. Top of page

This special issue contains research papers addressing the state-of-the-art in high-performance and large-scale computing systems and their use in modeling and simulation of real-world problems. A set of carefully selected works was invited on the basis of original presentations at the 2010 IEEE International Conference on HPCS (HPCS 2010) [27], which was held in Caen, France, June 28–July 2, 2010. The augmented works have been thoroughly reviewed, and only 10 papers covering a wide range of contemporary issues in HPC and simulation were retained in this issue. The manuscripts tackle research on topics that extend from accelerators, GPU and manycore systems, to methodologies and algorithms, to applications in distributed services, computational biology and environmental studies, to advances in simulation.

Relevant to the issues and trends brought up earlier, the papers presented in this volume can be classified into four categories. The first category deals with hardware accelerators, GPU, and manycore systems. Two papers in this category present research concerned with computational biology: the first one uses Field-Programmable Gate Arrays (FPGAs) and Application-Specific Integrated Circuits (ASICs) for DNA sequence alignment [28], and the second solves large plant motif problems with GPU computing [29]. Still, with hybrid computing porting of applications for GPU, accelerators have been achieved for a molecular dynamic framework [30] and for the dynamics part of a weather forecasting package [31]. The second group has two papers that deal with methodologies and algorithms: in [32], a cluster-based algorithm helps determine the catchment basin of rivers in large digital elevation models (DEMs), whereas the second paper proposes an interesting speedup-test protocol on the basis of well-known statistical tests [33]. The third category addresses simulation practices, with the first paper in this category being a survey that explains the existing techniques dedicated to the partitioning of random streams for parallel stochastic simulations and how such techniques have been adapted to hybrid computing with GPUs [34]. The second paper in this category presents a manycore architecture simulator adapted to deal with thousands of cores [35]. This should be interesting as we observe a growing trend of complex large scale systems built. The fourth category of papers deals with distributed services computing, whether with clouds [36] or with the mining of large data repositories [37].

In the following paragraphs, a short description that introduces all of the Special Issue papers in more details is given.

2.1 Hardware Accelerators, GPU and Manycore Systems

In [28], N. Sebastiao, N. Roma, and P. Flores present a new class of flexible and configurable hardware accelerators with high performance in DNA alignment based on the Smith and Waterman algorithm [38]. They introduce an innovative technique that significantly reduces the time and memory requirements for the traceback phase. In addition, their solution is able to concurrently process several small query sequences. The hardware accelerators proposed can be implemented either in FPGA or ASIC technologies.

In their paper, N.S. Dasari et al. [29] propose a parallelization of the planted motif problem (PMP) solving. Classical approaches to solve PMP are difficult to parallelize. This paper gives a simple and easily parallelizable enumeration-based approach called BitBased. With a smart memory usage, the BitBased approach is adapted to the solution of large size problems such as the (21,8) PMP, which was not previously reported in the literature as solved. Improved performances are achieved using multicore and GPU devices.

P.K. Agarwal et al. [30] describe the optimization of LAMMPS, a popular molecular dynamic framework that is ported on a hybrid architecture using GPU accelerators. The most computationally expensive tasks without interactions are dedicated to the GPUs. This is achieved on top of a more classical code using the Message Passing Interface (MPI) protocol. Benchmarks are shown on four generations of NVIDIA's GPU devices with different biomolecular system sizes. A parameterized performance model is also proposed to explore the potential of future heterogeneous systems for biological simulations.

V.T. Vu et al. [31] present the implementation of dynamics routines for a weather forecast model (HIRLAM originally in Fortran) in C with the Compute Unified Device Architecture (CUDA). The optimal number of grid points per thread is empirically determined. Optimal device-to-CPU data transfer is computed, and the resulting code, using multiple CUDA streams, is generated automatically. This implementation is used to check the applicability of GPUs to speedup the computing of the dynamics part of numerical weather prediction.

2.2 Methodologies and Algorithms

H.T. Do et al. [32] describe a parallel algorithm that determines the catchment basin of rivers in large Digital Elevation Models (DEMs). This fast and scalable algorithm constructs a minimal spanning tree and combines hydrogeology, image processing, and graph theory techniques. Designed for cluster architectures, this algorithm is able to manipulate large DEMs and obtains fairly accurate results in terms of geomorphology.

Lastly, S-A-A. Touati, J. Worms, and S. Briais [33] propose a rigorous statistical methodology that deals with the analysis of program performances. Following their proposition of speedup-test protocol, which is based on a set of well-known statistical tests, we anticipate to be in a position to improve the reproducibility of performance evaluation experiments. The implementation of this protocol is available as an open source R toolbox and that enables the certification of speedups with confidence intervals.

2.3 Simulation Practices

In [34], D.R.C. Hill et al. present the different partitioning techniques currently in use to provide independent streams to parallel processes in the case of stochastic simulations. A large set of research papers has shown that even if we have at our disposal statistically sound random number generators, their parallelization is still a delicate problem for many researchers and users. In addition to the classical parallelization approaches in use on regular processors, this paper also presents recent advances in pseudorandom number generation for General Purpose-Graphical Processing Units (GP-GPU).

R. Martino et al. [35] present a manycore architecture simulator, named SIMinG-1k, running on GP-GPUs. This simulator infrastructure is CUDA-based and is developed for design-space exploration of large-scale multicores and applications research. SIMinG-1k proposes a fine balance between simulation speed and modeling while keeping scalability. This simulator is able to simulate Acorn RISC Machine (ARM) and Intel x86 Industrial Standard Architectures up to thousands of cores.

2.4 Services Computing

In [36], V.C. Emeakaroha et al. introduce a new framework designed to monitor the provisioning of low-level resources with a mapping to high level of service-level agreements (SLAs). This framework, named LoM2HiS, includes an application deployment mechanism with monitored information and SLA violation prevention techniques to avoid penalty cost. The design as well as the implementation of this framework are discussed in details, and a case study demonstrates its usage and performance in real-world cloud environment. This framework can be considered as a pointer towards self-governing Information & Communication Technologies (ICT) infrastructures.

In the last contribution of this special issue, E. Cesario et al. [37] discuss data mining services and workflows for analyzing scientific data in high-performance distributed environments such as grids and clouds. In their paper, the authors present the definition of services for supporting distributed data mining tasks in grids. They also propose a workflow formalism and a service-oriented programming framework named DIS3GNO. This framework, which is to support various phases of knowledge discovery, is evaluated with a set of relevant case studies.

We hope the collection of manuscripts in this special issue will make a significant contribution toward future developments in the high performance computing and simulation areas.


  1. Top of page

The guest editors of this special issue wish to express their sincere gratitude to all authors, the Reviewing Committee, and the C&C EIC, Prof. Geoffrey Fox and staff for their efforts in making this special issue possible. The Reviewing Committee members are:

Emmanuel Agullo (USA), Sadaf Alam (Switzerland), Hesham Ali (USA), David I. August (USA), Bruno Bachelet (France), Francoise Baude (France), Pascal Bouvry (Luxembourg), Radu Calinescu (UK), Charlie Catlett (USA), Ann Chervenak (USA), Antonio Cofino (Spain), Camille Coti (France), Laurent D'Orazio (France), Ahmet Duran (USA), Schahram Dustdar (Austria), Randy Eubank (USA), Joel Falcou (France), Paolo Faraboschi (Spain), Bernhard Fechner (Germany), Udo Hnig (Germany), Neil Chue Hong (UK), Zhiyi Huang (New Zealand), Eric Innocenti (France), Chris Jesshope (The Netherlands), A. Jimenez-Madrid (Spain), Hai Jin (China), Ben Juurlink (Germany), Harald Koestler (Germany), Harald Kosch (Germany), Dieter Kranzlmller (Germany), Erwin Laure (Sweden), Adrien Lebre (France), Peter Leong (Singapore), Sébastien Limet (France), Mariofanna Milanova (USA), Reagan Moore (USA), Lisandru Muzy (France), Antonio J. Nebro (Spain), Andy Pimentel (The Netherlands), Thierry Priol (France), Bruno Raffin (France), Romain Reuillon (France), Marinette Revenu (France), Erich Schikuta (Austria), Karolj Skala (Croatia), Domenico Talia (Italy), Osamu Tatebe (Japan), Mamadou Kaba Traoré (France), Ventzeslav Valev (Bulgaria), Lorenzo Verdoscia (Italy), Chien-Min Wang (Taiwan), Benedikt Wilbertz (France), Roman Wyrzykowski (Poland), Ramin Yahyapour (Germany), Chao-Tung Yang (Taiwan), Sangho Yi (France), Vesna Zeljkovic (Kingdom of Saudi Arabia) and Haibo Zhang (New Zealand).


  1. Top of page
  • 1
    Snir M, Gropp W, Kogge P. Exascale research: preparing for the post-Moore era. Technical Report, University of Illinois, June 2011. [Online]. (Available from:
  • 2
    Lawlor OS. Message passing for GPGPU clusters cudaMPI. Workshop on Parallel Programming on Accelerator Clusters (PPAC 2009), New Orleans, LA, USA, August 31, 2009; 18.
  • 3
    Lawlor OS. Embedding openCL in C++ for expressive GPU programming. First International Workshop on Domain-Specific Languages and High-Level Frameworks for High Performance Computing, (WOLFHPC 2011), ACM. As part of 25th International Conference on Supercomputing, (ICS 2011), Lowes Ventana Canyon Resort, Tucson, Arizona, USA, May 31, 2011; 18.
  • 4
    Nakasato N, Makino J. A compiler for high performance computing with many-core accelerators. Workshop on Parallel Programming on Accelerator Clusters (PPAC 2009), New Orleans, LA, USA, August 31, 2009; 19.
  • 5
    van de Geijn R. Designing a library to be multi-accelerator ready: a case study. Workshop on Parallel Programming on Accelerator Clusters (PPAC2011), Austin, TX, USA, September 26, 2011. Held in conjunction with IEEE Cluster 2011.
  • 6
    Heinecke A, Klemm M, Bungartz H. From GPGPU to many-core: NVIDIA Fermi and Intel Many Integrated Core Architecture. IEEE Computing in Science & Engineering 2012; 14(2):7883.
  • 7
    Kindratenko VV, Enos JJ, Shi G, Showerman MT, Arnold GW, Stone JE, Phillips JC, Hwu W. GPU clusters for high-performance computing. IEEE Workshop on Parallel Programming on Accelerator Clusters (PPAC 2009), New Orleans, LA, USA, August 31, 2009; 18.
  • 8
    Top500. (Available from: [8 September 2012].
  • 9
  • 10
    Wittenbrink CM, Kilgariff E, Prabhu A. FermiI GF100 GPU architecture. IEEE Micro March/April, 2011; 2: 5059.
  • 11
    Jarp S, Lazzaro A, Leduc J, Nowak A. Evaluation of the Intel Sandy Bridge-EP Server Processor. CERN openlab: Switzerland, March 2012. version 2.2. CERN openlab. (Available at:
  • 12
    Doerksen M, Solomon S, Thulasiraman P. Designing APU oriented scientific computing applications in openCL. IEEE 13th International Conference on High Performance Computing and Communications (HPCC), Banff, Canada, September 2-4, 2011; 587592.
  • 13
    Dubey PK. Manycore Computing and MIC. The 2011 OFA International Monterey Workshop: CA, USA, March 2011. OpenFabrics Alliance. (Available at:
  • 14
    Voran T, Garcia J, Tufo H. Evaluating Intel's many integrated core architecture for climate science. TACC-Intel Highly Parallel Computing Symposium, Austin, TX, USA, April 10th-11th, 2012. Also available at:
  • 15
    Evangelinos C, Hill CN. Cloud Computing for Parallel Scientific HPC Applications: Feasibility of Running Coupled Atmosphere-Ocean Climate Models on Amazon's EC2, ACM CCA-08. ACM: Chicago, IL, USA, October 22–23, 2008.
  • 16
    Iorio F, Snowdon JL. Leveraging cloud computing and high performance computing advances for next-generation architecture, urban design and construction projects. In Proceedings of the 2011 Symposium on Simulation for Architecture and Urban Design, in conjunction with the 2011 Spring Simulation Multi-conference (SpringSim ’11). Society for Computer Simulation International, San Diego, CA, USA: Boston, MA , USA, April 3–7, 2011; 118125.
  • 17
    Vecchiola C, Pandey S, Buyya R. High-performance cloud computing: a view of scientific applications. In The 10th International Symposium on Pervasive Systems, Algorithms, and Networks (ISPAN 2009). IEEE Computer Society: Kaohsiung, Taiwan, December 14-16, 2009; 416.
  • 18
    Lu W. Cloud computing through virtualization and HPC technologies. Proceedings of the 2010 HPC Advisory Council European Workshop, Hamburg, Germany, May 30, 2010.
  • 19
    Sluiter F. BiG Grid HPC Cloud. SARA Computing & Networking Services: The Netherlands, March 2011.
  • 20
    Tanasescu C. Cyclone: SGI Cloud Computing for HPC. SGI: California, USA, 2010. (Available at:…/SGICycloneIDCHPC.pdf).
  • 21
    Pai VS, Crago SP, Kang D-I, Kang M, Singh K, Suh J, Walters JP, Younge AJ. Virtualized cloud computing for exascale performance, Position Paper, Workshop on Exascale Operating Systems and Runtime Software (ExaOSR-2012), October 4–5: Washington, DC, USA, 2012.
  • 22
    Simon H, Zacharia T, Stevens R. Modeling and Simulation at the Exascale for Energy and the Environment, Report on the Advanced Scientific Computing Research, Town Hall Meetings on Simulation and Modeling at the Exascale for Energy, Ecological Sustainability and Global Security (E3), The U.S. Department of Energy (DOE) Office of Advanced Scientific Computing Research (OASCR). Department of Energy, USA: Washington, DC, USA, 2007.
  • 23
    Sexton JC. Modelling, simulation and analytics in the exascale era. The International Conference on High Performance Computing & Simulation (HPCS 2012), Madrid, Spain, July 2–July 6, 2012. page xxxiii. Keynote speech available at:
  • 24
    PNNL Productive Programming Models for Exascale, Workshop on productive programming models for exascale scientific modeling and simulation, and data analysis applications Portland, OR, August 14-15, 2012.
  • 25
    Walters JP, Crago SP, Pai VS, Singh K, Suh J, Younge AJ, Zick KM. Enabling resilience through introspection and virtualization, Position Paper, Workshop on Exascale Operating Systems and Runtime Software (ExaOSR-2012), October 4–5: Washington, DC, USA.
  • 26
    Dongarra J. Architecture-Aware Algorithms for Scalable Performance and Resilience on Heterogeneous Architectures, DOE Advanced Scientific Computing Advisory Committee Meeting. ORNL, Department of Energy, TN, USA, August 2010. Presentation is available at
  • 27
    The 2010 IEEE International Conference on High Performance Computing and Simulation (HPCS 2010). (Conference web site is at Also, see [21 September 2012].
  • 28
    Sebastiao N, Roma N, Flores P. Configurable and scalable class of high performance hardware accelerators for simultaneous DNA sequence alignment. Concurrency and Computation: Practice and Experience; 25(10):13191339.
  • 29
    Dasari NS, Desh R, Zubair M. High performance implementation of planted motif problem. Concurrency and Computation: Practice and Experience; 25(10):13401355.
  • 30
    Agarwal PK, Hampton S, Poznanovic J, Ramanthan A, Alam SR, Crozier PS. Performance modeling of microsecond scale biological molecular dynamics simulations on heterogeneous architectures. Concurrency and Computation: Practice and Experience; 25(10):13561375.
  • 31
    Vu V-T, Cats G, Wolters L. GPU optimizations for the dynamics of the HIRLAM weather forecast model. Concurrency and Computation: Practice and Experience; 25(10):13761393.
  • 32
    Do H-T, Limet S, Melin E. A scalable parallel minimum spanning tree algorithm for catchment basin delimitation in large digital elevation models. Concurrency and Computation: Practice and Experience; 25(10):13941409.
  • 33
    Touati S-A-A, Worms J, Briais S. The speedup-test: a statistical methodology for program speedup analysis and computation. Concurrency and Computation: Practice and Experience; 25(10):14101426.
  • 34
    Hill DRC, Mazel C, Passerat-Palmbach J. Distribution of random streams for simulation practitioners. Concurrency and Computation: Practice and Experience; 25(10):14271442.
  • 35
    Martino R, Marongiu A, Raghav S, Pinto C, Atienza D, Benini L. SIMinG-1k: a thousand-core simulator running on GPGPUs. Concurrency and Computation: Practice and Experience; 25(10):14431461.
  • 36
    Emeakaroha VC, Brandic I, Maurer M, Dustdar S. Cloud resource provisioning and SLA enforcement via LoM2HiS framework. Concurrency and Computation: Practice and Experience; 25(10):14621481.
  • 37
    Cesario E, Lackovic M, Talia 1 D, Trunfio P. Programming knowledge discovery workflows in service-oriented distributed systems. Concurrency and Computation: Practice and Experience; 25(10):14821504.
  • 38
    Smith TF, Waterman MS. Identification of common molecular subsequences. Journal of Molecular Biology 1981; 147(1):195197.