Extending decoupled software pipeline to parallelize Java programs

Authors

  • André Loureiro,

    Corresponding author
    • Laboratório de Sistemas de Computação – Instituto de Computação – Universidade Estadual de Campinas, Av. Albert Einstein, 1251 – Cidade Universitária, Campinas, SP, Brazil
    Search for more papers by this author
  • João Paulo Porto,

    1. Laboratório de Sistemas de Computação – Instituto de Computação – Universidade Estadual de Campinas, Av. Albert Einstein, 1251 – Cidade Universitária, Campinas, SP, Brazil
    Search for more papers by this author
  • Guido Araujo

    1. Laboratório de Sistemas de Computação – Instituto de Computação – Universidade Estadual de Campinas, Av. Albert Einstein, 1251 – Cidade Universitária, Campinas, SP, Brazil
    Search for more papers by this author

Correspondence to: André Loureiro, Laboratório de Sistemas de Computação – Instituto de Computação – Universidade Estadual de Campinas, A.C. de André Oliveira Loureiro do Baixo, Av. Albert Einstein, 1251 – Cidade Universitária, Campinas, SP, Brazil, CEP 13083–852.

E- mail: andre.oliveira@lsc.ic.unicamp.br

SUMMARY

Programmers can no longer rely solely on micro-architectural and technology improvements to have their programs running faster. In today's multicore chips, parallel code needs to be explicitly written to extract any benefits from the extra available processing power. A recently proposed technique to parallelize general-purpose programs' loops at the binary level, called decoupled software pipeline (DSWP), has shown good performance numbers only under the assumption of a fast hardware intercore communication queue. In this paper, we propose Java-DSWP, a source-level DSWP-based parallelization technique that is much simpler than original DSWP and can be used to effectively parallelize Java applications. In addition, we propose and evaluate a software intercore communication scheme that enables code parallelized through Java-DSWP to be executed in commodity machines, thus not requiring a hardware intercore communication queue to be efficient, as DSWP does. We analyze three memory communication queue implementations and show experimental results that reveal an average 48% speedup on some SPCjvm2008 benchmarks. Copyright © 2012 John Wiley & Sons, Ltd.

Ancillary