Get access
Advertisement

A segment-based sparse matrix–vector multiplication on CUDA

Authors

  • Xiaowen Feng,

    1. Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
    Search for more papers by this author
  • Hai Jin,

    Corresponding author
    1. Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
    • Correspondence to: Hai Jin, Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, 430074, China.

      E-mail: hjin@hust.edu.cn

    Search for more papers by this author
  • Ran Zheng,

    1. Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
    Search for more papers by this author
  • Zhiyuan Shao,

    1. Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
    Search for more papers by this author
  • Lei Zhu

    1. Services Computing Technology and System Lab, Cluster and Grid Computing Lab, School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan, China
    Search for more papers by this author

SUMMARY

The challenge for Sparse Matrix–Vector multiplication (SpMV) performance is memory bandwidth, which mostly depends on input matrices and underlying computing platforms. To solve this challenge, many researchers have explored a variety of optimization techniques. One of the most promising aspects focuses on designing storage formats to represent sparse matrices. However, lots of prior storage formats cannot fully take advantage of the underlying computing platforms, resulting in unsatisfactory performance and large memory footprint. Therefore, a novel storage format, called Segmented Hybrid ELL + Compressed Sparse Row (CSR) (SHEC for short), is proposed to further improve the throughput and lessen memory footprint on Graphics Processing Unit (GPU). SHEC format employs an interleaved combination pattern, which combines certain amount of compressed rows to form a new SHEC row. Segmentation is brought in to balance load and reduce memory footprint. According to the empirical data, an automatic SHEC-based SpMV is developed to fit for all the matrices. Experimental results show that SHEC approach outperforms the best results of NVIDIA SpMV library and exhibits a comparable performance with state-of-the-art storage formats on the standard dataset. Copyright © 2012 John Wiley & Sons, Ltd.

Get access to the full text of this article

Ancillary