Get access

Generation of large finite-element matrices on multiple graphics processors

Authors

  • sss A. Dziekonski,

    Corresponding author
    • Wireless Communication Engineering (WiComm) Center of Excellence, Department of Microwave and Antenna Engineering, Faculty of Electronics, Telecommunications and Informatics, CUDA Research Center for Computational Electromagnetics at Gdansk University of Technology, Poland
    Search for more papers by this author
  • P. Sypek,

    1. Wireless Communication Engineering (WiComm) Center of Excellence, Department of Microwave and Antenna Engineering, Faculty of Electronics, Telecommunications and Informatics, CUDA Research Center for Computational Electromagnetics at Gdansk University of Technology, Poland
    Search for more papers by this author
  • A. Lamecki,

    1. Wireless Communication Engineering (WiComm) Center of Excellence, Department of Microwave and Antenna Engineering, Faculty of Electronics, Telecommunications and Informatics, CUDA Research Center for Computational Electromagnetics at Gdansk University of Technology, Poland
    Search for more papers by this author
  • M. Mrozowski

    1. Wireless Communication Engineering (WiComm) Center of Excellence, Department of Microwave and Antenna Engineering, Faculty of Electronics, Telecommunications and Informatics, CUDA Research Center for Computational Electromagnetics at Gdansk University of Technology, Poland
    Search for more papers by this author

Correspondence to: A. Dziekonski, Department of Microwave and Antenna Engineering, Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology, Gdansk 80-233, Poland.

E-mail: adziek@eti.pg.gda.pl

SUMMARY

This paper presents techniques for generating very large finite-element matrices on a multicore workstation equipped with several graphics processing units (GPUs). To overcome the low memory size limitation of the GPUs, and at the same time to accelerate the generation process, we propose to generate the large sparse linear systems arising in finite-element analysis in an iterative manner on several GPUs and to use the graphics accelerators concurrently with CPUs performing collection and addition of the matrix fragments using a fast multithreaded procedure. The scheduling of the threads is organized in such a way that the CPU operations do not affect the performance of the process, and the GPUs are idle only when data are being transferred from GPU to CPU. This approach is verified on two workstations: the first consists of two 6-core Intel Xeon X5690 processors with two Fermi GPUs: each GPU is a GeForce GTX 590 with two graphics processors and 1.5 GB of fast RAM; the second workstation is equipped with two Tesla C2075 boards carrying 6 GB of RAM each and two 12-core Opteron 6174s. For the latter setup, we demonstrate the fast generation of sparse finite-element matrices as large as 10 million unknowns, with over 1 billion nonzero entries.

Comparing with the single-threaded and multithreaded CPU implementations, the GPU-based version of the algorithm based on the ideas presented in this paper reduces the finite-element matrix-generation time in double precision by factors of 100 and 30, respectively. Copyright © 2012 John Wiley & Sons, Ltd.

Get access to the full text of this article

Ancillary