SU-E-T-531: Performance Evaluation of Multithreaded Geant4 for Proton Therapy Dose Calculations in a High Performance Computing Facility

Authors


Abstract

Purpose:

To evaluate the efficiency of multithreaded Geant4 (Geant4-MT, version 10.0) for proton Monte Carlo dose calculations using a high performance computing facility.

Methods:

Geant4-MT was used to calculate 3D dose distributions in 1×1×1 mm3 voxels in a water phantom and patient's head with a 150 MeV proton beam covering approximately 5×5 cm2 in the water phantom. Three timestamps were measured on the fly to separately analyze the required time for initialization (which cannot be parallelized), processing time of individual threads, and completion time. Scalability of averaged processing time per thread was calculated as a function of thread number (1, 100, 150, and 200) for both 1M and 50 M histories. The total memory usage was recorded.

Results:

Simulations with 50 M histories were fastest with 100 threads, taking approximately 1.3 hours and 6 hours for the water phantom and the CT data, respectively with better than 1.0 % statistical uncertainty. The calculations show 1/N scalability in the event loops for both cases. The gains from parallel calculations started to decrease with 150 threads. The memory usage increases linearly with number of threads. No critical failures were observed during the simulations.

Conclusion:

Multithreading in Geant4-MT decreased simulation time in proton dose distribution calculations by a factor of 64 and 54 at a near optimal 100 threads for water phantom and patient's data respectively. Further simulations will be done to determine the efficiency at the optimal thread number. Considering the trend of computer architecture development, utilizing Geant4-MT for radiotherapy simulations is an excellent cost-effective alternative for a distributed batch queuing system. However, because the scalability depends highly on simulation details, i.e., the ratio of the processing time of one event versus waiting time to access for the shared event queue, a performance evaluation as described is recommended.

Ancillary