Parallel implementations of quantum chemistry programs targeting supercomputers are challenging applications of dynamic load balancing algorithms. The implementation of work stealing (WS) algorithms is discussed and their usefulness is demonstrated. Evaluation of the four-center integrals of a Cu10 cluster requires 25 core-hours overall, achieving 88% efficiency with simple WS for 2048 cores, and 97% with task presorting based on a cost estimate. Limitations of cost sorting become noticeable for larger systems. When spatial symmetry is exploited together with integral screening, bundling the original tasks yields an efficiency of 98% for Cu79 in Oh symmetry on 512, 1204, and 2048 cores. The advantage of WS algorithms described in this work is not limited to the evaluation of four-center integrals. © 2014 Wiley Periodicals, Inc.