Energy consumption has become a major design constraint in modern computing systems. With the advent of petaflops architectures, power-efficient software stacks have become imperative for scalability. Techniques such as dynamic voltage and frequency scaling (called DVFS) and CPU clock modulation (called throttling) are often used to reduce the power consumption of the compute nodes. To avoid significant performance losses, these techniques should be used judiciously during parallel application execution. For example, its communication phases may be good candidates to apply the DVFS and CPU throttling without incurring a considerable performance loss. They are often considered as indivisible operations although little attention is being devoted to the energy saving potential of their algorithmic steps. In this work, two important collective communication operations, all-to-all and allgather, are investigated as to their augmentation with energy saving strategies on the per-call basis. The experiments prove the viability of such a fine-grain approach. They also validate a theoretical power consumption estimate for multicore nodes proposed here. While keeping the performance loss low, the obtained energy savings were always significantly higher than those achieved when DVFS or throttling were switched on across the entire application run. Copyright © 2012 John Wiley & Sons, Ltd.