The evolution of processors into multi-core architectures has led to the acceleration of scientific codes using numerous highly specialized processors, that is, multi-core central processing units (CPUs), graphics processing units (GPUs) and also devices that merge both technologies in a single-die chip. Development of parallel codes that are both scalable and portable between the processor architectures is challenging. To overcome this limitation, we investigated the acceleration of the finite-difference time-domain (FDTD) method in computational electromagnetics on modern computing architectures, that is, multi-core CPUs and GPUs, through the use of Open Computing Language (OpenCL). Further extension of the OpenCL parallel programing model with the Message Passing Interface allows for the targeting of standard distributed memory computer clusters as well as clusters accelerated by GPUs. Portability between hardware manufactured by different vendors and highly specialized and parallel computing architectures is the main advantage of the developed FDTD solvers. The codes were coupled with a commercial simulation platform to evaluate the performance of the solvers in real-world industrial scenarios. Although the portability resulted in a slightly reduced performance (10–35%) of the OpenCL-accelerated FDTD simulations compared with the native Compute Unified Device Architecture or Open Multiprocessing implementations, the obtained benchmarking results of the OpenCL FDTD solvers on distributed memory systems show that the communication overhead can be hidden by computations for sufficiently large simulation domains with a scaling efficiency higher than 90%. Copyright © 2012 John Wiley & Sons, Ltd.