Programmable graphics processing units (GPUs) nowadays offer tremendous computational resources for diverse applications. In this paper, we present the implementation of the dynamics routine of the HIRLAM weather forecast model on the NVIDIA GTX 480. The original Fortran code has been converted manually to C and CUDA. Empirically, it is determined what the optimal number of grid points per thread is, and what the best thread and block structures are. A significant amount of the elapsed time consists of transferring data between CPU and GPU. To reduce the impact of these transfer costs, we overlap calculation and transfer of data using multiple CUDA streams. We developed an algorithm that enables our code generator CTADEL to generate automatically the optimal CUDA streams program. Experiments are performed to find out if the applicability of GPUs is useful for Numerical Weather Prediction, in particular for the dynamics part. Copyright © 2012 John Wiley & Sons, Ltd.