Incoherent dedispersion is a computationally intensive problem that appears frequently in pulsar and transient astronomy. For current and future transient pipelines, dedispersion can dominate the total execution time, meaning its computational speed acts as a constraint on the quality and quantity of science results. It is thus critical that the algorithm be able to take advantage of trends in commodity computing hardware. With this goal in mind, we present an analysis of the ‘direct’, ‘tree’ and ‘sub-band’ dedispersion algorithms with respect to their potential for efficient execution on modern graphics processing units (GPUs). We find all three to be excellent candidates, and proceed to describe implementations in c for cuda using insight gained from the analysis. Using recent CPU and GPU hardware, the transition to the GPU provides a speed-up of nine times for the direct algorithm when compared to an optimized quad-core CPU code. For realistic recent survey parameters, these speeds are high enough that further optimization is unnecessary to achieve real-time processing. Where further speed-ups are desirable, we find that the tree and sub-band algorithms are able to provide three to seven times better performance at the cost of certain smearing, memory consumption and development time trade-offs. We finish with a discussion of the implications of these results for future transient surveys. Our GPU dedispersion code is publicly available as a c library at http://dedisp.googlecode.com/.