In this paper we discuss our approach to the MPI/GPU implementation of an Interior Penalty Discontinuous Galerkin Time domain (IPDGTD) method to solve the time dependent Maxwell's equations. In our approach, we exploit the inherent DGTD parallelism and describe a combined MPI/GPU and local time stepping implementation. This combination is aimed at increasing efficiency and reducing computational time, especially for multiscale applications. The CUDA programming model was used, together with non-blocking MPI calls to overlap communications across the network. A 10× speedup compared to CPU clusters is observed for double precision arithmetic. Finally, for p = 1 basis functions, a good scalability with parallelization efficiency of 85% for up to 40 GPUs and 80% for up to 160 CPU cores was achieved on the Ohio Supercomputer Center's Glenn cluster.