The controlled treatment outcome studies that examined the efficacy of EMDR in the treatment of posttraumatic stress disorder have yielded a range of results, with the efficacy of EMDR varying across studies. The current study sought to determine if differences in outcome were related to methodological differences. The research was reviewed to identify methodological strengths, weaknesses, and empirical findings. The relationships between effect size and methodology ratings were examined, using the Gold Standard (GS) Scale (adapted from Foa & Meadows, 1997). Results indicated a significant relationship between scores on the GS Scale and effect size, with more rigorous studies according to the GS Scale reporting larger effect sizes. There was also a significant correlation between effect size and treatment fidelity. Additional methodological components not detected by the GS Scale were identified, and suggestions were made for a Revised GS Scale. We conclude by noting that methodological rigor removes noise and thereby decreases error measurement, allowing for the more accurate detection of true treatment effects in EMDR studies. © 2002 John Wiley & Sons, Inc. J Clin Psychol 58: 23–41, 2002.