As the number of completed CASP (Critical Assessment of Protein Structure Prediction) experiments grows, so does the need for stable, standard methods for comparing performance in successive experiments. It is critical to develop methods for determining the areas in which there is progress and in which areas are static. We have added an analysis of the CASP4 results to that previously published for CASPs 1, 2, and 3. We again use a unified difficulty scale to permit comparison of performance as a function of target difficulty in the different CASPs. The scale is used to compare performance in aligning target sequences to a structural template. There was a clear improvement in alignment quality between CASP1 (1994) and CASP2 (1996). No change is apparent between CASP2 and CASP3 (1998). There is a small barely detectable improvement between CASP3 and the latest experiment (CASP4, 2000). Alignment remains the major source of error in all models based on less than about 30% sequence identity. Comparison of performance in the new fold modeling regime is complicated by issues in devising an objective target difficulty scale. We have found limited numerical support for significant progress between CASP3 and CASP4 in this area. More subjectively, most observers are convinced that there has been substantial progress. Progress is dominated by a single group. Proteins 2001;Suppl 5:163–170. © 2002 Wiley-Liss, Inc.