Here, we summarize the assessment of protein structure refinement in CASP8. Twenty-four groups refined a total of 12 target proteins. Averaging over all groups and all proteins, there was no net improvement over the original starting models. However, there are now some individual research groups who consistently do improve protein structures relative to a starting starting model. We compare various measures of quality assessment, including (i) standard backbone-based methods, (ii) new methods from the Richardson group, and (iii) ensemble-based methods for comparing experimental structures, such as NMR NOE violations and the suitability of the predicted models to serve as templates for molecular replacement. On the whole, there is a general correlation among various measures. However, there are interesting differences. Sometimes a structure that is in better agreement with the experimental data is judged to be slightly worse by GDT-TS. This suggests that for comparing protein structures that are already quite close to the native, it may be preferable to use ensemble-based experimentally derived measures of quality, in addition to single-structure-based methods such as GDT-TS. Proteins 2009. © 2009 Wiley-Liss, Inc.