Docking algorithms predict the structure of protein–protein interactions. They sample the orientation of two unbound proteins to produce various predictions about their interactions, followed by a scoring step to rank the predictions. We present a statistical assessment of scoring functions used to rank near-native orientations, applying our statistical analysis to a benchmark dataset of decoys of protein–protein complexes and assessing the statistical significance of the outcome in the Critical Assessment of PRedicted Interactions (CAPRI) scoring experiment. A P value was assigned that depended on the number of near-native structures in the sampling. We studied the effect of filtering out redundant structures and tested the use of pair-potentials derived using ZDock and ZRank. Our results show that for many targets, it is not possible to determine when a successful reranking performed by scoring functions results merely from random choice. This analysis reveals that changes should be made in the design of the CAPRI scoring experiment. We propose including the statistical assessment in this experiment either at the preprocessing or the evaluation step. Proteins 2010. © 2010 Wiley-Liss, Inc.