Risk-ranking protocols are used widely to classify the conservation status of the world's species. Here we report on the first empirical assessment of their reliability by using a retrospective study of 18 pairs of bird and mammal species (one species extinct and the other extant) with eight different assessors. The performance of individual assessors varied substantially, but performance was improved by incorporating uncertainty in parameter estimates and consensus among the assessors. When this was done, the ranks from the protocols were consistent with the extinction outcome in 70–80% of pairs and there were mismatches in only 10–20% of cases. This performance was similar to the subjective judgements of the assessors after they had estimated the range and population parameters required by the protocols, and better than any single parameter. When used to inform subjective judgement, the protocols therefore offer a means of reducing unpredictable biases that may be associated with expert input and have the advantage of making the logic behind assessments explicit. We conclude that the protocols are useful for forecasting extinctions, although they are prone to some errors that have implications for conservation. Some level of error is to be expected, however, given the influence of chance on extinction. The performance of risk assessment protocols may be improved by providing training in the application of the protocols, incorporating uncertainty in parameter estimates and using consensus among multiple assessors, including some who are experts in the application of the protocols. Continued testing and refinement of the protocols may help to provide better absolute estimates of risk, particularly by re-evaluating how the protocols accommodate missing data.