Evaluation of univariate quantitative diagnostic tests by strictly proper scoring rules is considered as an alternative to the traditional error rate measures. In principle, the posterior probability of disease as a function of the test value is estimated from training observations, and subsequently the score is assessed on a set of test samples. The same subjects may serve as training and test samples when the bootstrap procedure is applied for estimation of standard errors and correction of bias. The method is demonstrated using serum bile acids and bilirubin in patients with liver disease. The power for comparison of scores from two tests is compared with that from error rate measures for some typical situations.