In this paper we use simulations to compare the performance of new goodness-of-fit tests based on weighted statistical processes to three currently available tests: the Hosmer–Lemeshow decile-of-risk test; the Pearson chi-square, and the unweighted sum-of-squares tests. The simulations demonstrate that all tests have the correct size. The power for all tests to detect lack-of-fit due to an omitted quadratic term with a sample of size 100 is close to or exceeds 50 per cent to detect moderate departures from linearity and is over 90 per cent for these same alternatives for sample size 500. All tests have low power with sample size 100 to detect lack-of-fit due to an omitted interaction between a dichotomous and continuous covariate, while the power exceeds 80 per cent to detect extreme interaction with a sample size of 500. The power is low to detect any alternative link function with sample size 100 and for most alternative links for sample size 500. Only in the case of sample size 500 and an extremely asymmetric link function is the power over 80 per cent. The results from these simulations show that no single test, new or current, performs best in detecting lack-of-fit due to an omitted covariate or incorrect link function. However, one of the new weighted tests has power comparable to other tests in all settings simulated and had the highest power in the difficult case of an omitted interaction term. We illustrate the tests within the context of a model for factors associated with abstinence from drug use in a randomized trial of residential treatment programmes. We conclude the paper with a summary and specific recommendations for practice. Copyright © 2002 John Wiley & Sons, Ltd.