Inferring causation from non-randomized studies of exposure requires that exposure groups can be balanced with respect to prognostic factors for the outcome. Although there is broad agreement in the literature that balance should be checked, there is confusion regarding the appropriate metric. We present a simulation study that compares several balance metrics with respect to the strength of their association with bias in estimation of the effect of a binary exposure on a binary, count, or continuous outcome. The simulations utilize matching on the propensity score with successively decreasing calipers to produce datasets with varying covariate balance. We propose the post-matching C-statistic as a balance metric and found that it had consistently strong associations with estimation bias, even when the propensity score model was misspecified, as long as the propensity score was estimated with sufficient study size. This metric, along with the average standardized difference and the general weighted difference, outperformed all other metrics considered in association with bias, including the unstandardized absolute difference, Kolmogorov–Smirnov and Lévy distances, overlapping coefficient, Mahalanobis balance, and L1 metrics. Of the best-performing metrics, the C-statistic and general weighted difference also have the advantage that they automatically evaluate balance on all covariates simultaneously and can easily incorporate balance on interactions among covariates. Therefore, when combined with the usual practice of comparing individual covariate means and standard deviations across exposure groups, these metrics may provide useful summaries of the observed covariate imbalance. Copyright © 2013 John Wiley & Sons, Ltd.