This paper deals with nestedness measures that are based on pairwise comparisons of sites, evaluates their performance and suggests improvements and generalizations. There are several conceptual and technical criteria to judge their ecological applicability. It is of primary concern whether the measures 1) have a clear mathematical definition, 2) are influenced by the ordering of the data matrix, 3) incorporate similarity alone or similarity together with a dissimilarity component, 4) consider site pairs with identical species number negatively or positively, 5) show sensitivity to small changes in the data, and 6) are not vulnerable to type I and type II error rates. We performed a detailed comparison of the nestedness metric based on overlap and decreasing fill (NODF), the percentage relativized nestedness and the percentage relativized strict nestedness functions (PRN and PRSN, respectively), based on analytical results as well as on artificial and actual examples. We show that NODF is in fact the average Simpson similarity of sites with different species totals, and that its value depends on how the matrix is actually ordered. NODF is modified to always produce the maximum possible result (NODFmax), independently of the order of columns and rows. Being based on similarities, NODF and NODFmax overemphasize the overlap component of nestedness and underrate richness difference which is also an important constituent of nested pattern in meta-community data. This latter feature is reflected adequately by PRN and PRSN. However, PRSN is similar to NODF and NODFmax in sharing the disadvantages that 1) complete agreement and segregation in species composition are not distinguished, 2) a random matrix can have a higher value than truly nested patterns, and 3) they are ill-conditioned statistically. These problems are rooted mostly in that site pairs with tied totals affect the result negatively. We emphasize that PRN is free from these difficulties. PRN, PRSN, and NODFmax, together with mean Simpson similarity exhibit highly similar statistical performance: they are resistant to type I and type II errors for the less constrained null models, although there are subtle differences depending on matrix fill and algorithm of randomization. The most constrained null model, with all marginal totals fixed, makes all statistics more sensitive to type I errors, although vulnerability depends greatly on matrix fill.