COEFFICIENTS OF ASSOCIATION AND SIMILARITY, BASED ON BINARY (PRESENCE-ABSENCE) DATA: AN EVALUATION
Article first published online: 21 JAN 2008
Volume 57, Issue 4, pages 669–689, November 1982
How to Cite
HUBÁLEK, Z. (1982), COEFFICIENTS OF ASSOCIATION AND SIMILARITY, BASED ON BINARY (PRESENCE-ABSENCE) DATA: AN EVALUATION. Biological Reviews, 57: 669–689. doi: 10.1111/j.1469-185X.1982.tb00376.x
- Issue published online: 21 JAN 2008
- Article first published online: 21 JAN 2008
- Received 29 September 1981
Forty-three association (similarity) coefficients were collected and evaluated in this survey. Some of them are synonyms or direct correlates with earlier described indices (A8, A9, A12, A31, A33), others are mere transforms from one range of values to another (A10, A24, A33). Several coefficients are incompatible with suggested admissibility conditions of the minimum-maximum value (A13, A16, A27, A28, A29, A31), symmetry (A1, A2, A13, A16, A26), discrimination between positive and negative association (A27, A28, A31) or monotonicity with (χ2) (A19, to A24); A17 yields very low and erratic values.
As a result, 23 coefficients were excluded and the remaining 20 measures were subjected to an empirical trial on interspecific association data among fungi of the genus Chaetomium, with the use of a cluster analysis. The classification produced five main clusters of related coefficients, with several subgroups. It was then demonstrated that representative indices from different clusters yield different dendrograms of interspecific association among Chaetomium, and A34, A14, possibly also A36 and A40 seemed to be less sensible. A set of measures that generally work well (at least in the interspecific association) comprises A4 (Jaccard), A4 (Dice-Sφrensen), A7 (Kulczyński), A11 (Driver-Kroeber-Ochiai) and, with some reservation A30 (Pearson tetrachoric) and A32 (Baroni-Urbani-Buser). For some purposes, however, other ‘admissible’ coefficients would be more optimal, and the choice of a measure should be related to the nature of the data. It is tentatively suggested that three or so alternative coefficients be used and the results compared on the same data basis; moreover, significance tests on association should be carried out whenever possible.