*Contribution No. 1252. from the Department of Entomology of The University of Kansas, Lawrence, Kansas, U.S.A. Expenses incurred during preparation of this paper were defrayed from grant G-21011 of the National Science Foundation, U.S.A. This investigation was supported in part by a Public Health Service research career program award (No. 3-K3-GM-z2, 021-SI) from the National Institute of General Medical Sciences to the author.
STATISTICAL METHODS IN SYSTEMATICS*
Article first published online: 21 JAN 2008
Volume 40, Issue 3, pages 337–389, August 1965
How to Cite
SOKAL, R. R. (1965), STATISTICAL METHODS IN SYSTEMATICS*. Biological Reviews, 40: 337–389. doi: 10.1111/j.1469-185X.1965.tb00806.x
- Issue published online: 21 JAN 2008
- Article first published online: 21 JAN 2008
- Received 6 December 1964
From the biological viewpoint the tasks of systematics may be subdivided into analyses of infraspecific variation (both intra- and interpopulation studies), the separation of genetic from environmental effects on the phenotype, the definition of species (and possibly subspecies), the definition of supraspecific taxa, the measurement of similarity among taxa, life-history stages or organs, the measurement of evolutionary rates, the evaluation of biogeographical relationships and information-handling problems.
In all types of statistical analysis in systematics the correct choice of characters to be measured is of great importance. The sources of error and bias in measurement are listed here and ways are discussed for overcoming their effects. Ratios, frequently employed in systematics, have certain numerical disadvantages. Frequency distributions of characters provide insight into biological processes affecting the characters and they test assumptions about the distributions implied in the various statistical analyses. Simple description of single samples is carried out by statistics of location and dispersion, such as the mean and the variance.
Interpopulation variation usually involves two-dimensional comparison of geographically differing populations, but other comparisons are discussed as well. The analysis of variance is generally applicable. Care must be taken in choosing an appropriate sampling distribution for testing significance of differences among means. The frequently employed e-tests and the so-called ‘Dice-grams’ are not suitable. Graphical methods for representing geographical variation are still in need of considerable improvement. The definition of subspecies and the so-called percentage rules are critically reviewed.
Covariation between characters is studied by means of correlation and regression techniques. In correlation, individuals are randomly chosen with respect to two variables, while regression deals with the dependence of one (randomly chosen) variable upon the other, whose values are assumed to be fixed. Regression can be used in systematics descriptively (to show the relations of one variable upon the other), correctively (to permit samples differing in an important causal variable to be compared for other variables), and as an explanatory device (to relate character variation to climatic or other ecological factors).
Multivariate analyses permit simultaneous statistical treatment of several characters. They are laborious but the ever-growing speed and capacity of digital computers will increase their application. Resolution of covariation is carried out by analyses of covariance, partial correlations and factor analysis. It aims at understanding the interrelationships of characters among themselves. Partitioning of covariation into components representing environmental and genetic forces of various kinds helps to understand evolutionary factors in a given study.
The methods of numerical taxonomy for quantifying similarities among taxa and describing taxonomic structure are briefly reviewed. Similar methods may aid in biogeographic problems and in the measurement of rates of evolution. The discrimination of groups representing different taxa or sexes can be accomplished by means of discriminant functions.
Profound changes in the cataloguing and data-handling aspects of systematics are resulting from the development of electronic data processing. Many statistical computations in systematics can also be appreciably accelerated by processing them on a computer.
This article was prepared during my tenure of a Fulbright professorship at Tel Aviv University in Israel. I am greatly indebted to Professor H. Mendelssohn, Chairman of the Zoology Department of that institution, for his cordial hospitality and encouragement. The following persons have read various drafts of this manuscript and their comments and criticisms are very much appreciated: Anthony J. Boyce (Oxford University), Paul R. Ehrlich, Larry G. Mason (Stanford University), K. Reuven Gabriel (Hebrew University, Jerusalem), and F. James Rohlf (University of California, Santa Barbara).