Cladistic analysis of languages: Indo-European classification based on lexicostatistical data

Authors

  • Kateřina Rexová,

    Corresponding author
    1. Department of Zoology, Charles University, Viničná 7, CZ-128 44 Praha 2, Czech Republic
    2. Department of Philosophy and History of Sciences, Charles University, Viničná 7, CZ-128 44 Praha 2, Czech Republic
    Search for more papers by this author
  • Daniel Frynta,

    1. Department of Zoology, Charles University, Viničná 7, CZ-128 44 Praha 2, Czech Republic
    Search for more papers by this author
  • Jan Zrzavý

    1. Department of Zoology, Faculty of Biological Sciences, University of South Bohemia, Branišovská 31, CZ-370 05 České Budějovice, Czech Republic
    Search for more papers by this author

Corresponding author. E-mail address:kloskacka@centrum.cz (K.Rexov a a).

Abstract

The phylogeny of the Indo-European (IE) language family is reconstructed by application of the cladistic methodology to the lexicostatistical dataset collected by Dyen (about 200 meanings, 84 speech varieties, the Hittite language used as a functional outgroup). Three different methods of character coding provide trees that show: (a) the presence of four groups, viz., Balto-Slavonic clade, Romano-Germano-Celtic clade, Armenian-Greek group, and Indo-Iranian group (the two last groups possibly paraphyletic); (b) the unstable position of the Albanian language; (c) the unstable pattern of the basalmost IE differentiation; but (d) the probable existence of the Balto-Slavonic–Indo-Iranian (“satem”) and the Romano-Germano-Celtic (+Albanian?) superclades. The results are compared with the phenetic approach to lexicostatistical data, the results of which are significantly less informative concerning the basal pattern. The results suggest a predominantly branching pattern of the basic vocabulary phylogeny and little borrowing of individual words. Different scenarios of IE differentiation based on archaeological and genetic information are discussed.

Ancillary