Genetics and Language
The homology between the evolution of languages and of human groups has been noted for several years. The work of Cavalli-Sforza and his collaborators opened up a new aspect in the genetic analysis of populations (Cavalli-Sforza et al. 1989; Piazza et al. 1995; Cavalli-Sforza, 1997). The general result from their work was that populations are likely to be genetically distinct, on the basis of several markers, when they speak different languages. Surnames are a special part of languages, and in many populations they may be considered neutral markers linked to the Y chromosome. As such, they represent the trait d'union between language and genetics.
In recent years we have studied several aspects of population structure which are derived from the distribution of surnames. We have observed that in countries where immigration has been relatively low in the past few centuries, surnames are strictly linked to local languages, and found that the difference between surnames in different countries in Europe is mainly linguistic (Barrai et al. 2000). Surnames originated as names of trades, of phenotypic traits, of places, and as patronymics. Although Ferrari, Herrera, Smith, and Schmidt are graphically and phonetically different, they refer to the same trade, and may indicate cultural affinity among groups at their origin. Cultural and religious affinity may also be indicated by patronymics, as in the case of Johnson, Jensen, Hansen, Jovanovic and Giovannini, just to quote a single surname from a number of possibilities. The presence of similar phenotypic variants in most groups is indicated by surnames like Bianchi, Blanco, White and Weiss, whereas toponyms like Campi, Campos, Dechamps, Vandevelde and Feldmann indicate provenience from a similar ecology.
In 1996 we initiated the study of differentiation of surnames within countries, and we were able to show that differentiation often results in significant isolation by distance within the same language, and is visible when large samples are available (Barrai et al. 1997, 1999, 2002; Rodriguez-Larralde et al. 1998a, 1998b, 2000). A special case, however, was the structure of the USA, where large differentiation among surnames is poorly related to geographical location due to recent massive immigration and population mobility, hence not resulting in the relevant isolation by distance (Barrai et al. 2001).
In Switzerland we studied the elements of population structure in the four areas where four different languages (German, French, Italian and Räto-Romanisch) are spoken, and found isolation both within and between languages, with isolation between being much larger than within (Rodriguez-Larralde et al. 1998a). However, in Switzerland languages are strongly isolated from each other by physical barriers. For example, the Alps separate Italian from French and German, and to a lesser extent from Räto-Romanisch. As a consequence, in Switzerland there is confounding between the isolating power of language and that of physical barriers. A comparison with a country with multiple languages and no physical barriers would give an indication of the relative importance of both isolating factors.
Therefore, we now turn our attention to Belgium, a second European country where three different languages are prevalent (Flemish, French, and German) but where no physical obstacles between languages seem to exist, at least in a way not comparable to the barriers which exist in the Swiss Confederation. We examine the surnames structure of this small kingdom as a function of the three languages which are spoken there.
We recall that the main quantity from which we derive our considerations about structure is isonymy, as defined by Crow & Mange (1965), following Darwin (1875) who observed that the frequency of isonymous marriages (between persons of the same surname) exceeded the value expected under panmixia, Σipi2, where pi is the frequency of the i-th surname in an isolate or group. Crow & Mange noted that the proportion of marriages isonymous by descent among all marriages with inbreeding coefficient F would be I = 4F if all sex combinations of intermediate ancestors of the spouses were equiprobable. This was applied to the estimation of FST in populations, so that FST= (1/4)Σipi2 will estimate the drift that has occurred up to the present, and from which a set of parameters defining structure can be derived. The method proposed by Crow & Mange was often used after 1965, but we do not attempt here to propose a list of references which would overload this short paper. We suggest for key references the work of Barrai et al. (2001) on isonymy in the USA.
Aim of the Present Work
Belgium, the country in the present study, is a legally bilingual state where Flemish (58%) and French (31%) are prevalent, with a sizeable German speaking minority (10%) (CIA Internet Site, 2002). The language structure is correlated with the geographical structure, Flemish being spoken in the North of the country, French mostly in the Centre/South, and German mainly in the East (Encyclopaedia Britannica, 1962; see also the site http://www.euro-support.be/langbel/langbel.htm). The genetic structure of Belgium was studied by Dodinval (1970) who showed, using a large blood group database, that kinship decreases exponentially over distance in the country, and then isolation by distance is a trait of the Belgian population structure. The evolution of local consanguinity from 1918 to 1959 was described by Twiesselmann et al. (1962) using Catholic marriage dispensations. We now want to test whether isolation by distance is perceptible in Belgium also through the use of the surname structures, and whether the linguistic structure of Belgium is reflected by the surname distributions. To this end, we describe the surname distribution by linguistic area, and test whether the geographic location and the linguistic properties of the population result in isolation by distance. In so doing, we further hope that our results, if any, will help to discriminate between the isolating power of languages and that of physical barriers, since in Belgium these are almost non-existent except, of course, for sheer distance. Finally, we attempt to establish a quantitative equivalence between the isolating power of languages and that of geographic distance.