1. Top of page
  2. Summary
  3. Introduction
  4. Material and Methods
  5. Results and Discussion
  6. Conclusions
  7. Acknowledgments
  8. References

The isonymy structure of trilingual Belgium was studied using the surname distributions for 1,118,004 private telephone users. The users were distributed in 77 Flemish, 76 French, and 3 German speaking towns, selected on a geographic basis to form an approximately regular grid over Belgium. Lasker's distance was found to be considerably higher between languages than within languages. For the whole of Belgium, irrespective of language, it was highly correlated with linear geographic distance, with r = 0.721±0.014, which is the highest correlation observed in European countries to date. Within Belgium and within languages, the correlation was highest among the Flemish (r = 0.878 ± 0.007), and lowest among the French (r = 0.631±0.020). Isolation by distance in Belgium is the highest we have found in Europe, and as high as in Switzerland where the different languages are separated by geographical barriers. This is not the case in Belgium, so that the considerable isolating power of languages emerges clearly from the present analysis. From the comparison of Lasker's distance between (9.48) and within (8.16) languages, and from its regression over geographic distance (b = 0.01206), it was possible to establish a quantitative relationship between the isolating power of languages and that of geographic distance as (9.48–8.16)/0.01206 = 109 kilometres. This transformation of language distance into an equivalent geographic distance, given here for Belgium, can be applied to any similar geo-linguistic situation.


  1. Top of page
  2. Summary
  3. Introduction
  4. Material and Methods
  5. Results and Discussion
  6. Conclusions
  7. Acknowledgments
  8. References

Genetics and Language

The homology between the evolution of languages and of human groups has been noted for several years. The work of Cavalli-Sforza and his collaborators opened up a new aspect in the genetic analysis of populations (Cavalli-Sforza et al. 1989; Piazza et al. 1995; Cavalli-Sforza, 1997). The general result from their work was that populations are likely to be genetically distinct, on the basis of several markers, when they speak different languages. Surnames are a special part of languages, and in many populations they may be considered neutral markers linked to the Y chromosome. As such, they represent the trait d'union between language and genetics.

In recent years we have studied several aspects of population structure which are derived from the distribution of surnames. We have observed that in countries where immigration has been relatively low in the past few centuries, surnames are strictly linked to local languages, and found that the difference between surnames in different countries in Europe is mainly linguistic (Barrai et al. 2000). Surnames originated as names of trades, of phenotypic traits, of places, and as patronymics. Although Ferrari, Herrera, Smith, and Schmidt are graphically and phonetically different, they refer to the same trade, and may indicate cultural affinity among groups at their origin. Cultural and religious affinity may also be indicated by patronymics, as in the case of Johnson, Jensen, Hansen, Jovanovic and Giovannini, just to quote a single surname from a number of possibilities. The presence of similar phenotypic variants in most groups is indicated by surnames like Bianchi, Blanco, White and Weiss, whereas toponyms like Campi, Campos, Dechamps, Vandevelde and Feldmann indicate provenience from a similar ecology.

In 1996 we initiated the study of differentiation of surnames within countries, and we were able to show that differentiation often results in significant isolation by distance within the same language, and is visible when large samples are available (Barrai et al. 1997, 1999, 2002; Rodriguez-Larralde et al. 1998a, 1998b, 2000). A special case, however, was the structure of the USA, where large differentiation among surnames is poorly related to geographical location due to recent massive immigration and population mobility, hence not resulting in the relevant isolation by distance (Barrai et al. 2001).

In Switzerland we studied the elements of population structure in the four areas where four different languages (German, French, Italian and Räto-Romanisch) are spoken, and found isolation both within and between languages, with isolation between being much larger than within (Rodriguez-Larralde et al. 1998a). However, in Switzerland languages are strongly isolated from each other by physical barriers. For example, the Alps separate Italian from French and German, and to a lesser extent from Räto-Romanisch. As a consequence, in Switzerland there is confounding between the isolating power of language and that of physical barriers. A comparison with a country with multiple languages and no physical barriers would give an indication of the relative importance of both isolating factors.

Therefore, we now turn our attention to Belgium, a second European country where three different languages are prevalent (Flemish, French, and German) but where no physical obstacles between languages seem to exist, at least in a way not comparable to the barriers which exist in the Swiss Confederation. We examine the surnames structure of this small kingdom as a function of the three languages which are spoken there.

We recall that the main quantity from which we derive our considerations about structure is isonymy, as defined by Crow & Mange (1965), following Darwin (1875) who observed that the frequency of isonymous marriages (between persons of the same surname) exceeded the value expected under panmixia, Σipi2, where pi is the frequency of the i-th surname in an isolate or group. Crow & Mange noted that the proportion of marriages isonymous by descent among all marriages with inbreeding coefficient F would be I = 4F if all sex combinations of intermediate ancestors of the spouses were equiprobable. This was applied to the estimation of FST in populations, so that FST= (1/4)Σipi2 will estimate the drift that has occurred up to the present, and from which a set of parameters defining structure can be derived. The method proposed by Crow & Mange was often used after 1965, but we do not attempt here to propose a list of references which would overload this short paper. We suggest for key references the work of Barrai et al. (2001) on isonymy in the USA.

Aim of the Present Work

Belgium, the country in the present study, is a legally bilingual state where Flemish (58%) and French (31%) are prevalent, with a sizeable German speaking minority (10%) (CIA Internet Site, 2002). The language structure is correlated with the geographical structure, Flemish being spoken in the North of the country, French mostly in the Centre/South, and German mainly in the East (Encyclopaedia Britannica, 1962; see also the site The genetic structure of Belgium was studied by Dodinval (1970) who showed, using a large blood group database, that kinship decreases exponentially over distance in the country, and then isolation by distance is a trait of the Belgian population structure. The evolution of local consanguinity from 1918 to 1959 was described by Twiesselmann et al. (1962) using Catholic marriage dispensations. We now want to test whether isolation by distance is perceptible in Belgium also through the use of the surname structures, and whether the linguistic structure of Belgium is reflected by the surname distributions. To this end, we describe the surname distribution by linguistic area, and test whether the geographic location and the linguistic properties of the population result in isolation by distance. In so doing, we further hope that our results, if any, will help to discriminate between the isolating power of languages and that of physical barriers, since in Belgium these are almost non-existent except, of course, for sheer distance. Finally, we attempt to establish a quantitative equivalence between the isolating power of languages and that of geographic distance.

Material and Methods

  1. Top of page
  2. Summary
  3. Introduction
  4. Material and Methods
  5. Results and Discussion
  6. Conclusions
  7. Acknowledgments
  8. References


The population of Belgium in 1998 was estimated to be 10.2 million distributed over 30,510 square kilometres of prevalently flat land, with low hills in the East (CIA Internet Site 2002). In the present study, we include 1.12 million persons who are private telephone users, and thus most likely are over 14 years of age; in Belgium there are about 8.4 million persons above that age (CIA, ibid.) so that we are using a 13.3 per cent sample of the adult Belgian population. In 2000, we obtained a copy of the Telephone and Address book of Belgium in a CD-ROM produced by a commercial firm. The CD-ROM contains the files of all 4.5 million users registered by the telephone companies in the country for 1999. In order to assess isonymy structure as a function of geography, we selected 156 towns in a roughly regular grid across Belgium (Figure 1). Since each record in the database indicates if the user is non-private, we directly downloaded the private users for each town; hence we downloaded 1,118,004 private users for the 156 towns. In 77 of the towns, the prevalent language is Flemish, in 76 French, and in 3 German. For the towns selected the average distance between nearest neighbours is 11.2 km with a standard deviation of 3.2 km. The prevalent language, the location, the sample size, the number of different surnames and isonymy parameters for the towns analysed are given in Table 1.


Figure 1. Location of the Belgian towns included in the analysis. The polygons are obtained by Voronoi tassellation. The identification codes are as in Table 1. The three clusters delimited by dotted lines running from North to South result from the dendrogram obtained using the matrix of Lasker's distances. The thickest line separates languages.

Download figure to PowerPoint

Table 1.  Name of town, horizontal co-ordinate, X, and vertical co-ordinate, Y, sample size, N, number of different surnames, S, Fisher's α, Karlin's v, random isonymy I, identification code, C, and prevalent languge, P, of 156 Belgian towns. Alphabetic ordering
De Haan10.255.403524232619370.35470.00051636L
De Panne2.650.603066198915860.34090.00063138L
Habay la Neuve56.712.908365544450.34740.00224753R
Knokke Heist14.557.5013315650619550.12800.00051272L
La Roche en Ardent55.225.505474125580.50500.00179276R
Middelkerke Bad6.453.004802296920220.29630.00049593L
Nieuwpoort Bad4.651.902669168912810.32430.000781101L
St. Hubert51.621.006013432320.27850.004310126R
St. Niklaas29.552.301686739965860.03360.001706127L
St. Sauveur20.039.506214636300.50360.001587129R
St. Vith64.828.5015846893260.17070.003067130G
Trois Ponts60.231.004343335890.57580.001698138R
Oost Vleteren5.145.907444233810.33870.002625144L
Belgium  111800413738046460.00420.000215  


Based on the surname distribution, the random isonymy between towns I and J (Iij) was estimated as Iijkpkipkj, where pki and pkj are the relative frequencies of surname k in towns I and J respectively; the sum is over all surnames. Unbiased random isonymy within towns (Iii), Alpha (α, the effective number of surnames) which is an index of surname heterogeneity (Fisher, 1943), and ν (Karlin & McGregor, 1967), which is an indicator of migration rate, were estimated according to Rodriguez-Larralde et al. (1994). At this point, we want to underline a possible cause of misunderstanding due to the use of the same symbol for different quantities. In isonymy, Fisher's α is used as a measure of surnames heterogeneity; it is the inverse of isonymy, α= 1/I = 1/4FST. In inbreeding theory the total level of consanguinity in a group is also called ‘Bernstein α’ and it is the same as the Wrightian symbol FIT. However, since in their work on consanguinity in Belgium Twiesselmann et al. (1962) use Bernstein α, and we refer to their work, it seems useful to state that, in isonymy,

  • image

inasmuch as FST is the main component of the total level of inbreeding (or, faute de mieux, the only component which can be estimated).

However, for an ordered review of the quantities which can be derived from the surname distributions see Relethford (1988). Note that Iij is twice Lasker's coefficient of relationship, Ri (Lasker, 1977). We note also that Iij is a function of the kinship ϕij between towns I and J, as showed by Rodriguez-Larralde et al. (1998a). In practice, we calculate the linear correlation of Lasker's distance between towns I and J with their geographic distance. Lasker's distance is defined as -log(Iij) (Rodriguez-Larralde et al. 1998a). The linear geographic distance between towns I and J was estimated using the co-ordinates of each town on a map with a scale of 1:400,000. The significance of correlations was assessed using a permutation method (Smouse et al. 1986). A dendrogram was constructed from the matrix of Lasker's distances with the neighbour joining method (Saitou & Nei, 1987), using the software PHYLIP (Felsenstein, 1989, 1993). Such a dendrogram, representing the surname relationships between different towns of Belgium, shows affinities but does not imply origins. We do not show it here but it is available from the corresponding author.

Results and Discussion

  1. Top of page
  2. Summary
  3. Introduction
  4. Material and Methods
  5. Results and Discussion
  6. Conclusions
  7. Acknowledgments
  8. References

Distribution of the Users

All of Belgium

The number of private users per town ranged in size from 48 in the tiny village of Champlon to 150,772 in the metropolis of Brussels (Table 1). The minimum number of different surnames was 45, again in Champlon, and the maximum number was 55,905 in Brussels. The number of different surnames for the whole sample was 137,380. The users were not distinguished by sex. The log-log frequency distribution of the occurrence of surnames (Fox & Lasker, 1983) is given in Figure 2a. The graph is fairly linear, as we have observed in other European countries. We do not perceive in Belgium the excess of surnames with intermediate occurrence which seems typical of Holland (Barrai et al. 2002). This is true also for the Flemish, the French, and the German speaking areas (Figure 2b). We fitted a straight line to the log-log frequency distribution for the whole of Belgium, and for the three different languages. The regression was −1.54±0.02 for Belgium, −1.68±0.04 for French, −1.50±0.03 for Flemish, and −1.97±0.07 for the German speaking towns. These values indicate that immigration is lowest in the Flemish speaking part of Belgium. For the German area, since we had only three towns we calculated the regression of Lasker's distance among the 3 towns and the 153 remaining towns and their geographic distance. Thus, the regression is not language independent, and our considerations for the German speaking area are only indicative.


Figure 2. Distribution of the occurrence of surnames in Belgium, log-log scale. 2a) All Belgium 2b) occurrences in the Flemish speaking area, in the French area, and in the prevalently German area.

Download figure to PowerPoint

Belgian surnames are somewhat different from those of other countries, in the sense that patronymics and locatives dominate, whereas trade names and physical traits are not the most frequent surnames. Several patronymics are among the 100 surnames which are most frequent in the sample. We list them in Table 2. These 100 surnames comprise 111,733 users. The list is the result of pooling the three languages, since in our study we include 791,099 Flemish, 320,714 French, and 6,191 German speaking users, and group Brussels with the Flemish-speaking towns.

Table 2.  The 100 most frequent Belgian surnames in a sample of 1,118,004 telephone users

Among these 100 most frequent surnames the first is Janssens, followed by Peeters, Desmet, Maes and Devos. After these come Jacobs, Mertens, Declercq, Vandamme and Vandevelde. Janssens (with its graphic variant Janssen), Peeters and Jacobs are patronymics, while Maes is a locative from the river Mose or Maas in Flemish. Vandamme (from the dam) and Vandevelde (from the field) are locatives. Declerc is French. In the 100 most frequent surnames, we counted 26 clearly French, and 19 French-Flemish crossovers like Desmet, Devos, Dejonghe, Dewitte, Dewilde and several others beginning with the prefix De-. The other 55 seem clearly Flemish, 10 among them beginning with the prefix Van. However, a clearer picture emerges when we study the surnames within linguistic areas, as given in Table 3. In the Table, we list only the 50 most frequent surnames.

Table 3.  The 50 most frequent Belgian surnames in the three areas speaking Flemish (791,094 users including Brussels), French (320,714) and German (6,191)
Flemish French German 
Flemish Speaking Belgium

In this area, as for the whole country, the most frequent surname is Janssens, followed by Peeters, Desmet and Maes. Then come Devos, Jacobs, Mertens and Vandamme. As we said, Desmet and Devos are French-Flemish crossovers, as are the other surnames beginning with the prefix DeSmet, and Smets indicate a locative from a wet terrain, and Vos is simply fox. Although the surnames prefixed with De- may indicate immigration of French speaking Belgians into the Flemish area, the indication they give is not completely clear, whereas it is clearly given by the presence of Declercq and its graphic variant Declerck in the 50 most frequent surnames of this area. Overall, we count only three clearly French surnames (Declecq, Declerck, Debruyne) in the 50 most frequent surnames in the Flemish speaking part of Belgium. We perceive this as an indication that migration from the South of the country to the Northern area is not intense.

French Speaking Belgium

In the French speaking area, which is the Walloon or Southern half of Belgium, the most frequent surnames are Dubois, Lambert, Leclercq (here the determinative article has stayed and was not substituted by the prefix De-), along with Martin, Dupont and Dumont, in this order. Gerard is seventh from the top, followed by Leroy, Simon and Renard (compare the frequency of Renard, which means fox, with the frequency of Devos, which means ‘of the fox’). There are no clearly Flemish surnames amongst the fifty most frequent in the French area: we find Devos at 43rd place, and Desmet at 28th place. There is no flag surname which can clearly indicate North-South migration, in the way that Declercq indicates South-North, so that also movement from the North to the South of Belgium seems minor.

German Speaking Belgium

In our sample, we considered only three towns where German (really, a German dialect) is the prevalent language. They are Büllingen, Eupen and Sankt Vith, all of them at the far eastern border of Belgium. There were only 6,191 users in the three towns. The most frequent surnames are Schmitz, Schroder, Muller, Schumacher and Stoffels. Then come Radermacher, Meyer, Theissen, Peters and Henkes. Schmitz is a variant of the family Schmidt. Schroder and Muller have lost the umlaut over the first vowel. Theissen, Theisen and Theiβen are common German names, and Theissen belongs to this family. The first and only French surname in the fifty most frequent of the German speaking area is Faymonville (a locative; there is a small town with that name in the area). Peters, Mertens and Niessen are frequent also in the Flemish speaking part of Belgium. Again, it is not obvious to ascertain migration either from the Flemish or the French section of the population into the German part. The French surname Faymonville is a case in point, since there are only 29 persons in our sample with that surname, of which 6 are found in the rest of Belgium and 23 in the German speaking area, so they do not represent recent immigration there.

Isonymy Parameters in Belgian Towns

In Table 1 we give the sample size, the number of surnames and the isonymy parameters, Karlin's ν, Fisher's α, and unbiased isonymy I in the 156 towns analysed. Identification codes and prevalent language are given in the last two columns of the Table. Identification is also possible with the co-ordinates in linear scale (1/400,000) which are given in the first two columns. Minimum α (109) was computed in Herbeumont in West Flanders, and maximum in Brussels (5989). These values can be compared with α for the whole country, which was 4646, and with α averaged over all towns, which was 997. The variations seem larger than those seen in the European countries where large samples were studied, indicating that there is considerable heterogeneity in the Belgian surname structure. As in Switzerland, this may be attributed to the presence of the three languages which have a different geographic distribution. Average α is highest in the French speaking area, (α= 1187), intermediate in the Flemish area (α= 830) and lowest in the relatively small section of German speaking Belgium (α= 450). Overall, the lowest values are typical of East Flanders, East Namur and of Luxembourg.

From Table 1, we can calculate the correlations of sample size and surname number with isonymy parameters (we showed these also in Rodriguez et al. 1994 and in Scapoli et al. 1997). The correlations are given in Table 4 and the thumbnail images of the scatter diagrams in Figure 3. Note the strict dependence of surname number and of the other parameters from the sample size for each town. The correlations are also calculated in logarithmic scale to compensate for the large variations in sample size. For example, in linear scale Brussels is 3000 times larger than Champlon, whereas in log scale the variation goes from 3.87 to 11.92.

Table 4.  Correlations of Isonymy parameters with demographic parameters. First row, linear scale, second row, log-log scale
  1. N = sample size; C = surname number; I = unbiased isonymy.

  2. α= Fisher's indicator of heterogeneity.

  3. ν= Karlin-McGregor indicator of migration.

C 0.635−0.314−0.319
α   0.037−0.601
ν    0.093

Figure 3. Scatter diagrams of the demographic and isonymic parameters. Note the decreasing predictivity of sample size on the number of surnames, on α, and Karlin-McGregor ν, in this order. Bilogarithmic scale.

Download figure to PowerPoint

Isolation by Language

To study the effect of languages on isolation, first we calculated the average Lasker's distance between adjacent towns along and across the boundary line which separates Flemish from French and German. We found that across this line the average distance and its standard error is 8.11±0.12, whereas the average distance between nearest neighbour towns in the whole country is 7.16 ± 0.07. The difference is highly significant (t = 6.84, P ≪ 0.001). Then, we studied the average Lasker's distance within and between languages, and obtained the results given in Table 5.

Table 5.  Average Lasker's distance between pairs of towns within and between languages in Belgium. 77 Flemish, 76 French, and 3 German speaking towns
LanguagesLasker's distancePairs of towns
Flemish-Flemish 8.21 ± 0.0142926
French-French 8.12 ± 0.0112850
German-German 6.91 ± 0.2913
Within languages 8.16 ± 0.0095779
French-Flemish 9.47 ± 0.0235852
French-German 9.37 ± 0.046228
Flemish-German10.10 ± 0.053231
Between languages 9.48 ± 0.0086311
Belgium 8.85 ± 0.00912090

It is apparent that Lasker's distance, irrespective of geographic distance, is considerably larger between as opposed to within languages (t = 109.620, P vanishing). The three German towns are significantly more uniform than the Flemish (t = 4.462, P ≪ 0.01) and the French speaking towns (t = 4.155, P ≪ 0.01). Within Flemish, average distance is approximately the same as within French. The distance between French and Flemish is similar to the distance between French and German, and the distance between Flemish and German is the highest.

Isolation by Distance

We then took geographical position into account, and studied isolation due to geographic distance among the 156 towns. The elements of the matrix of Lasker's distance and the corresponding elements of the geographic distances were strongly correlated. Their scatter diagram is given in Figure 4. The correlation is the highest (with Switzerland) observed to date for European countries, with r = 0.721±0.014. This means that more than 50% of the surname heterogeneity is due to distance, a result similar to that obtained for Italy and Switzerland, and well above that obtained for the neighbouring Netherlands. However, since Belgium is a relatively small country compared to the previous ones we have studied, and it presents relatively few physical obstacles to movement, it seems reasonable to attribute the isolation both to physical distance and to the different geographical distribution of the three languages.


Figure 4. Scatter diagram of Lasker's and linear geographic distances. Belgium, with Switzerland, is the European country where the correlation between isonymic distance and geographic distance is highest. a) Flemish (L) speaking area b) French (R) speaking area c) three German (G) speaking towns against the other 153 Belgian towns.

Download figure to PowerPoint

In order to assess the effect of geographic distance independent of language, we studied separately the 77 Flemish towns in the Northern area, and the 76 towns in the South or Walloon or French area. For the 3 towns where German is the prevalent language, the analysis (as we have already underlined) is not independent of language, since we studied the isolation of the 3 towns versus the remaining 153.

The correlation between Lasker's and geographic distance for the Flemish towns (Figure 4a) is the highest we have ever measured in the course of our studies, with r = 0.878±0.007. This means that the Flemish speaking population of Belgium tends to be sedentary, so that migration and movement inside this area are not large. In fact, taking the correlation at face value, this would mean that 77% of the surname variation is due to distance. In the French speaking area, r = 0.631±0.020 (Figure 4b) indicates that only 40% of the variation of surnames within the French language is due to distance, about half of the explained variation in the Flemish. Also the internal mobility of the French speaking population must be accordingly larger.

The correlation for the three German towns with all others, r = 0.805±0.022 (Figure 4c), is also large but includes both the effects of isolation due to language and due to distance.

The correlations of Lasker's distances with the log of linear distance give almost equal results. However, given the size of Belgium, plain linear distance seems more appropriate for this study. Considering again the work of Dodinval (1970) in Belgium, we underline that the results he obtained using large genetic databases are also obtained using large surname databases.

Relation Between the Isolating Power of Geographic Distance and the Isolating Power of Languages

The impact of linguistic boundaries on population differentiation was examined by Dupanloup et al. (2000) at the continental level. They measured the added genetic distance at the two sides of a boundary and proposed an estimator of the effect of the boundary. In our much smaller scale, we seem to be in a position to establish a quantitative equivalence between isolation due to languages and isolation due to geographic distance in Belgium.

We recall that, for any regression equation of the type

  • image(1)

one can obtain the length ΔX of the abscissa between two points Y1 and Y2 on the regression by

  • image(2)

This can be applied to any pair of values Y1 and Y2 belonging to the same variable. In this work, this quantity indicates the expected distance between the baricenters of the three different linguistic areas of Belgium if all residents speak the same language.

In practice, we calculated the regression of Lasker's distance over geographic distance, to obtain the rate of increase in linear scale. The regression of Lasker's distance, Z, over linear kilometres, W, is:

  • image

where the coefficient 0.01206 indicates increase of Lasker's distance per kilometre at any distance. The three quantities Y1, Y2 and b, as indicated in equation (2), are:

  • 1
    Average Lasker's distance between languages (Table 5), Y1= 9.48
  • 2
    Average Lasker's distance within languages (Table 5), Y2= 8.16
  • 3
    The coefficient of regression of the distance over kilometres in linear scale, b = 0.01206

The difference between-within languages is (9.48 − 8.16) = 1.32 Lasker's units, and the rate of increase of Lasker's distance per kilometre is 0.01206 units. Thus, the isolating effect of languages in Belgium is:

  • image

This means that the increase in surname variation due to language differences is equivalent to the increase that one would observe in Belgium at a distance of a little more than 100 kilometres. Applying this method to the towns on the two sides of the boundary line separating languages in Figure 1, we obtain:

  • image

which means that the difference between adjacent towns due to language only is equivalent to a physical distance of 79 kilometres.

This simple method could be applied to any geo-linguistic situation when linear geographic distances and similarity distances are available.

Other Indicators of Surname Similarity Among Belgian Towns

We constructed a dendrogram of the 156 towns based on the matrix of Lasker's distance. The dendrogram does not identify the main separation between languages, but identifies three main clusters which go across languages. The geographic location of the clusters is given in Figure 1 over the Voronoi tassellation of the towns (Voronoi, 1908; Byers, 1996). The provinces of East and West Flanders cluster with Hainaut (cluster I), and this is somewhat surprising because Hainaut belongs to the French speaking area. Then, the provinces of Antwerpen, Brabant and Limburg form a central cluster (II). The third (III) or Eastern cluster comprises Liege and Namur (French speaking) but also includes Luxembourg where German is prevalent. If we compare the clusters with Figure 5, where α is plotted in three dimensions over Belgium, we observe that the south-eastern cluster is distinctly more isonymous than the rest of the country, so that the level of inbreeding is expected to be higher there. One may wish to compare Figure 5 of the present paper with Figure 7 of Twiesselmann et al. (1962) paper, so that the correlation between inbreeding levels estimated by dispensations and by isonymy becomes apparent.


Figure 5. Stereogram of α in 156 Belgian Towns. Note the high values in Brabant, on the Coast of the North Sea and in Liege. East Flanders, East Namur and Luxembourg have the lowest values of α.

Download figure to PowerPoint


  1. Top of page
  2. Summary
  3. Introduction
  4. Material and Methods
  5. Results and Discussion
  6. Conclusions
  7. Acknowledgments
  8. References

The surname structure of Belgium seems to be the most heterogeneous among the comparable structures of other European countries. We attribute the heterogeneity to the presence of three languages in the nation, each having its own prevalent geographic distribution. Since in Belgium there are no geographic barriers, the isolating effect of language is not confounded by other factors.

An equivalent source of information (CD-ROM of telephone users), and the same methodology described in this paper, was used to analyse the isonymy structure of other European countries, and of Venezuela, where we used the surnames of electors, and of the USA (Barrai et al. 1996, 1997, 1999, 2000, 2001, 2002; Rodriguez-Larralde et al. 1998a,b, 2000). The average value of α over all cities (or States, in the case of Venezuela), and the isolation by distance measured by the correlation between isonymy and geographic distance, are given in Table 6 for the eight countries studied to date.

Table 6.  Comparison of isonymy parameters in the USA, Venezuela and six European Nations
CountrySample Size (Millions)Surnamesα (average)Isolation by distance (r)Lasker MinDistance Max

There are three main features emerging from the comparisons in Table 6. Firstly, the similarity of European Nations for abundance of surnames as measured by α. Secondly, always for European countries, the general presence of isolation by distance, although of differing intensity in the different countries. Thirdly, the relatively small value of α in Venezuela, which is of the order of 100–200 and has been discussed elsewhere (Rodriguez-Larralde et al. 2000), and the practical absence of isolation by distance in the USA, due to population mobility and recent immigration.

From the present analysis, we observe considerable difference between the surname structure of Belgium and the neighbouring Netherlands, where the only official language is Dutch. In the plain of the Netherlands, isolation by distance, although significant and relevant, is the lowest in European countries. In Belgium it is the highest observed by us to date. The isolation is even higher in the Flemish speaking area, possibly indicating remote settlement and scarce movement of the Flemish population in Belgium. The lower correlation in the French speaking area, on the other hand, is indicative of higher population mobility, possibly linked to differences in economic, social, and cultural factors, compatible with the differing rate of industrialisation between areas.

In conclusion, the similarity in the surname structure of Belgium and Switzerland, associated with the geographic heterogeneity of Switzerland and the homogeneity of Belgium, indicates that the isolating power of different languages is at least as large as that of physical barriers. In the present case it corresponds to the isolation generated by a linear geographic distance of about 100 km. The next question is whether the time taken for disappearance of the isolation, in modern populations, will be longer for languages or for physical barriers. Our results for the USA may indicate that, when a common language is adopted, isolation may disappear very rapidly.


  1. Top of page
  2. Summary
  3. Introduction
  4. Material and Methods
  5. Results and Discussion
  6. Conclusions
  7. Acknowledgments
  8. References

This work was supported by grants of the Italian Ministry of Universities and of Research and by Agreements CNR/FONACIT 2002-2004 Number 132.36.1 (Italy) and PI-200001829 (Venezuela).


  1. Top of page
  2. Summary
  3. Introduction
  4. Material and Methods
  5. Results and Discussion
  6. Conclusions
  7. Acknowledgments
  8. References
  • Barrai, I., Scapoli, C., Beretta, M., Nesti, C., Mamolini, E. & Rodriguez-Larralde, A. 1996 Isonymy and the genetic structure of Switzerland. I: The distributions of surnames. Ann Hum Biol 23, 431455.
  • Barrai, I., Scapoli, C., Beretta, M., Nesti, C., Mamolini, E. & Rodriguez-Larralde, A. 1997 Isolation by distance in Germany. Hum Genet 100, 684.
  • Barrai, I., Scapoli, C., Mamolini, E. & Rodriguez-Larralde, A. 1999 Isolation by distance in Italy. Hum Biol 71, 947962.
  • Barrai, I., Rodriguez-Larralde, A., Mamolini, E. & Scapoli, C. 2000 Elements of the surname structure of Austria. Ann Hum Biol 26, 115.
  • Barrai, I., Rodriguez-Larralde, A., Mamolini, E., Manni, F. & Scapoli, C. 2001 Elements of the surname structure of the USA. Am J Phys Anthr 114, 109123.
  • Barrai, I., Rodriguez-Larralde, A., Manni, F. & Scapoli, C. 2002 Isonymy and isolation by distance in the Netherlands. Hum Biol 74, 263281.
  • Byers, J.A. 1996 Correct calculation of Dirichelet polygon areas. J Anim Ecol 65, 528529.
  • Cavalli-Sforza, L.L., Piazza, A., Menozzi, P. & Mountain, J. 1989 Genetic and linguistic evolution. Science 244, 112829.
  • Cavalli-Sforza, L.L. 1997 Genes, peoples, and languages. Proc Natl Acad Sci USA 94, 77197724.
  • Central Intelligence Agency. 2002 at Internet Site
  • Crow, J.F. & Mange, A.P. 1965 Measurements of inbreeding from the frequency of marriages between persons of the same surnames. Eugen Quart 12, 199203.
  • Darwin, G.H. 1875 Marriages between first cousins in England and their effects. J Stat Soc 38, 153184.
  • Dodinval, P. 1970 Population structure of A, B, O, AB blood groups in Belgium. Hum Hered 20, 169177
  • Dupanloup, I., Schneider, S., Langaney, A. & Excoffier, L. 2000 Inferring the impact of linguistic boundaries on population differentiation: application to the Afro-Asiatic-Indo-European case. Eur J Hum Genet 8, 750756.
  • Encyclopaedia Britannica, William Benton Publisher. 1962 Vol 2, see Belgium .
  • Felsenstein, J. 1989 PHYLIP - Phylogeny Inference Package (Rel. 3.2). Cladistics 5, 164166.
  • Felsenstein, J. 1993 PHYLIP (Phylogeny Inference Package) Rel. 3.5c. Distributed by the author. Department of Genetics, University of Washington , Seattle .
  • Fisher, R.A. 1943 The relation between the number of species and the number of individuals in a random sample of animal population. J Anim Ecol 12, 4258.
  • Fox, W.R. & Lasker, G.W. 1983 The distribution of surname frequencies. Int Stat Rev 51, 8187.
  • Karlin, S. & McGregor, J. 1967 The number of mutant forms maintained in a population. Proceedings of the Fifth Berkeley Symposium on Mathematics, Statistics and Probability, 4, pp. 415438.
  • Lasker, G.W. 1977 A coefficient of relationship by isonymy: a method for estimating the genetic relationship between populations. Hum Biol 49, 489493.
  • Piazza, A., Rendine, S., Minch, E., Menozzi, P., Mountain, J. & Cavalli-Sforza, L.L. 1995 Genetics and the origin of European languages. Proc Nat Acad Sci USA 92, 58365840.
  • Relethford, J.H. 1988 Estimation of kinship and genetic distance from surnames. Hum Biol 60, 475492.
  • Rodriguez-Larralde, A., Pavesi, A., Scapoli, C., Conterio, F., Siri, G. & Barrai, I. 1994 Isonymy and the genetic structure of Sicily. J biosoc Sci 26, 924.
  • Rodriguez-Larralde, A., Scapoli, C., Beretta, M., Nesti, C., Mamolini, E. & Barrai, I. 1998a Isonymy and the genetic structure of Switzerland. II. Isolation by distance. Ann Hum Biol 25, 533540.
  • Rodriguez-Larralde, A., Barrai, I., Nesti, C., Mamolini, E. & Scapoli, C. 1998b Isonymy and isolation by distance in Germany. Hum Biol 70, 10411056.
  • Rodriguez-Larralde, A., Morales, J. & Barrai, I. 2000 Surname frequency and the isonymy structure of Venezuela. Am J Hum Biol 12, 352362.
  • Saitou, N. & Nei, M. 1987 The neighbour-joining method: a new method for reconstructing evolutionary trees. Mol Biol Evol 4, 406425.
  • Scapoli, C., Rodriguez-Larralde, A., Beretta, M., Nesti, C., Lucchetti, A. & Barrai, I. 1997 Correlations between isonymy parameters. Int J Anthropol 12, 1737.
  • Smouse, P.E., Long, J.C. & Sokal, R.R. 1986 Multiple regression and correlation extensions of the Mantel test of matrix correspondence. Syst Zool 35, 627632.
  • Twiesselmann, F., Moureau, P. & Francois, J. 1962 Evolution du taux de consanguinite' en Belgique de 1918 a 1959. Population 17, 241266.
  • Voronoi, M.G. 1908 Nouvelles applications des parametres continus a la theorie des formes quadratiques, deuxieme memoire, recherche sur les paralleloedres primitifs. J Reine Angewandte Math 134, 198207.