SEARCH

SEARCH BY CITATION

Summary

  1. Top of page
  2. Summary
  3. Introduction
  4. Material and Methods
  5. Results and Discussion
  6. Conclusions
  7. Acknowledgments
  8. References

The isonymy structure of trilingual Belgium was studied using the surname distributions for 1,118,004 private telephone users. The users were distributed in 77 Flemish, 76 French, and 3 German speaking towns, selected on a geographic basis to form an approximately regular grid over Belgium. Lasker's distance was found to be considerably higher between languages than within languages. For the whole of Belgium, irrespective of language, it was highly correlated with linear geographic distance, with r = 0.721±0.014, which is the highest correlation observed in European countries to date. Within Belgium and within languages, the correlation was highest among the Flemish (r = 0.878 ± 0.007), and lowest among the French (r = 0.631±0.020). Isolation by distance in Belgium is the highest we have found in Europe, and as high as in Switzerland where the different languages are separated by geographical barriers. This is not the case in Belgium, so that the considerable isolating power of languages emerges clearly from the present analysis. From the comparison of Lasker's distance between (9.48) and within (8.16) languages, and from its regression over geographic distance (b = 0.01206), it was possible to establish a quantitative relationship between the isolating power of languages and that of geographic distance as (9.48–8.16)/0.01206 = 109 kilometres. This transformation of language distance into an equivalent geographic distance, given here for Belgium, can be applied to any similar geo-linguistic situation.


Introduction

  1. Top of page
  2. Summary
  3. Introduction
  4. Material and Methods
  5. Results and Discussion
  6. Conclusions
  7. Acknowledgments
  8. References

Genetics and Language

The homology between the evolution of languages and of human groups has been noted for several years. The work of Cavalli-Sforza and his collaborators opened up a new aspect in the genetic analysis of populations (Cavalli-Sforza et al. 1989; Piazza et al. 1995; Cavalli-Sforza, 1997). The general result from their work was that populations are likely to be genetically distinct, on the basis of several markers, when they speak different languages. Surnames are a special part of languages, and in many populations they may be considered neutral markers linked to the Y chromosome. As such, they represent the trait d'union between language and genetics.

In recent years we have studied several aspects of population structure which are derived from the distribution of surnames. We have observed that in countries where immigration has been relatively low in the past few centuries, surnames are strictly linked to local languages, and found that the difference between surnames in different countries in Europe is mainly linguistic (Barrai et al. 2000). Surnames originated as names of trades, of phenotypic traits, of places, and as patronymics. Although Ferrari, Herrera, Smith, and Schmidt are graphically and phonetically different, they refer to the same trade, and may indicate cultural affinity among groups at their origin. Cultural and religious affinity may also be indicated by patronymics, as in the case of Johnson, Jensen, Hansen, Jovanovic and Giovannini, just to quote a single surname from a number of possibilities. The presence of similar phenotypic variants in most groups is indicated by surnames like Bianchi, Blanco, White and Weiss, whereas toponyms like Campi, Campos, Dechamps, Vandevelde and Feldmann indicate provenience from a similar ecology.

In 1996 we initiated the study of differentiation of surnames within countries, and we were able to show that differentiation often results in significant isolation by distance within the same language, and is visible when large samples are available (Barrai et al. 1997, 1999, 2002; Rodriguez-Larralde et al. 1998a, 1998b, 2000). A special case, however, was the structure of the USA, where large differentiation among surnames is poorly related to geographical location due to recent massive immigration and population mobility, hence not resulting in the relevant isolation by distance (Barrai et al. 2001).

In Switzerland we studied the elements of population structure in the four areas where four different languages (German, French, Italian and Räto-Romanisch) are spoken, and found isolation both within and between languages, with isolation between being much larger than within (Rodriguez-Larralde et al. 1998a). However, in Switzerland languages are strongly isolated from each other by physical barriers. For example, the Alps separate Italian from French and German, and to a lesser extent from Räto-Romanisch. As a consequence, in Switzerland there is confounding between the isolating power of language and that of physical barriers. A comparison with a country with multiple languages and no physical barriers would give an indication of the relative importance of both isolating factors.

Therefore, we now turn our attention to Belgium, a second European country where three different languages are prevalent (Flemish, French, and German) but where no physical obstacles between languages seem to exist, at least in a way not comparable to the barriers which exist in the Swiss Confederation. We examine the surnames structure of this small kingdom as a function of the three languages which are spoken there.

We recall that the main quantity from which we derive our considerations about structure is isonymy, as defined by Crow & Mange (1965), following Darwin (1875) who observed that the frequency of isonymous marriages (between persons of the same surname) exceeded the value expected under panmixia, Σipi2, where pi is the frequency of the i-th surname in an isolate or group. Crow & Mange noted that the proportion of marriages isonymous by descent among all marriages with inbreeding coefficient F would be I = 4F if all sex combinations of intermediate ancestors of the spouses were equiprobable. This was applied to the estimation of FST in populations, so that FST= (1/4)Σipi2 will estimate the drift that has occurred up to the present, and from which a set of parameters defining structure can be derived. The method proposed by Crow & Mange was often used after 1965, but we do not attempt here to propose a list of references which would overload this short paper. We suggest for key references the work of Barrai et al. (2001) on isonymy in the USA.

Aim of the Present Work

Belgium, the country in the present study, is a legally bilingual state where Flemish (58%) and French (31%) are prevalent, with a sizeable German speaking minority (10%) (CIA Internet Site, 2002). The language structure is correlated with the geographical structure, Flemish being spoken in the North of the country, French mostly in the Centre/South, and German mainly in the East (Encyclopaedia Britannica, 1962; see also the site http://www.euro-support.be/langbel/langbel.htm). The genetic structure of Belgium was studied by Dodinval (1970) who showed, using a large blood group database, that kinship decreases exponentially over distance in the country, and then isolation by distance is a trait of the Belgian population structure. The evolution of local consanguinity from 1918 to 1959 was described by Twiesselmann et al. (1962) using Catholic marriage dispensations. We now want to test whether isolation by distance is perceptible in Belgium also through the use of the surname structures, and whether the linguistic structure of Belgium is reflected by the surname distributions. To this end, we describe the surname distribution by linguistic area, and test whether the geographic location and the linguistic properties of the population result in isolation by distance. In so doing, we further hope that our results, if any, will help to discriminate between the isolating power of languages and that of physical barriers, since in Belgium these are almost non-existent except, of course, for sheer distance. Finally, we attempt to establish a quantitative equivalence between the isolating power of languages and that of geographic distance.

Material and Methods

  1. Top of page
  2. Summary
  3. Introduction
  4. Material and Methods
  5. Results and Discussion
  6. Conclusions
  7. Acknowledgments
  8. References

Surnames

The population of Belgium in 1998 was estimated to be 10.2 million distributed over 30,510 square kilometres of prevalently flat land, with low hills in the East (CIA Internet Site 2002). In the present study, we include 1.12 million persons who are private telephone users, and thus most likely are over 14 years of age; in Belgium there are about 8.4 million persons above that age (CIA, ibid.) so that we are using a 13.3 per cent sample of the adult Belgian population. In 2000, we obtained a copy of the Telephone and Address book of Belgium in a CD-ROM produced by a commercial firm. The CD-ROM contains the files of all 4.5 million users registered by the telephone companies in the country for 1999. In order to assess isonymy structure as a function of geography, we selected 156 towns in a roughly regular grid across Belgium (Figure 1). Since each record in the database indicates if the user is non-private, we directly downloaded the private users for each town; hence we downloaded 1,118,004 private users for the 156 towns. In 77 of the towns, the prevalent language is Flemish, in 76 French, and in 3 German. For the towns selected the average distance between nearest neighbours is 11.2 km with a standard deviation of 3.2 km. The prevalent language, the location, the sample size, the number of different surnames and isonymy parameters for the towns analysed are given in Table 1.

image

Figure 1. Location of the Belgian towns included in the analysis. The polygons are obtained by Voronoi tassellation. The identification codes are as in Table 1. The three clusters delimited by dotted lines running from North to South result from the dendrogram obtained using the matrix of Lasker's distances. The thickest line separates languages.

Download figure to PowerPoint

Table 1.  Name of town, horizontal co-ordinate, X, and vertical co-ordinate, Y, sample size, N, number of different surnames, S, Fisher's α, Karlin's v, random isonymy I, identification code, C, and prevalent languge, P, of 156 Belgian towns. Alphabetic ordering
TownXYNSανI = 4FSTCP
Aalst27.645.902178442976770.03010.0014771L
Aarschot41.547.30623217713770.05700.0026532L
Alveringem4.548.109205475080.35570.0019693L
Andenne46.433.904718234610890.18750.0009184R
Antwerpen34.053.701150902644820220.01730.0004955L
Arlon59.711.705538280611830.17600.0008456R
Ath23.237.405876274210920.15670.0009167R
Awans52.838.501612117315840.49560.0006318R
Bastogne57.820.50202010476170.23400.0016219R
Beaumont31.426.6012858077250.36070.00137910R
Beauraing44.123.106024537490.55440.00133511R
Beernem15.551.80311911344450.12490.00224712L
Beloeil22.435.20343916096620.16140.00151113R
Bertrix49.516.2010566294850.31470.00206214R
Binche29.931.507104343315420.17830.00064915R
Blankenberge11.256.505109283015250.22990.00065616L
Bocholt54.352.8022628373010.11740.00332217L
Boom33.350.20402515945020.11090.00199218L
Bouillon46.414.406294303950.38570.00253219R
Brasschaat35.355.708496412815020.15020.00066620L
Brecht38.057.50531623376700.11190.00149321L
Brugge13.553.5027948668213240.04520.00075522L
Brussels33.243.401507725590559890.03820.00016723L
Buggenhout30.548.008844773960.30940.00252524L
Büllingen67.032.109023611730.16090.00578025G
Cerfontaine34.324.707164936270.46690.00159526R
Champlon53.823.4048453840.88650.00260427R
Charleroy35.031.60439111718040060.08360.00025028R
Chaudefontaine55.936.604128271121960.34720.00045529R
Chimay32.521.40212611107690.26560.00130030R
Ciney46.628.30318315728830.21720.00113331R
Chiny51.213.101771322090.54150.00478532R
Couvin35.921.60296714115820.16400.00171833R
Damme14.554.60255011846640.20660.00150634L
Daverdisse47.120.801311052140.62030.00467335R
De Haan10.255.403524232619370.35470.00051636L
Dendermonde28.848.501111426775840.04990.00171237L
De Panne2.650.603066198915860.34090.00063138L
Diest45.247.60518919785900.10210.00169539L
Diksmuide7.248.60317313797590.19300.00131840L
Dinant43.326.802599155312420.32340.00080541R
Erezee54.828.505723994230.42510.00236442R
Eupen63.038.00370517388500.18660.00117643G
Fauvillers56.916.402922487230.71230.00138344R
Gavere21.145.70288211843620.11160.00276245L
Geel44.152.40684917453050.04260.00327946L
Geetbets46.544.8010965633810.25800.00262547L
Gembloux39.335.604217246414930.26150.00067048R
Genk53.247.0012594601620590.14050.00048649L
Gent22.449.20608621369014620.02350.00068450L
Genval35.739.901602129528260.63820.00035451R
Geraardsbergen25.141.40788324456990.08140.00143152L
Habay la Neuve56.712.908365544450.34740.00224753R
Halle31.240.30761529387650.09130.00130754R
Hamoir54.032.207325709740.57090.00102755R
Hannut46.038.702446142310190.29410.00098156R
Hasselt50.446.001570545088890.05360.00112557L
Havelange49.131.108575766130.41700.00163158R
Herbeumont49.314.2097781090.52910.00917459R
Herentals41.452.80491615933900.07350.00256460L
Herselt42.349.40306910773550.10370.00281761L
Heuvelland(Kemmel)6.541.802501944340.63450.00230462L
Hoogstraten40.158.90371812162700.06770.00370463L
Houffalize58.924.102589159112090.31830.00082764R
Huy49.034.603807217314770.27950.00067765R
Ichtegem9.850.30282010874560.13920.00219366L
Ieper7.643.70729523459830.11870.00101767L
Jabbeke11.352.80301213256580.17930.00152068L
Jodoigne42.340.102666176615850.37290.00063169R
Kasterlee43.654.70336710962720.07470.00367670L
Keerbergen37.847.90269815007120.20880.00140471L
Knokke Heist14.557.5013315650619550.12800.00051272L
Kortemark10.448.50266210004830.15360.00207073L
Kortrijk14.243.0014590423510480.06700.00095474L
Langemark8.345.3017198085690.24870.00175775L
La Roche en Ardent55.225.505474125580.50500.00179276R
Lebbeke29.147.60466813503370.06730.00296777L
Leopoldsburg48.551.00321316237250.18410.00137978L
Lessines24.139.704480217311360.20230.00088079R
Libramont51.718.1010156435730.36080.00174580R
Liege56.039.00400491700539900.09060.00025181R
Lier37.051.40783225516060.07180.00165082L
Lokeren26.650.50861923824870.05350.00205383L
Lommel49.654.40663718843540.05060.00282584L
Louvain39.244.4021739775315980.06850.00062685L
Maaseik58.050.80501517584890.08880.00204586L
Malmedy62.832.50243311704980.16990.00200887R
Mechelen35.348.4017302584311030.05990.00090788L
Meerhout45.551.6019856372310.10420.00432989L
Meeuwen53.450.7027708572480.08220.00403290L
Menen11.542.106977263710060.12600.00099491R
Mettet38.828.902322140110580.31300.00094592R
Middelkerke Bad6.453.004802296920220.29630.00049593L
Modave50.132.506364737460.53980.00134094R
Mons26.232.5018437827622020.10670.00045495R
Moorslede10.744.7023479755280.18370.00189496L
Mouscron13.440.6014229478414240.09100.00070297R
Namur42.332.9022405861319620.08050.00051098R
Nassogne51.023.804263175310.55490.00188399R
Neufchateau52.816.007375116190.45650.001616100R
Nieuwpoort Bad4.651.902669168912810.32430.000781101L
Nijvel32.836.704653303424910.34870.000401102R
Ninove27.343.10849024426480.07090.001543103R
Noville44.435.705054038500.62730.001176104R
Ottignies37.038.402149186355780.72190.000179105R
Oostende8.154.0017610646617800.09180.000562106L
Opglabbek54.549.2022029602670.10810.003745107L
Oudenarde20.243.30680219985110.06990.001957108L
Paliseul47.417.502852072420.45920.004132109R
Philippeville36.725.501780119311730.39720.000853110R
Poperinge4.843.60417613756440.13360.001553111L
Profondeville42.530.602232153014200.38880.000704112R
Quaregnon23.932.40398721679940.19960.001006113R
Rijkevorsel40.157.5022387773380.13120.002959114L
Rochefort48.924.6014279007580.34690.001319115R
Roeselare11.846.401080328747950.06850.001258116L
Ronse20.040.70508121447550.12940.001325117L
Roosdaal28.443.20258810113510.11940.002849118L
Ruddervoorde13.250.40509819137830.13310.001277119L
Ruiselede16.548.809025114680.34160.002137120L
Soignes28.236.004801265415510.24420.000645121R
Spa60.234.302155142313190.37970.000758122R
Sprimont56.534.202297153415780.40720.000634123R
Staden9.847.0022449214800.17620.002083124L
Stavelot61.231.5014127965480.27960.001825125R
St. Hubert51.621.006013432320.27850.004310126R
St. Niklaas29.552.301686739965860.03360.001706127L
Stoumont59.231.805133815260.50630.001901128R
St. Sauveur20.039.506214636300.50360.001587129R
St. Vith64.828.5015846893260.17070.003067130G
Thuin32.129.403170194715930.33450.000628131R
Thurnhout43.357.00879927636370.06750.001570132L
Tielt15.347.60434114225110.10530.001957133L
Tienen43.442.40741824807830.09550.001277134L
Tintigny54.311.602702023500.56450.002857135R
Tongeren52.741.801113532707870.06600.001271136L
Tournai16.436.7016998553813830.07520.000723137R
Trois Ponts60.231.004343335890.57580.001698138R
Vencimont43.621.102611802230.46070.004484139R
Verviers60.036.9010261484621090.17050.000474140R
Veurne3.749.80284114168980.24020.001114141L
Vielsalm61.028.6017088895240.23480.001908142R
Virton54.88.5011957737710.39220.001297143R
Oost Vleteren5.145.907444233810.33870.002625144L
Vresse43.816.701511323800.71560.002632145R
Walcourt34.726.903694217414390.28030.000695146R
Waregem17.144.60698020705610.07440.001783147L
Waremme49.139.602817158511730.29400.000853148R
Waterloo34.040.005511407653810.49400.000186149R
Wavre37.839.806781428429580.30370.000338150R
Wellin47.022.502842283700.56570.002703151R
Westmalle38.856.10312313205430.14810.001842152L
Wetteren25.047.80581417674710.07490.002123153L
Wilrijk33.552.5010361506516110.13460.000621154L
Zeebrugge12.757.20225412137670.25390.001304155L
Zoutleeuw46.343.4015907043360.17450.002976156L
Belgium  111800413738046460.00420.000215  

Statistics

Based on the surname distribution, the random isonymy between towns I and J (Iij) was estimated as Iijkpkipkj, where pki and pkj are the relative frequencies of surname k in towns I and J respectively; the sum is over all surnames. Unbiased random isonymy within towns (Iii), Alpha (α, the effective number of surnames) which is an index of surname heterogeneity (Fisher, 1943), and ν (Karlin & McGregor, 1967), which is an indicator of migration rate, were estimated according to Rodriguez-Larralde et al. (1994). At this point, we want to underline a possible cause of misunderstanding due to the use of the same symbol for different quantities. In isonymy, Fisher's α is used as a measure of surnames heterogeneity; it is the inverse of isonymy, α= 1/I = 1/4FST. In inbreeding theory the total level of consanguinity in a group is also called ‘Bernstein α’ and it is the same as the Wrightian symbol FIT. However, since in their work on consanguinity in Belgium Twiesselmann et al. (1962) use Bernstein α, and we refer to their work, it seems useful to state that, in isonymy,

  • image

inasmuch as FST is the main component of the total level of inbreeding (or, faute de mieux, the only component which can be estimated).

However, for an ordered review of the quantities which can be derived from the surname distributions see Relethford (1988). Note that Iij is twice Lasker's coefficient of relationship, Ri (Lasker, 1977). We note also that Iij is a function of the kinship ϕij between towns I and J, as showed by Rodriguez-Larralde et al. (1998a). In practice, we calculate the linear correlation of Lasker's distance between towns I and J with their geographic distance. Lasker's distance is defined as -log(Iij) (Rodriguez-Larralde et al. 1998a). The linear geographic distance between towns I and J was estimated using the co-ordinates of each town on a map with a scale of 1:400,000. The significance of correlations was assessed using a permutation method (Smouse et al. 1986). A dendrogram was constructed from the matrix of Lasker's distances with the neighbour joining method (Saitou & Nei, 1987), using the software PHYLIP (Felsenstein, 1989, 1993). Such a dendrogram, representing the surname relationships between different towns of Belgium, shows affinities but does not imply origins. We do not show it here but it is available from the corresponding author.

Results and Discussion

  1. Top of page
  2. Summary
  3. Introduction
  4. Material and Methods
  5. Results and Discussion
  6. Conclusions
  7. Acknowledgments
  8. References

Distribution of the Users

All of Belgium

The number of private users per town ranged in size from 48 in the tiny village of Champlon to 150,772 in the metropolis of Brussels (Table 1). The minimum number of different surnames was 45, again in Champlon, and the maximum number was 55,905 in Brussels. The number of different surnames for the whole sample was 137,380. The users were not distinguished by sex. The log-log frequency distribution of the occurrence of surnames (Fox & Lasker, 1983) is given in Figure 2a. The graph is fairly linear, as we have observed in other European countries. We do not perceive in Belgium the excess of surnames with intermediate occurrence which seems typical of Holland (Barrai et al. 2002). This is true also for the Flemish, the French, and the German speaking areas (Figure 2b). We fitted a straight line to the log-log frequency distribution for the whole of Belgium, and for the three different languages. The regression was −1.54±0.02 for Belgium, −1.68±0.04 for French, −1.50±0.03 for Flemish, and −1.97±0.07 for the German speaking towns. These values indicate that immigration is lowest in the Flemish speaking part of Belgium. For the German area, since we had only three towns we calculated the regression of Lasker's distance among the 3 towns and the 153 remaining towns and their geographic distance. Thus, the regression is not language independent, and our considerations for the German speaking area are only indicative.

image

Figure 2. Distribution of the occurrence of surnames in Belgium, log-log scale. 2a) All Belgium 2b) occurrences in the Flemish speaking area, in the French area, and in the prevalently German area.

Download figure to PowerPoint

Belgian surnames are somewhat different from those of other countries, in the sense that patronymics and locatives dominate, whereas trade names and physical traits are not the most frequent surnames. Several patronymics are among the 100 surnames which are most frequent in the sample. We list them in Table 2. These 100 surnames comprise 111,733 users. The list is the result of pooling the three languages, since in our study we include 791,099 Flemish, 320,714 French, and 6,191 German speaking users, and group Brussels with the Flemish-speaking towns.

Table 2.  The 100 most frequent Belgian surnames in a sample of 1,118,004 telephone users
SurnameFrequencySurnameFrequency
Bertrand656Vandaele891
Wuyts661Depauw894
Vandeputte663Smets897
Petit670Vanhove917
Charlier671Dumont938
Thomas671Stevens938
Lenaerts672Vandewalle939
Dejonghe676Francois943
Renard678Degroote948
Bogaerts681Vandenbossche962
Denis687Deridder965
Verstraete693Debruyne999
Mathieu697Leclercq1005
Lejeune704Decoster1018
Deprez705Martin1026
Christiaens709Leroy1028
Verbeke713Declerck1042
Vermeersch713Dewilde1088
Simon715Coppens1106
Lambrechts719Demeyer1152
Fontaine724Vandenbroeck1188
Vanhecke725Dupont1207
Lauwers726Aerts1230
Verheyen727Desmet1277
Bogaert729Martens1282
Noel730Hermans1284
Lefevre731Hendrickx1331
Debruyn732Claeys1336
Cools757Lambert1377
Verhoeven775Decock1391
Dewulf776Debacker1411
Dekeyser782Pauwels1424
Bauwens791Michiels1458
Dhondt791Vermeulen1538
Laurent801Vandenberghe1583
Lemaire801Dubois1611
Cornelis807Claes1612
Smet811Wouters1615
Moens812Goossens1742
Dewaele814Willems1804
Janssen815Vandevelde1836
Lemmens824Vandamme2008
Dewitte832Declercq2064
Wauters835Mertens2166
Vanacker838Jacobs2309
Baert847Devos2556
Gerard865Maes2923
Lefebvre866Desmet3103
Deconinck869Peeters3119
Segers875Janssens3690

Among these 100 most frequent surnames the first is Janssens, followed by Peeters, Desmet, Maes and Devos. After these come Jacobs, Mertens, Declercq, Vandamme and Vandevelde. Janssens (with its graphic variant Janssen), Peeters and Jacobs are patronymics, while Maes is a locative from the river Mose or Maas in Flemish. Vandamme (from the dam) and Vandevelde (from the field) are locatives. Declerc is French. In the 100 most frequent surnames, we counted 26 clearly French, and 19 French-Flemish crossovers like Desmet, Devos, Dejonghe, Dewitte, Dewilde and several others beginning with the prefix De-. The other 55 seem clearly Flemish, 10 among them beginning with the prefix Van. However, a clearer picture emerges when we study the surnames within linguistic areas, as given in Table 3. In the Table, we list only the 50 most frequent surnames.

Table 3.  The 50 most frequent Belgian surnames in the three areas speaking Flemish (791,094 users including Brussels), French (320,714) and German (6,191)
Flemish French German 
Lemmens755Toussaint313Lux14
Deconinck762Antoine325Reinartz14
Moens763Declercq329Schroeder14
Smet767Moreau332Zimmermann14
Depauw778Adam335Braun15
Baert781Carlier345Bruls15
Vanacker783Devos345Terren15
Stevens793Lebrun350Velz15
Vandaele797Delvaux356Weber15
Smets804Bastin359Falter16
Segers813Marechal360Hoffmann16
Vandewalle834Legrand361Pfeiffer16
Vanhove843Thiry361Arens17
Vandenbossche861Guillaume363Becker17
Decoster878Thomas363Scholzen17
Deridder878Henry369Ernst18
Degroote879Robert377Heinrichs18
Debruyne917Collard382Hennes18
Coppens948Lefevre391Thissen18
Declerck970Servais402Gillessen19
Dewilde990Marchal413Keller19
Demeyer1058Desmet426Willems19
Vandenbroeck1121Bernard429Backes20
Aerts1156Leonard435Kohnen20
Hermans1164Michel436Rauw20
Martens1178Denis441Cormann21
Desmet1203Andre446Heeren21
Decock1217Evrard453Hilgers21
Claeys1225Mathieu478Reuter21
Hendrickx1234Charlier482Breuer22
Debacker1275Bertrand484Hermann22
Pauwels1293Fontaine490Mertens22
Michiels1331Lejeune499Cremer23
Vandenberghe1351Lemaire503Faymonville23
Vermeulen1396Noel509Krings24
Claes1460Petit517Palm24
Wouters1507Francois523Heinen26
Goossens1568Laurent534Jost26
Willems1611Lefebvre562Niessen26
Vandevelde1663Renard568Henkes29
Declercq1734Simon595Peters30
Vandamme1788Leroy610Theissen30
Mertens1925Gerard611Meyer32
Jacobs2098Dumont622Radermacher33
Devos2207Dupont635Stoffels33
Maes2652Martin669Schumacher34
Desmet2675Leclercq698Muller38
Peeters2922Lambert880Schroder46
Janssens3381Dubois980Schmitz70
Flemish Speaking Belgium

In this area, as for the whole country, the most frequent surname is Janssens, followed by Peeters, Desmet and Maes. Then come Devos, Jacobs, Mertens and Vandamme. As we said, Desmet and Devos are French-Flemish crossovers, as are the other surnames beginning with the prefix DeSmet, and Smets indicate a locative from a wet terrain, and Vos is simply fox. Although the surnames prefixed with De- may indicate immigration of French speaking Belgians into the Flemish area, the indication they give is not completely clear, whereas it is clearly given by the presence of Declercq and its graphic variant Declerck in the 50 most frequent surnames of this area. Overall, we count only three clearly French surnames (Declecq, Declerck, Debruyne) in the 50 most frequent surnames in the Flemish speaking part of Belgium. We perceive this as an indication that migration from the South of the country to the Northern area is not intense.

French Speaking Belgium

In the French speaking area, which is the Walloon or Southern half of Belgium, the most frequent surnames are Dubois, Lambert, Leclercq (here the determinative article has stayed and was not substituted by the prefix De-), along with Martin, Dupont and Dumont, in this order. Gerard is seventh from the top, followed by Leroy, Simon and Renard (compare the frequency of Renard, which means fox, with the frequency of Devos, which means ‘of the fox’). There are no clearly Flemish surnames amongst the fifty most frequent in the French area: we find Devos at 43rd place, and Desmet at 28th place. There is no flag surname which can clearly indicate North-South migration, in the way that Declercq indicates South-North, so that also movement from the North to the South of Belgium seems minor.

German Speaking Belgium

In our sample, we considered only three towns where German (really, a German dialect) is the prevalent language. They are Büllingen, Eupen and Sankt Vith, all of them at the far eastern border of Belgium. There were only 6,191 users in the three towns. The most frequent surnames are Schmitz, Schroder, Muller, Schumacher and Stoffels. Then come Radermacher, Meyer, Theissen, Peters and Henkes. Schmitz is a variant of the family Schmidt. Schroder and Muller have lost the umlaut over the first vowel. Theissen, Theisen and Theiβen are common German names, and Theissen belongs to this family. The first and only French surname in the fifty most frequent of the German speaking area is Faymonville (a locative; there is a small town with that name in the area). Peters, Mertens and Niessen are frequent also in the Flemish speaking part of Belgium. Again, it is not obvious to ascertain migration either from the Flemish or the French section of the population into the German part. The French surname Faymonville is a case in point, since there are only 29 persons in our sample with that surname, of which 6 are found in the rest of Belgium and 23 in the German speaking area, so they do not represent recent immigration there.

Isonymy Parameters in Belgian Towns

In Table 1 we give the sample size, the number of surnames and the isonymy parameters, Karlin's ν, Fisher's α, and unbiased isonymy I in the 156 towns analysed. Identification codes and prevalent language are given in the last two columns of the Table. Identification is also possible with the co-ordinates in linear scale (1/400,000) which are given in the first two columns. Minimum α (109) was computed in Herbeumont in West Flanders, and maximum in Brussels (5989). These values can be compared with α for the whole country, which was 4646, and with α averaged over all towns, which was 997. The variations seem larger than those seen in the European countries where large samples were studied, indicating that there is considerable heterogeneity in the Belgian surname structure. As in Switzerland, this may be attributed to the presence of the three languages which have a different geographic distribution. Average α is highest in the French speaking area, (α= 1187), intermediate in the Flemish area (α= 830) and lowest in the relatively small section of German speaking Belgium (α= 450). Overall, the lowest values are typical of East Flanders, East Namur and of Luxembourg.

From Table 1, we can calculate the correlations of sample size and surname number with isonymy parameters (we showed these also in Rodriguez et al. 1994 and in Scapoli et al. 1997). The correlations are given in Table 4 and the thumbnail images of the scatter diagrams in Figure 3. Note the strict dependence of surname number and of the other parameters from the sample size for each town. The correlations are also calculated in logarithmic scale to compensate for the large variations in sample size. For example, in linear scale Brussels is 3000 times larger than Champlon, whereas in log scale the variation goes from 3.87 to 11.92.

Table 4.  Correlations of Isonymy parameters with demographic parameters. First row, linear scale, second row, log-log scale
 CανI
  1. N = sample size; C = surname number; I = unbiased isonymy.

  2. α= Fisher's indicator of heterogeneity.

  3. ν= Karlin-McGregor indicator of migration.

N0.9670.525−0.349−0.271
 0.9760.567−0.825−0.568
C 0.635−0.314−0.319
  0.720−0.690−0.720
α   0.037−0.601
   −0.023−1.000
ν    0.093
     0.023
image

Figure 3. Scatter diagrams of the demographic and isonymic parameters. Note the decreasing predictivity of sample size on the number of surnames, on α, and Karlin-McGregor ν, in this order. Bilogarithmic scale.

Download figure to PowerPoint

Isolation by Language

To study the effect of languages on isolation, first we calculated the average Lasker's distance between adjacent towns along and across the boundary line which separates Flemish from French and German. We found that across this line the average distance and its standard error is 8.11±0.12, whereas the average distance between nearest neighbour towns in the whole country is 7.16 ± 0.07. The difference is highly significant (t = 6.84, P ≪ 0.001). Then, we studied the average Lasker's distance within and between languages, and obtained the results given in Table 5.

Table 5.  Average Lasker's distance between pairs of towns within and between languages in Belgium. 77 Flemish, 76 French, and 3 German speaking towns
LanguagesLasker's distancePairs of towns
Flemish-Flemish 8.21 ± 0.0142926
French-French 8.12 ± 0.0112850
German-German 6.91 ± 0.2913
Within languages 8.16 ± 0.0095779
French-Flemish 9.47 ± 0.0235852
French-German 9.37 ± 0.046228
Flemish-German10.10 ± 0.053231
Between languages 9.48 ± 0.0086311
Belgium 8.85 ± 0.00912090

It is apparent that Lasker's distance, irrespective of geographic distance, is considerably larger between as opposed to within languages (t = 109.620, P vanishing). The three German towns are significantly more uniform than the Flemish (t = 4.462, P ≪ 0.01) and the French speaking towns (t = 4.155, P ≪ 0.01). Within Flemish, average distance is approximately the same as within French. The distance between French and Flemish is similar to the distance between French and German, and the distance between Flemish and German is the highest.

Isolation by Distance

We then took geographical position into account, and studied isolation due to geographic distance among the 156 towns. The elements of the matrix of Lasker's distance and the corresponding elements of the geographic distances were strongly correlated. Their scatter diagram is given in Figure 4. The correlation is the highest (with Switzerland) observed to date for European countries, with r = 0.721±0.014. This means that more than 50% of the surname heterogeneity is due to distance, a result similar to that obtained for Italy and Switzerland, and well above that obtained for the neighbouring Netherlands. However, since Belgium is a relatively small country compared to the previous ones we have studied, and it presents relatively few physical obstacles to movement, it seems reasonable to attribute the isolation both to physical distance and to the different geographical distribution of the three languages.

image

Figure 4. Scatter diagram of Lasker's and linear geographic distances. Belgium, with Switzerland, is the European country where the correlation between isonymic distance and geographic distance is highest. a) Flemish (L) speaking area b) French (R) speaking area c) three German (G) speaking towns against the other 153 Belgian towns.

Download figure to PowerPoint

In order to assess the effect of geographic distance independent of language, we studied separately the 77 Flemish towns in the Northern area, and the 76 towns in the South or Walloon or French area. For the 3 towns where German is the prevalent language, the analysis (as we have already underlined) is not independent of language, since we studied the isolation of the 3 towns versus the remaining 153.

The correlation between Lasker's and geographic distance for the Flemish towns (Figure 4a) is the highest we have ever measured in the course of our studies, with r = 0.878±0.007. This means that the Flemish speaking population of Belgium tends to be sedentary, so that migration and movement inside this area are not large. In fact, taking the correlation at face value, this would mean that 77% of the surname variation is due to distance. In the French speaking area, r = 0.631±0.020 (Figure 4b) indicates that only 40% of the variation of surnames within the French language is due to distance, about half of the explained variation in the Flemish. Also the internal mobility of the French speaking population must be accordingly larger.

The correlation for the three German towns with all others, r = 0.805±0.022 (Figure 4c), is also large but includes both the effects of isolation due to language and due to distance.

The correlations of Lasker's distances with the log of linear distance give almost equal results. However, given the size of Belgium, plain linear distance seems more appropriate for this study. Considering again the work of Dodinval (1970) in Belgium, we underline that the results he obtained using large genetic databases are also obtained using large surname databases.

Relation Between the Isolating Power of Geographic Distance and the Isolating Power of Languages

The impact of linguistic boundaries on population differentiation was examined by Dupanloup et al. (2000) at the continental level. They measured the added genetic distance at the two sides of a boundary and proposed an estimator of the effect of the boundary. In our much smaller scale, we seem to be in a position to establish a quantitative equivalence between isolation due to languages and isolation due to geographic distance in Belgium.

We recall that, for any regression equation of the type

  • image(1)

one can obtain the length ΔX of the abscissa between two points Y1 and Y2 on the regression by

  • image(2)

This can be applied to any pair of values Y1 and Y2 belonging to the same variable. In this work, this quantity indicates the expected distance between the baricenters of the three different linguistic areas of Belgium if all residents speak the same language.

In practice, we calculated the regression of Lasker's distance over geographic distance, to obtain the rate of increase in linear scale. The regression of Lasker's distance, Z, over linear kilometres, W, is:

  • image

where the coefficient 0.01206 indicates increase of Lasker's distance per kilometre at any distance. The three quantities Y1, Y2 and b, as indicated in equation (2), are:

  • 1
    Average Lasker's distance between languages (Table 5), Y1= 9.48
  • 2
    Average Lasker's distance within languages (Table 5), Y2= 8.16
  • 3
    The coefficient of regression of the distance over kilometres in linear scale, b = 0.01206

The difference between-within languages is (9.48 − 8.16) = 1.32 Lasker's units, and the rate of increase of Lasker's distance per kilometre is 0.01206 units. Thus, the isolating effect of languages in Belgium is:

  • image

This means that the increase in surname variation due to language differences is equivalent to the increase that one would observe in Belgium at a distance of a little more than 100 kilometres. Applying this method to the towns on the two sides of the boundary line separating languages in Figure 1, we obtain:

  • image

which means that the difference between adjacent towns due to language only is equivalent to a physical distance of 79 kilometres.

This simple method could be applied to any geo-linguistic situation when linear geographic distances and similarity distances are available.

Other Indicators of Surname Similarity Among Belgian Towns

We constructed a dendrogram of the 156 towns based on the matrix of Lasker's distance. The dendrogram does not identify the main separation between languages, but identifies three main clusters which go across languages. The geographic location of the clusters is given in Figure 1 over the Voronoi tassellation of the towns (Voronoi, 1908; Byers, 1996). The provinces of East and West Flanders cluster with Hainaut (cluster I), and this is somewhat surprising because Hainaut belongs to the French speaking area. Then, the provinces of Antwerpen, Brabant and Limburg form a central cluster (II). The third (III) or Eastern cluster comprises Liege and Namur (French speaking) but also includes Luxembourg where German is prevalent. If we compare the clusters with Figure 5, where α is plotted in three dimensions over Belgium, we observe that the south-eastern cluster is distinctly more isonymous than the rest of the country, so that the level of inbreeding is expected to be higher there. One may wish to compare Figure 5 of the present paper with Figure 7 of Twiesselmann et al. (1962) paper, so that the correlation between inbreeding levels estimated by dispensations and by isonymy becomes apparent.

image

Figure 5. Stereogram of α in 156 Belgian Towns. Note the high values in Brabant, on the Coast of the North Sea and in Liege. East Flanders, East Namur and Luxembourg have the lowest values of α.

Download figure to PowerPoint

Conclusions

  1. Top of page
  2. Summary
  3. Introduction
  4. Material and Methods
  5. Results and Discussion
  6. Conclusions
  7. Acknowledgments
  8. References

The surname structure of Belgium seems to be the most heterogeneous among the comparable structures of other European countries. We attribute the heterogeneity to the presence of three languages in the nation, each having its own prevalent geographic distribution. Since in Belgium there are no geographic barriers, the isolating effect of language is not confounded by other factors.

An equivalent source of information (CD-ROM of telephone users), and the same methodology described in this paper, was used to analyse the isonymy structure of other European countries, and of Venezuela, where we used the surnames of electors, and of the USA (Barrai et al. 1996, 1997, 1999, 2000, 2001, 2002; Rodriguez-Larralde et al. 1998a,b, 2000). The average value of α over all cities (or States, in the case of Venezuela), and the isolation by distance measured by the correlation between isonymy and geographic distance, are given in Table 6 for the eight countries studied to date.

Table 6.  Comparison of isonymy parameters in the USA, Venezuela and six European Nations
CountrySample Size (Millions)Surnamesα (average)Isolation by distance (r)Lasker MinDistance Max
Europe: 
 Austria1.0140,7668540.595.499.94
 Belgium1.1137,4429970.746.0313.15
 Germany5.2462,52615960.513.729.11
 Holland2.4126,5047860.475.1210.19
 Italy5.1215,62312360.616.4311.61
 Switzerland1.7166,1168910.726.279.61
America: 
 USA18.0899,58513660.245.199.84
 Venezuela3.968,6651220.784.565.80

There are three main features emerging from the comparisons in Table 6. Firstly, the similarity of European Nations for abundance of surnames as measured by α. Secondly, always for European countries, the general presence of isolation by distance, although of differing intensity in the different countries. Thirdly, the relatively small value of α in Venezuela, which is of the order of 100–200 and has been discussed elsewhere (Rodriguez-Larralde et al. 2000), and the practical absence of isolation by distance in the USA, due to population mobility and recent immigration.

From the present analysis, we observe considerable difference between the surname structure of Belgium and the neighbouring Netherlands, where the only official language is Dutch. In the plain of the Netherlands, isolation by distance, although significant and relevant, is the lowest in European countries. In Belgium it is the highest observed by us to date. The isolation is even higher in the Flemish speaking area, possibly indicating remote settlement and scarce movement of the Flemish population in Belgium. The lower correlation in the French speaking area, on the other hand, is indicative of higher population mobility, possibly linked to differences in economic, social, and cultural factors, compatible with the differing rate of industrialisation between areas.

In conclusion, the similarity in the surname structure of Belgium and Switzerland, associated with the geographic heterogeneity of Switzerland and the homogeneity of Belgium, indicates that the isolating power of different languages is at least as large as that of physical barriers. In the present case it corresponds to the isolation generated by a linear geographic distance of about 100 km. The next question is whether the time taken for disappearance of the isolation, in modern populations, will be longer for languages or for physical barriers. Our results for the USA may indicate that, when a common language is adopted, isolation may disappear very rapidly.

Acknowledgments

  1. Top of page
  2. Summary
  3. Introduction
  4. Material and Methods
  5. Results and Discussion
  6. Conclusions
  7. Acknowledgments
  8. References

This work was supported by grants of the Italian Ministry of Universities and of Research and by Agreements CNR/FONACIT 2002-2004 Number 132.36.1 (Italy) and PI-200001829 (Venezuela).

References

  1. Top of page
  2. Summary
  3. Introduction
  4. Material and Methods
  5. Results and Discussion
  6. Conclusions
  7. Acknowledgments
  8. References
  • Barrai, I., Scapoli, C., Beretta, M., Nesti, C., Mamolini, E. & Rodriguez-Larralde, A. 1996 Isonymy and the genetic structure of Switzerland. I: The distributions of surnames. Ann Hum Biol 23, 431455.
  • Barrai, I., Scapoli, C., Beretta, M., Nesti, C., Mamolini, E. & Rodriguez-Larralde, A. 1997 Isolation by distance in Germany. Hum Genet 100, 684.
  • Barrai, I., Scapoli, C., Mamolini, E. & Rodriguez-Larralde, A. 1999 Isolation by distance in Italy. Hum Biol 71, 947962.
  • Barrai, I., Rodriguez-Larralde, A., Mamolini, E. & Scapoli, C. 2000 Elements of the surname structure of Austria. Ann Hum Biol 26, 115.
  • Barrai, I., Rodriguez-Larralde, A., Mamolini, E., Manni, F. & Scapoli, C. 2001 Elements of the surname structure of the USA. Am J Phys Anthr 114, 109123.
  • Barrai, I., Rodriguez-Larralde, A., Manni, F. & Scapoli, C. 2002 Isonymy and isolation by distance in the Netherlands. Hum Biol 74, 263281.
  • Byers, J.A. 1996 Correct calculation of Dirichelet polygon areas. J Anim Ecol 65, 528529.
  • Cavalli-Sforza, L.L., Piazza, A., Menozzi, P. & Mountain, J. 1989 Genetic and linguistic evolution. Science 244, 112829.
  • Cavalli-Sforza, L.L. 1997 Genes, peoples, and languages. Proc Natl Acad Sci USA 94, 77197724.
  • Central Intelligence Agency. 2002 at Internet Site http://www.cia.gov/cia/publications/factbook/country.html
  • Crow, J.F. & Mange, A.P. 1965 Measurements of inbreeding from the frequency of marriages between persons of the same surnames. Eugen Quart 12, 199203.
  • Darwin, G.H. 1875 Marriages between first cousins in England and their effects. J Stat Soc 38, 153184.
  • Dodinval, P. 1970 Population structure of A, B, O, AB blood groups in Belgium. Hum Hered 20, 169177
  • Dupanloup, I., Schneider, S., Langaney, A. & Excoffier, L. 2000 Inferring the impact of linguistic boundaries on population differentiation: application to the Afro-Asiatic-Indo-European case. Eur J Hum Genet 8, 750756.
  • Encyclopaedia Britannica, William Benton Publisher. 1962 Vol 2, see Belgium .
  • Felsenstein, J. 1989 PHYLIP - Phylogeny Inference Package (Rel. 3.2). Cladistics 5, 164166.
  • Felsenstein, J. 1993 PHYLIP (Phylogeny Inference Package) Rel. 3.5c. Distributed by the author. Department of Genetics, University of Washington , Seattle .
  • Fisher, R.A. 1943 The relation between the number of species and the number of individuals in a random sample of animal population. J Anim Ecol 12, 4258.
  • Fox, W.R. & Lasker, G.W. 1983 The distribution of surname frequencies. Int Stat Rev 51, 8187.
  • Karlin, S. & McGregor, J. 1967 The number of mutant forms maintained in a population. Proceedings of the Fifth Berkeley Symposium on Mathematics, Statistics and Probability, 4, pp. 415438.
  • Lasker, G.W. 1977 A coefficient of relationship by isonymy: a method for estimating the genetic relationship between populations. Hum Biol 49, 489493.
  • Piazza, A., Rendine, S., Minch, E., Menozzi, P., Mountain, J. & Cavalli-Sforza, L.L. 1995 Genetics and the origin of European languages. Proc Nat Acad Sci USA 92, 58365840.
  • Relethford, J.H. 1988 Estimation of kinship and genetic distance from surnames. Hum Biol 60, 475492.
  • Rodriguez-Larralde, A., Pavesi, A., Scapoli, C., Conterio, F., Siri, G. & Barrai, I. 1994 Isonymy and the genetic structure of Sicily. J biosoc Sci 26, 924.
  • Rodriguez-Larralde, A., Scapoli, C., Beretta, M., Nesti, C., Mamolini, E. & Barrai, I. 1998a Isonymy and the genetic structure of Switzerland. II. Isolation by distance. Ann Hum Biol 25, 533540.
  • Rodriguez-Larralde, A., Barrai, I., Nesti, C., Mamolini, E. & Scapoli, C. 1998b Isonymy and isolation by distance in Germany. Hum Biol 70, 10411056.
  • Rodriguez-Larralde, A., Morales, J. & Barrai, I. 2000 Surname frequency and the isonymy structure of Venezuela. Am J Hum Biol 12, 352362.
  • Saitou, N. & Nei, M. 1987 The neighbour-joining method: a new method for reconstructing evolutionary trees. Mol Biol Evol 4, 406425.
  • Scapoli, C., Rodriguez-Larralde, A., Beretta, M., Nesti, C., Lucchetti, A. & Barrai, I. 1997 Correlations between isonymy parameters. Int J Anthropol 12, 1737.
  • Smouse, P.E., Long, J.C. & Sokal, R.R. 1986 Multiple regression and correlation extensions of the Mantel test of matrix correspondence. Syst Zool 35, 627632.
  • Twiesselmann, F., Moureau, P. & Francois, J. 1962 Evolution du taux de consanguinite' en Belgique de 1918 a 1959. Population 17, 241266.
  • Voronoi, M.G. 1908 Nouvelles applications des parametres continus a la theorie des formes quadratiques, deuxieme memoire, recherche sur les paralleloedres primitifs. J Reine Angewandte Math 134, 198207.