Gene Admixture in the Costa Rican Population

Authors


*Correspondence: Bernal Morera, Instituto Clodomiro Picado, Universidad de Costa Rica, 2060 San José, Costa Rica. Tel/fax: (506) 207 55 50; e-mail: rbt@cariari.ucr.ac.cr; bmorera@biologia.ucr.ac.cr

Summary

The general population of Costa Rica has sometimes been considered to be the product of an amalgamation of groups of diverse origin. To determine the magnitude of accumulated admixture since Spanish colonization, 11 classic genetic markers were analyzed in a total of 2196 individuals originating from five distinct regions of the country. A maximum likelihood approach was used. The proportions of genes of European, Amerindian and African ancestry were found to be 61%, 30% and 9% of the total population, respectively. Variation was observed at a regional level, with an increased European influence in the North (66%) and Central (65%) regions. Meanwhile an increase in Amerindian ancestry was found in the South (38%), and a higher incidence in the contribution of African genes was detected in the coastal regions (13% in the Atlantic and 14% in the North Pacific). A principal component (PC) analysis showed that 76% of the existing variability can be explained by the first two PCs, which is in agreement with the variations observed in the admixture process by geographic area. It has been concluded that the Costa Rican population is truly trihybrid, similar to populations in other Latin American countries; however, it differs from them fundamentally by the proportion of gene flow from ancestral populations.

Introduction

Diverse ethnic-historic sources have considered the Costa Rican population to be a hybrid, essentially the product of an amalgamation or contact of entire groups of Europeans, Africans and Amerindians (Thiel, 1902; Sanabria, 1957; Meléndez, 1982, 1985; Acuña-León & Chavarría-López, 1991; Meléndez-Obando, 1993, 1997) at the beginning of the Spanish colonization, as occurred in other Latin American countries (Sans, 2000). This population structure could be, in theory, very useful in quantitating the gene flow that occurred among the ancestral populations that constituted the population as a whole (Roberts, 1978; Morera & Barrantes, 1995). Notwithstanding, most studies of the genetics of Costa Rican populations have been conducted on Amerindian groups (Barrantes et al. 1985, 1990; Barrantes, 1993; Santos et al. 1994), and to a lesser degree on populations of African descent (Sáenz et al. 1971, 1984; Madrigal et al. 2001) and recent immigrants (Morera et al. 2001a).

Few genetic studies have examined the admixed (mestizo) majority (Morales-Cordero et al. 2001; Morera et al. 2001b), which is clearly expanding at present with the consequent loss of valuable information about its genetic variability. This situation restricts the application of such knowledge to problems of genetic epidemiology and health, and inhibits access to evidence about its origins and hence, the contribution of each ancestral population. Recently, some inherited illnesses that had been attributed to a possible single founder effect of European origin have been studied (Leon et al. 1992; Saborío, 1992; Freimer et al. 1996a; Shah et al. 1997; Auger et al. 1999; Frants, 1999; Bech-Hansen et al. 2000), although under the hypothesis of the existence of a certain genetic homogeneity of the Costa Rican population (Freimer et al. 1996b).

Hence, a major study of the genetic structure of the Costa Rican population is necessary. This research has three essential objectives: 1. to estimate the accumulated admixture using various genetic markers with the maximum likelihood method; 2. to analyze the variation of this admixture among the different regions of the country; 3. to compare the results obtained in this study with those obtained from different Latin American countries, using the same methodology.

Materials and Methods

Brief Geographic Considerations

Costa Rica is located in the meridian part of the Central American isthmus (8°–11°15′ North, 82°30′–86° West) and has an area of 51100 km2. In 2001, the population of Costa Rica surpassed 4 million inhabitants (Rosero-Bixbi, 2001). Nevertheless, the distribution of the population is uneven, with the majority of the inhabitants residing in the Central Valley zone (60%), followed by the North Pacific region (13%) and the South Pacific region (10%). The areas with a hot climate and poor soil, like those of the Caribbean (9%) and of the North (8%), have the lowest population densities (Biosca, 1992).

Study Population and Genetic Markers

The analyzed sample includes information on 2196 unrelated adults of both sexes, from all parts of the country, who were examined in the Department of Forensic Sciences of the Judicial Branch of Costa Rica. The country was divided into five regions for historical-geographic reasons (Fig. 1). The majority of the individuals were grouped regionally, according to their birthplace, as follows: Atlantic (<n = 100), Central (n = 1311), Chorotega (n = 451), North (n = 104) and South (n = 200). This grouping sought to minimize the effect of migration from the countryside to the city that began in the mid sixties (Maguid, 1986). All of the participants in the study were Costa Rican by birth, which implies they were born within the national territory and that at least one of their parents was at some time Costa Rican, either by birth or by naturalization. Following the Spanish method of surnames, each individual has two last names, one from the father and the other from the mother. A Spanish surname in the core Costa Rican population generally indicates an origin from the Colonial Epoch, inherited directly from Spanish ancestors, or adopted by the indigenous people and African slaves who assumed the surname of their guardians (encomenderos) or owners (González-Víquez, 1921). A Spanish surname could also originate from the second, but proportionately smaller, Spanish immigration to Costa Rica (Schmidt, 1979). In the sample population studied, 95.7% of the individuals had two Hispanic surnames, 4% had one Hispanic surname and the other non-Hispanic, and a mere 0.3% had two non-Hispanic surnames. For the last group, origins of surnames included Anglo-Saxon derived English or Afro-Caribbean names, as well as Italian, Chinese, French, German and Philippine. As a very general approximation, this reflects the limited contribution generated by the waves of immigration to Costa Rica that occurred at the end of the 19th and during the 20th Centuries (Chaves-Camacho, 1969; Schmidt, 1979), as a possible source of genetic variability.

Figure 1.

Geographic regions of Costa Rica, divided according to county.

The polymorphisms of 32 alleles or haplotypes of 11 genetic systems were determined, including the blood groups ABO, Rhesus (RH), MNSs (MNS), Duffy (FY), Kell (KEL), Kidd (JK), Secretor (SE), P system (P), Lewis (LE), Diego (DI), and the haptoglobin (HP) protein, as described elsewhere (Morera et al. 2001b). Nevertheless, not all of the blood groups were examined in all the individuals, and for this reason the sample size varied between loci.

Ancestral Gene Frequencies and Admixture Estimates

For this study, a detailed review of the literature and historic records of the 16th and 17th centuries (Anonymous 1909–1930) was carried out to establish the most probable ancestral populations. The estimated ancestral gene frequencies are shown in Table 1. The mean frequencies of Amerindians were weighted by sample size using the gene frequencies of each ethnic indigenous group currently living in Costa Rica (Barrantes et al. 1990; Barrantes, 1993). The European ancestral frequency was estimated using the regional genetic frequencies found in the Iberian Peninsula (Calafell, 1995), with the approach of historic weighting as suggested by Reed (1969) using the area of origin of the Europeans established in Costa Rica during Colonial times, namely, North of Spain (11%), Mediterranean (8%), Portugal (2%), Basque Country-Navarre (15%), Castile-Extremadura (28%), Andalusia (33%), and other regions (3%), as indicated previously (Sanabria, 1957; Meléndez, 1982). The average estimated ancestral frequency from Western Africa was obtained from the weighted average of the regional frequencies (Roychoudhury & Nei, 1988), following the historic origins of the zone of extraction of the slaves brought to Costa Rica during the Colonial Epoch, from Senegal-Gambia (12%), Gold Coast (42%), Congo-Angola (45%), and others (1%) (Morera & Barrantes, 1995).

Table 1.  Ancestral gene frequencies used in the accumulated admixture calculations
  Ancestral gene frequencies
SystemAllelesAfricanAmerindianEuropean
ABOO0.6651.0000.650
 A10.1210.0000.246
 A20.0600.0000.040
 B0.1540.0000.056
RHCDE0.0000.0000.064
 CDe0.0680.4850.409
 cDE0.0290.4130.068
 cDe0.6750.1020.056
 cde0.2130.0000.383
 Cde0.0150.0000.018
 cdE0.0000.0000.002
MNSMS0.0850.2320.250
 Ms0.3970.5330.311
 NS0.1080.0730.063
 Ns0.4100.1620.376
KELK0.0070.0000.039
 k0.9931.0000.961
FYFya0.0010.5760.415
 Fyb0.0120.4240.585
 Fy0.9870.0000.000
JKJka0.7700.4150.536
 Jkb+ Jk0.2300.5850.464
PP10.7440.7230.581
 P2 + p0.2560.2770.419
LELE0.3080.4530.664
 le0.6920.5470.336
SESe0.4971.0000.512
 se0.5030.0000.448
DIDia0.0000.0020.000
 Dib1.0000.9981.000
HPHp-10.6780.4790.407
 Hp-20.3220.5210.593

The calculations of admixture were determined using the trihybrid model, in accordance with the historic and genealogic postulates of the origin of this population. The values were completed for the entire study and for each region, using the maximum likelihood method (Krieger et al. 1965). The MISTURA program from the software package GENIOC of the Institute Oswaldo Cruz de Rio de Janeiro, Brazil, was utilized to analyze the trihybrid populations. The Diego group was excluded from the calculations in the Atlantic, North, and South regions because the sample size did not reach 50 individuals. The goodness of fit for each genetic marker was evaluated with the admixture model. The exact test of populational differentiation was applied using the ARLEQUIN program (Schneider et al. 1997) to estimate the statistical significance of the variations among the different regions of Costa Rica.

The Principal Components (PCs) were extracted as a linear combination of variables (R-mode; Comrey, 1992) from the correlation matrix, using the statistical software package SPSS. In the analysis we included the five regional populations of Costa Rica. Only those genes studied in all the populations were included in the study.

Results

Table 2 shows the absolute phenotype frequencies of the studied markers in the different regions of the country and in the total population. The calculations of admixture under the trihybrid model for Costa Rica (Table 3) show an admixed population with a strong global component of genes of European origin (61%), followed by the Amerindian contribution (30%) and the smaller African influence (9%). A more detailed analysis reveals that all the regions have highly similar hybrid populations. Nevertheless, slight variations exist in the contribution of each ethnicity in the amalgamation, according to geographic area (Table 3). Thus, the coastal regions show an increase in the contribution of African genes (13% in the Atlantic and 14% in the Chorotega regions) in relation to the central regions. The Amerindian genes reach a maximum proportion in the South (38%), followed by the Atlantic (34%), both of which today contain the largest Amerindian populations in the country. The North (66%) and Central (65%) regions show an increase in European ancestry. In all cases, the majority of genes are designated as European, fluctuating between 51% and 66%.

Table 2.  Phenotype frequencies in the geographical regions of Costa Rica
  Regions 
SystemPhenotypeCentralChorotegaNorthSouthAtlanticCosta Rica
ABOA1359111615132636
 A2301676372
 B14262201615267
 AB34973056
 O54828991118701154
DIDi*a(+)311006
 Di*a(−)20553202924334
FYFy*a253146434932532
 Fy*ab493185758057934
 Fy*b353141686424673
 Fy*121301733
KELK110002
 Kk391633567
 K10663911811901122014
JKJk*a(+)752311150146841497
 Jk*a(−)25684323623451
LELe(+)236111544221514
 Le(−)11044233010227
MNSMS812911159157
 Ms1206325208246
 NS12512121
 Ns12848162213233
 MSs17870363227354
 NSs6124784108
 MNS693111812136
 MNs214122345021457
 MNSs24890453724456
PP1773315139147761507
 P2250126393229493
RHCDEe12421120
 Cde20975404323406
 CcDE4302110
 CcDEe20694324321413
 CcDe328139545431622
 cDE70249125124
 cDEe16681302123332
 cDe46437910122
 cDEe200103
 Ccde421209
 cde67201165112
SESe9333521521651011760
 se16452322517306
HPHp*121164363416362
 Hp*1–2520196919861966
 Hp*2293115495031538
Table 3.  Estimates of accumulated admixture in the Costa Rican populations
 Estimated proportion of ancestral contribution
 AfricanAmerindianEuropean
Atlantic13 ± 3.434 ± 6.653 ± 7.4
Central7 ± 2.928 ± 4.865 ± 5.6
Chorotega14 ± 3.335 ± 5.151 ± 6.0
South8 ± 4.138 ± 5.654 ± 6.9
North7 ± 3.827 ± 6.666 ± 7.2
Costa Rica9 ± 2.830 ± 4.661 ± 5.3
 (total) 

The analysis of the heterogeneity of the admixture estimates among the loci show significant distortion (X2, p < 0.05) in the blood groups ABO and P, with reference to the expected values using the admixture model. The ABO*B allele appears disproportionately in the general population of Costa Rica and in the Central region, while the P*P1 allele shows distortions in the general Costa Rican population, as well as in the Central, Atlantic, and Chorotega regions, in spite of previous findings that determined that both loci were in Hardy-Weinberg equilibrium.

The Principal Components analysis among the regions of Costa Rica reveals that the first PC accounts for 44.7% of the variability contained in the matrix, while the second PC explains another 30.9%, together accounting for 75.6% of the variability. Likewise, both the third and fourth PC explain slight variabilities, 12.3% and 12.1%, respectively. Plotting the results of PC 1 and 2 (Fig. 2), the existence of regional genetic heterogeneity could be appreciated: the coastal regions were similar, the Central region was located in the middle, and the North and South regions were discordant to each other. This is in agreement with the variations observed in the proportions of each ethnic contribution to the admixture process, according to geographic area.

Figure 2.

Principal Component (PC) analysis of the Costa Rican regional populations. The score values have been multiplied by 10. Abbreviations are: ATL Atlantic, CEN Central, CHO Chorotega, NOR North, SOU South.

Discussion

The results of this work agree with and confirm previous historical (Thiel, 1902; Gudmundson, 1978; Meléndez, 1982, 1985) and genealogical (Sanabria, 1957; Meléndez-Obando, 1993, 1997) studies of designating the Costa Rican population as a multi-origin hybrid. Other genetic studies of qualitative nature (Sáenz et al. 1974, 1980; Marín-Rojas & León-Sánchez, 1996) had already pointed in the same direction. Nevertheless, the non-European contribution to the present Costa Rican population, as determined in this study, and especially the Amerindian contribution, is much higher than expected. With respect to the African contribution to the Costa Rican gene pool, for the most part this must have occurred during the Colonial Epoch, an event which is confirmed by the wide diffusion of the African-specific alleles FY*0 (= FY*null) (Morera et al. 2001b), HB*S and HB*C (Sáenz et al. 1980) in the general population and in all regions of the country. As such, there has been a minor genetic contribution from the Afro-Caribbean population, coming principally from Jamaica, from 1872 to the first decades of the 20th Century (Meléndez & Duncan, 1989). According to the 1950 Census, only 2% of individuals were of Afro-Jamaican origin (Anonymous, 1953), although their cultural integration and admixture intensified after that date (Meléndez & Duncan, 1989).

The only previous attempt to quantify the admixture in Costa Rica was the unpublished work of Roberts (1978) who estimated, using the ABO and Rh(D) blood groups, a presence of 48% African genes, 40% European, and 12% Amerindian. However, that work had some theoretical and practical limitations, such as the use of a very general methodology and only 5 alleles of 2 loci, which were not highly informative. Nevertheless, if Robert's original data are reanalyzed with our noted trihybrid model, the discrepancies remain, since the African contribution is still elevated (41%) at the expense of the European (39%) and the Amerindian (20%). Of all the factors that can help explain the observed differences between both studies, the one with the greatest impact is the elevated genetic frequency of ABO*B, which is higher in Costa Rica (0.0769) than in Spain (0.0595) (Planas et al. 1966). This fact, also reported by Roberts (1978), contradicts the dilution of the frequency that is expected under the admixture model, and which is certainly observed for the ABO*A allele. On the other hand, even though in our study the frequency of the ABO*B allele also appears distorted with respect to the admixture model, its relevant weight in the determination of African ancestry is small compared to other more informative alleles and haplotypes, such as FY*0, Rh*(cDe) and Hp*1. Alternatively, some populations of Andalusia and Extremadura (Calafell, 1995) possess frequencies of ABO*B higher than those found in Costa Rica, and since this divergence could fall within the expected error of genetic frequencies a more detailed analysis is required.

Recently, Madrigal et al. (2001) carried out a study in Limón, on the Caribbean coast of Costa Rica (referred to in this study as the Atlantic region), using a different methodology (FST Analysis; Long, 1991). They studied the gene flow between the Afro-Limoneses (of Jamaican origin) and the Hispano-Limoneses (mestizos). Their admixture estimates for this mestizo group indicated an ancestry of 58.66% European, 33.83% Amerindian, and 7.51% African. Those results are notably similar to ours for the Atlantic region (Table 3), and it is possible to attribute the slight discrepancies found with the European and African contribution (6.1% and 6.0%, respectively) to differences in methodology.

If the estimation of admixture in Costa Rica is compared with the results obtained in other Latin American countries, for which similar studies have been carried out using the Krieger et al. method (1965), it is concluded that the Costa Rican population is similar to other Latin American populations, like that of Brazil (Krieger et al. 1965; Schneider & Salzano, 1979; Salzano, 1982) and of certain cities of Mexico (Lisker et al. 1988, 1990, 1995). It is relevant to comment that countries like Costa Rica and Uruguay, which at some time were officially reputed to have essentially European populations, are in reality trihybrids, similar to that of other countries whose populations are commonly accepted as mestizos, differentiated only by the proportions of gene flow from ancestral populations. This result confirms the position of Sans et al. (1993) that studies of gene admixture make it necessary for us to revise the concept we have of national identity.

The present study provides objective information about the genetic heterogeneity among the distinct regions of Costa Rica, estimated using two different, but complementary, methodologies. Thus, the observed variations in the contribution of each ethnicity by geographic area correlates with the heterogeneity calculated using the PC analysis, for which both methods must in part be reflecting the same phenomenon: the historic process that constructed the current population by the admixture of Amerindians, Spaniards and Africans. Claims of a supposed homogeneity of the Costa Rican population, or more specifically of the Central Valley of Costa Rica, was officially reported by politicians and intellectuals in the last decades of the 19th Century (Biolley, 1892), when they tried to consolidate a national identity which was separate and different from the rest of Central America, and this misconception survives even in the national historiography (Acuña-Ortega, 2001). One of the principal architects of the present-day Costa Rican state synthesized the idea of describing Costa Rica as a country with a “white and homogeneous ethnic composition” (Facio, 1942). Even though this concept had wide local acceptance for most of the 20th Century, it was never formally tested using methodologies and biological information, prior to this study, for the entire country. In spite of its mythological origin (Meléndez-Obando, 1999; Acuña-Ortega, 2001), the supposed homogeneity of the Costa Rican population has been embraced by numerous recent genetic studies of gene mapping, and sustained by a wide variety of modes of communication (Uhrhammer et al. 1995; Freimer et al. 1996b; McInnes et al. 1996; Reus & Freimer, 1997; Telatar et al. 1998a; DeLisi et al. 2001; Mesén, 2001; Service et al. 2001). Nevertheless, the results of the present study do not support the proposal of a genetic homogeneity of the Costa Rican people, nor specifically of the population of the Central Valley of Costa Rica, but rather indicate a much more complex situation.

Even though the existence of a supposed founder effect has been the favourite explanation for the linkage disequilibrium observed in various studies of gene mapping (Saborío 1992; Escamilla et al. 1996; Freimer et al. 1996a; Freimer et al. 1996b; McInnes et al. 1996; Shah et al. 1997; Escamilla et al. 1999; Frants 1999; Garner et al. 2001; Service et al. 2001), other causes such as natural selection or admixture are equally plausible, and at least this latter option is more in accordance with the genetic history of this population. The genetic heterogeneity found in certain specific cases does not contradict the possibility of founder effects, mainly at a local level (i.e. Solís, 2000; Leal et al. 2001).

An understanding of the differential ethnic contribution in the distinct regions of the country would help focus the search for simple and complex genetic factors responsible for hereditary illnesses. In the same way, it would aid genetic-epidemiological studies concerned with elucidating the role of natural selection in the etiology of complex ailments (whose prevalence varies among continental groups), such as Non-Insulin Dependent Diabetes Mellitus (NIDDM) in people of Amerindian ancestry (Chakraborty & Weiss, 1986; Garza-Chapa et al. 2000) to arterial hypertension in individuals of African ancestry (MacLean et al. 1974; Sichieri et al. 2001). In fact, our results suggest that it is probably erroneous to assume an European origin a priori for each trait or disease found in the Costa Rican population. Thus, for example, probably the most frequent genetic mutation in all of Costa Rica is HB*S, of known African origin (Cavalli-Sforza et al. 1994; Vogel & Motulsky, 1996). It is possible that new studies on the Costa Rican population, particularly those using DNA analysis and considering in great detail the diverse components of the population structure, will provide better evidence to define the situation.

Acknowledgments

H. Krieger and J. Lobo generously provided us with the MISTURA program and offered their guidance during the development of this investigation. This work was sponsored by the Universidad de Costa Rica; Grant No. 111-90-068 and 111-97-522. B.M. received a grant from the Agencia Española de Cooperación Internacional (AECI).

Received: 28 May 2002

 Accepted: 11 September 2002

Ancillary