Genetic portrait of the Amazonian communities of Peru and Bolivia: The legacy of the Takanan‐speaking people

During the colonial period in South America, many autochthonous populations were affected by relocation by European missionary reductions and other factors that impacted and reconfigured their genetic makeup. Presently, the descendants of some “reduced” and other isolated groups are distributed in the Amazonian areas of Peru, Bolivia, and Brazil, and among them, speakers of Takanan and Panoan languages. Based on linguistics, these peoples should be closely related, but so far no DNA comparison studies have been conducted to corroborate a genetic relationship. To clarify these questions, we used a set of 15 short tandem repeats of the non‐recombining part of the Y‐chromosome (Y‐STRs) and mitochondrial DNA (mtDNA) control region sequence data. Paternal line comparisons showed the Takanan‐speaking peoples from Peru and Bolivia descended from recent common ancestors; one group was related to Arawakan, Jivaroan, and Cocama and the other to Panoan speakers, consistent with linguistics. Also, a genetic affinity for maternal lines was observed between some Takanan speakers and individuals who spoke different Amazonian languages. Our results supported a shared ancestry of Takanan, Panoan, Cocama, and Jivaroan‐speaking communities who appeared to be related to each other and came likely from an early Arawak expansion in the western Amazonia of South America.

in the Peruvian rainforest ecoregion, about 51 languages are spoken, including Kichwa (Quechua from Andean populations), which was introduced by the Jesuit and Franciscan missionaries and used as a lingua franca. The Panoan (Shipibo-Conibo-Cashibo), Cocama, and Piro (or Yine, an Arawakan group) are distributed mainly along the Ucayali region (part of selva central), while Jivaroan, Shawi, and Kechwa-Lamista are in the San Martín and Loreto Departments. Furthermore, several Arawakanspeaking communities (such as Yanesha, Ashaninka, and Machiguenga) live in the ceja de selva, ranging from Pasco to Cusco Departments. We previously described the reductions established by missionaries in several populations of San Martín and Loreto Departments  and this scenario was similar for all Amazonian tribes. DNA studies have shown genetic relatedness between the Jivaroan, Panoan, Arawakan, Cocama, and others (Barbieri et al., 2014;Di Corcia et al., 2017;Sandoval et al., 2016).
In Bolivia, there are about 33 lowland ethnic groups distributed mainly in Beni, Pando, and Santa Cruz Departments. Most names of these groups are associated with Catholic missions, like Santiago de Chiquitos or Ascención de Guarayos. At the time of Spaniard military incursions (1538-1671), the Llanos de Mojos (in Beni Department) was inhabited mostly by agriculturalists like Arawakspeaking communities (nowadays also called "mojeños," and including the Trinitario, Ignaciano, Javeriano, Loretano, Joaquiniano, or groups with local names such as Baure) (Block, 1994;Eriksen, 2011). In 1587, the first wave of missionaries arrived in the Mojos and Santa Cruz regions from Peru and Paraguay (Román-López & Castro-Mojica, 2016). Between 1682 and 1744, the Jesuits founded about 25 villages, although after their expulsion in 1767, some villages were relocated, and some newly created (Block, 1994). In 1754, at least 16 ethnic groups were registered at San Ignacio de Mojos village (Bert et al., 2004). By contrast, in the Llanos de Mojos and near the Andean foothills, some ethnolinguistic groups remained isolated, such as Canichana, Movima, Cayubaba, Itonama, Yuracare, and Tsimane (Adelaar & Muysken, 2004;Erikssen, 2011).
Despite the widespread admixture, it seems that the Tupian-speakers like Guarayo (descendants of Guaraní), Siriono (or Chori), Guaraní (scattered in Beni and Santa Cruz Departments), and Chiriguano (a pejorative name given by the Incas, also called 'Guaraní' in colonial times) resisted Spanish incursions (Block, 1994;Métraux, 1942;Nordenskiold, 1917;Sala et al., 2019). The nomadic Siriono families avoided contact with foreigners and remained isolated, although some individuals were brought to missions such as the Chiquitos and Yuracare villages (Lehm et al., 2004). By 1984, the Siriono population was estimated as 265 inhabitants (Lehm et al., 2004). Several tribes were subservient to others; for example, the subjugated Tapi-ete group adopted a Tupian language from the Chiriguano tribes (Brown et al., 1974).
The Boreal Chaco was also probably a refuge for some hunter-gatherer' groups such as the Wichi (Matacoan-Guaycuru language, known as Weenhayek in Bolivia) and Ayoreo (Zamucoan language), an isolated group in Bolivia (scattered clans in nearby localities), which is closely related to the "Moro" or Ayore tribe from Paraguay (Brown et al., 1974;Pérez-Diez & Salzano, 1978). With the exception of Ayoreo, it has been shown that most populations of Gran Chaco are genetically homogenous (Demarchi & García-Ministro, 2008). Moreover, after 1870, during the rubber-boom period, there was displacement (and enslavement) of several ethnic groups within the Peruvian-Brazilian-Bolivian tropical lowlands (Román-López & Castro-Mojica, 2016). Despite this complex demographic scenario, several native Amazonian groups have still preserved part of their original culture and traditions.
Recent ancient and modern DNA studies identified genealogies and connections between South American populations (Barbieri et al., 2019;Borda et al., 2020;Di Corcia et al., 2017;Gnecchi-Ruscone et al., 2019;Nakatsuka et al., 2020;Sandoval et al., 2016). However, many questions remain, particularly on the genetic relationships between Takanan and Panoan-speaking communities who have primarily settled between the Amazonian borders of Peru, Bolivia, and Brazil (Dixon & Aikhenvald, 1999;Métraux, 1942). The Takanan-speaking villages (also called "Esse ejas" in Perú) are scattered on the shores of the Madre de Dios, Tambopata, Heath, and Malinowski rivers (Department of Madre de Dios, Peru); Madre de Dios, Beni, Madidi, near Portachuelo Alto, and Bajo, Riberalta and Nueva Villanueva (Departments of Beni, Pando and La Paz, Bolivia). Also, there are some isolated groups near the Abuná River (a tributary of the Madeira River in Acre-Rondonia, Brazil), which are related to the Araona language (Takanan linguistic family). On the other hand, the Panoan-speaking communities are distributed in the Departments of Ucayali, Madre de Dios, Huánuco and Loreto (Perú), Departments of Pando and Beni (Bolivia) and in the States of Acre, Rondonia and Amazonia of Brazil (Dixon & Aikhenvald, 1999;Eriksen, 2011;Métraux, 1942). Takanan-and Panoan-speaking communities appear to have a deep demographic relationship. In this regard, Panoan and Takanan linguistic families share phonological, morphological, and lexical characteristics (Adelaar & Muysken, 2004;Aikhenvald, 2012;Valenzuela & Guillaime, 2017), but controversies persist on whether corresponding populations are genetically related as there is a lack of comparative DNA patterns to corroborate or refute this hypothesis (Dixon & Aikhenvald, 1999).
Within this context, to investigate genetic relationships between Takanan and Panoan-speaking communities, we compared data from uniparental DNA markers (Ychromosome-paternal line, mtDNA-maternal line) and data from communities in Llanos de Mojos and other Amazonian populations, which were used as references. Following this framework, Wichi and Ayoreo communities from Bolivia were also included to understand their interrelationships or gene flow with lowland populations.
In addition, for Y-STR comparisons, Parecis-speakers (n = 10) from Brazil were included.
A map showing the approximate linguistic distribution of Takanan communities of Peru and Bolivia and other references are shown ( Figure 1).

Y-chromosome
A list of 17 Y-STR haplotypes from Native American sublineages in study populations is shown in Table S1a (including data from Jota et al., 2016). Also, geographical location and linguistic affiliation details are shown in Table S1b. Phylogenetic analyses mainly showed two subgroups of Takanan-speaking families, both from Peru (Takanan_PE) and Bolivia (Takanan_BO) and labeled as T1 and T2, respectively (Figures 2 and S1). From Figure 2, subgroup T2, which was part of the Takanan family of Peru and Bolivia (n = 16), included one Shawi from Northwest Amazonia Peru, one Chiquitano, which was connected to one Yine (Ucayali, Arawakan), and one Kichwa from the Loreto region (Jivaroan). This T2 subgroup was linked to Panoan-speaking individuals, along with some Arawaks and several Jivaros (e.g., only two mutation steps between four Shipibo-Conibos and eight Takanas from Peru, when compared between Peruvian populations only ( Figure S2)) and suggesting a shared genetic affin-ity. We identified an individual who spoke the Canichana language who was related to the Panoan group. Furthermore, some Panoan-speaking individuals outside the main branch (Shipibo-Conibo, Amahuaca, Yaminahua, and Cashibo) were related to Arawakan individuals (only one mutation step ( Figure S2)). Also, two Cocamas and one Shipibo-Conibo shared a common haplotype according to historical demography interactions.
The larger subgroup of the two agglomerates of Takanas (T1, n = 47, which included one Reyesano) was related (only two mutation steps) to six individuals; four Arawakan-speaking individuals from Bolivia (Arawakan_BO) and two Jivaroan-speaking individuals from Peru (Figures 2 and S1). This group was related to one Cocama, two Arawaks from Bolivia, and 10 Arawakspeaking individuals from Brazil (Arawakan_BR, Parecis) ( Figure 2).
In another branch of the phylogenetic tree, four Takanan-speaking individuals from Bolivia and one Tsimane shared a common haplotype ( Figure S1). It should be noted that an individual from Takanan_BO was linked (by only two mutation steps) to a group of Arawaks from Peru (Arawakan_PE, especially Yanesha, n = 18). Likewise, among the Reyesano population, practically half were related to the Takanan (mainly to T2) and the other half was dispersed in the phylogenetic tree. This second group was connected to individuals from other populations, as two Reyesanos shared different haplotypes with two individuals speaking the Arawakan language of Bolivia and one with Canichana ( Figure S1).
In other observations, despite a genetic affinity between most individuals with the Tupian language (including Guaraní, Guarayo "y," Siriono "s," and Tapiete "t"), other individuals were scattered across branches like the Chiquitano, such as Arawak language speakers ( Figure S1). Also, genetic affinity was observed between the majority of Tsimane (n = 9) and Moseten-speaking (n = 22) individuals and one Yuracare-speaking individual. They were closely related to three Movima, one Canichana, and one Reyesano-speaking individuals. Likewise, most Guarayos (n = 28; Figure S1) shared haplotypes with four Arawakspeakers (Arawakan_BO), including one Canichana, two Itonama, and one Movima. They also showed affinities with five Guarayos, three Guaraní-speakers, and one Chiquitano. Moreover, four Guaraní individuals were linked to a cluster of Panoan individuals (only one, two, or three mutation steps) (Figure 2).
In terms of genetic variability at the population level (comparing 21 populations), AMOVA analysis in a PCoA plot (Figure 3) showed that Takanan populations, from both Peru and Bolivia, were close to each other and also to Jivaroan-speaking populations of Peru and Arawakanspeaking populations of Bolivia (Arawakan_BO). Furthermore, with the exception of Jivaroan and Arawakan_PE, the genetic distance gradient between Peruvian popu-lations corresponded with their respective geographical areas. For the Panoan population, the genetic diversity profile was closer to Bolivian than Peruvian populations. However, when observing spatial configurations between Peruvian populations, the Panoan were a little closer to Jivaroan than Takanan populations, which were analogous to analyses at the individual level (cluster T2). Also, populations distant from the central conglomerate had a very low haplotype diversity, such as Tsimane (h = 0.4189) and Yuracare (h = 0.6159) (Table S2).

MtDNA
Genetic analyses were performed using only Native American A2, B2, C1/C*, and D1/D4h3a mtDNA lineages. A list of mtDNA control region haplotypes with variants scored with respect to rCRS is shown (Table S3a) and haplogroup frequency data are also shown (Table S3b). From the maternal line, phylogenetic relationships showed that several individuals from different linguistic families shared haplotypes, indicating common ancestries. For example, for the A2 lineage (Figure 4), 14 Takananspeaking individuals from Peru shared a haplotype with two Takanan from Bolivia and two Arawakan from Peru, which in turn were related to some Panoan, one Cocama, and several Yuracare, Jivaroan, and Chiquitano speak-  ers, including one Cayubaba and one Takanan speaker from Bolivia (n = 14, core central group sharing the same haplotype).
In C1/C* lineages (Figure 6), the Takanan of Peru and Bolivia shared some haplotypes, which through a Jivaro (only one mutation step) are related to the core central group (n = 25, Jivaroan, Shawi, Arawakan_PE, Takanan_BO, Reyesano, Moseten, Canichana, Arawakan_BO, and Yuracare) that share a haplotype. Also, other distinct haplotypes were shared between individuals with different languages. In particular, one Panoan and one Arawakan_PE shared a haplotype.
Finally, for D1/D4 lineages (Figure 7), the Takanan and Panoan did not share haplotypes with other individuals, unlike the core central group where a haplotype was shared by other linguistic groups (n = 11, Arawakan, Cayubaba, and Tupian). However, at the population level (comparing 21 populations), AMOVA results in a PCoA plot (Figure 8) showed that the Takanan of Peru (Takanan_PE) were differen-tiated from other populations (e.g., Wichi and Ayoreo). In contrast, the Panoan population was closer to Shawi. Furthermore, the Arawakan population of Peru was more closely linked to Jivaroan and Cocama than other populations. Finally, the isolated Ayoreo population revealed a very low haplotype diversity (h = 0.2857), followed by Tsimane (h = 0.5824) and Moseten (h = 0.6444) (Table S4).

DISCUSSION
Based on linguistic features, the Panoan and Takananspeaking peoples should be closely related, but until now this hypothesis was controversial due to a lack of comparative DNA studies confirming or refuting the notion (Dixon & Aikhenvald, 1999). Moreover, a more general hypothesis of the expansion of the Arawak language in South America approximately 2.4 thousand years ago (kya) remains unclear (Eriksen, 2011). Thus, in this study, we investigated these issues and included several tropical lowland populations from Peru and Bolivia and an Arawakan community from Brazil as references as well as the Wichi and Ayoreo communities from Bolivia in the general framework.
Our results indicated the Y chromosomes from a Takanan-speaking subgroup (T2 in Figure 2) were linked to Panoan-speaking communities. However, the Panoan  and most individuals of the Takanan T1 subgroup were more closely related to Jivaroan and Arawakan-speaking individuals. Moreover, the observation that one Cocama individual close to T1, and two Cocama and one Shipibo-Conibo speaking individuals shared the same haplotype ( Figure S2) indicated a close genetic affinity. This may have been due to remaining gene flow traces during ancient pre-Columbian times, or more recently the Cocama-Shipibo confederation against the priest missions, particularly in coalition with the Arawaks and Quechuas in the rebellion led by Juan Santos Atahualpa in 1742 in the central Amazonia of Peru (Adelaar & Muysken, 2004;Aikhenvald, 2012;Ludescher, 2001). These demographic interactions could partially explain the "lexical loans" in Panoan and Ashaninka, including Quechua, an imposed lingua franca in Amazonia (Fleck, 2013;Métraux, 1942). This also putatively explains the genetic affinities between three Panoan-speaking individuals and several Arawakanspeaking individuals from Peru.
Our previous studies on Amazonian populations in Peru also showed that one Shipibo-speaking individual not only shared a haplotype with the Cocamas, but also with two individuals from the Loreto region and one Kechwa-Lamista (an admixed Amazonian group) individual from the San Martín region . Remarkably, this observation showed that Takanan (from Peru-Bolivia) and Panoan (from Peru) speakers were closely related to Arawakan and Jivaroan speakers, including Cocama (Tupian).
Similarly, our results were consistent with other studies at the genomic scale (via SNP chip analysis); where Panoan-speaking communities (Shipibo-Conibo, Cashibo, Matsés, Nahua) shared similar genomic profiles with Arawakan-speaking populations (Ashaninka, Yanesha), including Jivaroan groups (Huambisa, Awajun, Candoshi), Kechwa-Lamista, Piapoco (Arawakan from Colombia), Guarani, and others (Barbieri et al., 2019;Borda et al., 2020;Gnecchi-Ruscone et al., 2019;Nieves-Colón et al., 2020). The wide spatial distribution of this gene pool coincides with a hypothesis on the expansion of the Arawak language in South America, likely via the Madeira and Amazon rivers, which occurred approximately 2.4 kya left archaeological vestiges in areas of Llanos de Mojos (Bolivia) and 2.2 kya in the Ucayali river, Peru (Eriksen, 2011). This population expansion could be correlated with the dispersion of some domesticated plants in the last 6-4 kya (Clement et al., 2015). During this time, the following generations likely differentiated from their main linguistic branch in a similar way to the expansion history of the Macro-Je or Tupi-Guaraní (Tupian) linguistic groups in the central and southeastern parts of the Amazonia of Brazil (Aikhenvald, 2012;Eriksen, 2011;Ramallo et al., 2013).
Our interpretations are supported by the genetic affinity between ten Pareci from Brazil (Arawakan-speaking tribe) and several Arawakan-speaking individuals from Bolivia, who are related to the largest group of Takanas (T1). This observation agreed with linguistics and indicated that Saraveka (extinct Arawakan language from Bolivia) was closely related to the Pareci language (Adelaar & Muysken, 2004;Métraux, 1942).
In terms of the maternal line, our data suggested that in the Bolivian and Peruvian Amazonia region, high gene flow occurred. For example, some motifs in the mitochondrial control region corresponding to B2 lineage (73-263-499-16189-16217-16519) are widely shared between many Amazonian groups in South America, regardless of their linguistic affiliation, including Andean groups (Mazières et al., 2008;Ramallo et al., 2013;Sandoval et al., 2018). In addition, although some Takanas are differentiated from other individuals, one shared a haplotype with some Arawaks (from Peru and Bolivia). For the A2 lineage, a shared haplotype existed between one Arawakan (Arawakan_PE) and some Takanan, and was related to three Panoan individuals, one Cocama and one Jivaro, suggesting a shared genetic affinity. This observation was consistent with a previous study showing two different shared haplotypes between Arawakan and Panoan groups (Di Corcia et al., 2017). Likewise, the core central group, which is shared between a Takanan (Takanan_BO), Yuracare, Chiquitano, Jivaroan, and Cayubaba, reflects a common origin. These individuals and the Takanan are linked through one Panoan-speaking individual who is closely related to them. Another similar case was identified by the C1 lineage, where several individuals from eight linguistic groups (core central group composed of Takanan, Arawakan, Shawi, Jivaroan, Yuracare, Canichana, Moseten, and Reyesano) shared the same haplotype, indicating they were derived from common ancestors. Also, a similar picture of shared haplotypes by different linguistic groups in some Beni region populations was previously reported (Bert et al., 2004).
Among other studied populations, our results showed that the maternal haplotype diversity of Ayoreo-speaking community has been dramatically impacted by genetic drift (h = 0.29), a picture that coincides with previous genetic studies of Gran Chaco populations (Demarchi & García-Ministro, 2008). In the paternal line, like Tsimane (h = 0.42), the Siriono showed the lowest haplotype diversity (h = 0.41; n = 21, only three different haplotypes, data not shown in the Table S2), followed by the Guarayo (h = 0.43, data not shown in the Table S2). These groups were separated from other populations (data not shown in the PCoA plot), which was consistent with the picture described at the individual level and with historical records (Lehm et al., 2004;Métraux, 1942). The isolated populations likely were impacted by lower gene flow or bottlenecks. Meanwhile, at the individual level, one Guaraní-speaker shared a haplotype with one Wichi. Regarding this, an Argentinian Wichi community also shared haplotypes with other Matacoan-speaking groups such as Toba and Pilagá in the Gran Chaco region (Sala et al., 2019). Thus, both patrilineal and matrilineal analyses indicated close interrelationships and acculturation occurring in the lowlands of Bolivia, southern Brazil, western Paraguay and northern Argentina in congruence with ethnographical, linguistic, and genetic studies (Bert et al., 2004;Block, 1994;Brown et al., 1974;Demarchi & García-Ministro, 2008;Eriksen, 2011;Métraux, 1942;Ramallo et al., 2013).
Although some populations were isolated due to historical or socio-political reasons, for example, the distant position of the Tsimane and Yuracare Bolivian communities in two-dimensional PCoA spaces (paternal and maternal lines), we also showed that in the Peruvian and Bolivian Amazonia, there was intense inter-population migration in line with historical records.
Finally, we suggest that the Takanan, like Panoan, Cocama, and Jivaroan speakers, were derived from ancient Arawak groups that were spread through the western Amazonia of South America, providing a plausible explanation that overlaps with archaeological and linguistic arguments.

A C K N O W L E D G M E N T S
We thank all volunteers who donated biological samples to the previous South American Genographic Projects. We also thank Oscar Acosta, Pedro Paulo R. Vieira, David Soria H., Daniela Arteaga, Alejandro Estrada, and Donaldo Pinedo for their support in fieldwork. This study received funds from the National Geographic Society (The Genographic Project-South America) and grants from USMP (E10012019010) of Peru, FAPEMIG and CNPq of Brazil.

C O N F L I C T O F I N T E R E S T S TAT E M E N T
The authors declare no conflict of interest.

D ATA AVA I L A B I L I T Y S TAT E M E N T
The Y-STRs and mtDNA data are available in the Supplementary Material files: Figures S1 and S2 and Tables S1-S4.