Adaptation to milking agropastoralism in Chilean goat herders and nutritional benefit of lactase persistence

Abstract The genetic trait of lactase persistence (LP) evolved as an adaptation to milking pastoralism in the Old World and is a well‐known example of positive natural selection in humans. However, the specific mechanisms conferring this selective advantage are unknown. To understand the relationship between milk drinking, LP, growth, reproduction, and survival, communities of the Coquimbo Region in Chile, with recent adoption of milking agropastoralism, were used as a model population. DNA samples and data on stature, reproduction, and diet were collected from 451 participants. Lactose tolerance tests were done on 41 of them. The European −13,910*T (rs4988235) was the only LP causative variant found, showing strong association (99.6%) with LP phenotype. Models of associations of inferred LP status and milk consumption, with fertility, mortality, height, and weight were adjusted with measures of ancestry and relatedness to control for population structure. Although we found no statistically significant effect of LP on fertility, a significant effect (P = 0.002) was observed of LP on body mass index (BMI) in males and of BMI on fertility (P = 0.003). These results fail to support a causal relationship between LP and fertility yet suggest the idea of a nutritional advantage of LP. Furthermore, the proportion of European ancestry around the genetic region of −13,910*T is significantly higher (P = 0.008) than the proportion of European ancestry genome‐wide, providing evidence of recent positive selection since European–Amerindian admixture. This signature was absent in nonpastoralist Latin American populations, supporting the hypothesis of specific adaptation to milking agropastoralism in the Coquimbo communities.


Section 1: Communities of admixed goat herders in semiarid Chile
The agro-pastoralist groups under study are goat herders in a semi-arid region of Chile, who were selected as a model population because of their pastoralist livelihood, their high dependency on livestock and milk, and their mixed European/Amerindian ancestry and thus variable lactase persistence status. These groups have developed a set of practices over the last 400 years or so, which have been described by social scientists as a 'notable adaptation to their ecological conditions', such as their system of land management, collective property, and territorial organisation, their reliance on multiple sources of income which includes transhumant pastoralism (Gallardo 2002;Alexander 2008). The groups, locally known as "Agricultural Communities" are 180 scattered communities consisting altogether of around 30,000 people. They collectively own approximately 1,000,000 hectares of land with poor irrigation, little arable land and small carrying capacity. The region where these populations are settled is a transitional zone between the Atacama Desert, one of the driest places on Earth, and the Chilean central valleys. The zone has an average annual rainfall of 100-200 mm and constant threat of droughts. The region of Coquimbo, where most of these communities are settled, is a narrow area between the Pacific Ocean and the Andes Mountain Range, covering around 40,000 km 2 extending from 29°S to 32°S around meridian 71°W. Terrain is defined by a pronounced slope of incremental altitude towards the Andes (see Figure S1). This area has west-east oriented mountain ridges below 3,000 metres, and is cut by the three main rivers, named, from north to south: Elqui, Limarí and Choapa which lend their names to the surrounding valleys.

History
From 1470 until the arrival of Europeans in 1537, local native groups of Llama herders in the Coquimbo Region were invaded, and eventually conquered, by the Inca Empire. The native population was likely to have been diminished by this invasion, but there were still native settlements by the time of the first contacts with Spanish groups. Shortly after Spanish arrival the pandemic spread of diseases among Native Americans led to further decline. As in most of Latin America, European migrants were almost all male, so a fast process of admixing started.
The main activity in the southern part of the Spanish Empire was mining and included he 'Norte Chico' for which the region of Coquimbo became the main source of food and fuel for the people working in the mines. It was during this period that livestock and cattle were introduced (Gallardo, 2002). All this led to further land deterioration due to intensive farming, overgrazing, and logging.
There are several competing historical hypotheses to explain the origin of the system of communal property in the area (summarised by Gallardo 2002), as opposed to the development of the large Haciendas and Ranchos developed elsewhere in Spanish America, but it is generally agreed that by the end of the 18th century land deterioration made the system of large-scale production counterproductive. Since then, agriculture gradually lost its economic importance in favour of livestock rearing; and the colonial system of land tenure, economy and subsistence changed to the organisation distinctive of the Agricultural Communities today.

Population size
The population sizes for each collection site as obtained from a special bulletin of the Chilean National Institute of Statistics (INE) devoted to the Agricultural Communities of the Coquimbo region (Vergara et al., 2005) is shown in Table S1. These data are based on the Chilean National Census of 2002, which although outdated, is the most recent reliable data source available ** . This transhumant pastoralism has now become somewhat more restricted, the distance travelled being less. This is partly because selling and production of cheese is becoming more difficult because of hygiene regulations (Alexander, 2008) and partly because school is compulsory. Today most of the milk consumed is industrially produced cow's milk bought in stores, and milk consumption is similar to that of other Chilean rural populations (see Fernández et al. 2015). Nonetheless, at least until very recently, milk and dairy products were the dominant part of the daily intake of food in these groups for around five months every year, a diet that is somewhat surprising in view of the historic lactase non-persistence in Amerindians.

Section 2: Recruitment and Demographic Profile
The study volunteers were recruited from the Agricultural Communities of the Chilean region of Coquimbo in South America, a set of populations featuring milking pastoralism of recent adoption.

Data collection
451 adult volunteers were recruited in 9 villages and hamlets from the Coquimbo region (Table S1).
They were invited to participate after being fully informed about the project and the security of their personal data, and written consent was obtained for each of the participants who agreed to have their data analysed as part of this study. A demographic profile based on our sample of 451 participants can be found in Table S2.
As is normally the case in anthropological studies in isolated areas, villages were selected strategically according to population size and accessibility, and participants were all volunteers in compliance with ethical standards. Therefore, biases from sampling relatives and self-reported lactose intolerant individuals could not be avoided at this stage, but genetic methods, described in the main text, were adopted to account for their effects.
Local people from each community were approached during the first days of fieldwork before any recruitment, in order to let them know about our work there. Throughout this phase informal interviews were conducted to collect and record general information about the area, trends in migration and demography, dietary habits, goat rearing practices, milking, and milk processing.
Afterwards suitable prospective participants were approached, the details of our project were explained, and information sheets were provided. Suitable meeting space in communal venues was arranged in each village to interview those who agreed to participate.
For strategic reasons (time and complexity of the tests) all the LTT phenotype data collection (n=41) was done in the same village (Barraza) selecting non-smoker volunteers who were not currently undergoing treatment with antibiotics and prepared to undergo an overnight fast.
Questionnaires administered by the first author and trained interviewers were used to collect information as to the length of residency, residency at birth, place of origin of parents and grandparents, details of children, their birth-places, and date of death when applicable. Ownership of assets, livestock, communal rights, access to services and milk product consumption were also recorded. Participants reported whether they or their household have access to certain goods and services (such as tap water, electricity, etc. See Table S2). These data were used to measure wealth by processing all assets using a multiple correspondence analysis, to get a factorial weight based on the contribution to the inertia at the first principal axis of each item (Asselin and Anh 2008).
Interviews were carried out at local communal facilities (e.g. sports clubs, community associations, etc.), or at the house of the interviewed volunteer, according to their preference.
Quantities and frequencies of glasses of milk consumed were used to estimate milk consumption as numbers of glasses per day. The paper records were entered into a database using EpiData Entry (Lauritsen and Bruus 2008)

Lactose Tolerance Testing
Breath hydrogen levels were measured using a breath hydrogen monitor (MicroH2 Micromedical Ltd.). After an overnight fast and measuring baseline levels, subjects were given a load of 50 g of lactose in 250 ml of water. Afterwards, breath hydrogen levels were recorded at intervals of 30 minutes. A given test was stopped after two sustained increments of 20 ppm above baseline, when lactase persistence status was assessed as a lactose non-digester. Individuals showing no substantial rise in breath hydrogen after 3 hours were classified as lactose digesters. The phenotypes of subjects showing fluctuating levels of breath hydrogen were classified as indeterminate and those whole failed to produce breath hydrogen throughout the test as hydrogen non-producers (i.e. do not have appropriate hydrogen-producing colonic bacteria).

Section 3: Genetic markers used in this study
Samples of buccal cells were collected from cotton swabs to perform DNA extractions by modified versions of methods based on phenol/chloroform (Freeman et al. 2003) and salting out precipitation (Quinque et al. 2006). A segment of 706 bp of the LCT enhancer region (MCM6, intron 13) was amplified by PCR using primers MCM6i13 and MCM6778 described by Ingram et al. (2007).
Samples were sequenced in both directions on an ABI 3730xl DNA Analyzer (Applied Biosystems) by the UCL Centre for Comparative Genomics, using the Sanger Method.  (Table   S3.1) included 2 SNPs (rs3754689 and rs2278544) useful for identification of the core haplotypes described by Hollox et al. (2001) and rs182549 located at −22kb known to be in very high LD with −13,910C>T. Both sets of SNPs (30 AIMs and the 27 SNPs on chromosome 2) were typed by LGC Genomics (Hoddesdon, UK) using KASP chemistry. of rs4988235 (−13,910C>T, marked with an asterisk). The most frequent haplotype occurs 10 times.
The 34 most frequent haplotypes are the same as the most frequent 'A' core haplotype detected in Europeans with the same markers. Note that the STR markers do not reflect the same clustering as the AIMs and are clearly not detecting continental ancestry, but more likely more recent geographic differences in ancestry, which are likely to be the product of increased relatedness within villages.  Table S1.) have higher BMI than males in all but two villages. Note that the villages Gualliguaica and La Polvada have very small sample sizes (See Table S1.)

Haplotype-based local ancestry analysis
In an attempt to further validate this analysis using whole-genome high-density genotype data from the 1000 Genomes database (The 1000 Genomes Project Consortium 2012), the admixture-based analysis above was replicated with haplotype-based local ancestry estimates using high-density genotype data. As the high-density SNP panel is able to resolve all the three continental ancestries (Maples et al. 2013;Rishishwar et al. 2015), this analysis could be performed on all three mainland For whole-genome ancestry estimation, the merged dataset was LD-pruned using PLINKv1.9 (resulting in 150,858 SNPs being retained), and supervised Admixture (k = 3) was performed.
For local ancestry estimation, haplotype-based method RFMix (Maples et al. 2013) was used to provide higher accuracy in ancestry assignments using LD, and also meaning that more SNPs can be used towards the inference, as LD-based pruning is not required.
The genotype data (of both admixed and reference individuals together) was first phased using the software SHAPEIT2 (Delaneau et al. 2012) with default parameters. Local ancestry assignments were performed using the software RFMix to infer local continental ancestry in the subset of phased admixed individuals.
As reference continental panels, we used 175 Amerindians (from Chacon-Duque et al. 2018), 107 IBS (Iberian populations in Spain) and 101 YRI (Yoruba in Ibadan, Nigeria) individuals from The 1000 Genomes Project. We ran RFMix with the phase correction feature enabled and performed two rounds of the EM algorithm. We used the default settings except the number of reference haplotype per tree node, which was set to 5. This was done to take into account unbalanced reference panel sizes in the random forest algorithm, as recommended by Maples et al. 2013. RFMix assigns local continental ancestry to each allele of each admixed haplotype, allowing for errors in genotyping, slight admixture in the reference samples, etc. For each locus across both haplotypes the posterior probabilities were converted to the most likely ancestry and aggregated to estimate the proportion European ancestry for that person.
Average local European ancestry (obtained via RFMix) of the Lactase region under study (1Mb either side of rs4988235) was compared to the average genome-wide European ancestry (obtained via Admixture) using the same one-sided Wilcoxon signed rank test. All p-values were again nonsignificant: 0.4583 for MXL, 0.3032 for PEL, and 0.4467 for CLM.

Power calculations
In order to determine whether the Admixture Method to determine local ancestry had sufficient Power to detect a 3% difference in local ancestry (delta ancestry) in the control populations, simulations were done in which the sample numbers and ancestry proportions and distributions mirrored those groups. Power was estimated to be 0.2683 for MXL and 0.8034 for PEL.
Using the haplotype based method RFMix to determine local ancestry, simulations show that an increase of 3% European ancestry locally would be detected with power 0.3836 for MXL, 0.4750 for PEL and 0.6562 for CLM respectively. The variations in power depend on the varying sample sizes of the three groups.