America's red gold: multiple lineages of cultivated cochineal in Mexico

Cultivated cochineal (Dactylopius coccus) produces carminic acid, a valuable red dye used to color textiles, cosmetics, and food. Extant native D. coccus is largely restricted to two populations in the Mexican and the Andean highlands, although the insect's ultimate center of domestication remains unclear. Moreover, due to Mexican D. coccus cultivation's near demise during the 19th century, the genetic diversity of current cochineal stock is unknown. Through genomic sequencing, we identified two divergent D. coccus populations in highland Mexico: one unique to Mexico and another that was more closely related to extant Andean cochineal. Relic diversity is preserved in the crops of small-scale Mexican cochineal farmers. Conversely, larger-scale commercial producers are cultivating the Andean-like cochineal, which may reflect clandestine 20th century importation.


Introduction
Domesticated cochineal (Dactylopius coccus) is a New-World scale insect cultivated for carminic acid, a potent scarlet dye used to color textiles, cosmetics, and food (Ch avez- Moreno et al. 2009). With the use of mordants and adjuncts, carminic acid dyes produce colors ranging from pinks to deep purples and black (Phipps 2010). From the conquest of the Aztec Empire by the Spanish until the advent of laboratory-synthesized colorants in the 19th century, cochineal dye was the preeminent source of scarlet coloring. Cochineal was one of the primary exports from New Spain (after gold and silver) and played a critical role in the highland Mexican economy, where commercial production was centered (Ch avez- Moreno et al. 2009). Cochineal dye's monetary value was so high that its production was a Spanish state secret and pre-Columbian codices describing its use were destroyed to prevent piracy. After the development of artificial red dyes, cochineal production nearly disappeared, including from highland Mexico. Since the 1970s, cochineal production has started to resurge due to the discovery of carcinogenic and hazardous properties of many synthesized dyes (Ch avez- Moreno et al. 2009).
Cochineal insects (Dactylopius spp.) are endemic American phytophagous scale insects of the monogeneric family Dactylopiidae. Ten species are currently recognized (Van Dam and May 2012), although highly divergent biotypes within individual species have been identified, suggesting possible cryptic speciation (Mathenge et al. 2009). Four wild species are endemic to north and central Mexico (D. confusus, D. gracilipilus, D. opuntiae, and D. tomentosus), while an additional five wild species (D. austrinus, D. ceylonicus, D. confertus, D. salmianus, and D. zimmermanni) are endemic to South America (Rodr ıguez et al. 2001;Ch avez-Moreno et al. 2009;Van Dam and May 2012). As an antimicrobial and antipredatory defense mechanism, all cochineal insects (both wild and cultivated species) synthesize the anthraquinone carminic acid. Of the Dactylopius species, domesticated D. coccus produces the most carminic acid (~20% of dry body weight) (Wouters and Verhecken 1989;Ch avez-Moreno et al. 2009). Additionally, D. coccus lacks the protective waxy coating that the wild forms possess, making it more susceptible to both weather fluctuations and predation (Ch avez- Moreno et al. 2009 D. coccus's dispersed geographical pattern raises the question of whether the current day distribution is natural or the result of deliberate introduction of the insects in prehistory. The earliest known cochineal-dyed textiles were discovered in Paracas, Peru (10th to 12th century AD), but the first evidence of cochineal farming was found in Mexican Toltec (10th century AD) sites (Rodr ıguez et al. 2001;Ch avez-Moreno et al. 2009 (2005) argued that the presence of eight species that prey on domesticated cochineal in Mexico, as opposed to only one extant species in the Andes, indicates a Mexican origin. Genetic evidence is lacking: before this project, only 58 short DNA sequences (<800 bp each) were available for the entire Dactylopius genus.
Although genetic analyses could clarify the history of domesticated cochineal, they require phylogenetically informative variation to exist in extant populations. Whether extant Mexican cochineal exhibits such variation is unclear. While Oaxaca, Mexico was once the center of cochineal production, the Oaxacan cochineal industry nearly disappeared during the 19th century (Ch avez- Moreno et al. 2009). Cochineal crops were deliberately destroyed during the Mexican War of Independence. The industry never recovered due to the competition from foreign production and the development of synthetic dyes. This bottleneck may have greatly reduced the level of diversity. Furthermore, Mexican populations may have become introgressed with Peruvian stocks during the 20th century (Ch avez- Moreno et al. 2009). After the destruction of the Oaxacan cochineal industry, the center of production shifted to Peru. As the majority of Mexican D. coccus crops had gone extinct, some Mexican farmers may have been forced to obtain Peruvian stocks to start production. Trade of D. coccus stocks with the Canary Islands has also been noted in Mexico (Ch avez- Moreno et al. 2009), although this is less likely to obscure phylogeographic information since the Canary Island population was introduced from Mexico around 1825 A.D. (Piña Luj an 1980). Here, we assess the level of extant diversity of Mexican D. coccus through analysis of mitochondrial genetic markers and de novo whole-genomic sequencing.

Cochineal sample collection
Grana (dried female cochineal used for dye production) and fresh D. coccus females were obtained from smallscale farmers and large-scale commercial vendors in Mexico, Chile, and Peru (Table 1). As large-scale commercial vendors may conglomerate crops from different farmers in each year, we tested multiple crop years from several producers (Table 1). We also obtained historic grana of unknown provenance from the Peabody Museum of Archaeology and Ethnology (Harvard University) to evaluate whether extinct diversity might be preserved in historic specimens. Additionally, we collected wild female cochineal (Dactylopius spp.) by hand in Oaxaca, Mexico, for comparison with the cultivated species.

Mitochondrial marker analyses
DNA was extracted from 166 single insects using the PowerSoil kit (MO BIO, Carlsbad, California, USA) and the QIAamp â DNA Mini Kit (Qiagen, Valencia, California, USA) according to manufacturer's instructions. The dataset included 40 insects cultivated by small-scale Oaxacan farmers, 75 from large-scale commercial producers (15 Mexican, 40 Peruvian, and 20 Chilean), 10 historic grana samples without provenance, and 41 wild Dactylopius from Oaxaca (Table 1). The mitochondrial cytochrome c oxidase I (cox1) and 12S rRNA genes were amplified by the polymerase chain reaction and dideoxy-terminator sequenced (Appendix). The 12S rRNA experiments were omitted for most individuals as we found only three single-nucleotide polymorphisms in an initial subset of 30 individuals (10 Mexican and 20 Peruvian grana from commercial vendors), and the results were in agreement with the more informative cox1 results (Table 1 obtained from GenBank. While this sample is not representative of the entire Dactylopius genus, it includes all publicly available data for these genes. Table 1. Single-insect samples collected and analyzed for mitochondrial markers. The geographic and/or commercial source of the material as well as year of collection is given for each sample. Also noted is whether the sample was obtained from a small-scale cochineal farmer ("Smallscale"), a large-scale commercial vendor ("Commercial"), or wild-caught ("Wild"). "Sample Type" states whether the sample was derived from grana or fresh insects. The total sample size and the number of sequenced cytochrome c oxidase I (cox1) and 12S rRNA mitochondrial genes are also given.

Sample
Source

Identification and phylogenetic analysis of genomic sequence variants
A draft Dactylopius coccus genome assembly was constructed using JR-Assembler 1.02 (Chu et al. 2013; Table 2). The final assembly was 18.6 Mbp long with an N50 of 378,999 bp ( Table 2). The quality-controlled merged sequence reads were aligned against the D. coccus assembly using BWA 0.7.5 Durbin 2009, 2010)   ).

Selection on the Dactylopius coccus genome
To determine whether the cochineal genome was undergoing detectable natural or artificial selection, we predicted genic sequences using GeneMark-ES 2.3c (Borodovsky and Lomsadze 2011). The ratio of nonsynonymous to synonymous (N/S) SNPs was calculated using SnpEff 3.6a (Cingolani et al. 2012). Tajima's D was calculated using 500-bp windows with VCFtools 1.0.9 (Danecek et al. 2011).

Mitochondrial DNA analyses
The grana accessions' DNA preservation varied, probably due to different procedures used for preparation (e.g., boiling and air drying). We were unable to obtain sequences for all individuals due to the variation in DNA preservation. We obtained 68 cox1 ( Figure 2. Condensed maximum-likelihood trees of Dactylopius coccus cytochrome c oxidase I (cox1) and 12S rRNA mitochondrial genes. Topology robustness was tested with 100 bootstrap replicates.  Fig. 2). We sequenced 25 wild Oaxacan cochineal cox1 genes. All the Oaxacan wild cochineal we collected clustered with Dactylopius opuntiae (Fig. 2). We observed nine credible substitutions in 1003 bp of D. coccus mitochondrial DNA (0.90% divergence): six substitutions in 559 bp of cox1 sequence (1.1% divergence) and three substitutions in 454 bp of 12S rRNA (0.66% divergence). We identified three cox1 and two 12S rRNA D. coccus haplotypes (Fig. 2). Peruvian commercial cochineal cox1 sequences differed by one substitution from the Oaxacan small-scale farm insect specimens. A third divergent cox1 haplotype (an additional five substitutions) was found in Mexican commercial samples. The Chilean sample clustered with the Peruvian commercial grana. The 12S rRNA tree resolved the same two major clades (Peruvian commercial/Oaxacan small-scale farm insects versus Mexican commercial cochineal).

Genomic SNP phylogenetic analyses
A total of 11,517 genomic variants (including 10,598 polymorphic single-nucleotide polymorphisms [SNPs]) were identified in the three D. coccus pools. To account for sequencing errors, collapsed repetitive regions and apparent variants deriving from D. coccus-like environmental contaminants, we refined the SNP dataset by requiring that each site be sequenced a minimum depth of 59 per pool (159 total depth) and a maximum of 1009 per pool (3009 total depth). The refined SNP dataset included 82 high-confidence polymorphic SNPs (1359 mean total sequencing depth). Both the raw and filtered SNP datasets were analyzed by principal component (PCA) and identity-by-state relatedness analyses using SNPRelate 0.9.12 (Zheng et al. 2012 ; Fig. 3). While SNP-Relate was designed to analyze individuals, no similar software is yet available for Pool-Seq data. To corroborate the SNPRelate results, we calculated genomic differentiation (mean F ST ) of the informative sites using PoPoola-tion2 1.201 (Kofler et al. 2011) using the same SNP filtering criteria as in the SNPRelate analyses. Additionally, SNP-sharing analysis was performed on the raw SNP dataset using VCFtools 1.0.9 (Danecek et al. 2011).
All genomic SNP analyses had congruent results ( Fig. 3; Table 3). The first principal component separated the Oaxacan small-scale farm sample from the Mexican and Peruvian commercial vendor specimens. Similarly, in the identity-by-state relatedness analyses, the Mexican and Peruvian commercial samples form a clade, with the Oaxacan small-scale farm sample being more distantly

Dendrogram of All SNPs
Oaxacan small−scale Mexican commercial Peruvian commercial 0.00 0.10 0.20 0.30

Dendrogram of Filtered SNPs
Oaxacan small−scale Mexican commercial Peruvian commercial Figure 3. Relatedness between Oaxacan small-scale farm, Mexican commercial, and Peruvian commercial cochineal bulk samples. Principle component analysis (top row) separates the Oaxacan small-scale farm insects from the commercial specimens, with the first principle component explaining the majority of the variation (59.9% and 70.8% in the unfiltered and filtered SNP datasets, respectively). Identity-by-state analysis (bottom row) of these SNP datasets produces dendrograms with congruent topology. related (Fig. 3). Genomic differentiation analysis also separated the Oaxacan small-scale farm sample from the two commercial samples (Table 3). Additionally, the commercial samples from Mexico and Peru share more SNPs with each other than either do with the Oaxacan small-scale farm sample (Fig. 4). These results indicate that the Mexican and Peruvian commercial samples are more closely related to each other than they are to Oaxacan small-scale farm cochineal. Notably, both the genomic differentiation and the SNP-sharing analyses show that the Oaxacan small-scale farm sample is slightly closer related to the Mexican commercial cochineal than to the Peruvian cochineal (Table 3; Fig. 4). Unfortunately, we are unable to ascertain precise ages of these genomic clades as we have no paleontological calibration point and the most closely related sequenced genome, the pea aphid (Acyrthosiphon pisum), is too divergent to align against the D. coccus draft genome sequence (International Aphid Genomics Consortium 2010).

Discussion
We find no effect on the mitochondrial DNA diversity that can be attributed solely to human management. Nevertheless, the cox1 and 12S rRNA mitochondrial diversity is limited (three and two haplotypes, respectively) with one Mexican haplotype diverging from the other two, suggesting some form of bottleneck in the past. Nonfunctionally constrained mitochondrial markers (such as the control region) may be more variable. While it is tempting to attribute the observed bottleneck to human management, a more likely explanation is cytoplasmic incompatibility due to Wolbachia infection, a process that can produce false phylogeographic signal in arthropod phylogenetic trees (Hurst and Jiggins 2005). Dactylopius host numerous endosymbionts (Ram ırez-Puebla et al. 2010), including the Alphaproteobacterium Wolbachia (Pankewitz et al. 2007). We detected Wolbachia sequences in both the single-marker and genomic analyses (Appendix). Furthermore, we found only one mitochondrial haplotype in the wild Oaxacan cochineal (D. opuntiae), suggesting that limited mitochondrial diversity is common across Dactylopius species. Similarly, we found no conclusive evidence that the cochineal genome is under strong natural or artificial selection. Nevertheless, we observed only one D. coccus genomic sequence variant every~1600 nucleotides, which suggests a relatively slow mutation rate for insects (for comparison, Drosophila simulans has a SNP every~40 bases) (Begun et al. 2007;Hu et al. 2013). Further research is required to determine whether the slow mutation rate reflects selection.
The genomic phylogeny suggests that extant Mexican D. coccus derive from at least two source populations.  One of these populations appears to be Mexican in origin, while the other is more closely related to Peruvian cochineal. Moreover, the distinctiveness between the "Mexican" and "Peruvian" clades suggests long-term isolation between the populations, which does not support the hypothesis of continuous and extensive trading of cochineal stocks during the pre-Columbian era as has been proposed previously (Ch avez- Moreno et al. 2009). This observation supports contentions by local Mexican cochineal farmers that Peruvian stock may have been recently imported into Oaxaca with the renewed interest in cochineal production. However, our genomic differentiation and SNP-sharing results suggest that Mexican commercial cochineal may also have some local Mexican ancestry, even if it primarily derives from recently imported Peruvian stock. Notably, the mitochondrial and genomic phylogenies are incongruent. The cox1 tree clusters the Peruvian grana and Mexican fresh insect accessions, but the genomic SNP data indicate that the two grana samples form a clade. Wolbachia infection is a likely cause of the discrepancy between the mitochondrial DNA and the genomic variant phylogenies (Hurst and Jiggins 2005). Alternatively, this incongruence could reflect recent introgression (Zakharov et al. 2009), which would be consistent with recent importation of South American cochineal into Mexico.
Further genomic research is required to establish D. coccus's domestication center(s) with confidence. Our cochineal dataset does not permit us to identify the ultimate source population. Additionally, although Wolbachia strains exhibit phylogenetic and phylogeographic patterning (Russell et al. 2009), we were unable to pinpoint the source location of D. coccus through sequencing and analysis of its Wolbachia endosymbiont (Appendix). Our results, however, show that phylogenetically informative variation survives in the crops of Oaxacan small-scale cochineal farmers. Nevertheless, future analyses will need to carefully control for the effects of recent clandestine Peruvian introgression into Mexican stocks.

Acknowledgments
The Wenner-Gren Foundation (to NT and NRG), the David Rockefeller Center for Latin American Studies (to NT), Harvard University's Department of Human Evolutionary Biology, and the Science of the Human Past Initiative supported this research. Alejandro de Avila Blomberg (Jard ın Etnobot anico de Oaxaca) and Eric Ch avez Santiago (Museo Textil de Oaxaca) kindly provided specimens. Hannah Koon, Linda Reynard, and the Instituto Nacional de Antropolog ıa e Historia assisted in sample collection.

Data Accessibility
Sanger sequences have been deposited in GenBank (accessions KJ701865-KJ702008). Genome assemblies and associated sequence reads have been deposited in the BioProject archive (accession PRJNA244295).
columns in order to separate the DNA from carminic acid. DNA-rich fractions were collected and purified using the QIAquick PCR Purification Kit (Qiagen).
Bulk extracts were sheared to~200 bp average length using a S220 Focused-Ultrasonicator (Covaris, Inc., Woburn, Massachusetts, USA). DNA-sequencing libraries were constructed using the PrepX Illumina Kit (IntegenX) and NEXTflex TM DNA Barcodes (Bioo Scientific) on the Apollo 324 robotic platform (IntegenX) according to the manufacturer's instructions. Libraries were quality-controlled via analysis on an Agilent 2100 using a high-sensitivity DNA chip and quantified using the KAPA Library Quantification Kit -Illumina/Universal (KAPA Biosystems) and a Qubit â Fluorometer. A total of 13 PCR cycles using the NEXTflex TM kit enriched the indexed libraries to sequenceable concentrations. PCR-enriched libraries were requantified and pooled in equimolar ratios. Paired-end 150-bp sequences were generated on onequarter of an Illumina HiSeq 2500 lane.
A Dactylopius coccus genome assembly was constructed using JR-Assembler 1.02 from the original unpaired reads (Chu et al. 2013). The final four base pairs of each read were removed to improve sequence quality as recommended by Chu and colleagues (Chu et al. 2013). JR-Assembler uses complete reads to assemble the genome sequence via seed extension, which improves assembly of large genomes in comparison with de Bruijn graph assemblers such as SOAPdenovo2 (Luo et al. 2012) and ABySS (Simpson et al. 2009). We found that de Bruijn graph assemblers (SOAPdenovo2 and ABySS) produced unsatisfactory D. coccus assemblies, probably due to the relatively low sequencing depth and presence of repetitive regions. The original reads were aligned very poorly against the SOAPdenovo2 and ABySS assemblies, possibly due to misassemblies after chopping the reads into k-mers. Moreover, analysis of the sequence datasets using KmerGenie 1.5658 (Chikhi and Medvedev 2014) found no optimal k-mer solution.
The assembly was aligned against the GenBank nonredundant nucleotide database using MegaBLAST 2.2.27+ and the National Center for Biotechnology Information contamination screen (Zhang et al. 2000;). The Mega-BLAST results were analyzed in MEGAN 4.70.4 (Huson et al. 2011). Contigs and scaffolds matching contaminants (e.g., Proteobacteria) were removed from the assembly. Genome assembly statistics were calculated using the assemblathon_stats.pl script from the Assemblathon 2 competition (Bradnam et al. 2013). Genome completeness was evaluated using the Core Eukaryotic Gene (CEGs) approach implemented in CEGMA (Parra et al. 2009). The assembly included 47 of 248 complete CEGs (19%) with an additional 53 partial CEGs (21%). As a final test of assembly quality, the known mitochondrial cox1 and 12S rRNA sequences were identified in the assembly. JR-Assembler had correctly assembled these sequences and placed them in the same scaffold.

Phylogeographic Analysis of the Dactylopius coccus Strain of Wolbachia Genome
Wolbachia strains exhibit phylogenetic and phylogeographic patterning (Russell et al. 2009). We therefore assembled and analyzed the genome of the Dactylopius coccus strain of Wolbachia (strain "wCoc") in order to pinpoint the ultimate geographic source of D. coccus. The merged cochineal reads were aligned against two complete Wolbachia genomes (strains wMel and wPip) (Wu et al. 2004;Klasson et al. 2008) using BWA 0.7.4 (Li and Durbin 2009, 2010. Aligned reads were removed from the sequence pools using a custom script. These reads were used to de novo assemble wCoc using SOAPdenovo2 (127 bp k-mer length, 32 bp minimum mapped read length) (Luo et al. 2012). The wCoc genome was then iteratively aligned against the remaining merged reads, the newly aligned sequences were removed from the datasets, and then, the wCoc genome was reassembled including the newly removed sequences. This process was repeated until no more reads aligned against the draft genome (two iterations). The final wCoc genome sequence totaled 1.13 Mb with an N50 of 1387 bp (Table A2). Previously sequenced Wolbachia genomes range in length between 1.0 and 1.5 Mb, indicating that we have sequenced~75-100% of the wCoc genome.
Wolbachia strains are classified primarily by the ftsZ gene (Lo et al. 2002). After identification of this gene in wCoc, we aligned it against 797~428-bp partial Wolbachia ftsZ sequences obtained from GenBank. The analyzed Other Wolbachia strains Clade E Figure A1. Condensed maximum-likelihood tree of 797 partial Wolbachia ftsZ genes. The Wolbachia endosymbiont of Dactylopius coccus (strain "wCoc") falls in clade B. Clade nomenclature follows Lo et al. 2002. The tree was constructed under a Tamura-Nei substitution model with invariant sites and a gamma distribution for substitution rates (four gamma categories) and tested with 100 bootstrap replicates. Only clades supported by at least 50 replicates are noted.