A collection of barcoded natural isolates of Saccharomyces paradoxus to study microbial evolutionary ecology

Abstract While the use of barcoded collections of laboratory microorganisms and the development of barcode‐based cell tracking are rapidly developing in genetics and genomics research, tools to track natural populations are still lacking. The yeast Saccharomyces paradoxus is an emergent microbial model in ecology and evolution. More than five allopatric and sympatric lineages have been identified and hundreds of strains have been isolated for this species, allowing to assess the impact of natural diversity on complex traits. We constructed a collection of 550 barcoded and traceable strains of S. paradoxus, including all three North American lineages SpB, SpC, and SpC*. These strains are diploid, many have their genome fully sequenced and are barcoded with a unique 20 bp sequence that allows their identification and quantification. This yeast collection is functional for competitive experiments in pools as the barcodes allow to measure each lineage's and individual strains’ fitness in common conditions. We used this tool to demonstrate that in the tested conditions, there are extensive genotype‐by‐environment interactions for fitness among S. paradoxus strains, which reveals complex evolutionary potential in variable environments. This barcoded collection provides a valuable resource for ecological genomics studies that will allow gaining a better understanding of S. paradoxus evolution and fitness‐related traits.

trees and associated soils (Charron, Leducq, Bertin, Dube, & Landry, 2014a;Naumov, Naumova, & Sniegowski, 1998). Genetic diversity within this species is structured into five main lineages in North America: SpA, originally from Europe but recently introduced in North America, and SpB, SpC, SpC*, and SpD that are endemic to North America (Figure 1a). Other lineages have been identified worldwide, from Far East Asia to Hawaii (Hénault et al., 2017;Kuehne, Murphy, Francis, & Sniegowski, 2007;Leducq et al., 2016;Liti et al., 2009;Xia et al., 2017). The most recent research focuses on the endemic North American lineages SpB, SpC, and SpC* as models for speciation and hybridization (Charron, Leducq, & Landry, 2014b;Leducqet al., 2017Leducqet al., , 2016 and for adaptation to climatic conditions Leducq et al., 2014). The population structure observed in North America shows that these lineages show partial postzygotic reproductive isolation. Also, no evidence of first-generation hybrids has been found in nature so far, suggesting that SpA, SpB, and SpC may actually represent fully isolated species (Charron, Leducq, & Landry, 2014b). Finally, Xia et al. (2017) recently described an additional highly diverse group, designated as the SpD lineage.
These lineages occupy a large geographic region with extensive environmental variation. It has been shown that SpB, SpC, and SpC* display a distribution closely linked to their different ranges of temperature tolerance, therefore potentially reflecting ecological specialization (Leducq et al., 2014(Leducq et al., , 2016Figure 1b). The lineages perform differently at high temperature and do not survive equally to freeze-thaw cycles, with southern populations outperforming northern ones. They also appear to diverge in terms of performance when grown on limiting nutrient media with different carbon or nitrogen sources (Leducq et al., 2016;Samani et al., 2015). The molecular basis of this ecological specialization has been examined and candidate genes have been identified as potential key players . For example GRS2, which codes for an aminoacyl tRNA-synthetase, is expressed at high temperature and its protein level differs in abundance between the SpB and SpC lineages. Allele swapping experiments revealed that protein-coding changes at this gene could be partly responsible for the inability of SpC to grow at high temperature.
Most studies have compared the fitness of different strains in controlled conditions using colonies grown isolated from each other on solid media. Being able to compare their fitness when in contact with each other would eventually allow to measure direct interactions among lineages or strains or interactions with other microbial species in a shared environment. One approach that has been developed recently for the study of model organisms is the use of DNA barcodes to track strains individually within a pool (Mazurkiewicz, Tang, Boone, & Holden, 2006). For more than a decade, barcoded yeast collections have been a powerful genomic tool to advance our knowledge of genomics and cell biology (Giaever & Nislow, 2014). The method relies on a unique short DNA segment inserted in a strain, which enables it specific identification. Using barcode sequencing (also known as Bar-Seq; Filteau, Charron, & Landry, 2017;Gresham et al., 2011;Robinson, Chen, Storey, & Gresham, 2014;Smith et al., 2009), relative fitness is measured within and between conditions by monitoring the relative abundance of each barcode through time in a mixed pool of strains. The various applications of the S. cerevisiae knockout collection (Giaever et al., 2002), in which one gene is deleted and replaced by an antibiotic resistance cassette flanked by two unique barcodes, is a relevant example of the diversity of uses this tool can offer (Nislow et al., 2015;Novo et al., 2013;Sliva, Kuang, Meluh, & Boeke, 2016;VanderSluis et al., 2014). Other yeasts have been barcoded with similar approaches, for instance the collection of Schizosaccharomyces pombe insertion mutants (Chen, Hale, Ciolek, & Runge, 2012) and various isolates of S. cerevisiae (Cubillos, Louis, & Liti, 2009;Maclean et al., 2017).
Here, we barcoded a collection of North America wild S. paradoxus strains to facilitate the study of natural diversity in controlled conditions. This collection includes 198 SpB, 64 SpC, 47 SpC*, and F I G U R E 1 The S. paradoxus population structure and geographical distribution in North America. (a) Representation of the evolutionary history of the S. paradoxus North American lineages (Leducq et al., 2016). The European SpA and American lineages diverged about 200,000 years ago. It is hypothesized that SpB and SpC were in allopatry during the last glaciation from 110,000 to 12,000 before present (BP). A secondary contact between SpB and SpC would have occurred after the glacial retreats, leading to the formation of SpC* by hybridization. The SpD clade was identified recently, and its origin is not yet elucidated (Xia et al., 2017). (b) Geographical distribution of the S. paradoxus strains used in this study. Circle size is proportional to the number of strains at the location (a) (b) 5 SpD barcoded strains. To illustrate the use of this collection, we performed an experiment in which the strains were pooled and competed in rich medium (YPD) at 25°C and 35°C and in synthetic medium supplemented with proline at 25°C. This allowed to measure each lineage's and each strain's relative fitness. We demonstrate that the different lineages, and strains within lineages, show extensive variation in fitness in these conditions, including genotype-by-environment interactions.

| Barcode and resistance cassette amplification
Barcodes from the S. cerevisiae deletion collection (Giaever et al., 2002) were amplified and combined with the hygromycin B (HPH) or the nourseothricin (NAT) resistance cassette and inserted by transformation and homologous recombination at the HO locus of selected diploid S. paradoxus strains ( Figure 2, step I). The HO locus was chosen as the barcode integration site because it is a common neutral replacement site in laboratory strains. Indeed, HO is not required for growth and the deletion has no detectable effect on vegetative growth when replaced with a resistance cassette (Baganz, Hayes, Marren, Gardner, & Oliver, 1997). The endonuclease encoded by HO is responsible for mating-type switching and is consequently inactive in diploids (Haber, 2012). The deletion in diploids will allow the production of stable haploids by dissection of the barcoded strains if needed.
Genomic DNA was extracted following a protocol modified from Looke, Kristjuhan, and Kristjuhan (2011). Strains from the S. cerevisiae deletion collection were printed onto arrays of 384 colonies on solid yeast peptone dextrose (YPD) with 10 g/L of yeast extract, 20 g/L of tryptone, and 20 g/L of glucose, as outlined in Rochette et al. (2015) using a BMC-BC robotic platform (S&P robotics, North York, Canada). Cycling protocol details are listed in Supporting Information Table S4.
In 141 cases, we were not able to amplify the barcode of the initially selected deletion strain from the S. cerevisiae collection, so a second one was used and reported in our database (Supporting Information   Table S1). Two different and unique barcodes were assigned to each S. paradoxus strain and associated with the HPH or NAT resistance cassette to form the Tag1-HPH and Tag2-NAT copies.

| Fusion PCR
The PCR products containing the barcodes were fused by PCR with the resistance cassettes, HPH or NAT ( Table S2). This lead to a higher rate of successful transformations.

| Barcodes insertion
All natural strains used in this study are listed in Supporting Information Table S1. Their geographic distribution is represented in Figure 1. Competent cells and transformation were performed as in Gietz and Woods (2002) with the following modifications: cells were grown overnight in 5 ml of YPD at 30°C without agitation, diluted to an OD 595 of 0.15/ml, and grown again to an OD 595 of 0.4-0.7/ml. Each culture was harvested by centrifugation at 500 g for 5 min, and the  Table S1).

| Barcoded strains phenotypic analysis
The 370 parental strains and 594 barcoded strains of the collection were assembled in two arrays (omnitrays, 86 mm × 128 mm Petri dish) on solid YPD medium (Rochette et al., 2015). One contained all the SpB

| Barcoded strain competition assay
The S. paradoxus barcoded strains were pooled as in Smith et al.
(2011) with the following modifications: the collection was replicated from glycerol stock onto YPD plates in a 384-array format and incubated for 2 days at 30°C. To maximize their growth before pooling, the strains were transferred for a second round of incubation on YPD plates for 2 days at 30°C. The set of SpB and SpD barcoded strains were first pooled together in an intermediate pool and SpC and SpC* in another one. The two pools were combined into a single one to obtain a mix of the four lineages. Cells were adjusted to concentrations equivalent to 50 optical density (OD 595 ) in YPD + 25% glycerol, and the pool was aliquoted in several 1 ml tubes and stored at −80°C. Strains LL12_028 (Tag1-HPH and Tag2-NAT copies, SpB), LL12_003 (Tag1-HPH and Tag2-NAT copies, SpB), LL13-025 (Tag1-HPH and Tag2-NAT copies, SpB), LL11_002 (Tag1-HPH copy, SpC), LL12_004 (Tag1-HPH copy, SpC), and LL12_007 (Tag1-HPH copy, SpC) were removed because they had abnormal colony morphologies suggesting that they were either contaminated or were affected by the transformation.
Competition assays were carried out in 96-deep-well plates in three conditions: YPD at 25°C and 35°C, and synthetic minimal me- We refer to t 1 as our initial time, equivalent to five generations in YPD and 3.5 generations in proline medium.
To limit stochastic effects, we pooled four different wells at the end of the experiment to obtain at least four final replicates for each condition at the initial and final time points (YPD at 25°C at t 1 and t 5 , YPD at 35°C at t 1 and t 5 , and proline medium at 25°C at t 1 and t 7 ). We extracted DNA from these pools using a protocol adapted from Amberg, Burke, and Strathern (2005)  The tubes were incubated at −20°C for 15 min and centrifuged for 15 min at 16,100 g at 4°C. The supernatant was removed, the cell pellets were dried by evaporation at 37°C, and DNA samples were resuspended in 50 μl water. DNA concentration was measured using a Nanodrop 2000c (Thermo-Fisher Scientific, Waltham, USA) and adjusted to 10 ng/μl for further use.

| Barcode sequencing
Forward and reverse primers were used for multiplex sequencing with Ion Torrent technology using predefined indexes (Faircloth & Glenn, 2012) and newly designed ones (listed in Supporting Information Table S2). Each 25 μl PCR reaction mix sample con-

| Barcode sequence analysis and quantification
All possible expected PCR products including the dual-index and the barcodes were concatenated with NNNNN spacer. This reference sequence was used for mapping barcoded sequences using where P final is the frequency of the strain at the final time (t 5 or t 7 ) and P initial is the frequency of the strain at the initial time (t 1 ) of the competition assay. The frequency for each strain was calculated as its number of reads divided by the total number of reads in the library considered. We used 18 generations for the calculations. For each strain, the relative fitness value of its copies, Tag1-HPH and Tag2-NAT, was estimated using the median relative fitness of the four replicates. The global relative fitness of each strain within a given condition was calculated by using the mean value of the Tag1-HPH strain and Tag2-NAT relative fitness. Finally, to determine the fitness of each lineage in each condition, the average fitness of all strains belonging to each lineage was calculated. Twelve outlier strains were removed from the analysis because of their highly differential fitness between their two barcoded copies, Tag1-HPH and Tag2-NAT.
To define the filtering threshold, the mean and the standard deviation (SD) of the difference between the fitness of the Tag1-HPH and Tag2-NAT copies of all the strains were calculated. Strains with a difference value higher or lower than mean ± (2.5 × SD) were removed.

| Transformation and integration of the barcodes in the S. paradoxus strains
The S. cerevisiae deletion collection was used to amplify unique barcodes before integration in the S. paradoxus strains. This collection consists of strains in which nonessential genes have been individually replaced with a KanMX module, which confers resistance to geneticin, and two flanking unique DNA barcodes of 20 bp labeled as uptags and downtags (Giaever et al., 2002). Because the uptag barcodes were previously sequenced in Filteau et al. (2017) and presented fewer discrepancies with the database of the deletion collection than the downtag, we used the uptags only. Changes in barcode sequences that were detected by Filteau et al. (2017) are listed in Supporting Information Table S1.
From one to three rounds of transformation were performed to insure barcode insertion in a maximum of strains. This was achieved successfully for more than half of the parental strains (Supporting Information Table S3). Integration failure could be due to polymorphisms in the flanking regions of HO locus where homologous recombination takes place or at the loci for PCR confirmation or to variation among strains in their level of competence for transformation. Proper integration could not be confirmed for every strain, suggesting that the barcode and the selection cassette were not always at the appropriate genome location. Such strains were not further considered.
Thus, the Saccharomyces paradoxus barcoded collection consists of 550 strains from the SpB, SpC, SpC* lineages, and the SpD group, either barcoded in two distinct copies, with the Tag1-HPH and Tag2-NAT module (n = 238) or in single copy with a Tag1-HPH (n = 34) or the Tag2-NAT module (n = 40; see details in Table 1). The parental SpD strains were in small number, and two out of nine SpD strains were obtained in one copy, and three out of nine SpD strains were barcoded with both copies. The sequencing of 539 inserted barcodes showed that seven barcodes carried a mutation, 18 barcoded were not likely legible due to sequencing errors and 49 barcodes were not expected to be in the constructed strains. These could be errors of barcodes in the S. cerevisiae collection or contamination that took place during the experiment. Within these different barcodes, 19 were usable as they were not already associated to a S. paradoxus strain. However, we had to eliminate from the collection 44 strains that had a common barcode. Among the 594 transformed strains, 550 remained in the collection (93%).

| Comparison of the growth between the barcode and parental strains
To confirm that barcode insertion had no significant effect on growth compared to parental strains, a growth experiment was performed on solid rich medium (YPD) at 25°C and 35°C. Conditions were selected according to previous studies in which growth differences between lineages were observed (Leducq et al., 2016). For all lineages, no significant differences between the barcode and their parental strains were found (Kruskal-Wallis tests, Figure 4 and Supporting Information Table S5).

| Competition assay
To test if quantitative barcode sequencing could be used with our collection and if it is adequately sensitive to characterize mean relative fitness of the lineages as well as individual relative fitness, we performed a competition assay in three specific conditions: YPD at 25°C, 35°C and proline medium at 25°C. These conditions were shown to differentiate the three lineages in previous studies Leducq et al., 2016).
For each condition, read counts were highly reproducible across replicates (Supporting Information Figure S1, Pearson's r = 0.92-0.99, p < 0.01). After sequence data filtering, 206 out of the 238 strains barcoded in two copies were detected (for details, see Supporting Information Table S6) and 193 were used in the analysis, after eliminating the strains with highly differential fitness between their two tag modules. This suggests that a slightly higher sequencing depth would be required to cover the entire collection in future experiments.

| D ISCUSS I ON
The budding yeast S. paradoxus has been providing insights into the evolution and ecology of fungi over the past ten years. For example, studies have illustrated the role played by ecological and historical parameters in shaping the ecological distribution of North American populations Leducq et al., 2016). Here, we developed a collection of barcoded strains to further empower these investigations. The availability of two barcoded copies of S. paradoxus strains allows the use of biological replicate for further experimentation and eventually the use of the different resistance markers for genetic studies. As the genome of a large fraction of these strains is fully sequenced (Leducq et al., 2016)  p ρ aspects related to speciation and hybridization Leducqet al., 2017Leducqet al., , 2016. It is also possible to extend the collection by adding samples, as the barcode source is the S. cerevisiae KO collection that contains more than 4,000 unique barcodes, and many more could be created by designing new ones.
We showed that the insertion of the barcode does not significantly affect the growth of the strains in our tested conditions on solid medium (Figure 2). However, we observed that some strains show difference in fitness between the two cassette modules when grown in competition. These differences could arise from unwanted effects that occurred during transformation, including secondary mutations, multiple insertions, or genome instability that lead to the accumulation of variation among otherwise isogenic strains. Further investigation will be needed to assess the cause of this variation. A small fraction of barcode mis-assignment could also contribute to these differences. Nevertheless, the majority show consistent behavior and thus provide an invaluable resource.
We revealed that even with a modest number of reads, significant fitness differences could be detected among strains and lineages. Our Our results are consistent with these observations. Furthermore, our analyses uncovered extensive genotype-by-environment interactions for fitness in these populations. Although the number of conditions tested is too small to allow for drawing general trends, these results suggest that no single lineage or strain would be able to outcompete the others in conditions that vary in space and/or time because the fitness ranking appears to be to a large extent independent among conditions, at least between the rich and defined conditions. These strains could be specialized in conditions that vary locally and across their respective geographical range, which would contribute to maintain a large diversity of genotypes in North America.
F I G U R E 6 Genotype-by-environment interaction for fitness. The interaction between genotype and environment was investigated by analyzing the correlation between the strain fitness within (a, b, and c) and between (d, e, and f) conditions, considering the two tag modules as biological replicates. Fitness correlation is systematically high between tag modules within condition comparisons. This shows the extent of noise caused by strain transformations and/or biases or noise in barcode quantification (a, b, and c) and the maximum correlation possible between conditions. Correlations between conditions are systematically lower, showing a major effect of growth conditions in relative fitness among strains shown the influence of competition on how specific microbial isolates cope with a novel environment or with a fluctuating environment (Bleuven & Landry, 2016;Lawrence et al., 2012;Osmond & de Mazancourt, 2013;Pekkonen, Ketola, & Laakso, 2013;Van Den Elzen, Kleynhans, & Otto, 2017). In its environment, S. paradoxus is surrounded by an abundant diversity of microbial species and its growth success will depend on the composition of these communities (Kowallik, Miller, & Greig, 2015). By using the S. paradoxus barcoded collection as a tractable and genetically well-characterized model system, it will become possible to study the selective pressures of abiotic as well as biotic factors that shape the species evolution in highly controlled conditions. Finally, further research could benefit from coupling experiment in controlled conditions with tools such as our S. paradoxus barcoded collection and methods that are developed to measure microbial fitness and persistence in nature (Anderson et al., 2018;Boynton, Stelkens, Kowallik, & Greig, 2017). Biology.

CO N FLI C T O F I NTE R E S T
The authors have no conflict of interest.

AUTH O R S CO NTR I B UTI O N
AKD, IGA, and CRL designed the experiment; CB, AKD, IGA, and GN acquired the data; CB, GN, and HM analyzed and interpreted the data; CB wrote the manuscript with input from CRL and GN.
All authors contributed and agreed on the content of the final version.

E TH I C S S TATEM ENT
None required.