Hybridization capture of larch (Larix Mill.) chloroplast genomes from sedimentary ancient DNA reveals past changes of Siberian forest

Siberian larch (Larix Mill.) forests dominate vast areas of northern Russia and contribute important ecosystem services to the world. It is important to understand the past dynamics of larches in order to predict their likely response to a changing climate in the future. Sedimentary ancient DNA extracted from lake sediment cores can serve as archives to study past vegetation. However, the traditional method of studying sedimentary ancient DNA—metabarcoding—focuses on small fragments, which cannot resolve Larix to species level nor allow a detailed study of population dynamics. Here, we use shotgun sequencing and hybridization capture with long‐range PCR‐generated baits covering the complete Larix chloroplast genome to study Larix populations from a sediment core reaching back to 6700 years from the Taymyr region in northern Siberia. In comparison with shotgun sequencing, hybridization capture results in an increase in taxonomically classified reads by several orders of magnitude and the recovery of complete chloroplast genomes of Larix. Variation in the chloroplast reads corroborates an invasion of Larix gmelinii into the range of Larix sibirica before 6700 years ago. Since then, both species have been present at the site, although larch populations have decreased with only a few trees remaining in what was once a forested area. This study demonstrates for the first time that hybridization capture applied directly to ancient DNA of plants extracted from lake sediments can provide genome‐scale information and is a viable tool for studying past genomic changes in populations of single species, irrespective of a preservation as macrofossil.


| INTRODUC TI ON
Siberian forests are unique as they cover a vast area of about 263.2 million ha (Abaimov, 2010) dominated by a single genus of tree, the deciduous conifer larch (Larix Mill.). As the only extensive forest biome growing on continuous permafrost, it plays an important role for local communities and it provides critical ecosystem services in a global context including carbon stocks, climate feedbacks, permafrost stability, biodiversity and economic benefits (Herzschuh, 2019). It is therefore important to understand how the genus and individual larch species have responded and will respond to changing climatic conditions. Frequent natural hybridization between larch species makes it difficult to distinguish taxa, and the number of accepted species is still under discussion (Abaimov, 2010). This is one of the reasons why there is still little known about the population dynamics of Siberian larch species and the question remains of whether there have been migrations of larches in the current postglacial period.
Sedimentary ancient DNA (sedaDNA) from lakes can act as an archive of the past and has been demonstrated to be a valuable tool in the study of past vegetation history (Jørgensen et al., 2012;Parducci et al., 2017;Wang et al., 2017;Willerslev et al., 2003).
Most sedaDNA studies focus on organellar DNA, as the higher copy number of organelles per cell compared with the nucleus allows a higher chance of retrieval. The metabarcoding approach  applied to DNA extracted from sediments is the most common, robust and fast technique to study past vegetation (Alsos et al., 2018;Niemeyer et al., 2017;Pansu et al., 2015). For ancient DNA of plants, a very short, but highly variable DNA fragment from the chloroplast genome is PCR-amplified out of the pool of DNA fragments and subsequently sequenced using high-throughput sequencing (Taberlet et al., 2007). However, the method is not suited to resolve population dynamics of single species, as metabarcoding markers used for ancient degraded samples must be very short while at the same time flanked by primers that are conserved across a larger taxonomic group. Therefore, their taxonomic resolution is, in most cases, insufficient to resolve closely related species (Sønstebø et al., 2010;Taberlet et al., 2007), let alone show subspecific variation.
Sequencing of the entire DNA extracted from ancient sediments, termed metagenomic shotgun sequencing, has been shown to provide information on the entire taxonomic composition of the sample (e.g. fungi, bacteria, archaea; Ahmed et al., 2018;Parducci et al., 2019;Pedersen et al., 2016). By sequencing complete DNA molecules, it is possible to authenticate ancient sequences versus modern contaminants by their specific postmortem DNA damage patterns towards the ends of the molecules (Ginolhac et al., 2011).
As it is not restricted to a specific DNA fragment, it also allows the retrieval of many different loci belonging to single species provided they are sufficiently concentrated in the sample. A major drawback, however, is the immense sequencing effort that must be expended to achieve a sufficient overview of the DNA present in a sample. Most of the sequences retrieved from ancient environmental samples are not assignable to a specific taxon because available sequence databases are still limited, and most assigned sequences are assumed to be of noneukaryotic origin (Ahmed et al., 2018;Pedersen et al., 2016). Especially in the case of DNA extracted from lake sediments, the ratio of sequences assigned to terrestrial plants to total DNA-sequenced is expected to be extremely low (Parducci et al., 2019).
A way to overcome the limitations of shotgun sequencing is to enrich the DNA of the focal species in the samples via hybridization capture prior to sequencing. To do this, one can use short fragments of DNA of the species and target sites of interest as baits, to which the corresponding sites of interest in ancient DNA libraries are hybridized. This technique, originally developed for modern DNA, is commonly applied in ancient DNA studies, particularly for use on single specimens (Ávila-Arcos et al., 2011;Maricic et al., 2010) and with a focus on mammals, mostly using mitochondrial DNA (Carpenter et al., 2013;Dabney et al., 2013;Enk et al., 2016). Successful capture enrichment from sedimentary ancient DNA has been reported only a few times so far. Cave sediments (Slon et al., 2017) and permafrost samples (Murchie et al., 2019) were successfully enriched for a range of terrestrial organisms, while an attempt to capture ancient mammalian DNA from lake sediments failed (Moore et al., 2019). Plants have received limited attention in ancient DNA studies (Parducci et al., 2017)  Ancient plant DNA recovered directly from lake sediments, which is commonly targeted in metabarcoding studies (Epp et al., 2015;Liu et al., 2020;Pansu et al., 2015), has, however, not yet been used as a target for the capture of larger genomic regions of specific target species. Beyond the retrieval of short fragments useful for species identification (as used in metabarcoding), it is not clear how complete the genomic record of plants in sediment cores is. This is also true for chloroplast DNA, which holds valuable ecological and adaptive information, through the genes for photosynthesis, and is widely used for taxonomic identification and phylogenetic analyses (CBOL Plant Working Group et al., 2009;Jansen et al., 2007;Shaw et al., 2007). In conifers as Larix, chloroplast DNA is paternally inherited via pollen (Szmidt et al., 1987), associated with a higher intraspecific gene flow and a lower rate of introgression than the maternally inherited mitochondrial DNA (Du et al., 2009). As a result, chloroplast DNA variation is more species-specific than mitochondrial DNA variation in this group (Du et al., 2009). Complete chloroplast genomes have, however, not yet been targeted for capture enrichment from lake sediment cores.
Here, we apply shotgun sequencing and a hybridization capture approach targeting the complete Larix gmelinii chloroplast genome to sedaDNA samples from a small lake in the Taymyr region of northeastern Siberia. The study site lies in the boundary zone of two larch species, L. gmelinii and Larix sibirica, with hybridization occurring between the boundary populations (Abaimov, 2010;Polezhaeva et al., 2010). It has been hypothesized for this region that a natural invasion of L. gmelinii into the range of L. sibirica occurred during the Holocene (Semerikov et al., 2013). The lake is situated in the treeline ecotone with scattered patches of L. gmelinii occurring in the area (Klemm et al., 2016). A sediment core of the lake has already been extensively studied using pollen analysis, DNA metabarcoding and mitochondrial variants (Epp et al., 2018;Klemm et al., 2016), making it an ideal site to study ancient larch population dynamics based on chloroplast DNA.
As a proof of concept, four samples were both shotgunsequenced and enriched by hybridization capture for the chloroplast genome of L. gmelinii, to evaluate how well we can retrieve and assemble genome-scale data-here the complete chloroplast genome of Larix-from sedimentary ancient DNA. We demonstrate the successful enrichment by comparing taxonomically classified reads of the shotgun and hybridization capture data sets and evaluate the degree of coverage of the Larix chloroplast genome across the different annotated regions of the genome. This study presents the first successful recovery of complete chloroplast genomes from ancient lake sediments.

| Sample material
Samples were obtained from a sediment core from lake CH12 (72.399°N, 102.289°E, 60 m a.s.l.) in the Khatanga region of the northern Siberian lowlands, located between the Taymyr Peninsula to the north and the Putorana Plateau to the south ( Figure 1). The lake's position is in the northern part of the treeline ecotone and is currently surrounded by a vegetation of single-tree tundra. Samples from this core (core ID 11-CH-12A) have already been analysed by Klemm et al. (2016) and Epp et al. (2018).
Details of the chronology of the core are described in Klemm et al. (2016). Four new samples were chosen for the present study at depths/ages 121.5 cm/~6700 calibrated years before present (cal-BP), 87.5 cm/~5400 cal-BP, 46.5 cm/~1900 cal-BP and 2.5 cm/~60 cal-BP.

| Laboratory work
2.2.1 | Sampling, DNA extraction and library preparation Core subsampling was performed as described in Epp et al. (2018). and 6.7 ng/µl (60 cal-BP). 5 µl of each DNA extraction was used in the library preparation. Libraries were prepared following the single-stranded DNA library preparation protocol of Gansauge et al. (2017), which was specifically developed for ancient degraded DNA, with the following adjustment: as we had no access to a programmable shaking incubator in our ancient DNA laboratory, the ligation of the second adapter (CL53/CL73) was carried out in a rotating incubator. The libraries were quantified with qPCR as described by Gansauge and Meyer (2013). We first prepared a standard for qPCR by amplifying a part of the pUC19 vector (New England BioLabs) with primers carrying P5 and P7 binding sites.
The PCR contained 0.05 U Taq DNA Polymerase and 1x PCR buffer (Sigma-Aldrich), 0.25 µM CL105 and CL106, 1.25 mM dNTPs (Invitrogen) and 10 pg pUC19 DNA in a final volume of 100 µl and was carried out with the following cycling conditions: 5 min at 95°C, 30 cycles with 30 s at 95°C, 58°C and 72°C, each, followed by 5 min at 72°C. qPCR standards were purified using the MinElute PCR Purification Kit (Qiagen) following the manufacturer's recommendations. Standards were diluted in a series from 10 9 to 10 2 copies/µl. qPCR was carried out in 1× Maxima™ SYBR™ Green (Thermo Scientific), 0.2 µM IS7 and IS8 and 1 µl of the sample libraries diluted 1:20 with TET buffer in a total volume of 25 µl on a Rotor-GeneQ qPCR instrument (Qiagen). Cycler conditions were 10 min 95°C, 40 cycles of 30 s at 95, 60 and 72°C, each. Fluorescence was measured after each extension step.
The prepared libraries were used downstream for both shotgun sequencing and hybridization capture of chloroplast genomes.

| Shotgun sequencing
Twenty-four µl of the prepared DNA library was amplified and indexed by PCR with 13 cycles as described in Gansauge and Meyer (2013)

| Bait construction
Long-range PCR products covering the complete chloroplast genome of Larix gmelinii were generated using 18 primer pairs de-

| Hybridization capture
The enrichment was done following the protocol of Maricic et al. (2010).
Another 24 µl of prepared DNA libraries was PCR-amplified with 16 cycles with the same set of index primers, PCR products were purified, and fragment length and concentration were estimated in the same way as described for the shotgun samples. Libraries were pooled in equimolar amounts to a total of 2 µg, including the two blanks (extraction blank and library blank). The two blanks had a molarity of 20% compared with the samples. To prevent binding of library molecules to the adapter sequences, which would result in off-target capture, the adapter sequences were blocked prior to the capture experiment by blocking oligonucleotides. In addition to the blocking oligonucleotides BO3/4.P7.part1.F/R and BO5/6.P7.part2.F/R provided in the original protocol for single indexed libraries (Maricic et al., 2010)

| Alignment
Alignments against an L. gmelinii chloroplast reference genome were made using three data sets: the complete capture data set, a subset of this data set containing only Larix-classified reads (as described above) and the same subset of the shotgun data set. As reference, the chloroplast genome of an L. gmelinii individual from the Taymyr region was used (NCBI Accession No.: MK468637.1). Reads were mapped using bwa aln algorithm (v.0.7.17-r1188,    (Table S1).
After trimming and filtering, 62.6% of the sample reads remained for the analysis, of the blanks 0.2% remained. Comparable results were obtained by Ahmed et al. (2018) who retained 52% of shotgun sequenced sedimentary DNA after trimming and quality control. Eighty-two per cent of the sample reads overlapped and were merged.
Using kraken2 with the nt database and a confidence threshold of 0.8, 0.3% of the quality control passed (QC) shotgun reads could be classified. The majority of sample reads were classified as Bacteria (62.6%) and Eukaryota (23.4%). Across all samples, 2.8 thousand (k) reads were classified as Larix (Figure 2, Table S2).
When classifying against the custom chloroplast database using kraken2 with default confidence, 0.16% of the QC shotgun sample reads could be assigned to Viridiplantae (Table S3). In the samples, 3.6 k reads were classified as Larix and used in further analyses. In the blanks, no read was assigned to Larix with neither of the databases or thresholds. Therefore, they were not considered further in the analysis.

| The hybridization capture data set
The sequencing of the hybridization capture experiment resulted in approximately 192 M paired-end reads for the four samples and 10 M reads for the blanks. After trimming and quality filtering, 66% were kept from the samples and 0.3% were kept from the blanks.
About 91% of sample reads and 95% of blank reads overlapped and were merged (Table S1).
Classification with kraken2 using the nt database with a confidence threshold of 0.8 could classify 28% of the capture sample reads. Of the classified sample reads, the majority was classified as Eukaryota (44%) and more specifically Viridiplantae (43%). Three M reads were assigned to Larix (8.9% of classified reads, Figure 2).
Classification against the custom chloroplast database using kraken2 with default confidence resulted in 46 M (36.5%) of the capture sample reads assigned to Viridiplantae. Of the assigned reads, 9.2% were classified as Larix (4.2 M reads).
In the two blanks, six and eight reads (classification with nt and chloroplast database, respectively) were assigned to Larix. When factoring in that the blanks were sequenced only with one fifth of a share compared with the samples, the number of assigned reads to Larix is still many orders of magnitude smaller in the blanks than in the samples (the lowest number of Larix-classified reads in a sample is 57 k reads, in the extraction blank it is 35 with the applied correction factor). Therefore, the blanks were not considered further in the analysis.
A comparison of the shotgun and capture data sets shows that 46.6-to 155.8-fold more reads were assigned to Eukaryota in the capture data set. Within the Viridiplantae, enrichment ranged from 77.8-to 236.9-fold enrichment of captured data in respect to shotgun data. The number of Larix-classified reads per sample corresponds to an increase of around 800-to 1160-fold compared with the shotgun data. These reads were filtered for PCR duplicates when aligning to the Larix chloroplast genome. Comparing the aligned, deduplicated reads of both shotgun and capture data sets, the enrichment ranged from 6.4-to 16.2-fold (Table S4).

| Ancient DNA authenticity
A mapdamage analysis (Jónsson et al., 2013) was applied to the alignment files of Larix-classified reads aligned to the Larix gmelinii chloroplast genome. The overlapping merged reads for the three ancient samples (from 1900, 5400 and 6700 cal-BP) show a clear increase in C-to-T substitutions at both ends with a greater pronunciation at the 5′ ends. A clear increase in substitution rate with age is visible ( Figure S1). The unmerged paired-end reads show comparable Cto-T substitution rates for the forward reads at the 5′ ends and for the reverse reads at the 3′ ends ( Figure S2)

| Retrieval of the Larix chloroplast genome
To evaluate the retrieval of Larix chloroplast genome sequences, alignments against a reference were made with three data sets: the complete capture data set and the Larix-classified subsets of capture and shotgun data sets. The alignment of the Larix-classified capture reads resulted in a near-complete retrieval of the Larix chloroplast genomes for all samples except for the most recent one.  (Table S5).
In the alignments, the coverage is not equal across the different annotated regions. When aligning the Larix-classified reads, the coverage is highest for inverted repeats and lowest for ribosomal RNA ( Figure 3, dark shaded colours). In the same data set, the coverage is, on average, higher for intergenic regions, pseudogenes and conserved open reading frames (ORFs), than for protein-coding genes.
When aligning the complete capture data set against the same reference, the coverages of the different annotated regions show a different pattern: highest coverage is at the ribosomal RNA, followed by the photosystem complex coding region and the inverted repeats.
Considering the 294 sites that differ between the two reference genomes of L. gmelinii and Larix sibirica, 95.5% of all reads in all samples carry L. gmelinii specific variations, and 4% of the reads carry L. sibirica variants (Figure 4). Almost no reads (0.4%) carry neither of the two species-specific variations ('other').
Between 0.3% and 51% of the analysed positions contained at least one read which was classified as L. sibirica, with the highest percentage detected at 6700 cal-BP and the lowest percentage detected at 60 cal-BP (Table S6). The ratio of L. sibirica variants over all positions and reads varied from 5% (6700 cal-BP) to 1.6% (5400 cal-BP). Most of the variation between the two Larix species lies in the intergenic region and in conserved ORFs of unknown function (Figure 4). When using an L. sibirica chloroplast genome as reference, the vast majority of reads (79.6%) still carry variations assigned to L. gmelinii, while 19.1% of the reads are classified as L. sibirica and 1.3% as 'other' (Table S6).

| DISCUSS ION
Ancient DNA from lake sediments constitutes a valuable resource to investigate the response of populations to past environmental changes. Previous studies using metabarcoding or shotgun sequencing have not yet explored the full potential of this resource. Here, we applied shotgun sequencing and hybridization capture using PCRgenerated baits of the Larix chloroplast genome, to retrieve complete chloroplast genomes and study past changes in the population history of larches in northern Siberia.

| Taxonomic classification-conservative approach results in low numbers of assignment
In the shotgun data set, only 0.3% of quality-filtered reads could be classified against the nt database. This is a very low number compared with other studies (Ahmed et al., 2018;Slon et al., 2017). In our analysis, the parameter setting in the bioinformatic approach had a high impact on the rate of classification. We used kraken2 (Wood et al., 2019), a new version of kraken, which is a particularly conservative tool compared with others, reporting less false-positive but also less true-positive hits than others tools, even with default values (Harbert, 2018). We used it with the very high-confidence threshold of 0.8, which calculates a score for each taxonomic level and can be set between 0 (most sensitive) and 1 (most specific). We decided upon this high-confidence setting, as we found it gives the best results in terms of vegetation composition based on our knowledge of the vegetation history (Epp et al., 2018), but with the consequence of very low overall assignment rates. Indeed, when we use the default confidence threshold of kraken2, we could assign 10%-16% of the reads. However, more lenient classification causes a reinforcement of the database bias: few deeply sequenced taxa are more likely to be assigned than the majority of shallowly or fragmentarily sequenced taxa (Parducci et al., 2017).

| Target enrichment success-Larix reads increased by orders of magnitude along with other taxonomic groups
Hybridization capture resulted in an increase in taxonomically classified reads by orders of magnitude, especially with respect to the ratio of classified reads (0.3% to 28%), and also in absolute numbers of assigned reads (800 K reads to 35 M reads). These results show, for the first time, that DNA capturing of whole chloroplast genomes is effective even for DNA libraries that contain DNA from diverse origins and low on-target rates such as DNA from ancient lake sediments.
The number of reads classified as genus Larix using the nt database increased 800-to 1600-fold from shotgun to hybridization capture data set. However, in all samples there was a high level of of life. This complex mixture corresponds to a higher sequence divergence than mixtures from pooled individuals from one taxonomic order, which have previously been used to measure the capability of capturing sequences highly diverged from the baits (Paijmans et al., 2016;Peñalba et al., 2014). This capability of baits capturing fragments from diverged taxa could be potentially refined and used to study wider taxonomic groups of interest in ancient lake sediments. Apart from Viridiplantae sequences, the capture data set also contains considerable amounts of reads classified as bacteria encompassing different groups of bacteria. This can likewise be explained by the presence of highly conserved gene sequences in the chloroplast genome, which are also shared by bacteria. In particular, the chloroplast genome contains sequences coding for the 16S ribosomal RNA, which is widely used as a phylogenetic marker. Such marker genes are present in high amounts in the nt database so they are very likely to be taxonomically assigned.

| Complete retrieval of ancient Larix chloroplast genomes
In the capture data set, complete chloroplast sequences could be retrieved from the three oldest samples (>99% with onefold coverage). Comparing the alignment of all reads to the chloroplast genome with the subset of only Larix-classified reads, coverages are most distinct for protein-coding genes, especially genes coding for the photosystem complex, transfer RNA and ribosomal RNA. The coverage of the complete data set in these regions is higher, in the case of ribosomal RNA, even orders of magnitude higher, whereas the same regions are low in coverage or even contain gaps in the alignment of Larix-classified reads. These coding regions are highly conserved across taxa (Green, 2011), and as the short reads can also be attributed to other organisms, they are classified to a higher taxonomic rank than Larix. The gaps in the alignment of Larix-classified reads can therefore be attributed to the conservative bioinformatic approach of only including unambiguously classified Larix reads and are not the result of missing sequences in the sample.

F I G U R E 4
Alignment of Larix-classified reads from hybridization capture data set against the Larix chloroplast reference genome. The coverage per position is depicted in grey. For the 294 sites, variable between the Larix gmelinii and Larix sibirica chloroplast genomes, colour indicates how many reads correspond to the variation found in each of the two species or if a read contained a variation found in neither of the two species ('other'

Genome position
Analysis of DNA damage patterns in the Larix chloroplast alignment revealed C-to-T substitution rates typical for ancient DNA ( Figures S1 and S2). Typical for the preparation of single-stranded libraries, these substitutions could be observed both at the 3′ end and at the 5′ end of the molecules . C-to-T substitution rates increased with sample age, in line with previous observation (Pedersen et al., 2016;Sawyer et al., 2012). Mapped read and insert lengths ranged from 50 to 340 bp (mean 92 bp), showing the short fragment length typical for ancient DNA (Green et al., 2008).

| Larix sibirica variants present over time
When comparing the ancient reads to chloroplast reference genomes from Larix gmelinii and L. sibirica, the great majority of reads carry L. gmelinii variants with a low frequency of L. sibirica variants in all four samples. In contrast, the analysis of one mitochondrial marker derived from the same core by Epp et al. (2018) showed a mixture of mitotypes typical for each of the respective species, with relatively high rates of the L. sibirica mitotype-except for the most recent sample, which showed clear dominance of the L. gmelinii mitotype-pointing to a co-occurrence of both species throughout most of the sediment core. In the genus Larix, chloroplasts are predominantly inherited paternally (Szmidt et al., 1987) whereas mitochondrial DNA is inherited maternally (DeVerno et al., 1993), a phenomenon which has been reported for almost all members of the conifers (Neale & Wheeler, 2019). This biparental inheritance results in different rates of gene flow and subsequently asymmetric introgression patterns (Du et al., 2009;Petit et al., 2004). Simulations (Currat et al., 2008) and molecular studies on a range of Pinaceae (Du et al., 2009(Du et al., , 2011Godbout et al., 2012) showed that the seed-transmitted mitochondria, which experience little gene flow, introgress more rapidly than the pollen dispersed chloroplasts, which experience high gene flow. A second finding of these studies is that introgression occurs asymmetrically from the resident species into the invading species. An expected result of introgression is therefore a population carrying mitotypes of the former local species and chlorotypes of the invader.
In the case of the population history of L. sibirica and L. gmelinii in their contact zone, Semerikov et al. (2013) found evidence for the asymmetric introgression of L. sibirica mitotypes in a population carrying only L. gmelinii chlorotypes, confirming the natural invasion of L. gmelinii into the range of L. sibirica. Here, we corroborate these findings with a distinct discrepancy between relatively high rates of L. sibirica mitotypes as reported before (Epp et al., 2018) and low rates of L. sibirica in the chloroplast reads found in this study.
This points to an invasion of L. gmelinii in a former population of L. sibirica prior to the date of our oldest sample (6700 cal-BP).
Further evidence in support of this scenario is found in the results from a lake sediment core 250 km southwest of the study site (Epp et al., 2018), where samples reaching back to 9300 cal-BP show exclusively L. sibirica mitotypes, before they were gradually replaced by the L. gmelinii mitotype.
Our study shows that by capturing the complete chloroplast genome, we achieve a high resolution and can detect species-specific variants even at low frequencies. Further studies should also include mitochondrial sequences in the target enrichment to collect data from several markers or potentially the complete mitochondrial genome. By combining the two organelle genomes in a hybridization capture experiment, it would be possible to study hybridization and introgression events in detail, which would help to deepen our understanding of population dynamics over long time scales. Holocene climate deterioration has also been inferred by various studies (Andreev et al., 2004;MacDonald et al., 2000MacDonald et al., , 2008Pisaric et al., 2001) and is in correspondence with the reconstructed global cooling trend of the middle to late Holocene (Marcott et al., 2013).

| CON CLUS IONS
Siberian larch forest covers vast areas of northern Asia with Larix as the only tree-forming species. Lake sediments containing ancient DNA constitute an archive to answer the question of how larch forests respond to changing climate, but the low amount of target DNA in combination with a complex mixture of sequences makes them challenging material to study the population dynamics of a specific species. Here, we have shown the success of hybridization capture of complete chloroplast genomes from 6700-year-old lake sediments originating from northern Siberia. Shotgun sequencing of sedaDNA prior to enrichment showed that, depending on the cautiousness of the bioinformatic approach, only very low rates of reads can be securely assigned to taxa even at the domain level. By using PCR-generated baits covering the whole chloroplast of Larix for hybridization capture, we could achieve increases by several orders of magnitude of assignable reads. The enrichment of Larix reads was most distinct, but plant DNA in general was also enriched. With ancient DNA from lake sediments, hybridization capture thus offers the potential of not only analysing the target species in depth, but also studying the taxonomic diversity of the sample in a similar way to traditional molecular barcoding approaches. The method is more costly than the metabarcoding approach, and computationally more complex, but brings the advantage of not being restricted to a spe-

ACK N OWLED G EM ENTS
We thank our Russian and German colleagues who helped in fieldwork in 2011 to obtain the samples. Nick Mewes is highly acknowledged for assistance in the laboratory. We also thank Cathy

DATA AVA I L A B I L I T Y S TAT E M E N T
The Illumina sequence data are submitted to the European Nucleotide Archive under Project Number PRJEB35838, Accession Numbers ERS4197088-ERS4197099 (Schulte et al., 2020).