Combining the least cost path method with population genetic data and species distribution models to identify landscape connectivity during the late Quaternary in Himalayan hemlock

Abstract Himalayan hemlock (Tsuga dumosa) experienced a recolonization event during the Quaternary period; however, the specific dispersal routes are remain unknown. Recently, the least cost path (LCP) calculation coupled with population genetic data and species distribution models has been applied to reveal the landscape connectivity. In this study, we utilized the categorical LCP method, combining species distribution of three periods (the last interglacial, the last glacial maximum, and the current period) and locality with shared chloroplast, mitochondrial, and nuclear haplotypes, to identify the possible dispersal routes of T. dumosa in the late Quaternary. Then, both a coalescent estimate of migration rates among regional groups and establishment of genetic divergence pattern were conducted. After those analyses, we found that the species generally migrated along the southern slope of Himalaya across time periods and genomic makers, and higher degree of dispersal was in the present and mtDNA haplotype. Furthermore, the direction of range shifts and strong level of gene flow also imply the existence of Himalayan dispersal path, and low area of genetic divergence pattern suggests that there are not any obvious barriers against the dispersal pathway. Above all, we inferred that a dispersal route along the Himalaya Mountains could exist, which is an important supplement for the evolutionary history of T. dumosa. Finally, we believed that this integrative genetic and geospatial method would bring new implications for the evolutionary process and conservation priority of species in the Tibetan Plateau.


Introduction
The study of measurement and modeling of dispersal of organisms associated with genetic data is of growing inter-est in the field of landscape ecology (Tischendorf and Fahrig 2000;Sork and Smouse 2006;Wang et al. 2009;Lowe and Allendorf 2010). Evidence suggests that dispersal is a key processes determining the spatial population structure of species, in particular historical dispersal and resulting gene flow among populations affected by the Quaternary climate fluctuations, profoundly influenced the diversity of current species (e.g., Hewitt 2000Hewitt , 2004Lindborg and Eriksson 2004;Lessa et al. 2010;Sandel et al. 2011). Therefore, locating historical dispersal and assessing its effects are crucial in understanding the demographic and evolutionary history of species, and more practical, in providing significant illustrations for conserving endangered species in the face of climate change (Brown and Yoder 2015).
In past decades, the developing field of phylogeography has shown that a number of temperate plants experienced repeated expansion-contraction in species ranges during the late Quaternary period, and several hypothesized colonization routes from the glaciation refugia to ice-free areas have been predicted on the European and North American continents (e.g., Taberlet et al. 1998;Soltis et al. 2006). For these studies, population demography associated with (re)colonization events can usually be inferred using coalescent theory and statistical methods (Nielsen and Wakeley 2001;Excoffier 2004;Hey and Nielsen 2004); however, the dispersal routes in a spatial context remain unclear. Fortunately, given habitat heterogeneity as a friction layer using geospatial and environmental data, several approaches such as the least cost path (LCP) method and circuit theory have made it possible to explore the location of dispersal corridors across the landscapes (Vignieri 2005;McRae and Beier 2007;Zeller et al. 2012;Graves et al. 2014). Among these methods, the LCP calculation coupled with species distribution models (SDMs) and population genetic data have been applied to identify landscape connectivity in a spatially explicit framework (Chan et al. 2011;Brown 2014). This integrated LCP method is mainly based upon two premises (Chan et al. 2011): one states that species distribution within suitable habitat could have facilitated population connectivity, namely high probability of occurrence in the SDMs has low cost to dispersal through the landscape matrix; the second precondition is that at molecular level, populations with shared haplotypes usually experienced dispersal between sample localities. Thus, if you obtained all the shared haplotypes between populations and past species distribution, it would be feasible to determine historical dispersal routes of plant species during the late Quaternary using the combined LCP method.
Within the subalpine zone of the eastern and central Himalaya and the Hengduan Mountains (HHM) region, several cold-inferring forest species that belong to Picea, Abies, Tsuga, and Taxus, are widely distributed (Singh and Singh 1987;Xu et al. 2014). During the late Quaternary, most of plants in the HHM region experienced little change in their distribution, resulting from stable climate and a large geographical barrier ("Mekong-Salween Divide") that hindered species dispersal in the Hengduan Mountains ( Fig. 1; e.g., Opgenoorth et al. 2010;Li et al. 2011;Yue and Sun 2014). However, a few cold-tolerant species, such as Tsuga dumosa, Picea likiangensis, and Taxus wallichiana, experienced unusual population expansion from the Hengduan Mountains to the Himalaya in this period (Cun and Wang 2010;Liu et al. 2013); moreover, the population expansion mode of these cold-tolerant species is quite contrary to the contraction reflection of species in other temperate regions when in response to the last glaciation (Taberlet et al. 1998;Soltis et al. 2006). Thus, a question should be raised, which whether or not several expansion routes could really exist in the HHM region during the late Quaternary, if they existed, we want to know the location of these dispersal corridors.
In this study, we chose a widespread subalpine species T. dumosa as our model to identify its dispersal routes. There are three ideal reasons for choosing this species: first of all, according to its previous phylogeographical work, a series of conclusions inferred from molecular evidence have revealed that this hemlock species underwent range expansion after the largest glaciation (0.8-0.6 Ma) in the HHM region (Cun and Wang 2010), which brings about a hypothesis that there could be one or several historical dispersal corridors exist in this region; secondly, because paternal chloroplast (cp), maternal mitochondrial (mt), and biparental nuclear (n) DNA adopted by its previous work represent various inheritance and dispersal features, if we use these three kinds of genomic markers, we can study the effects of different levels of gene flow through seed and pollen-mediated; thus, a more comprehensive prediction of dispersal corridors become more convincing; eventually, substantial fossil records of Tsuga would provide actual evidence for the dispersal estimation. Therefore, for these three particular reasons, T. dumosa is a finer case study for exploring the dispersal corridors using the LCP method. For this study, our main objectives are (1) to identify the dispersal routes during the late Quaternary in the HHM region and assess their reliability from various evidence, and (2) to compare the concordance/discordance in dispersal corridors across time periods and molecular markers; (3) we expected to reveal some valuable implications for the evolutionary history and conservation priority of plants in the Tibetan Plateau (TP).

Determination of shared haplotypes
To explore the possible dispersal routes of T. dumosa, which populations have shared haplotypes needed to be firstly determined. According to the previous phylogeographical work from Cun and Wang (2010), they col- lected 18 sample sites (503 individuals), and 19, 8, and 90 haplotypes were determined for cpDNA, mtDNA, and nDNA, respectively. Among these haplotypes, 7, 4, and 33 are the number of shared haplotypes. For this study, we adopted the haplotype information of 18 localities, and populations with shared haplotypes from the three types of molecular markers were shown in the haplotype distribution maps (Fig. 2).

Species distribution modeling
We used environmental and occurrence data to build species distribution models, identifying areas that T. dumosa could exist across three time periods. The three periods are the current conditions (~1950-2000), the last glacial maximum (LGM,~21-18 ka) based on the Community Climate System Model (CCSM) and the last interglacial (LIG,~140-120 ka). Environmental data in this study, including altitude and 19 bioclimate variables (Table S1), were obtained from the WorldClim database (http://www.worldclim.org/; Hijmans et al. 2005) with a resolution of 2.5 min for each layer. It is widely known that several climate variables are highly correlated with each other, in order to avoid overfitting of SDMs, we conducted an autocorrelation test of 19 bioclimate variables. Several bioclimate variables of the three periods that had relatively low Spearman's coefficients (r < 0.75) were retained for subsequent analysis (Table S2).
For the second necessary data set for SDMs, species occurrence localities were mainly gathered from literature (Cun and Wang 2010), the Global Biodiversity Information Facility (GBIF, http://data.gbif.org), the Chinese Virtual Herbarium (CVH, www.cvh.org.cn) and field surveys in the year of 2013. For SDMs to be performed well, occurrence data require to be spatially independent, so the elimination of spatial clusters of localities is important for model calibration and evaluation. Because when spatial clusters of localities exist, models are often over-fit toward environmental biases (reducing the model's ability to predict spatially independent data) and model performance values are inflated (Veloz 2009;Hijmans 2012;Boria et al. 2014). Hence, a spatially rarefy occurrence data method at several distances was applied according to topographic and climatic heterogeneity (Boria et al. 2014). Here, localities of occurrence were spatially filtered at 5, 15, and 25 km 2 in areas of high, medium and low climatic heterogeneity, respectively. For the estimation of climatic heterogeneity, we conducted a principal component (PC) analysis for 19 bioclimate variables, and then we calculated the mean standard deviation of the first three climate PCs. Finally, the remaining 65 localities together with low correlated environmental layers were used for species distribution modeling.
In this study, we used the maximum entropy algorithm in MAXENT v3.3 (Phillips et al. 2006), to estimate current and past distributions of T. dumosa, with the settings for convergence threshold (10 À5 ), number of iterations (500) and localities of occurrence divided into testing data sets and training data sets (20 and 80%, respectively). In addition, we carried out pairwise comparison of the binary SDMs for the three periods (e.g., from the last interglacial to the last glacial maximum) to predict range shifts since the last interglacial period.

Visualizing dispersal corridors
By applying the LCP method, we mapped the dispersal routes of T. dumosa since the late Quaternary using SDMtoolbox (Brown 2014) in ArcGIS 10.1 (Environmental Systems Research Institute, Inc., Redlands, CA). The detailed procedures for this method were as follows ( Fig. 3): firstly, we obtained a dispersal cost layer (resistance layer) by inverting the SDMs (1-SDM), and subsequently applied the resistance layer to create a cost distance raster for each sample locality (18 localities in all); secondly, based on the cost distance raster, corridor layers were established between two localities that only had shared haplotypes; thirdly, while most previous studies (e.g., Vignieri 2005;Wang et al. 2009) always obtained a linear dispersal route, which may oversimplify the demographic process of species; hence, we applied the categorical LCP approach to better depict environmental heterogeneity in dispersal. Here, we classified the value of each corridor layer into four intervals (three cutoff values), if the lowest value of corridor layer was hypothesized as 1.00, the four intervals are 1.00-1.01, 1.01-1.02, 1.02-1.05, and 1.05-highest value, then these four intervals were reclassified as new values 5, 2, 1, and 0, respectively; finally, we summed up all of the pairwise reclassified corridor layers and standardized (0-1), and the dispersal maps of T. dumosa were eventually identified in an explicit landscape. For the dispersal corridors across three periods and molecular markers, the LCP approach was carried out nine times in all with same shared haplotypes patterns.
Estimation of gene flow among regional groups Estimation of gene flow among regional groups has been able to provide a fine proof for investigating dispersal corridors of species. As mentioned above, evidence suggests plants migrated from the Hengduan Mountains to the Himalaya, we hypothesized there was higher gene flow along this colonization route. To verify the hypotheses, according to the publication of Cun and Wang (2010), we divided the distribution range of T. dumosa into four regional groups: the Himalaya (Hi) group, the eastern Hengduan Mountains (EHe) group, the western and central Hengduan Mountains (W&CHe) group, and the southern Hengduan Mountains (SHe) group. To assess the level of gene flow among the four groups, a maximum-likelihood estimation of migration rates based on coalescence theory was implemented using the software Migrate v3.6.4 (Beerli 1997;Beerli and Felsenstein 1999). Here, only adjacent groups were estimated based on cpDNA/mtDNA haplotype sequence (Accession No. HM162943-HM162973, HM163057-HM163059). Because in the study of Cun and Wang (2010), they did not provide the number of private haplotype in each population, that made it impossible for estimation of gene flow and subsequent analysis of mapping genetic divergence based on nDNA. For this approach, we concentrated on asymmetrical gene flow and unequal effective population sizes estimated from F ST values. We conducted the program three times with different random seed numbers under the same condition (10 short chains, 500 trees used out of 50,000 sampled and 3 long chains, 5000 trees used out of 500,000 sampled).

Mapping genetic divergence pattern
Geographical barriers usually lead to genetic divergence and hinder species dispersal. So a genetic divergence pattern could indicate which areas represent geographical barriers, and which areas with low values of genetic divergence could be suitable for dispersal. Here, we mapped the pattern of genetic divergence using the Genetic Landscape GIS Toolbox (Vandergast et al. 2011) in ArcGIS v10.1 (ESRI). Based on cpDNA/mtDNA haplotype sequence, the net number of sequence differences (D A ; Nei and Li 1979) among 18 populations as the index of genetic divergence was firstly calculated using the ARLE-QUIN v3.5 package (Excoffier et al. 2005) under the Tamura and Nei model of nucleotide evolution (Tamura and Nei 1993). Subsequently, genetic divergence values (D A ) were mapped at the midpoints between populations and finally a continuous surface (namely genetic divergence pattern, 1-km 2 grid cell size) was created using an inverse distance weighted interpolation algorithm.

Distribution changes since the last interglacial
Due to cycles of glaciation in the Quaternary period, species distribution was commonly altered within the temperate regions. With no exception in this study, when comparing with species distribution of three periods, we identified the phased difference in ranges of T. dumosa. The detailed range changes showed that (1) from the last interglacial to the last glacial maximum, species distribution in the Himalaya mainly presented an east-westward contraction and little downstream movement within the Himalayan valleys, in the meanwhile, for the EHe regional group, the species expanded northward (Fig. 4A); (2) after the last glacial maximum, the W&CHe group developed a wide range of eastward and northeastward expansion (Fig. 4B); (3) as a whole, the range shifts of Himalayan hemlock species have demonstrated little east-westward and northeastsouthwestward trends since the last interglacial (Fig. 4C).

Dispersal corridors across time periods and genetic markers
Putative dispersal corridors in the three periods were visualized based on three genomic markers (Fig. 5). When comparing the dispersal routes across time periods and genetic markers, dispersal generally followed the southern slope of Himalaya, but the degree of likely dispersal varied across time and genomes, with higher dispersal in the present and in the mtDNA haplotypes. Another relatively higher area of dispersal from the western and central Hengduan Mountains to the eastern Hengduan Moun-

5787
tains was identified using cpDNA haplotypes. Besides, the dispersal route of T. dumosa is partially overlaid with the Yarlung Zangbo valley; we assumed this valley might be another important historical dispersal corridor.

Patterns of gene flow and genetic divergence
According to gene flow patterns estimated from coalescent analyses based on cp/mt DNA (Fig. 6), the path from the Hengduan Mountains to the Himalaya had strong gene flow, which was mostly overlapped with the Himalayan dispersal corridor. Further, within the genetic divergence map (Fig. 7), area of Himalaya had low genetic divergence value, suggesting that no obvious barrier existed in this dispersal area during the Quaternary.

Discussion
Himalayan dispersal route and other dispersal corridors in the TP By virtue of the LCP approach (which is commonly used in the field of landscape ecology), integrated with SDMs and molecular data, we have been able to identify the dispersal corridors of T. dumosa in the late Quaternary. In the TP, one directional dispersal approximately from the Hengduan Mountains to the Himalaya has been repeatedly assumed by several phylogeographical studies (e.g., Yang et al. 2008;Liu et al. 2013), until now, we understand the location of the dispersal route for this coniferous forest species, which might have migrated along the southern slope of the Himalaya from the Hengduan Mountains during the late Quaternary period. Meanwhile, series of consequences in this study, such as distribution changes since the last interglacial, the high strength of gene flow and no noticeable barriers in this direction, also imply the existence of this route. Moreover, a wide occurrence of Tsuga pollen records (Cun and Wang 2010) on the southern slope of the Himalaya provides more convincingly evidence. Apart from the Himalayan route, we assumed the Yarlung Zangbo Valley may be another dispersal corridor in the HHM region. According to a small amount of Quaternary Tsuga pollen and pollen of other coniferous species found within the valley (Song and Liu 1982;Kong et al. 1996;Tang and Shen 1996), it suggests that coniferous forests ever distributed in this valley. In the other hand, we know that the Yarlung Zangbo Valley with an average altitude below 4000 m is the largest vapor channel in the TP (Lin and Wu 1990), the features of lower topography together with strong airflow were likely to be helpful in the movement of seed and pollen of species. As a result, all above-mentioned evidence demonstrates that the Yarlung Zangbo Valley may be another historical dispersal path. Furthermore, with regard to the sympatric alpine species Pedicularis longiflora, the LCP estimation for its dispersal corridor also focused on the area of Yarlung Zangbo Valley (Yu et al. 2014).

Concordance and discordance in dispersal corridors across time periods and genetic markers
Generally, common geological history and climate conditions may lead to similar biogeographic patterns (Taberlet et al. 1998;Soltis et al. 2006). In view of the LCP method, the similar patterns containing species distributions across time periods and spatial haplotype structures among different genetic markers could lead to congruence in dispersal routes. For the hemlock species within the HHM, one side, the relatively stable climate during the late Quaternary region usually resulted in little change of species distribution (Cun and Wang 2010;Liu et al. 2013), generating the almost consistent in dispersal areas across time. From another point of view, the Indian monsoon offered strong driving force to make wind-dispersed pollen and seed move from the Hengduan Mountain to the Himalaya, accounting for the purification of haplotypes because of successive founder effects, and finally leading to similar haplotype patterns of three genetic markers. Consequently, although cpDNA, mtDNA, and nDNA reflect propagation characteristics of pollen and seed, the relative concordance in dispersal corridors was still exhibited across the three genomic markers.
Even if the overall accordance in dispersal routes was apparent, little distinction was still detected, especially for the degree of dispersal identified by different genomes. In hemlock species, the inheritance of mtDNA is typical maternal, reflecting seed movement exclusively (Neale and Sederoff 1989). Because seed dispersal was limited, it is more liable to generate genetic differentiation and haplotypes purification among the Himalayan populations. Go back to the LCP method, haplotype composition of colonized populations is more closer, indicating the higher degree of dispersal along the colonization route; in contrast, the paternal cpDNA and biparental nDNA, both represent the feature of pollen dispersal (nDNA also represents seed dispersal). Although extensive pollen transmissions occurred among populations, it would counteract the dispersal strengthen in colonization direction, resulting in lower level of dispersal than mtDNA along the Himalaya Mountains. Additionally, because tracing the imprint of historical seed colonization could reveal the true demographic history of species (Schaal et al. 1998), along with our results, thus we assumed that the estimation of dispersal route based on maternal inheritance of genome may be more reliable.

Implications for the evolutionary history and conservation of alpine plants
Undoubtedly, determining the location of dispersal is an important supplement for the evolutionary history of species. The dispersal routes we identified would bring valuable insights to other alpine species in the HHM region. Except for T. dumosa, numerous alpine species migrate from the Hengduan Mountains to the Himalaya during the late Tertiary or Quaternary, we speculated they may spread along the southern slope of Himalaya or Yarlung Zangbo Valley or other uncovered corridors. Besides the location of dispersal routes, we really want to understand what factors drive the dispersal within the HHM region? It is well known that the Hengduan Mountains was regarded as an important region of origin and radiation of alpine plants and refugia during the Quaternary glaciation (Wu 1988a,b;Qiu et al. 2011;Yu and Zhang 2013). When they were confronted with rapid uplift of the TP in the late Tertiary and cycles of climatic oscillations in the Quaternary, in order to respond to the vast environmental changes, various evolutionary processes took place, such as local adaptation, long distance dispersal, and recolonization into the interior of the TP from glacial refugia after the glaciation. Besides, the Indian monsoon and East Asian monsoon originated in the late Miocene ) also could offer continuous force to facilitate plant pollen and seed spread westward to the TP interior. Thus, during the Quaternary period, both Quaternary climatic fluctuations and Indian monsoon contributed to promote the dispersal of alpine plants in the HHM region (Liu et al. 2014;Wen et al. 2014).
The higher degree of dispersal area we identified is equally important for the conservation of species diversity in the HHM region. Based on the trends of species range shift responding to past climate change, we predict plant species will experience expansion toward the Himalaya Mountains under the future global warming. Thus, the Himalaya dispersal area should be prioritized to protect when making conservation strategies. Besides, in face of ongoing habitat destruction relating to human activities, such as building hydropower station and increasing number of tourists, weighed and strategic management for dispersal corridor should be taken into account.

Conclusion
Our main objective in this study was to identify the dispersal routes of T. dumosa. Superficially, this appeared to be relatively straight-forward, however, we realized that it could complement the evolutionary history of alpine plants in the TP. We can imagine that during the Quaternary period, in response to cycles of glaciation, plant species may have migrated along the Himalaya Mountains from the Hengduan Mountains, and evidence now demonstrates that this Himalayan dispersal route actually existed. However, we suggest that this may not be the only path from the Hengduan Mountains to the interior of the TP, there might be several corridors to be identified till more species involve to predict and more fossil records were discovered in future studies. Furthermore, for some endangered species, the estimation of potential dispersal activity is important for their conservation under future climate change.