Cryptic speciation in gentoo penguins is driven by geographic isolation and regional marine conditions: Unforeseen vulnerabilities to global change

The conservation of biodiversity is hampered by data deficiencies, with many new species and subspecies awaiting description or reclassification. Population genomics and ecological niche modelling offer complementary new tools for uncovering functional units of phylogenetic diversity. We hypothesize that phylogenetically delineated lineages of gentoo penguins (Pygoscelis papua) distributed across Antarctica and sub‐Antarctic Islands are subject to spatially explicit ecological conditions that have limited gene flow, facilitating genetic differentiation, and thereby speciation processes.


| INTRODUC TI ON
Evolutionary ecology aims to elucidate the spatial pattern of intraspecific genetic diversity and the evolutionary and ecological processes that underpin such patterns. These data enable policymakers to make informed decisions regarding biodiversity conservation and management. However, our understanding of the spatial patterns of biodiversity is often based on incomplete information (Hortal et al., 2015), with many new species and subspecies awaiting description or reclassification and an immense pool of intraspecific diversity having gone largely undocumented. As a result, biodiversity conservation is hampered by such data deficiencies that limit our understanding of the evolutionary patterns and processes that give rise to biodiversity, a situation referred to as the "Darwinian shortfall" (Diniz-Filho, Loyola, Raia, Mooers, & Bini, 2013). In this context, new techniques for studying population-level genomics and spatial variation of the ecological niche offer complementary tools for uncovering functional units of phylogenetic diversity that have heretofore been obscured (Chen et al., 2019;Pahad, Montgelard, & van Vuuren, 2019). how conservation policy can be directly impacted by new insights obtained through the integration of larger genomic datasets with novel approaches to ecological modelling. This is particularly pertinent to polar environments that are among the most rapidly changing environments on earth.

K E Y W O R D S
diversification, ecological niche overlap, gentoo penguin, subspecies Over the past decade, an increasing number of studies have revealed that the macrofauna of the Southern Ocean shows contrasting patterns of intraspecific diversity, from the existence of single evolutionary units distributed all the way around Antarctica (Cristofari et al., 2016;Díaz, Féral, David, Saucède, & Poulin, 2011) and/or throughout sub-Antarctic, to a multitude of geographic clades, each restricted to a specific area (González-Wevar et al., 2019). Such endemism suggests that the isolation of populations has led to diversification through vicariance after colonization (Chenuil et al., 2018;Halanych & Mahon, 2018;Price, 2007). Oceanic fronts and the great geographic distance that separates Antarctica from the sub-Antarctic archipelagos and islands can limit dispersal and promote the divergence of evolutionary units within species (Clucas et al., 2018;Vianna et al., 2017). Moreover, this divergence may be greater when regional populations become exposed to dissimilar environments.
Ultimately, the inter-regional differentiation of their ecological niches can result in shifts in allele frequency among populations that may lead to local adaptation, and given sufficient time, to speciation (De Queiroz, 2007;Graham, Ron, Santos, Schneider, & Moritz, 2004).
Molecular studies of the gentoo penguin P. papua have revealed old and cryptic lineage diversification across the Antarctic and sub-Antarctic. This deep genetic structure among populations can be explained by the Antarctic Polar Front (APF) separating colonies, by the large geographic distance among breeding colonies, and life history traits such as high natal philopatry and the coastal lifestyle of gentoo penguins limiting dispersal (Clucas et al., 2018;de Dinechin et al., 2012;Levy et al., 2016;Vianna et al., 2017).
Diversification between gentoo penguin colonies from Crozet, Kerguelen, the Falkland/Malvinas Islands and Antarctica took place between 3.6 and 1.3 million years ago (Mya; Vianna et al., 2017).
The geographic distribution of the gentoo lineages as recovered using molecular DNA data (Vianna et al., 2017) is partially inconsistent with the present classification of subspecies using morphology: northern gentoo (P. papua papua) are distributed north of 60°S across the sub-Antarctic region, and southern gentoo (P. papua ellsworthii) are distributed between 60° and 65°S around Antarctica (Stonehouse, 1970). Diversification of gentoo penguin clades (lineages) could be explained in terms of vicariance processes induced and/or reinforced by geographic barriers, followed by selective forces in response to local environmental variables.
Penguin species require both terrestrial breeding areas with suitable conditions for thermoregulation that favour their reproduction and nearshore marine habitats that supply sufficient food resources. Understanding the association of each cryptic lineage with its local environment shows affinities or local adaptation which may in turn be used to investigate the drivers and limitations of how lineages may respond to future environmental change.
This is particularly relevant in species with deep intraspecific genetic structure to enable accurate designation of the conservation status of member of a species complex by, for example, IUCN or Birdlife International. The gentoo penguin is currently listed as a single species widely distributed across the sub-Antarctic region and part of the Antarctic Peninsula, whose category by IUCN is "Least Concern" due to its stable population trends. However, consideration of the spatial structure of P. papua lineages might necessitate revising their conservation status.
In the present study, we first explore the biogeographic extent and drivers of the global P. papua distributional range by modelling the respective marine and terrestrial ecological niches of this penguin species as a whole. Using mtDNA sequences and genome-wide SNP, we then establish the genetic relationships among a comprehensive sampling of gentoo penguin populations distributed across the range of the species. Our sampling includes previously unstudied colonies, such as from Marion, Martillo and Macquarie Islands.
In addition, we determine and compare the terrestrial and marine ecological niche and macroenvironment envelopes that each lineage occupies. Finally, we investigate the role of ecology as a driver of lineage differentiation among gentoo penguin colonies.

| Species distribution modelling
We evaluated the Pygoscelis papua biogeographic range and associated spatial ecological niche drivers with a species distribution model (SDM; Figure 1, Figures S1-S4). Available georeferenced data points for all known gentoo breeding colonies were compiled and supplemented with information on the spatial presence of penguins taken directly from XY-coordinated Maritime Antarctica monitoring sites (Woehler, 1993), from additional literature (Lescröel & Bost, 2006) and from governmental reports on South Atlantic and sub-Antarctic territory dependencies. All point data were filtered by the spatial resolution of environmental data to a 5 arc-min resolution. Climatic and macroecological variables used as environmental predictors for SDMs were extracted from the databases WorldClim2 (terrestrial; Fick & Hijmans, 2017) and Bio-orAClE 2.0 (marine; Assis et al., 2017). We used a Pearson correlation test (r > .90) to examine collinearity between all pairs of variables offered by these repositories, and when these pairs had r > .90, we kept those with higher potential biological relevance. The final selection was composed of 7 variables for modelling the marine environment: sea ice cover maximum (o1), max/min primary productivity (o9/o10), max/min salinity (o11/o12) and max/ min surface water temperature (o13/o14); and another set of 7 variables for modelling the terrestrial environment: mean diurnal temperature range (bio2), temperature isothermally (bio3), temperature annual range (bio7), mean temperature of the warmest calendar quarter (bio10), precipitation seasonality (bio15), mean precipitation of the wettest quarter (bio16) and mean precipitation of the warmest quarter (bio18). All SDMs were built with the MaxEnt algorithm (Phillips, Anderson, & Schapire, 2006). Logistic MaxEnt outputs provided suitability gradients that helped us to visualize terrestrial and marine macroenvironmental preferences for the species as a whole ( Figure 1).
Where the two habitats overlapped, we displayed the terrestrial output in "hybrid" cells and inferred marine suitability based on adjacent cells. We tested the models with a 30% random subset and calculated the true statistic skill (TSS) on the minimum training presence threshold as an indication of the robustness of the models.

| Samples collection for genetic data
We evaluated genome-wide SNP using ddRAD data and the mtDNA control region for gentoo penguin across the Southern Ocean ( Figure 2). For ddRAD data, we analysed a total of 110 individuals, up to 13 individuals per population (Table 1) sampling procedures and permit details are provided in Appendix S1.
DNA was isolated from blood samples using the salt protocol from Aljanabi and Martinez (1997) with modifications described in Vianna et al. (2017), and from faecal samples using the QIAamp DNA Stool kit (Qiagen). We evaluated degradation of genomic DNA through electrophoresis on a 1% agarose gel. Extractions were quantified using a Qubit fluorometer (Thermo Fisher Scientific).

F I G U R E 2
Genomic SNP data analyses supporting the existence of four main genetic groups and several sub-clusters or lineages (a-d).

| ddRAD library preparation
We prepared ddRAD libraries for gentoo penguins, following the protocol described in Peterson, Weber, Kay, Fisher, and Hoekstra (2012).
Genomic DNA (500 ng) of each individual was digested using 0.

| ddRAD data processing
SNP sets were produced from raw reads assembled to the gentoo penguin reference genome (Appendix S1) using stACks version 2.

| Diversity indices
We evaluated the differences in genetic diversity for whole genome SNP coverage across the breeding colonies. This was done by calculating the expected heterozygosity (H e ), observed heterozygosity (H o ) and allelic richness (A r ) with rarefied allele counts, using the hiErfstAt package version 0.04-22 (Goudet, 2005) in r v 3.5.1 (R Core Team, 2018).

| Population genetic structure
To assess the influence of varying numbers of loci on determining population genetic structure, we generated 6 random subsets each of 50, 500, 1,000, 2,000, 3,000 and 4,000 loci ( Figure S5). For each set of the above number of loci, we performed a DAPC analysis (Jombart, Devillard, & Balloux, 2010) in Adegenet (Jombart, 2008;Jombart & Ahmed, 2011) to estimate both, the number of genetic groups using the Bayesian information criterion (BIC), and to determine the genetic structure of each subset using a number of principal components equal to N/3 where N is the total number of individuals. We also calculated the pairwise F ST for the total number of SNP, and F ST and ɸ ST for the mtDNA HVRI data, among locations using ArlEquin v. 3.5.2.2. We summarized the results from Arlequin graphically using the R functions for Arlequin XML files ( Figure S6).
Statistical significance of the estimates was determined with 10,000 permutations. The p-value for pairwise F ST and ɸ ST between populations was corrected using a FDR.
To determine the number of genetic groups using the total number of SNP after filtering procedures, a Bayesian clustering approach was implemented using struCturE v 2.3.4 (Pritchard, Stephens, & Donnelly, 2000). Clusters (K) varied from one to eleven, corresponding to the number of breeding colonies sampled plus one ( Figure 2b).
Ten replicate runs were performed in parallel using StrAuto (Chhatre & Emerson, 2017). For each run, the genetic ancestry of each individual was estimated based on the admixture model without any prior population assignment under a correlated frequency model, with 500,000 Markov chain Monte Carlo (MCMC) replicates and with a 10% burn-in period. The 10 replicates obtained for each value of K were summarized with Clumpp (Jakobsson & Rosenberg, 2007) and plotted using distruCt (Rosenberg, 2004). The optimal value of K was identified according to the Evanno's method (Evanno, Regnaut, & Goudet, 2005) as implemented in Structure Harvester (Earl & vonHoldt, 2012).

| Species Tree, phylogenetic reconstruction and divergence time
The species tree SNP data were generated in snApp version1.3.0  (Figure 2c, Figure S7). Five Adélie penguin samples were sequenced and then incorporated into the phylogeny (KX925508-KX925512), and a sequence from the emperor penguin (Aptenodytes forsteri) was used as the outgroup (Li et al., 2014). The model of nucleotide substitution implemented was determined using jmodEltEst2 v.

| Genomic-based species delimitation
Species delimitation hypotheses were tested using the SNP data with a species delimitation method that use Bayes factors (BFD*; Leaché, Fujita, Minin, & Bouckaert, 2014) implemented in SNAPP.
Alternative species delimitation scenarios were allowed to be compared with this method in an explicit MSC framework by calculating and comparing marginal likelihood estimates (MLE) for each evaluated model.
We conducted several independent runs in Path Sampler (Lartillot & Philippe, 2006) in BEAst with 12 steps each consisting of 100,000 MCMC generations. We used a burn-in of 10,000 generations, after which we sampled every 100 steps using an alpha value of 0.3. These settings were sufficient to ensure convergence and obtain ESS > 500. The Bayes factor (BF) test statistics were calculated, where BF is the difference in MLE (Marginal L-Estimate) between all competing models. Three competing species delimitation hypotheses were defined based on current taxonomy follow- ing Stonehouse, (1970), geographic distribution of putative species and our phylogenomic analyses. To avoid over-parametrization, we ran each model using a gamma distribution (2, 2,000) as prior distribution for the ancestral population size parameter (h), that is the "intermediate population size" scenario used for SNAPP analyses.

| mtDNA species delimitation
Two different species delimitation methods were employed to evaluate the importance of mtDNA lineage structure across the geographic range of gentoo penguins, the Automatic Barcoding Gap Discovery (ABGD) method (a non-tree-based method) and Generalized Mixed Yule Coalescent (GMYC) method (a single locus, tree-based method). The ABGD method uses genetic distance to detect a "barcoding gap" between candidate species based on genetic distance values that are not overlapping among intra-and interspecific comparisons and are independent of tree topology.
The ABGD method was performed on the online web server (http://wwwabi.snv.jussi eu.fr/publi c/abgd/) and was run with the default settings (Pmin = 0.001, Pmax = 0.1, Steps = 10, X (relative gap width) = 1.5, Nb bins = 20). The mtDNA HVRI sequence alignment (without outgroup) was used to compute a matrix of pairwise distances using simple uncorrected distance. The GMYC method was implemented in R package splits (Ezard, Fujisawa, & Barraclough, 2009). This method is based on an ultrametric phylogenetic tree such as one calibrated using a molecular clock with dissimilarities of branching rates used to infer species boundaries following a Yule process and neutral coalescent events.

| Quantification of ecological niche overlap
We examined the ecological niche overlap between the main genetic  Figure 3).

| Bioclimatic variables and population genetic structure
To determine the relative contribution of geographic position and each bioclimatic or ecological variable to the genetic structure of the neutral genotypes, we followed genotypic association analyses with a partial redundancy analysis (RDA; Figure 4) in the vEgAn package (Oksanen et al., 2019). For this, first the spatial genetic structure was estimated using the geographic coordinates of the sampling sites based on distance-based Moran's eigenvector maps, dbMEMs (Dray, Legendre, & Peres-Neto, 2006;Legendre & Legendre, 2012).
dbMEMs were determined by converting latitude and longitude into cartesian coordinates using the sodA package in R 3.22, and with these, a matrix of Euclidean distances was calculated using the dist function in vEgAn in R (Oksanen et al., 2019). Using this matrix, a rectangular matrix was created with the dbMEMs associated with latitude (dbMEM2) and longitude (dbMEM1) using the create.
dbMEM.model function in the AdEspAtiAl package. Prior to the analysis, genotype data were standardized by removing the broadscale trend using the decostand function with the Hellinger's method in vEgAn. A partial RDA was used to evaluate the environmental variables as fixed factors and dbMEM vectors as co-variables to control for the effect of the spatial distribution in the genetic structure. We determined the optimal model with respect to the environmental factors that best explained the genetic variability using the ordistep function in vEgAn according to their significance, F-ratio and AIC. We used a marginal ANOVA with 10,000 permutations to evaluate the significance of each fixed factor considered.  Figure 1 matrices were first standardized (x−mean(x)/SD(x)). We then performed (a) a partial Mantel test using genetic distance (i.e. F ST ) and geographic distance between colonies, using environmental distances (marine, terrestrial and combined) as co-variates, and, conversely, (b) we performed a partial Mantel test using genetic distance and environmental distances with geographic distances as a covariate ( Figure 5, Figure S8). Partial Mantel test was carried out using the ECodist package in R (Goslee & Urban, 2007). Finally, to estimate the joint effect of geographic and environmental distances we performed a Multiple Matrix Regression with Randomization analysis (MMRR; Wang, 2013) using the package popgEnrEport (Adamack & Gruber, 2014;Gruber & Adamack, 2015). For each set of variables, we previously

(a) (c) (b)
performed a Mantel test to evaluate the correlation with the geographic distance in order to fulfil the no-correlation assumption of the MMRR analysis (Wang, 2013). With the weight of their relative contribution on genetic differentiation as measured by MMRR analysis, we constructed a new distance matrix and conducted a Mantel test between the genetic distance and the combined effects of geographic distances and environmental distance.

| Species distribution modelling
Gentoo penguins are widely distributed across the Southern Ocean and inhabit many of its islands and coasts. Our SDM approach allowed us to identify preferred conditions in both the terrestrial and marine environments inhabited by gentoo penguins during the breeding season. Predictive performance (TSS) was high for both marine (0.85) and terrestrial (0.96) models. Gentoo penguins have a strong preference for breeding around waters of high primary productivity across a range of temperature and salinity levels, with summer temperatures at breeding colonies oscillating around a few degrees above zero, a limited diurnal range of temperatures, and moderate summer precipitation levels (Figure 1, Figures S1-S4).

| Genetic structure and lineages
Using a reduced genome approach (ddRAD) after HWE filtering, we obtained 4,429 SNP for 110 individuals across the Southern Ocean, with a median coverage of 97.33× (Table S1) and a mean quality score of 35. Using mitochondrial DNA, we identified a total of 145 haplotypes from 303 gentoo penguins. Genetic diversity was similar across populations (SNP H o = 0.30-0.39; mtDNA H d = 0.64-1.0; Table 1).
The SNP data showed agreement across the six random subsets of data comprising different numbers of loci for DAPC analyses ( Figure S5) and our coalescent-based trees generated using SNAPP with the total number of SNPs revealing four main clusters of gentoo penguins (Figure 2a These results support the existence of four main clades, along with the colony from Macquarie Island as distinct lineages.

| Environmental niche overlap
Once we observed the genetic structure among regional gentoo populations, we examined the question of what could be the underlying macroecological driver of lineage divergence. Niche overlap techniques, indicate the degree of ecological characteristics shared by two or more functional groups (Broennimann et al., 2012). Our environmental niche overlap analysis performed independently for the terrestrial and marine environments shows that all pairwise combinations of the ecological niches belonging to the four clades differ significantly in terms of equivalence (i.e., all have non-identical macroecological envelopes) for both the terrestrial and marine environments ( Table 2). The observed overlap in D-values was very low, ranging from 0% to 5% in the marine environment where only the Kerguelen clade had a small overlap with the Antarctic (1%) and South American (5%) clades. In the terrestrial environment, the overlap was also relatively small but always present, reaching a high of 15% between the Antarctic and north APF clades. The north APF clade evaluated here comprises gentoo penguins from Crozet, Marion and Macquarie Islands, not including Kerguelen Island which lies on the APF. Notably, the Kerguelen, Antarctic and South American clade overlapped with each other by 6% and 7%, respectively. These findings indicate that the ecological segregation between clades is consistently stronger in the marine than the terrestrial environment.
As all marine and terrestrial niches of the four clades are non-equivalent, we explored the niches in relation to one another (Warren, Glor, & Turelli, 2008). We found no evidence of niche evolution (dissimilarity) for any pairwise comparison. However, our niche similarity tests revealed significant results for the analyses conducted in the context of niche conservation (i.e. the niche for one clade showing greater relatedness to that of another clade than to a random simulation; Table S2). Here, both the Antarctic and South American clades had significant similarities to each other in the terrestrial environment (p = .04 and p = .03, respectively, Table S2), suggesting that these two recently diverged lineages are retaining some common ecological features from the shared ancestral macroclimatic niche; that is, they have not fully differentiated into their respective environments. In contrast, we did not find evidence of niche conservation (similarity) between the north AFP and Kerguelen clades despite their genetic relationship as sister lineages (p = .39 and p = .28, respectively, on terrestrial environment similarity, and p = .19 and p = .23, respectively, on marine environment similarity; Table S2).
Therefore, clade niches are more highly differentiated among the oldest sister populations (Crozet vs. Kerguelen) and reduced on the recent diverged ones (Antarctica and South America).
Consistent with the niche overlap scores, the results of our spatial PCA show a stronger segregation of the multivariate ellipsoids in the marine environment. As shown on the first axis in Figure 3a, the increasing temperature and salinity of the marine environment for northern populations such as the Falkland/Malvinas Islands contrast with the marine environment of the southernmost Antarctic populations, related to surrounding sea ice at the latter site. The second axis (Figure 3a) shows a strong positive effect for primary productivity, with the latitudinal extremes (Antarctica and South America) sharing similar tendencies to occupy higher productivity areas. The north APF clade and Kerguelen breeding sites have lower productivity values, possibly related to the observed mismatch of the locally preferred marine and terrestrial environments (seen in Figure 1 insets), where the most favourable (greener) feeding areas are distant from the breeding coastlines. Terrestrial PCA effects are less evident (Figure 3b). Axis 1 shows a positive effect for higher summer temperatures, effectively segregating non-Antarctic populations; the locations from north of APF clade and South American sites sustain higher precipitation in the same warmer period. Variation in seasonal distribution of temperature is higher in the Antarctic and South America than to the north of the APF clade, suggesting a small oceanic effect. Axis 2 indicates higher isothermality and diurnal temperature ranges, but this effect cannot be attributed to any clade in particular.

| Environmental and genetic redundancy analyses
Optimal models of RDAs for the marine and bioclimatic variables were in general consistent with the niche overlap results described above. The first two axes of the RDA explain 80.8% and 74.1% of the total variance for the terrestrial bioclimatic (F = 4.472) and marine model (F = 4.113), respectively, making both the general models highly significant (p = .001, Figure 4). The best-fit model for bioclimatic variables included temperature and precipitations (Table S3, Figure 4). Both variables were mainly associated with the South American group (Falkland/Malvinas and Martillo Islands) and Crozet/Marion group, respectively. Our modelling using ordistep for marine variables revealed the best-fit models included sea water temperatures, salinity and primary productivity, which were strongly associated with populations from the Falkland/Malvinas Islands and Antarctic populations, whereas primary productivity was associated with localities north of APF such as Crozet Island (Table S3; Figure 4)

TA B L E 2
Results of the (lower than random) equivalence tests for PCA between the four clades of gentoo penguin and positive effects of higher summer temperatures, whereas Antarctica, Signy Island (S. Orkney) and Martillo Island (S. America) were in turn associated mainly with reduced precipitation regimes ( Figure 4).
The partial Mantel test was highly significant (p < .05) for distance as a response variable (Figure 5a)

| D ISCUSS I ON
Conservation focuses on protecting species and their habitats while inherently assuming a strong degree of niche conservation (Wiens, Stralberg, Jongsomjit, Howell, & Snyder, 2009). The wide geographic distribution of gentoo penguins around the Southern Ocean and part of the South Atlantic spans diverse marine and terrestrial abiotic conditions and suggests that the species, seen as a whole, has a wide tolerance of climatic regimes. Nonetheless, our data indicate that the species is divided into several distinct regional lineages that have adapted to exploit local environmental conditions. These lineages are spread over large distances in the Southern Ocean and subjected to spatially dynamic changes in environmental conditions associated with climate change (Swart, Gille, Fyfe, & Gillett, 2018).
Changing conditions in the Southern Ocean involve, among others, coastal water becoming less saline and ocean acidification, which is expected to produce major impacts on the Antarctic biota (Convey & Peck, 2019). The growth trend of gentoo penguins as a whole is not to be taken as a representative fate for each of the genetic and ecologically different clades we identify. Changes in the Southern Ocean will likely affect more intensively peripheral colonies situated at the edges of the distributional range where the species are at the limit of their tolerances (Forcada & Trathan, 2009).
However, this issue remains to be explored in depth. Such rapid changes in environmental conditions mean that at least some, if not all, breeding habitat will be at risk of becoming suboptimal over time. This raises important questions about the ecological resilience of previously overlooked cryptic lineages, which lack broad dispersal capabilities and occupy specialized niches. Thus, some of these lineages may not be equally able to adapt to the currently changing macroecological conditions and could be under local risk of extinction (Thomas et al., 2004).
Unveiling cryptic diversification events is essential to implementing informed conservation management strategies. Here, we employed multiple methods centred on using a combination of molecular (genome-wide SNP and mtDNA) and ecological data (niche models and overlap analyses) to detect pronounced diversification among gentoo penguin colonies across the Southern Ocean and to explore the underlying processes that may have led to the observed extent of lineage differentiation. High ecological variability has been described for gentoo penguins across their biogeographic range, with resulting impacts on feeding and breeding biology including laying time, chick growth (Williams, 1995), expression of colour ornaments (Barbosa, Palacios, Valera, & Martinez, 2012), the duration of foraging trips and the availability of prey among colonies (Lescröel, Bajzak, & Bost, 2009). Some of these traits, such as the timing of laying, have a genetic basis, as genotypes may be selected to match resource availability and chick rearing requirements (Charnov & Krebs, 1974), or by photoperiod-, climatic-or resource-related plasticity (Lambrechts, Blondel, Maistre, & Perret, 1997).
The behaviour of gentoo penguins may provide support for the existence of genetic differences because limited gene flow among colonies promotes differentiation and diversification. Gentoo penguins have a greater propensity for being sedentary during the non-breeding period than do other pygoscelid penguin species (Dodino, Hart, Harris, & Rey, 2018;Friesen, Burg, & Mccoy, 2007;Williams, 1995), which partly explains the degree of isolation among colonies. The gentoo penguin is a resident inshore forager (Dimitrijevic et al., 2018;Lescröel & Bost, 2005;Lescröel, Ridoux, & Bost, 2004), an attribute that may limit its dispersal, in contrast with the pelagic behaviour of other penguin species which facilitates inter-colony gene flow (Clucas et al., 2018). Moreover, natal philopatry may explain the population genetic structure detected from genomic data among breeding colonies within each clade we studied. Hence, natural selection may operate across different environments at sea, enabling local adaptation, isolation and over time speciation. At a regional scale, using SNP data, Clucas et al. (2018)  Peninsula, which has smaller body sizes and bill proportions than P.
p. papua from the northern parts of the species' distribution across the sub-Antarctica (Stonehouse, 1970). However, spatial variation in morphometrics is also evident in other populations, with a tendency of decreasing size towards the south of the gentoo penguin distribution (Stonehouse, 1970) and within Antarctica (Valenzuela-Guerra, Morales-Moraga, González-Acuña, & Vianna, 2013). Gentoo penguins from Macquarie Island were first described as a distinct subspecies (P. papua taeniata; Mathews, 1927) from those distributed across the rest of the sub-Antarctic. The subspecies, P. papua taeniata, was later grouped with individuals from Heard, Kerguelen and Marion Island (Peters, 1934), but the penguin population from Crozet Island was not evaluated. Gentoo penguins from Crozet have been reported in the literature as being larger than their counterparts from other locations (Falla, 1937;Stonehouse, 1970), and to resemble those from Marion Island (Crawford, 1952), which are consistently identified in this paper as part of the same genetic clade.  (Funk & Omland, 2003;Maddison, 1997).
In terms of macroecology, gentoo penguin terrestrial niches are less differentiated than those in the marine environment, a distinction we attribute to the high intra-clade homogeneity of sea conditions within feeding areas, in particular locally stable water temperatures and salinity, with larger inter-clade differences across regions appearing to be caused by latitudinal gradients of ocean stratification. Terrestrial features of penguin rookeries are locally more variable within breeding areas due to changing weather conditions, and they sustain a more homogeneous inter-regional optimum driven by the general oceanic climate present across latitudes.
This pattern, which reduces the degree of climatic differentiation across terrestrial regions while promoting a rich variety of marine ecosystems, is typical of the Southern Ocean territories. In the case of gentoo penguins, we attribute genetic differentiation primarily to conditions at sea, whereas land conditions are subjected primarily to more local characteristics related to topographic features that drive nesting habitat availability.
Equivalence tests indicate that the niches of all four clades of gentoo penguins differ in terms of both the marine and terrestrial macroenvironment. Our results also suggest one instance of niche conservatism, but only in one of the terrestrial pairwise comparisons: Antarctica and South America. This could be explained by the recent age of these lineages, thereby retaining some common ecological features from the shared ancestral macroclimatic niche; that is, each lineage has not yet diverged to occupy distinct terrestrial environments. This degree of niche conservationism could also explain why gentoo penguin populations in the Antarctic Peninsula are responding positively to a changing clime, increasing their population numbers and expanding southwards as the macroecological conditions become more favourable for them (Trivelpiece et al., 2011). By extrapolation, this similarity in niches would have eroded in clades that have experienced a longer time period of climatic variation enabling local selection to occur for optimal rookery selection, such as on Crozet and Kerguelen Islands. Interestingly, in the Antarctica versus South America comparison, the marine sPCA suggests that these sister clades occurring at environmental extremes on the Scotia Arc have sought to acquire greater marine feeding resources (seen from their position in areas of higher primary production) and diverged from the ancestral state (north of APF clade) which lies between cold and warm water adapted lineages. Thus, we postulate that the marine evolutionary trade-off between thermal stress and gain in primary production where the species expanded its niche towards broader temperature ranges thanks to a higher availability of resources at both thermal extremes. In contrast, our analyses suggest that the terrestrial macroenvironment poses less of a challenge for gentoo penguins than for other penguin species, perhaps due to the species' ability to withstand a wide variation in summer temperatures. Moreover, oscillations in temperature and precipitation are less apparent across breeding sites of sister clades north and south of the APF than changes in the marine environment. This leads us to propose that an important mechanism driving diversification of gentoo penguin lineages, on top of isolation by distance, comes from a trade-off between dispersal (gene flow) and local adaptation to the spatially changing conditions in the marine environment (isolation by environment).
Overall, in the MMRR analyses we find that the genetic distances among gentoo penguin colonies are best explained by the combined effects of geographic distance and marine environmental distance. In the case of geographic distance, this is primarily expressed through the vast longitudinal distribution of oceanic islands and continental land masses across the Southern Ocean. In the case of environmental distances, this is attributed to the rapid change of water conditions due to circulation patterns of oceanic currents that occur across a short latitudinal gradient. The profound environmental gradients and long distances between genetically distinctive regional clades of P. papua suggest that the unique functional units will be faced with varying challenges in the face of climate change and as such should be evaluated separately, and not lumped together for the species as a whole.

| CON CLUS IONS
Given ongoing processes associated with global change, gentoo penguins face more significant challenges than other penguin species in maintaining healthy population numbers. This is because gentoo penguins are resident and do not migrate to more favourable habitats after breeding, instead relying upon habitats that must supply both summer and winter needs. Gentoo penguins have comparatively limited mobility and rely on the availability of suitable coastal areas for breeding and feeding during the reproductive season (Kowalczyk, Reina, Preston, & Chiaradia, 2015). In the case of the Antarctic populations, an intra-regional expansion southward may be feasible, than is currently present, but other regional populations in the sub- irrevocably lost. This is particularly pertinent to polar environments that are among the most rapidly changing environments on earth.

DATA AVA I L A B I L I T Y S TAT E M E N T
The raw VCF of ddRAD and bioclimatic/oceanographic data are