Genetic structure in the European endemic seabird, Phalacrocorax aristotelis, shaped by a complex interaction of historical and contemporary, physical and nonphysical drivers

Geographically separated populations tend to be less connected by gene flow, as a result of physical or nonphysical barriers preventing dispersal, and this can lead to genetic structure. In this context, highly mobile organisms such as seabirds are interesting because the small effect of physical barriers means nonphysical ones may be relatively more important. Here, we use microsatellite and mitochondrial data to explore the genetic structure and phylogeography of Atlantic and Mediterranean populations of a European endemic seabird, the European shag, Phalacrocorax aristotelis, and identify the primary drivers of their diversification. Analyses of mitochondrial markers revealed three phylogenetic lineages grouping the North Atlantic, Spanish/Corsican and eastern Mediterranean populations, apparently arising from fragmentation during the Pleistocene followed by range expansion. These traces of historical fragmentation were also evident in the genetic structure estimated by microsatellite markers, despite significant contemporary gene flow among adjacent populations. Stronger genetic structure, probably promoted by landscape, philopatry and local adaptation, was found among distant populations and those separated by physical and ecological barriers. This study highlights the enduring effect of Pleistocene climatic changes on shag populations, especially within the Mediterranean Basin, and suggests a role for cryptic northern refugia, as well as known southern refugia, on the genetic structure of European seabirds. Finally, it outlines how contemporary ecological barriers and behavioural traits may maintain population divergence, despite long‐distance dispersal triggered by extreme environmental conditions (e.g. population crashes).


Introduction
Studies of population genetics evaluate how genetic variation is partitioned across a species' range. Often, a species does not exist as a single panmictic population, but rather as several subdivided populations, each experiencing the effects of independent genetic drift and local adaptation to the various geographical locations (Avise 2000). Geographically separated populations may show a low rate or complete lack of immigration by individuals, and so reduced gene flow can lead to population differentiation, the first stage of speciation (Avise 2000). The geographical separation of populations may be a result of physical or nonphysical barriers that prevent dispersal and restrict gene flow, or simply of long distances between the ranges of each population (isolation by distance). The vagility of organisms like birds allows them to overcome many of these barriers. Seabirds in particular can disperse long distances, so high levels of gene flow might be expected among populations. Despite this, many seabird species exhibit substantial genetic structure in terms of significant divergence among breeding populations (Friesen et al. 2007). Thus, seabirds provide good models for studying speciation because physical barriers may have a small effect on their diversification, so the role of nonphysical ones may be more evident in seabirds than in less mobile organisms (Friesen 2015).
Physical barriers shaping the genetic structuring of organisms may be either historical (e.g. advance of ice sheets) or contemporary (e.g. land masses, ice or open sea for inshore species) (Avise 2000;Friesen et al. 2007). Many studies suggest that closely related species and subspecies of seabirds began to diverge in the Pleistocene, when distributional changes associated with glacial cycles may have induced population fragmentation and subsequent differentiation as a result of allopatric divergence in multiple glacial refugia (e.g. Avise et al. 2000;Moum & Arnason 2001;Morris-Pocock et al. 2008. Moreover, continental land masses can isolate populations in different ocean basins; for example, the presence of the narrow Isthmus of Panama may serve as an effective physical barrier to gene flow for several tropical seabirds Steeves et al. 2005).
In the absence of an obvious physical barrier, other, nonphysical barriers must be responsible for the existence of genetic structure between populations of seabirds (Friesen 2015). Factors related to species life histories can act as behavioural barriers, which sufficiently prevent gene flow and promote genetic differentiation in birds (Avise 2000;Newton 2003). Species with narrow foraging ranges might be more likely to be genetically structured, as a consequence of lower rates of gene flow. Alternatively, population-specific nonbreeding or foraging distribution may be important in reducing contact between individuals from different colonies. Finally, strong philopatry is expected to lead to genetic differentiation among breeding populations through drift and⁄or selection (e.g. Friesen et al. 2007;Morris-Pocock et al. 2008).
Although several studies have been published to date (reviewed in Friesen 2015), approximately 1/3 of seabird species lack any population genetic analysis. For example, only eight of the approximately 40 extant Phalacrocoracidae species have been studied (Marion & Le Gentil 2006;Duffie et al. 2009; Barlow et al. 2011;Mercer et al. 2013;Calder on et al. 2014;Rawlence et al. 2014). Moreover, few of the published phylogeographical analyses concern the European avifauna and even fewer the seabirds of the Mediterranean region (e.g. G omez-Diaz et al. 2006;Gay et al. 2007). Comparative phylogeographical studies, mostly of terrestrial plants and animals (e.g. Hewitt 2000), suggest that the palaeogeographical history of this region has shaped European biodiversity. Particularly during glacial periods, fragmented populations survived in southern refugia (Balkan, Italian and Iberian Peninsulas) and expanded northward in interglacial periods (Avise 2000;Hewitt 2000), although cryptic northern refugia have also been reported (Stewart et al. 2010). Sea level oscillations significantly altered the Atlantic and Mediterranean coasts and revealed land bridges within the Mediterranean (Taberlet et al. 1998;Bonfiglio et al. 2002), facilitating or limiting the dispersal of terrestrial and marine organisms, respectively. The Atlantic Ocean and the Mediterranean Sea were completely separated during the Messinian salinity crisis (MSC), until the tectonic collapse of the Gibraltar Strait (5.3 My ago) (Krijgsman et al. 1999). Nowadays, only this narrow channel allows passage between them, retaining their distinct oceanographic characteristics (Coll et al. 2010) and seabird communities' diversity (Zotier et al. 1999). As seabirds are usually associated with specific marine habitat features (Zotier et al. 1999), the Gibraltar Strait might represent both a historical physical barrier and a nonphysical ecological one.
Seabirds thus represent a useful system for assessing the importance of physical and nonphysical barriers, investigating the role of European refugia and specifically of the ecological and historical interplay between the Atlantic and the Mediterranean on phylogeographical patterns of marine species. Here, we focus on a Palaearctic endemic seabird, the European shag (Aves: Suliformes, Phalacrocoracidae), Phalacrocorax aristotelis. Within the recently published phylogeny of cormorants, Kennedy & Spencer (2014) suggest that this might be a monotypic genus (Gulosus). Currently, the species comprises three morphologically distinct subspecies: P. a. aristotelis (hereafter Atlantic shag), with a breeding distribution from northern Russia to the Atlantic coast of Iberia, P. a. desmarestii (Mediterranean shag), which breeds within the Mediterranean, and P. a. riggenbachi (African shag) distributed along the northern and western African coasts. So far, there is no available fossil evidence of the historical distribution of the shag (see Ml ıkovsk y 2009) and no published studies regarding the putative biogeographical events that could have shaped its present distribution and genetic structuring.
Ecological or behavioural barriers might also reduce gene flow among shags, as they show high breeding philopatry (Aebischer 1995;Velando & Freire 2002) and wide stretches of open sea probably impose a physical barrier to their movements (Wanless & Harris 2004). On the other hand, immature shags make occasional longdistance movements, up to several hundred kilometres (Wanless & Harris 2004), and adult shags also move between locations during nonbreeding periods (Sponza et al. 2013;Grist et al. 2014). However, it remains unknown to what extent such movements translate into effective gene flow.
Previous genetic analysis of this species (Barlow et al. 2011) focused mainly on the genetic structure of the Atlantic shag, although including one Mediterranean population (Corsica). Weak genetic structure was found among Atlantic populations, suggesting the existence of historical and present gene flow, while analyses failed to distinguish between the Atlantic shag and the Mediterranean shag. The authors hypothesized that the Corsican population could represent the extreme end of a cline in variation between P. a. aristotelis to the west and populations of P. a. desmarestii to the east and that the Atlantic colonies could have been colonized from southern Mediterranean refugia.
In this study, we sampled widely throughout the Mediterranean and collected new sequence data to develop a robust phylogeographical framework for the shag. We quantified population genetic structure, estimated gene flow rates and assessed phylogeographical patterns, timing of lineage divergence and historical demography, using recently developed coalescent models. We aimed to evaluate the effect of (i) historical events, such as the MSC or Pleistocene glacial cycles on intraspecific diversification and (ii) contemporary barriers, physical or otherwise (e.g. land masses and long distance between colonies) on the genetic structure of shag colonies. In the first case, we expected timing of intraspecific divergence points to mirror that of known historical events, helping identify the number and location of past refugia and reconstruct (re)colonization patterns. In the second case, we expected populations to show isolation-by-distance patterns and restricted contemporary gene flow between distant colonies.

Study area and sampling strategy
We analysed a total of 519 specimens from 28 shag colonies (Table 1, Fig. 1A). The Mediterranean populations were represented with 72 samples (oral swabs, feathers, eggshells, blood and tissue) collected during 2006-2012 from eight shag colonies in the Aegean and Adriatic seas, which were pooled with samples from 20 colonies from the Atlantic and Corsica (Barlow et al. 2011). Samples were stored dry at À20°C or in 95% ethanol at À80°C.

DNA isolation, mtDNA sequencing and microsatellite genotyping
We extracted total DNA from the Aegean and Adriatic samples with the DNeasy Blood and Tissue Extraction Kit (QIAGEN), following the manufacturer's procedure.
For the mitochondrial DNA (mtDNA) analyses, we amplified a segment of the NADH dehydrogenase two (ND2) protein-coding gene and part of the noncoding control region one (CR1). We amplified the ND2 marker with primers and conditions as in Barlow et al. (2011), while for the amplification of CR1, we used several combinations of primers used in other Phalacrocoracidae species or specifically designed for P. aristotelis (for detail on amplification procedures, see Table S1, Supporting information). We then excised the amplification products from 1% agarose gel and purified them with the NucleoSpin Extract II DNA purification kit (MACHEREY-NAGEL). Both strands of the mtDNA segments were sequenced on an ABI Prism 3100 capillary sequencer (CEMIA, Larissa, Greece) for the Aegean and Adriatic samples, and at the Natural Environment Research Council (NERC) Biomolecular Analysis Facility (NBAF) at the University of Edinburgh for the Atlantic and Corsican samples.
We used the same amplification conditions and primers as in Barlow et al. (2011), to genotype the Aegean and Adriatic samples for seven microsatellite markers, that is three dinucleotide (Phaari02, Phaari05 and Phaari06) and four tetranucleotide (Phaari08, Phaari11, Phaari12 and Phaari16) motifs. We genotyped the products on an 8-capillary ABI 3500 Genetic Analyser (Applied Biosystems) and scored them in comparison with a size standard (GeneScan 600LIZ, Applied Biosystems), using the GENEMAPPER Software version 4 (Life Technologies/Applied Biosystems). As previously published genotypes for the Atlantic and Corsican shags resulted from fragments visualized via 6% denaturing polyacrylamide gels on a LI-COR 4200 IR2 genotyper (Barlow et al. 2011), we regenotyped 10 randomly chosen samples from the Atlantic and Corsica alongside the Aegean and Adriatic samples and cross-checked the results to ensure consistency.

Mitochondrial markers
Phylogenetic analyses. We used CLUSTALX version 2.0 (Larkin et al. 2007) to align the sequences for each mtDNA segment with homologous sequences from other phalacrocoracid and suliform species as outgroups for alignment and tree rooting (Table S4, Supporting information). We performed final phylogenetic analyses on the mtDNA concatenated data set.
We computed within-group uncorrected genetic distances (pi and standard errors) and among groups pi for populations grouped according to their geographical origin (e.g. Aegean and Adriatic) and their respective phylogroup, using MEGA version 6 (Tamura et al. 2013).
We built phylogenies using Bayesian inference (BI) and maximum likelihood (ML). We used jMODELTEST version 2.1.4 (Darriba et al. 2012) to select the most suitable model of DNA substitution (Table S2, Supporting information). We tested three different partitioning strategies: (a) a single model of evolution for the entire mtDNA segment, (b) two separate evolution models and parameters for the two partitions (CR1 and ND2) and (c) four separate models of evolution, one for CR1 and three for the codon positions of ND2. We performed preliminary Bayesian analyses in MrBAYES version 3.2.3 Table 1 Details on the Phalacrocorax aristotelis sampling localities and genetic diversity per colony. Genetic diversity values represent the number of individuals (n) analysed in each case, the nucleotidic diversity (P) and haplotype diversity (Hd), based on the concatenated mtDNA sequences of ND2 and CR1 markers, and mean number of alleles (Na), number of private alleles (Pa), allelic richness (Ra), averaged values of observed heterozygosity (H o ) and expected heterozygosity (H e ), and inbreeding coefficient (F IS ; significant values are bold, P < 0.05). based on the microsatellite data. MtDNA and microsatellite diversity values are given in populations with n ≥ 2 and n ≥ 5, respectively   Markov chains of 3*10 6 generations and sampling intervals of 100 generations. Substitution model parameters were unlinked across partitions, and each subset was allowed to have its own rate. We run four independent analyses so that global likelihood scores, individual parameter values, topology and nodal support could be compared to check for local optima. We used TRACER to confirm stationarity (all parameters with an ESS > 200) and decide on the number of 'burn-in' trees (25%). All the remaining (post-burn-in) trees were used to estimate the topology of a 50% majority rule tree and its posterior nodal probabilities (pP).
We carried out partitioned ML analysis with RAxML 7.5.2 (Stamatakis 2006), with each partition having its own GTR-GAMMA model. Nodal support of the trees was tested via 1000 bootstrap replicates (bp).
Estimation of divergence times. We estimated the time of the most recent common ancestor (TMRCA) for nodes of interest observed in the mtDNA gene tree with the package BEAST version 1.8.0 (Drummond et al. 2012). To calibrate the molecular clock, we used the split between P. aristotelis (Gulosus aristotelis) and Nannopterum + Leucocarbo (Kennedy & Spencer 2014). To account for possible errors due to the use of a secondary calibration point, we applied a normal distribution with a mean value and standard deviation of 10 AE 0.5 My (setting the 97.5 intervals of the distribution at 9.0-11.2 Mya, as inferred by the primary analysis of Kennedy & Spencer 2014). Such an ancient calibration might underestimate substitution rates and therefore overestimate divergence times. To avoid that, we also applied a second prior on the substitution rates for the ND2 and CR1 using the ranges of published rates for Suliformes and other avian species (for detail, see Table S3, Supporting information). Specifically, we allowed uniform distributions for both priors with maximum and minimum values of substitutions/site/My of 0.0048-0.0090 and 0.025-0.1 for ND2 and CR1, respectively. Analyses were run under a relaxed molecular clock (uncorrelated lognormal), with a 'Coalescent: Constant Size' prior on rates of cladogenesis. We performed four independent runs, each with a chain length of 2*10 7 iterations and trees sampled every 1000 generations. We checked for convergence with TRACER to (ESS values > 200), and after a burn-in of 10%, runs were combined in LogCombiner and TreeAnnotator from the BEAST package and the outcome was visualized with FIGTREE version 1.4.2 (Rambaut 2014).
Estimation of dispersal/vicariant events. To better understand the biogeographical history of the species, and to reconstruct ancestral distribution areas, we used a statistical approach of the dispersal-vicariance analysis (S-DIVA; Yu et al. 2010) as implemented in the program RASP version 3 (Yu et al. 2015). This approach accounts for uncertainty in the phylogenetic estimate, producing a probability P of the proposed ancestral range at each node. We used the mtDNA tree produced with BEAST and assigned the terminal nodes to geographical areas, based on their present distribution. We used an input of 10 000 trees produced with BEAST to estimate the probability and run the analysis with default 'Optimize' settings and under the 'Allow Reconstructions' option.
We inferred the demographic fluctuations of the main identified phylogenetic units and for the total of our samples by two separate approaches. First, we tested for deviations from the neutral Wright-Fisher model consistent with a population expansion under the neutrality hypothesis. We calculated Fu's Fs and R 2 statistics in DnaSP, based on segregating sites and assuming no recombination, and their respective significance, with 10 000 coalescent simulations and 0.95 confidence interval. Significantly negative Fs values (for large population sample sizes~50) and significantly positive R 2 values (for small ones~10) can be interpreted as signatures of population expansion (Ramos-Onsins & Rozas 2002).
Second, we constructed Bayesian skyline plots (BSPs) with BEAST, under the coalescent tree prior, with the piecewise constant model and number of groups set to half the sample size per group with a maximum of 10. We used the most suitable model for the entire mtDNA segment, a strict clock prior and a uniform distribution rate of 0.0055 substitutions/site/MY, which corresponds to the mean mutational rate estimated for our concatenated data set. We performed two runs, each with 8*10 7 generations and log parameters sampled every 1000 generations. We assessed the ESS of parameters and generated BSPs with TRACER, after discarding a burn-in of 10%.

Microsatellite markers
Genetic variability and population differentiation. Due to low sample sizes, we excluded three Aegean populations (SP, TL and FO) from the following analyses. We checked genotypic data for amplification errors and for the presence of null alleles using MICROCHECKER version 2.2.3 (Van Oosterhout et al. 2004). We tested for linkage disequilibrium and Hardy-Weinberg equilibrium (HWE) with GENEPOP version 4.2 on the online server (Rousset 2008), using an exact probability test (Markov chain parameters: 10 000 dememorizations, 100 batches, 1000 iterations per batch), with the Bonferroni's correction. In case of deviation from HWE, we also tested whether this was attributed to heterozygosity deficit or excess.
We calculated estimators of genetic variability, such as allele frequency, average number of alleles per locus (N a ), observed (H o ) and expected (H e ) heterozygosities for each microsatellite locus and each population with GenALEx 6.5 (Peakall & Smouse 2012). We estimated the inbreeding coefficient (F IS ) with randomization test for its significance and allelic richness (Ra) as a standardized measure of the number of alleles corrected by sample size using FSTAT version 2.9.3.2 (Goudet 2001). We also obtained the traditional estimators of genetic population differentiation, F ST (Weir & Cockerham 1984) and R ST (Slatkin 1995), as well as pairwise estimates of F ST and R ST between all populations with sample sizes ≥5, under the Bonferroni's correction, with FSTAT and SPAGeDi version 1.4 (Hardy & Vekemans 2002), respectively.
Testing for bottleneck and isolation effects. We tested the hypothesis of a bottleneck effect with BOTTLENECK version 1.2.02 (Cornuet & Luikart 1997). As the best-fit mutation model was unknown, we tested for both the SMM (stepwise-mutation model) and the two-phase model (TPM = 70% SMM, geometric distribution = 0.36). We evaluated the significance of heterozygosity excess using the Wilcoxon signed-rank test (10 000 simulation replicates) and the inspection of the allele frequency distribution curve; swift from a normal L-shaped distribution is an indicator which discriminates bottlenecked populations from stable ones (Piry et al. 1999).
We also tested for the presence of isolation by distance using Mantel's test (Mantel 1967) implemented in GenALEx between the matrix of linearized pairwise F ST [F ST /(1 À F ST )] and a matrix of log-transformed geographical distance between populations estimated from geographical coordinates. We used two different approaches to construct the geographical matrixes: (i) using the modified Haversine formula algorithm which takes into account the spherical shape of the earth and (ii) calculating Euclidean distances in R (version 3.2.1, R Core Team 2014, packages 'sp', 'raster', 'rgdal', 'geosphere' and 'gdistance'), where distance between two colonies was taken following the coastline with crosssea distances measured between land masses at the closest points. We estimated the significance for each test with 10 000 permutations.
Population clustering. We used the Bayesian clustering analysis implemented in STRUCTURE version 2.3.4 (Pritchard et al. 2000) to infer genetic structure for all the studied populations and samples. We tested the probability of different numbers (1-10) of independent genetic clusters (K) with five runs (1*10 6 iterations each, 50% burn-in), assuming correlated allele frequencies and using the admixture model with and without the LOCPRIOR setting, which takes into account sample location and is expected to perform better when genetic structure is weak or when the number of loci is small (< 20; Hubisz et al. 2009). The vector specifying the degree of admixture between each subpopulation (alpha) was inferred from our data, the distribution of allele frequencies (lambda) was estimated from our data (k = 0.62 for K = 1), and the prior for F ST was specified with a mean value and standard deviation of 0.01 AE 0.05.
For choosing the optimal number of clusters for our data, we estimated the log likelihood given the number of clusters [ln P(X/K)] (Pritchard et al. 2000) and the second-order rate of change of ln P(X/K) (DK) (Evanno et al. 2005) over all five runs using STRUCTURE HARVESTER online Web server (Earl & Von Holdt 2012). As different solutions may result from replicated cluster analyses, we used CLUMPAK software (Kopelman et al. 2015) with the 'greedy' option and 10 000 random input orders to find the optimal individual alignments of replicated cluster analyses and to visualize the final results.

Mitochondrial DNA phylogeography
We amplified 109 sequences of 578 bp for ND2 and 121 sequences of 624 bp for CR1, resulting in ten haplotypes (10 polymorphic sites) and 28 haplotypes (30 polymorphic sites), respectively. In the concatenated mtDNA data set, we included 79 sequences and identified 39 haplotypes (38 polymorphic sites) (Table S4, Supporting information). We found higher values of nucleotide diversity (P = 0.26-0.39%) in the Aegean, Corsican and Spanish populations, while the Adriatic and N. Atlantic populations showed lower values (0-0.09%) ( Table 1). The genetic distances within and among major geographical groups were also relatively low (pi = 0-0.8%) (Table S5, Supporting information).
All phylogenetic analyses (gene trees) grouped populations in three strongly supported and distinct groups: the first phylogroup clustered all the N. Atlantic populations (Norway, Iceland, Faroe Is., England, Ireland, Scotland and France), the second the Spanish and Corsican populations and two individuals from the Aegean (XR) and the third all the remaining individuals from the Aegean and the Adriatic populations (Fig. 2). The phylogenetic relationships among the three phylogroups were not fully resolved, but all southern populations (Spanish, Corsican, Adriatic and Aegean) were grouped in one clade with medium support (bootstrap P < 90, pP < 70). Each of the three phylogroups was further divided, although most subgroups showed medium-to-low support. The N. Atlantic populations (N. Atlantic phylogroup) were clustered in two subgroups (A1 and A2). Several subgroups were included within the Spanish/Corsican (S1, S2, C1 and C2) and the Aegean/Adriatic phylogroups (G1-G4). In some cases, these subgroups correspond to geographical regions; for example, the majority of the Corsican and most of the Adriatic shags were clustered in distinct clades (C1 and G3, respectively; Fig. 2). According to our molecular dating (Fig. 2), the MRCA of all the European shags lived approximately 700 kya. The N. Atlantic phylogroup was then isolated from its sister clade of Spanish/Corsican + Aegean/ Adriatic shags. The latter was further split approximately 500 kya, separating the western populations (Spain and Corsica) from the eastern Mediterranean ones (Adriatic and Aegean). Subsequent radiation occurred within each of the three phylogroups resulting in the divergence observed in current populations, Fig. 2 The chronogram produced by BEAST, showing phylogenetic relationships and estimated divergence times of the European shag lineages. Letter codes with numbers in terminal nodes refer to mtDNA haplotypes of the concatenated data set (Table S4, Supporting information). Letters correspond to the respective population and locality shown in Fig. 1A. Statistical support is given above branches; numbers shown in order for the respective analysis (BEAST posterior probabilities/MrBAYES posterior probabilities/ RAxML bootstrap values), only if pP > 0.50 and bootstrap values P > 50. Numbers at the axis on the bottom refer to MY since present; mean estimated times in MY and the 95% intervals are noted for each major node corresponding to the three phylogroups (N. Atlantic, Spain and Corsica, and Aegean and Adriatic) and their common ancestor. Subgroups within each major clade are shaded. The cladogram obtained with S-DIVA is also presented, imposed over the chronogram. On each node, circles and within letters correspond to the defined distribution area of the common ancestor (as explained in the inset) and the effect of vicariant/dispersal events are shown with dashed-line circles or shaded squares, respectively. which was dated at approximately 350 kya for the Spanish/Corsican and the Aegean/Adriatic phylogroups and more recently at approximately 250 kya, for the N. Atlantic one.
The biogeographical history of the species seems to be a result of several vicariant and dispersal events and their combinations, according to the S-DIVA analysis (Fig. 2). Of the 14 divergence events found, six corresponded to vicariance, one to a vicariance/dispersal event and seven to dispersal events, all showing high values of probability (P = 1.0), with the exception of the nodes connecting subgroups S2/C2 and G3/G4 (P = 0.31-0.56). One inferred area of ancestral distribution was proposed for each of the studied nodes. The common ancestor of European shags was distributed from the N. Atlantic through Spain and as far as the eastern Mediterranean, covering all the areas of the species' present distribution. Vicariance has acted to split (a) N. Atlantic populations from all the remaining ones and (b) the latter populations into two groups: one in the eastern Mediterranean and one in the western Mediterranean and Spain. In the west, ancestral populations were originally distributed in Spain (AS) and Corsica, while the east Mediterranean ancestral populations were restricted in the Aegean. Several independent dispersal events were inferred from internal nodes, particularly throughout the Mediterranean Sea and between Corsica and Spain.

Mitochondrial DNA demographic analyses
The network reconstruction (Fig. 1B) retrieved the same relationships as in the phylogenetic analysis. Haplotypes were connected in three major clusters matching the three phylogroups, with the Aegean/Adriatic haplotypes placed in the central position. The N. Atlantic and the Aegean/Adriatic clusters showed a starlike topology with one or two common haplotypes in the middle, connected to several rare ones in the periphery. Most of the Spanish/Corsican haplotypes were rare or even unique to individual populations.
Fs values for each of the three phylogroups and for all samples were significantly negative and with strong statistical support (P < 0.01; Table 2). All R 2 values were positive, but none showed a statistical significance (P > 0.05; Table 2). Given the relatively small size of the samples tested for each phylogroup (n = 17-39), we expect the R 2 values to better represent their demographic history. Thus, although there is evidence of population expansion in all the phylogroups, it seems to be rather gradual than sudden. Moreover, it is probably recent, as shown from the estimated times for each phylogroup ( Table 2). The skyline plots gave analogous results (Fig. 3). The Spanish/Corsican and the Aegean/ Adriatic populations underwent a gradual and mild increase of their population sizes during the last 200 000 and 80 000 years, respectively, while the N. Atlantic populations had a constant population size during the last 30 000 years.

Genetic diversity based on microsatellite data
In total, we genotyped 519 European shags from 28 populations at seven polymorphic microsatellite loci (Table S6, Supporting information). Across all populations, the total number of alleles detected per locus ranged from two (Phaari05 in populations HN, AD, FT, SV, IE, LB, IB) to 17 (Phaari11 in ST). There was evidence for a low frequency of null alleles at loci Phaari02 (BB, CS, XR) and Phaari11 (XR), with possibility P ≤ 0.01. In most populations, there was no evidence of departure from gametic equilibrium between any pair of loci (P > 0.05). Only two locus pairs in individual populations (Phaari06-Phaari12; IM and Phaari11-Phaari16; BD and IE) strongly deviated from equilibrium (P < 0.001), showing significant deficit or excess of heterozygotes; the former was found in SK, KJ, BB, ST, AS and XR for Phaari11, CA for Phaari16, BD for Phaar-i12 and AS for Phaari06, and the latter in BD and XR for Phaari05 and Phaari08, and in GC for Phaari02 (P < 0.01).
Based on the mean multilocus inbreeding coefficient per population (mean F IS = À0.202 to 0.210), only XR, GC, IB, BD and SV returned significant departures from random mating. For XR, a deficit of heterozygotes was detected and a slight excess of heterozygosity was found in all other cases. There were consistently high levels of genetic diversity (mean H e = 0.46-0.74) and consistently low levels of allelic richness (mean Ra = 2.67-3.75) for all locus-colony combinations. Private alleles were found in nine populations (Table 1).
The genetic differentiation between populations was moderate but revealed a statistically significant structure. The pairwise F ST values ranged from 0 (for the population pair SK-MS) to 0.35 (CR-RS) with a global F ST of 0.085 (P ≤ 0.001, SE = AE0.043), and R ST ranged from to 0 (for several pairs of populations within the North Atlantic, within the Aegean and between Aegean and Adriatic) to 0.57 (OR-KJ), with a global estimate of 0.23 (P ≤ 0.001, SE = AE0.043) (Table S7, Supporting information).

Population structure based on microsatellite data
The results from cluster analysis with STRUCTURE converged to K = 2 in all independent runs (assuming correlated allele frequencies, using LOCPRIOR, and an estimated k = 0.6239; mean ln P = 11 959/mean similarity score of five runs was 98%). According to the estimated values of ln P(X/K), Evanno's DK and Q (membership probability of each individual to a given cluster), this scenario is strongly supported against all alternative clustering for K = 3-10 (Fig. S1a, Supporting information). The same outcome persisted when different options regarding the allele frequencies, admixture model and values of lambda were tested, although alternative options returned lower mean lnP values. These two clusters (Fig. 1Cb) grouped all Atlantic and all Mediterranean populations, respectively. Although clearly placed within the Atlantic cluster, the Spanish populations appear admixed, especially GC, which showed the lowest membership proportion (Q = 0.52).
To further investigate the population structure within each of the two groups, we repeated two independent analyses, one including only the Atlantic and the other including only the Mediterranean populations. The Atlantic populations (Fig. 1Ca) were assigned to three clusters (Fig. S1b, Supporting information) (K = 3, five runs with mean ln P = À9 407/mean similarity score at 96%). The first clustered all Norwegian populations, SV, FT and the Scottish populations BD and CA, the second clustered the remaining N. Atlantic populations and IB (France), and the third clustered the Spanish ones; populations FT, IB, VZ and AS were admixed. For the Mediterranean populations (Fig. 1Cc), an ambiguous clustering was observed, resulting in either K = 4 or K = 6 (Fig. S1c, Supporting information), with the latter showing a slightly higher probability (mean ln P = À2 051/mean similarity score at 85%). The ambiguity arose from different clustering of the Aegean populations, whereas Corsican and Adriatic populations formed two consistently distinct groups. Within the Aegean, at least three additional clusters can be defined for the XR, EV and CR populations.
The bottleneck hypothesis was rejected for most populations under either mutational model (normal Lshaped allele frequency distributions and no evidence of heterozygosity excess; P > 0.01). Under the SMM model, some of the N. Atlantic populations (HN, RS, KJ, CA, BB, IM, ST, IE), as well as VZ (Spain) and XR (Greece), showed a normal distribution despite some evidence of heterozygote deficiency (0.05 ≥P > 0.01). Only the TPM model revealed a shifted distribution and statistically significant evidence of heterozygosity excess (P > 0.01), suggesting past bottleneck in three populations, namely MS, GC and OR.
Genetic distances showed a significant positive correlation with geographical distance between populations only when distances were calculated through sea and along the coastlines (R = 0.730, r 2 = 0.4922, P < 0.001). Distances measured over land with the Haversine formula algorithm were not found to have a significant effect on genetic differentiation between populations (R = À0.020, r 2 = 0.0004, P > 0.05). Table 2 MtDNA diversity (for both ND2 and CR1 markers) and summary statistics (Fu's Fs and Ramos-Onsins and Rozas's R 2 ) of population size changes for each phylogroup presented in Fig. 2. Tests of significance are provided (10 000 coalescent simulations at 0.95 confidence interval) (significant values are bold, P < 0.05). Values of tau and t (s = 2lt, where l is the mutational rate and t the time since expansion) are also provided, using the mutational rate l = 0.0055 substitutions/MY estimated in BEAST

Discussion
Wider sampling and independent marker analyses revealed a more complex population genetic structure than previously suggested for the European shag (Barlow et al. 2011). This structure was mostly concordant for nuclear and mitochondrial markers; the two extremes of the species' distribution evaluated here, that is the North Atlantic and the eastern Mediterranean, are highly diverged. The placement of Spanish and Corsican populations was not conclusively resolved, although mtDNA gene trees clustered them in one phylogroup and favoured their closer relationship with the eastern Mediterranean. Spanish populations (and, to a lesser degree, the French population) were admixed, showing evidence of current gene flow from the north (N. Atlantic) and the east (mainly Corsica). These points are discussed below in the light of historical and present, physical and nonphysical barriers that drive the differentiation of a European marine organism.

Historical biogeography
The first hypothesis tested here was that intraspecific diversification of the shag has been driven primarily by historical events, either during the MSC (at approximately 5 Mya) or during the more recent Pleistocene climatic changes. Extant shag populations are clearly separated into two phylogenetic groups, one in the northern (N. Atlantic) and the other in the southern part of the species' current distribution (Mediterranean and possibly Spain). Our time estimates indicate that this split postdated the MSC and occurred more recently, during the Pleistocene. This is in agreement with the absence of fossil records for the Mediterranean Sea avifauna, suggesting that most seabird species did not survive the development and maintenance of a desiccated Mediterranean basin during the Late Miocene MSC (Ml ıkovsk y 2009). According to the S-DIVA analysis and molecular dating (Fig. 2), the European shag had a wide past distribution throughout the Mediterranean and Atlantic which was later fragmented by a vicariant event that occurred between 1 Mya and 450 000 ya. The Pleistocene epoch (2.6-0.11 Mya) was a time of extraordinary oscillations in global climate. The effect of these climatic changes on the geographical distributions of species was profound, and it has been shown that many of the phylogenetic separations that led to extant sister species of birds were initiated in the Pliocene (5.3-2.6 Mya) and completed in the Pleistocene (Klicka & Zink 1997), while many pronounced phylogeographical separations within avian species have occurred within the last 2 My (Avise 2000). These separations, also revealed in the shag phylogeny, were probably the result of extended ice masses driving isolation in combination with environmental changes on land and at sea (e.g. sea level fluctuations, vegetation, current circulation, productivity and food availability).
Pleistocene effects on the phylogeographical structure of conspecific avian populations have been demonstrated in seabirds across a wide range of environments, for example arctic: razorbill (Alca torda, Moum & Arnason 2001) and two murre species (Uria aalge, Morris-Pocock et al. 2008;Uria lomvia, Tigano et al. 2015); tropical: masked booby (Sula dactylatra, Steeves et al. 2005); and temperate: Cory's shearwater (Calonectris diomedea, G omez-Diaz et al. 2006). In most of these cases, Atlantic populations were genetically differentiated from their conspecifics in other marine regions (Moum & Arnason 2001;Morris-Pocock et al. 2008). Additionally, a similar phylogeographical boundary between the Atlantic and the Mediterranean has been found among seabirds (e.g. the European storm petrel Hydrobates pelagicus, Cagnon et al. 2004;Cory's shearwater, G omez-Diaz et al. 2006), showing congruent differentiation patterns with the shag and divergence times dating back to the Pleistocene.
In Europe, pronounced global cooling on a 100 000year cycle resulted in continental ice sheets that extended as far south as the 52nd parallel (Berger 1984). The separation zone for the shag populations currently lies between 43°and 47°N (between France and Spain; Fig. 1A) or at approximately 50°N, excluding the admixed France colony (IB). Land masses (or ice masses in this case) can provide a substantial physical barrier to dispersal, as shags do not typically fly over land. The accumulation of ice resulted in lower sea level during the glacial periods, establishing land connections between Mediterranean islands and the mainland (e.g. Sicily and the southwestern tip of Italy, Bonfiglio et al. 2002) or coastlines (e.g. the northeastern and northwestern Adriatic coasts, Taberlet et al. 1998). This could have formed a physical barrier to dispersal within the Mediterranean and promoted isolation between the east and the west Mediterranean approximately 500 000 ya (Fig. 2).
The hypothesis of a postglacial colonization of the North Atlantic from southern refugia is not supported by our molecular evidence. The network analysis (Fig. 1B) suggests the existence of two originally isolated ancient stocks in the N. Atlantic and the eastern Mediterranean. The respective isolation of shag populations during glaciations and the inferred areas of ancestral distribution (S-DIVA; Fig. 2) imply the existence of both northern and southern refugia. The location of refugia in southern Europe has been well documented for several species of terrestrial animals with diverged populations originating from the Iberian, Italian or Balkan peninsulas (Avise 2000;Hewitt 2000;Stewart et al. 2010). This study provides further evidence for an important role of these refugia in the divergence of seabirds. In contrast, the existence of cryptic refugia in northern Europe has only recently been proposed (Stewart et al. 2010). Studies of genetic variation in the herring gull complex (Larus argentatus, Liebers et al. 2004) and the thick-billed murre (Tigano et al. 2015) suggested the existence of a refugium in the Northeast Atlantic during the last glacial maximum (LGM), probably located on the islands of the Arctic Ocean. Interestingly, both in our study and in that of Tigano et al. (2015) the Hornøya population (HN) showed private haplotypes, suggesting that this may represent a refugial colony within the Northeast Atlantic. This region may have acted as a glacial refugium for many organisms, coldadapted or with a high tolerance of cold conditions, due to the presence of ice-free corridors along the Atlantic coastal areas of north Scandinavia during most of the LGM (Parducci et al. 2012).
After the subdivision of the ancestral shag population, the three diverged phylogroups expanded almost simultaneously at approximately 350 000-250 000 ya (Fig. 2). In the case of the N. Atlantic, populations possibly expanded following the retreat of ice masses in a southwest direction, while Aegean populations colonized the Adriatic Sea. Multiple dispersal events were documented in the west Mediterranean and the Spanish coasts, showing evidence of movement through the Gibraltar Strait in both directions, as well as from Corsica to the eastern Mediterranean (Fig. 1B, 2). A general trend towards population stability within each of the three phylogroups has continued over the past 300 000 years, as shown by the BSP analyses (Fig. 3).

Influence of current physical and nonphysical barriers
Regardless of the detectable imprint that historical events have left on the population structure of the European shag, current physical and nonphysical barriers also act to shape the genetic diversity of the species. Microsatellite data (Fig. 1C) depict two genetic clusters that correspond to the populations distributed on either side of the Gibraltar Strait, with some admixture in the Spanish populations. This suggests a barrier restricting movements between Atlantic/Spanish and Mediterranean populations. The Gibraltar Strait may act as a major physical boundary between Atlantic and Mediterranean populations of seabirds and other marine organisms (see G omez-Diaz et al. 2006; Qu er e et al. 2012 and references therein). However, we cannot exclude the possibility of rare events of immigration between Corsica and Spain, given the genetically admixed Spanish populations and especially Galicia (GC), and the evidence provided by S-DIVA analysis (Fig. 2) that suggests past interglacial dispersal across the Gibraltar Strait.
Genetic structuring was also present within the Atlantic (Fig. 1Ca) and the Mediterranean (Fig. 1Cc). The Atlantic populations were divided into three clusters: one of them included the Spanish populations and the other two the North Atlantic ones in a northwestern-southeastern transition (Norway, Iceland and the northwestern colonies of the UK vs. southeastern UK and France). A similar population structuring among N. Atlantic colonies is weakly reflected in the mtDNA phylogeny (Fig. 1B, 2); therefore, it could partially be attributed to historic fragmentation of populations. As demonstrated, these populations probably originate from a small genetic stock; it is likely that northwestern and southeastern colonies are related to genetically distinct founder populations.
Within the Mediterranean, the observed genetic structure corresponds to at least four geographical regions, that is Corsica, Adriatic, Aegean and Crete. There is no evident physical barrier between them that might prevent overseas movements, but several hydrological boundaries have been proposed, that is underwater ridges that affect the prevailing current flows in the different Mediterranean basins, shaping their hydrological features (e.g. salinity and cold-water upwelling) (Coll et al. 2010). Examples of such barriers are the Siculo-Tunisian Strait (separating the west and the east Mediterranean) and the Otranto sill (isolating the Aegean/Ionian and Adriatic seas), which have promoted genetic structure within the last 30 000 years in several marine organisms (e.g. common cuttlefish Sepia officinalis, P erez-Losada et al. 2007; sea bass Dicentrarchus labrax, Qu er e et al. 2012; goby Pomatoschistus tortonesei, Merji et al. 2009). Therefore, it is possible that genetic structuring within Mediterranean shags is induced by ecological barriers such as local adaptation to prey movements.
Shag populations also display genetic structure within geographical regions (North Atlantic and Aegean) in the absence of any apparent physical barriers, suggesting a possible role for other factors such as (i) founder or bottleneck events that affect local population dynamics, (ii) philopatry and (iii) isolation by distance between colonies in respect of the species' flight capacity. Breeding populations of shags may exhibit periodic crashes followed by rapid population growth (Frederiksen et al. 2008). Although several crashes have been reported throughout the species range, the bottleneck hypothesis was rejected in most of the populations studied here, while both historical and contemporary demographic analyses (Tables 1, 2 and Fig. 3) suggest population stability. One explanation could be that crashes affect a given colony, but their overall impact on the shag population is minimal and/or cannot be detected with our approach. Another possible explanation could be that crashes are sometimes the result of reproductive failure affecting the population size during the crash period but not the survival of adults who will probably breed in their natal colony during the following reproductive period (Potts et al. 1980;Aebischer 1986;Thanou 2013). Moreover, even after periodic large-scale mortality events, shag populations are known to recover rapidly (boom-and-bust population dynamics, Frederiksen et al. 2008) and the philopatric behaviour of the surviving adults and chicks (Aebischer 1995;Barlow et al. 2013) might be sufficient to retain the colony's genetic diversity.
Geographical distance between shag colonies had a strong influence on the extent of population genetic structure, but only when cross-sea distance was considered. This outcome highlights the role of continental land masses as a physical barrier that promotes isolation between shag populations. Shags are coastal seabirds with limited foraging ranges, usually within a few kilometres during breeding and postbreeding periods, although they can fly a few hundred kilometres between breeding and postbreeding grounds (Sponza et al. 2013;Bogdanova et al. 2014;Grist et al. 2014). Occasional longdistance, cross-sea movements have been also reported for adults and juveniles prior to the colony recruitment (Wanless & Harris 2004). Following crashes, juvenile shags have been shown to move further than during noncrash years (Potts 1969), while adults could emigrate to other colonies (Velando & Freire 2002;Thanou 2013). These events, even if they happen rarely, might promote gene flow among populations, despite high levels of natal and breeding philopatry (Barlow et al. 2013). However, long-distance dispersal seems unable to erase the overall genetic structure, and thus, breeding immigrants more often originate from geographically close populations, probably of the same genetic cluster.
Finally, processes such as sex-biased dispersal may affect population genetic structure. A few cases of adjacent shag populations sharing common mtDNA haplotypes despite their significant nuclear divergence (high F ST values) were found within the N. Atlantic and the Aegean, while evidence of a female-biased dispersal to new territories was reported in years with unfavourable weather conditions (Baros et al. 2013). Female-biased dispersal within a close range around natal colonies may act to homogenize genetic variation, but further research is needed to clarify whether long-distance female immigration could have a strong effect on the genetic structuring of shag populations.

Taxonomy and conservation
Microsatellite data provided evidence of genetic differentiation between Atlantic and Mediterranean shags, but the transition cline in variation between P. a. aristotelis and P. a. desmarestii moved to the north and was located among the admixed Spanish populations. Thus, currently described morphological subspecies appear to be paraphyletic and the morphological differences between them might reflect local adaptations. Representation of African and Black Sea shags in future phylogenetic work would help infer the centre of the species' diversification and resolve taxonomic implications.
The North Atlantic and the eastern Mediterranean shags represent two evolutionarily significant units (ESUs), while the Spanish/west Mediterranean populations are probably better described by a meta-population model. Molecular data suggest that past bottlenecks are relatively rare and stable populations have probably persisted throughout the species' range. During recent decades, however, several Atlantic and Mediterranean populations have declined (BirdLife International 2015). Thus, focusing conservation strategies on the specified ESUs could be effective, especially considering that they are distinct units of a monotypic, European endemic genus (Kennedy & Spencer 2014).

Conclusions
The complex phylogeographical history of the shag corroborates the effect of the European palaeogeography on the diversification of marine organisms and particularly the effect of Pleistocene glacial cycles on restricted marine areas, such as the Mediterranean Sea. Moreover, it highlights the importance of lesser studied northern refugia along with well-known southern ones and the role of contemporary ecological barriers within marine areas. As expected, philopatry and local adaptation to specific marine habitat features act to maintain population genetic structure in this seabird. However, neither physical nor nonphysical barriers seem to prevent rare episodes of long-distance dispersal typically associated with unfavourable conditions. This flexibility in dispersal behaviour and tolerance of a wide range of climate conditions (e.g. temperate vs. near-arctic refugia) suggest that different populations may respond individually to habitat and climatic changes, ensuring species survival at least in the case of the shag. Dalsgaard (Anda), T. Anker-Nilssen (Røst), G. Bangjord (Mel-