Fig. S1 Influence of potential ascertainment bias. A. SNPs linked to alleles that were specific to samples from either Cumaná or upper Quare were excluded before reconstruction of a network using SPLITSTREE4. The clustering did not change compared to the one using all SNPs (Fig. 2), but genetic distances between the Cumaná and remaining samples as well as between the upper Quare and the remaining samples decreased. B. Site frequency spectra using the major allele in upper Quare samples as reference. Allele frequencies of the reference were calculated for populations included in the ascertainment panel (Cumaná and upper Quare) and for samples within major regions.

Fig. S2 Principal component analysis (PCA) with k-means clustering. We excluded the 10 laboratory strains from the PCA in order to make sure that they did not distort our results. Clustering was obtained by k-means using the all 18 significant PCs and k = 19. A vertical bar represents each individual. Horizontal black lines indicate predefined regions, rivers and sample sites. Major geographical regions are colour-coded as in Figs 1 and 2, with more colour gradations discriminating sub-clusters.

Fig. S3 Detailed investigation of regional subpopulations. The 239 individuals were subdivided into five smaller sets representing the five major geographical regions. Populations that appeared as likely sources of admixture in the previous analysis (Fig. 4) were included in every subset. Shown are the most likely Structure results for each subset (see Materials and Methods). The analysis of subsets showed more refined sub clustering within region compared to the global analysis in Fig. 4. PL, Pitch Lake, PD, Palm Drive, UElC, Upper El Cedro, Tra, Tranquille, UO, Upper Oropouche, LM, Lower Matura, PA, Poza Azufre, ElC, El Cedro, Lop, Lopinot

Table S4 Analysis of molecular variance (AMOVA) without laboratory guppy strains. NOTE: The major regions were the uppermost level, followed by river basins within these regions and the sample sites again at the lowest hierarchical level. ***P < 0.001, **P < 0.01, *P < 0.05, NS, not significant.

Table S5 Pairwise FST-values calculated between all populations defined by sampling site. ***P < 0.001, **P < 0.01, *P < 0.05, NS, not significant (Excel file).

Fig. S6 Linkage plot. Clustering was inferred with STRUCTURE using the linkage model with 100 000 burn in steps and 100 000 iteration steps with k = 4. Each individual is represented by two subsequent rows and columns represent genotypes ordered according to the position on the linkage groups. Colours describe the most likely cluster membership for each genotype.

Table S7 SNP data. Markers are named according to (Tripathi et al. 2009). Population numbers refer to Table 1. Heterozygous SNPs are coded according to IUPAC. ‘N’ means missing genotype.

Please note: Wiley-Blackwell are not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.

MEC_4528_sm_FigureS1_biasDetection.pdf846KSupporting info item
MEC_4528_sm_FigureS2_labstrains.pdf216KSupporting info item
MEC_4528_sm_FigureS3_Structure_subsets.pdf426KSupporting info item
MEC_4528_sm_FigureS6_linkageModel_subtitle.jpg2181KSupporting info item
MEC_4528_sm_TableS4_AMOVA_labstrains.xls16KSupporting info item
MEC_4528_sm_TableS5_pairwise_fst_byReich.xls43KSupporting info item
MEC_4528_sm_TableS7_SNPdata_transposed.xls3384KSupporting info item

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.