Data S1 Materials and methods.

Table S1Drosophila melanogaster Microsatellite primer pairs.

Table S2 Populations used for the human population study.

Table S3 Microsatellite used for the human population study.

Table S4 Heterozygosities for the different chromosomes in African and Non-African Drosophila melanogaster populations.

Table S5 Results of the linear models.

Table S6 Relationship between the number of loci and divergence for the estimate of population structure.

Fig. S1 Microsatelite marker design. Schematic of a chromosome where one region wherein microsatellites where genotyped has been magnified. The enlarged region shows eight marks in red which represent the microsatellite positions within the region. The X, 2nd and 3rd chromosomes have 5 such regions along their length and the 4th chromosome due to its small size only one.

Fig. S2 Clustering solutions for each chromosome.

Fig. S3 Clustering solutions for each genomic region. Each clustering solution is labeled with its correspondent region label.

Fig. S4 Histograms of the ml-differences distribution for the regions against each of the chromosomes. (A) comparison for the X regions, (B) comparison for the 2nd chromosome regions, (C) comparison for the 3rd chromosome regions and (D) comparison for the region in the 4th chromosome.

Fig. S5 Correlation plots of the region’s properties against the region’s ranking position. Log(ml): logartithm of the marginal likelihood of the regions’ clustering solutions.

Fig. S6 Simplified illustration of the effect of genealogical lineage sorting. Each line represents the frequency with which the true clustering solution [i.e. A(B,C)] occurs among 1000 random draws of a set of n loci (n from 10 to 100). The lines correspond to the results from drawing loci from a distribution of markers where the proportion of loci resulting in the true population structure is: 20% (dashed blue), 30% (dotted gray), 40% (dashed-dotted green), 50% (long-dashed orange), 60% (dashed red). When the loci that result in the true population structure occur with a frequency of 70% or higher in the genome 10 or more loci result in the expected clustering solution (solid-black line).

Fig. S7 Relationship between population differentiation and genealogical inference. We used computer simulations to determine the frequency of correctly inferred clustering solutions in relationship to the number of loci used and population differentiation when using two Bayesian methods to infer population structure, i.e. baps and structure. Results are shown for 50 simulations of 5 populations with average FST values of: 0.01 (dashed grey line), 0.05 (solid blue line) and 0.1 (dashed light blue line). For a detailed explanation of the simulated datasets see Data  S1, Supporting information.

MEC_4990_sm_SupportingInformation.doc15557KSupporting info item

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.