As a demonstration of some of the envisaged applications of diveRsity, two reproducible examples are provided below. These examples assume that the diveRsity, shiny, doParallel, sendplot and plotrix packages have been installed as well as their dependencies. For additional examples, users are encouraged to read the package manual.
Example 1. Using visualisation tools to investigate large genetic differentiation matrices
Pairwise genetic differentiation is an important parameter in the assessment of relationships among populations within a geographical context. To date, the true potential of pairwise genetic differentiation statistics has not been fully realised, owing mainly to difficulties in identifying meaningful trends in often very large numbers of population comparisons.
However, using both the divPart and difPlot functions, diveRsity allows users to visualise large pairwise matrices of genetic differentiation, making the identification of particularly differentiated population samples relatively straightforward. This procedure is demonstrated below.
Load diveRsity into the current R session:
# Load the diveRsity package
In this example, the Big_data data set (distributed with diveRsity) will be used. The data were simulated under a hierarchical island model (i.e. five island groups with 10 subpopulations each allowing high geneflow within island groups and low geneflow among island groups), using the software EASYPOP v1.7 (Balloux 2001). Population samples within the Big_data data file were arranged in order of geographical proximity for the purpose of demonstrating how diveRsity can be used to identify broad-scale geographical trends from genetic data.
data(Big_data, package =”diveRsity”)
The divPart function is first used to calculate the required pairwise statistics matrices. In this example, the argument parallel will be set to TRUE as a large number of comparisons have to be computed (i.e. for N = 50).
# Assign the results to the variable'pwStats'
pwStats <- divPart(infile = Big_data, outfile =”Big_results”,
gp = 2, WC_Fst = TRUE, bs_locus = FALSE,
bs_pairwise = FALSE, bootstraps = 0,
Plot = FALSE, parallel = TRUE)
The resulting R object, pwStats contains the required pairwise statistics, which can be passed to the function difPlot for visualisation.
difPlot(x = pwStats, outfile =”Big_results”,
This command will write four.png files (one for each estimated statistic) and four.html files to the folder Big_results under the current R working directory. An example of the functionality of the.html tooltips is given in Fig. 2. From this figure, it is clear that the data are represented by five distinct genetic groups, which correlates with the simulation conditions described above. There are clearly high levels of differentiation among island groups (light blue/white) and low levels of differentiation within island groups (dark blue). This graphical representation perfectly relays what is known to be genetically/evolutionarily true (though natural population systems will rarely be so ideal).
Figure 2 also illustrates the ability to rapidly identify population pairs of interest by simply positioning the mouse pointer over a particular comparison square/pixel. In this example, the pairwise comparison between populations 18 vs. 23, (GST = 0·8883, θ = 0·9408, = 0·9927 and DJost = 0·8802), indicates that these two populations are highly differentiated from one another.
Example 2. Assessing polymorphism bias in diversity partitioning estimators
As discussed above, diversity partitioning statistics such as GST and θ are negatively dependent on within subpopulation heterozygosity. Where this negative dependence is present (e.g. when using highly polymorphic microsatellites), it is important to ensure that inferences made from calculated values do not violate important assumptions. Using the functions divPart, readGenepop and corPlot, it is possible to carry out an ad hoc assessment of polymorphism bias in diversity statistics, thus allowing users to make informed decisions about whether to proceed with inference of demographic processes for example. A reproducible example is given below:
# Load the diveRsity package
Next, an example data set (Test_data) provided with diveRsity should be loaded into the R session.
data(Test_data, package =”diveRsity”)
Initially, Test_data is analysed by the function divPart to calculate locus θ, GST, and DJost estimators.
# Assign the results to the variable ‘difStats’
difStats <- divPart(infile = Test_data, outfile =”Test”,
gp = 3, WC_Fst = TRUE, bs_locus = TRUE,
bs_pairwise = FALSE, bootstraps = 1000,
plot = TRUE, parallel = TRUE)
Next, Test_data is analysed by readGenepop to count the total number of alleles per locus.
# Assign the result to the variable ‘numAlleles’
numAlleles <- readGenepop(infile = Test_data, gp = 3,
The package has now generated two results objects in the R environment: difStats and numAlleles. These objects can be passed to the function corPlot.
corPlot(x = numAlleles, y = difStats)
Figure 3 provides an example of the output from this analysis. As can be seen in this example, both θ and GST are negatively correlated with the number of alleles per locus, while and DJost are strongly positively correlated. This discordance is indicative of a case where the mutation rate is likely to obscure past demographic processes (e.g. geneflow); thus, such a data set is unsuitable for addressing such questions.
Figure 3. Correlation assessment of locus estimators θ, GST, and Dest (DJost unbiased estimator), with locus polymorphism (total number of alleles), returned from the corPlot function. Red lines represent the line of best fit and r values are Pearson product moment correlation coefficients.
Download figure to PowerPoint
Users executing the above code will also see a range of other graphical outputs in a folder named ‘Test’ within their working directory. These plots allow users to assess the variability of parameter estimation for individual loci, which can in turn be incorporated into decisions about ‘misbehaving’ loci for example.