The field of population genetics is traditionally considered to have been founded in the early 1930s, but existed without nucleotide data until the first description of variation in the Adh locus in Drosophila in 1983 (Kreitman 1983). As such, it has been a largely theoretical field longer than it has been an empirical one. The advent of the empirical era resulted in a rapid flowering of techniques for extracting meaningful information from sequence data through application of theory, but recent years have seen the arrival of a new ‘big data’ empirical era. Data sets composed of thousands of sequenced genomes or thoroughly genotyped individuals now offer novel opportunities for applying population genetic theory to massive data sets for practical purposes.
A study in this issue of Molecular Ecology by Nkhoma et al. (2013) explores the potential epidemiological applications of ‘big’ genotype data gathered from 96 SNPs typed in 1731 clinical malaria samples collected over a decade in South-East Asia (Fig. 1). Malaria has been on the decline in this region, largely due to increased treatment for disease with artemisinin combination therapy. The authors explored the genomic data for signatures of reduced disease transmission and made a number of observations that are informative in the light of the life cycle and biology of Plasmodium malaria parasites. The power of this approach is impressive, considering it is based on a small collection of SNPs rather than complete genome sequences, and suggests that full genome sequencing of population samples may be economically justifiable for many applications only for initial identification of the most informative (high frequency) variants within a population.
The most revealing genetic indicator of reduced disease transmission over time was one of the simplest to infer: the incidence of infections composed of more than one parasite clone. Plasmodium parasites are haploid in humans, and mixed clonal infections may therefore be reliably detected through heterozygous genotype calls. The authors saw mixed clone infections drop from 63% to 14% of samples in annual collections over the study period. Sexual outcrossing in malaria parasites happens in mosquitoes and occurs most commonly when a mosquito bites a multiply-infected person, so it would be reasonable to expect that reduced sexual outcrossing during the study period, coupled with reduced transmission, could affect the genetic effective population size (Ne) of the parasite population.
However, this is not what Nkhoma et al. observed. Effective population size, measured via variance in allele frequency between transmission seasons, was unchanged during the course of the study, despite falling infection prevalence. The authors proffer several biological explanations for this null result, including migration-based stabilization of allele frequencies and an unchanged reservoir of asymptomatic infections between transmission seasons. A third explanation that this finding is a false negative resulting from insufficient power also deserves consideration. Figure 2 illustrates the results of a binomial sampling-based simulation to explore the expected drift-based allele frequency variance over the course of 1 year in a parasite population, assuming eight parasite generations per year. As the plot shows, discerning changes in Ne on the basis of allele frequency variance may be difficult when Ne is >1000. Even with massive sample sizes designed to reduce sampling variance, expected differences in allele frequency may be too small in large populations to be detectable (Hare et al. 2011), regardless of minor allele frequency. Consequently, detectable changes in variance Ne may be most useful for identifying populations on the brink of collapse.
For an endangered species subject to conservation efforts, such information may come too late to be of practical use. For campaigns against infectious diseases, however, this genetic signature could be a critical indicator of an opportunity for local disease elimination (Volkman et al. 2012). Even when estimates of Ne fail to provide resolution, Nkhoma et al. show that genomic data may illuminate other aspects of parasite transmission dynamics, such as the persistence of multilocus genotypes. Given that Plasmodium parasites facultatively outcross when distinct strains co-infect a mosquito, such information is another useful means of inferring the incidence of mixed infections and profiling the nature of infectious reservoirs in the human population. Collectively, these data appear to indicate that, although P. falciparum parasite populations in South-East Asia are low relative to some other disease-endemic regions and that they continue to shrink, they are not yet on the cusp of elimination. Full realization of the potential for genomic tools to provide useful surveillance of population dynamics will ultimately require the collection of benchmark data sets across a range of populations, calibrated using traditional estimators population size or disease transmission rate.