Determining the parentage of individuals collected from wild populations is important for addressing a broad range of ecological and evolutionary questions (Jones & Ardren 2003; Pemberton 2008). A primary challenge confronting successful parentage analysis in natural populations is to control the number of false assignments (Jamieson & Taylor 1997; Christie 2010), which can occur when individuals that are not part of a parent–offspring relationship are incorrectly assigned as such. False assignments can readily occur in data sets collected from natural populations due to the large number of pairwise comparisons that must be made between putative parents and offspring. For example, if 100 adults and 100 juveniles are collected from the wild without any supplementary demographic or observational data, then 10 000 pairwise comparisons are required (i.e. each adult must be compared with each juvenile). With limited numbers of genetic loci, these large numbers of comparisons can result in pairs of individuals sharing alleles across all loci by chance alone, falsely mimicking the patterns of alleles shared between parents and their offspring. False assignments are a serious concern for most parentage studies because they can lead to incorrect conclusions regarding reproductive success, individual fitness and pedigree reconstruction (Marshall *et al*. 1998; Araki & Blouin 2005; Kalinowski *et al*. 2007). Thus, a primary objective of many parentage methods is to minimize the number of false assignments (see Jones & Ardren 2003 and Jones *et al*. 2010 for a review of parentage methods). In an attempt to identify which methods can identify the most parent–offspring pairs while minimizing the number of false assignments, Harrison *et al*. (2013a) applied three different parentage methods to simulated data sets. The authors employed a kinship program, colony (Jones & Wang 2010), a likelihood-based approach, famoz (Gerber *et al*. 2003), and a Bayesian parentage method (Christie 2010) that was suggested to have a high rate of false assignment. Here, I show that Harrison *et al*. used incorrect settings for the Bayesian approach and demonstrate that if more appropriate settings are used, then this approach actually performs better than the others at controlling the number of false assignments.

The Bayesian parentage method (Christie 2010) was initially developed to identify parent–offspring pairs in data sets where a low proportion of parents and offspring were sampled. For example, the majority of marine larvae are miniscule and are nearly impossible to directly track in their ocean environment. As such, parentage analyses can be a useful way to determine how far and to what extent larvae are dispersing, a result that has important implications for the successful design and implementation of marine-protected areas (Planes *et al*. 2009; Christie *et al*. 2010a,b; Harrison *et al*. 2012). When sampling from large natural populations, the probability of collecting any parent–offspring pairs can be low given the small size of the sample relative to the size of the population. Thus, in some cases, there may not be any true parents or offspring within the sample, such that it is vital to accurately control the rate of false assignments.

With that end in mind, the approach presented in Christie (2010) first calculates a prior probability of any putative pair being false within a data set. Within this framework, a putative pair is any pair of individuals that share at least one allele at all loci (see Christie *et al*. 2013 for a more flexible approach). The prior is calculated directly from the genetic data and is obtained by dividing the expected number of pairs that share alleles at all loci by chance alone by the observed number of putative pairs. For example, if the number of pairs expected to share an allele at all loci by chance was equal to 300 and the observed number of putative pairs in the data set equalled 1000, then the prior would be equal to 0.30. Thus, the probability that a randomly selected putative pair in the data set is false would equal 30%. This prior is next incorporated into Bayes' theorem to calculate the posterior probability for each individual putative parent–offspring pair. Even if the prior probability of a randomly selected pair being false is high, the posterior probability for an individual pair may be low because true parent–offspring pairs often share alleles that are less common than those observed in pairs that share alleles by chance (See Box 1 for a brief summary of the method).

### Box 1. Bayesian parentage analysis

To identify parent–offspring pairs, Bayes' theorem is employed to determine the posterior probability of a putative parent–offspring pair being false given the frequencies of shared alleles. Briefly, the method takes into account allele frequencies such that pairs that share common alleles are considered much less likely to be true than are pairs that share rare alleles. In accordance with Mendelian expectation, each parent–offspring pair will share at least one allele across all loci. If a limited number of loci are employed, then pairs of individuals can share alleles by chance alone. The rate of false matching for a given marker set increases exponentially with a linear increase in sample size (Christie 2010). The first step is to calculate a prior equal to the probability of any randomly selected putative pair sharing alleles by chance:

where *F*pairs equals the expected number of false parent–offspring pairs and *N*putative equals the total number of putative parent–offspring pairs. Thus, if a data set was expected to contain 10 pairs that shared alleles by chance, but was observed to contain 100 pairs, then Pr(φ) would equal 0.1. Calculating these values requires simulations based on the genotype data (See 'Methods' in Christie *et al*. 2013 for details). In rare cases where the expected number of false pairs is greater than the number of putative pairs, the prior is rounded to 1. Most, if not all, observed false pairs will share common alleles, because the probability of sharing an allele by chance is approximately proportional to the square of the allele frequency. In contrast, the probability that a true parent–offspring pair will share a particular allele is simply proportional to the allele frequency. Therefore, pairs sharing rare alleles are much more likely to be true parent–offspring pairs. Bayes' theorem is invoked to exploit this principle by calculating the probability of a putative parent–offspring pair being false given the frequencies of shared alleles:

where Pr(φ) is the prior, calculated as described above, and Pr(φ^{c}) is the complement. Pr(λ|φ) equals the probability of sharing the observed alleles given that the putative pair in question is false. This value is calculated for each putative pair using simulated multilocus genotypes. Notice that when a putative pair shares the most common alleles across all loci that Pr(λ|φ) = 1, and consequently Pr(φ|λ) = Pr(φ). To calculate Pr(λ|φ^{c}), which is the probability of sharing alleles given that a putative pair is true, the same approach is employed, but the observed allele frequencies are used rather than the frequencies at which alleles were shared. Notice also that when the prior Pr(φ) equals 1, the posterior, Pr(φ|λ), also equals 1.

For each putative pair in the data set, the Bayesian approach calculates a posterior probability of a pair being false given the frequencies of shared alleles. For example, if a particular putative pair has a posterior probability equal to 0.05, then it has a 5% probability of being false. If there are 100 other putative pairs from the same data set with posterior probabilities equal to 0.05, then five of those pairs will be false on average. The appropriate way to employ the Bayesian parentage method is to define an a priori cut-off value (hereafter: alpha) for the calculated posterior probabilities. It is generally not advisable to accept pairs with a posterior value >0.1. These pairs may not necessarily be false, but as the posterior probability increases, there is an increasing probability that the putative pair is false. Using the same example as above, if alpha is set to 0.1, then 10 of 100 assignments may be false, if alpha is set to 0.5, then 50 of 100 assignments may be false, and if alpha is set to 1, as was done by Harrison *et al*., then all assignments may be false. For most studies, setting alpha between 0.01 and 0.05 will minimize most false assignments (see Christie *et al*. 2013). In rare cases, slightly lower (e.g. 0.001) or higher (e.g. 0.1) cut-off values may be warranted.

In the paper by Harrison *et al*., the authors used a two-step process when applying the Bayesian method. First, the authors set α = 1, thereby accepting all putative pairs, regardless of the posterior probability. Because the Bayesian approach from Christie (2010) reports every putative pair that matches at all loci, this procedure is identical to performing simple exclusion without allowing for a locus to mismatch (Fig. 1A, orange line). Next, for each offspring, the authors accepted only the assignments with the lowest posterior value (Fig. 1A, blue line). Thus, if an offspring matched to one candidate father with a posterior probability equal to 0.01 and to a different candidate father with a posterior probability of 0.09, only the first assignment was included. However, if a different offspring matched one candidate father with a posterior probability of 0.9 and another candidate father with a posterior probability of 0.95, then the first pair would be accepted, even though the probability of that pair being false equalled 0.9. If an offspring only matched a single candidate parent, it was also accepted, regardless of the posterior probability. Not surprisingly, the number of type I errors (false assignments) reported in Harrison *et al*. for the Bayesian method is quite large because alpha (the cut-off value) was set to 1. Here, I reanalyse the exact data presented in Harrison *et al*. using a more appropriate cut-off value.