On minimizing assignment errors and the trade-off between false positives and negatives in parentage analysis

Authors

  • Hugo B. Harrison,

    Corresponding author
    1. School of Marine and Tropical Biology, James Cook University, Townsville, Qld, Australia
    2. Australian Research Council Centre of Excellence for Coral Reef Studies, James Cook University, Townsville, Qld, Australia
    3. Laboratoire d'Excellence ‘CORAIL’, USR 3278 CRIOBE CNRS-EPHE, CRIOBE, Moorea, French Polynesia
    Search for more papers by this author
  • Pablo Saenz-Agudelo,

    1. Red Sea Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
    Search for more papers by this author
  • Serge Planes,

    1. Laboratoire d'Excellence ‘CORAIL’, USR 3278 CRIOBE CNRS-EPHE, CRIOBE, Moorea, French Polynesia
    Search for more papers by this author
  • Geoffrey P. Jones,

    1. School of Marine and Tropical Biology, James Cook University, Townsville, Qld, Australia
    2. Australian Research Council Centre of Excellence for Coral Reef Studies, James Cook University, Townsville, Qld, Australia
    Search for more papers by this author
  • Michael L. Berumen

    1. Red Sea Research Center, King Abdullah University of Science and Technology, Thuwal, Saudi Arabia
    2. Biology Department, Woods Hole Oceanographic Institution, Woods Hole, MA, USA
    Search for more papers by this author

Abstract

Genetic parentage analyses provide a practical means with which to identify parent–offspring relationships in the wild. In Harrison et al.'s study (2013a), we compare three methods of parentage analysis and showed that the number and diversity of microsatellite loci were the most important factors defining the accuracy of assignments. Our simulations revealed that an exclusion-Bayes theorem method was more susceptible to false-positive and false-negative assignments than other methods tested. Here, we analyse and discuss the trade-off between type I and type II errors in parentage analyses. We show that controlling for false-positive assignments, without reporting type II errors, can be misleading. Our findings illustrate the need to estimate and report both the rate of false-positive and false-negative assignments in parentage analyses.

The objective of parentage analyses can vary depending on the nature of the study, although a common goal is to correctly assign each and every offspring from a population to its true mother and/or father (Blouin 2003; Jones & Ardren 2003; Jones et al. 2010). If not all putative parents have been sampled, correct assignments and correct exclusions must be distinguished from false assignments (false positive – type I error) and false exclusions (false negative – type II error). In Harrison et al. (2013a) study, we carried out simulations to assess how the number and allelic diversity of microsatellite loci, the proportion of candidate parents sampled and genotyping error could affect the susceptibility of different methods of parentage analysis to type I and type II errors. We showed that the number and diversity of loci were the most important factors defining the accuracy of parentage analyses. We found that full- and pairwise-likelihood methods were systematically better at minimizing type I and type II errors than an exclusion-Bayes theorem approach, although all methods could accurately distinguish correct assignments and correct exclusions with 20 highly diverse loci.

In his comment, Christie (2013) cautions that an error using the exclusion-Bayes theorem approach (Christie 2010) led us to wrongly conclude that this method could not control the rate of false-positive assignments. However, minimizing only false-positive assignments was not the objective of our study and to do so neglects other decision types of single parent assignment tests (Harrison et al. 2013a). We defined accuracy as the ability to distinguish correct assignments and correct exclusions from type I and type II errors; a metric that takes into account all possible decision types in parentage analyses (Harrison et al. 2013a) and is the most relevant to comparative studies. We accept that applying a maximum posterior probability of assignment (alpha) prior to accepting putative parent–offspring pairs, as Christie (2013) has done, can control the number of false-positive assignments and that for many purposes, this may be desirable. However, minimizing the rate of false assignments affects the rate of false exclusions, a trade-off that is contingent on the different objectives of parentage studies. For instance, if the alternative goal is to maximize the number of true parent–offspring pairs that are assigned, setting alpha too low may inadvertently reject a large number of correct parent–offspring relationships.

To fully evaluate the effects of fixing alpha at different arbitrary levels, we reran all 60 simulated scenarios (Harrison et al. 2013a,b) accepting either all putative parent–offspring pairs (α = 1) or only pairs with a probability of being false below 0.01 and 0.05 and analysed the effects of such measures on the accuracy of assignments. Using the same N1000 high-diversity data set with 1% genotyping error as presented in Harrison et al. (2013a), we assessed the performance of each method depending on three potential objectives of parentage analysis: (i) maximize the proportion of assignments that are correct, (ii) maximize the number of true parent–offspring pairs that are identified, (iii) obtain an accurate estimate of the proportion of true parent–offspring pairs that are present in the sample.

Fixing alpha at 0.05 or 0.01 did not improve the overall accuracy of the exclusion-Bayes method in our simulated scenarios unless the proportion of candidate parents was low (Fig. 1). Across all simulated scenarios, a cut-off value of 1, as in Harrison et al. (2013a), resulted in an overall accuracy of 0.653 ± 0.283, whereas cut-off values of 0.05 and 0.01 resulted in an overall accuracy of 0.650 ± 0.301 and 0.599 ± 0.305, respectively. Here, reducing alpha results in an explicit trade-off where the decrease in type Ia and type Ib errors (falsely assigning parentage when the true parent is or is not present in the sample of candidate parents) is outweighed by the increase in type II errors (Figs S1–S3, Supporting information). Even when using this trade-off to control the rate of false-positive assignments, the exclusion-Bayes method appears to be comparatively less effective at distinguishing between true and false parent–offspring pairs than either the pairwise-likelihood approach implemented in famoz (Gerber et al. 2003) or the full-likelihood approach implemented in colony (Wang 2004; Jones & Wang 2010).

Figure 1.

Proportion of accurate assignments of three approaches to parentage analysis. Each methods was tested on high- and low-diversity simulated microsatellite data sets with high (1%) and low (0.1%) levels of genotyping error for varying levels of number of loci and proportion of candidate parents samples. Continuous black lines correspond to results from the full-likelihood method implemented in colony, dashed black lines are the results from the pairwise-likelihood method implemented in famoz and dotted black lines from the exclusion-Bayes method using a cut-off value of 1.0 as presented in the study by Harrison et al. (2013a). Blue and red dot-dash lines correspond to results from the exclusion-Bayes method using cut-off values of 0.05 and 0.01, respectively. A value of 1.0 represents the optimal performance in each panel.

In some circumstances, the trade-off between type I and type II errors can be adjusted to meet specific objectives of parentage studies. For example, if the aim is to maximize the proportion of assignments that are correct (Fig. 2; Objective 1), using the exclusion-Bayes method with an stringent cut-off value (α = 0.01) to minimize type Ia and type Ib errors does appear to perform well compared with other methods, especially when the proportion of sampled parents and the number of loci are low. However, even in scenarios where the proportion of correct assignments equals that of famoz or colony, it identifies comparatively fewer assignments (Fig. 3). Alternatively, if the aim is to maximize the number of true parent–offspring pairs that are identified (Fig. 2; Objective 2), both type Ia (falsely assigning to a parent when the true parent was in the sample) and type II errors must be minimized. In this situation, the exclusion-Bayes method improves by allowing all putative parent–offspring pairs to be assigned (α = 1.0; Figs 2 and 3). If the aim is to obtain an accurate estimate of the proportion of true parent–offspring pairs that are present in the sample (Fig. 2; Objective 3), the primary objective is to balance type Ib errors (falsely assigning to a parent when the true parent was not in the sample) and type II errors. The number of true parent–offspring pairs present in the sample is correctly estimated when the number of type Ib equals the number of type II error. In this case, minimizing type I errors without controlling type II errors underestimates the number of true parent–offspring pairs in the sample by a factor of 2–4. Regardless of the objective, increasing the number or allelic diversity of loci is the most effective way to reduce both type I and type II errors (Figs 2 and 3, Harrison et al. 2013a) and increase the performance of parentage analyses. Simulations, with known parent–offspring pairs, are integral to estimating errors rates and therefore optimizing the performance of parentage analyses.

Figure 2.

Performance of three methods of parentage analysis under study-specific objectives. Each method was assessed using the N1000 high-diversity data set with 1% genotyping error as described in the study by Harrison et al. (2013a). The specific objectives are 1) maximizing the proportion of assignments that are correct; 2) maximizing the number of true parent–offspring pairs that are identified, and 3) obtaining an accurate estimate of the proportion of true parent–offspring pairs that are present in the sample (see Figs S1–S3, Supporting information for a description of each performance indicator). Line representations are identical to Fig. 1. A value of 1.0 represents the optimal performance in each panel.

Figure 3.

Number of correct assignment, correct exclusions, false-positive (Type Ia and Type Ib) and false-negative (Type II) assignments in the analysis of the N1000, high-diversity data set with 1% genotyping error as described in the study by Harrison et al. (2013a). We used the exclusion-Bayes method with three different cut-off values (α = 0.01, 0.05 and 1.0) and present results from famoz and colony as they were presented in the study by Harrison et al. (2013a).

The methods described by Christie (2010) and implemented in solomon (Christie et al. 2013) do appear well suited where marker information is scarce and where avoiding false assignments is a priority. Rejecting putative parent–offspring above a certain threshold alpha did not improve the overall accuracy of the exclusion-Bayes method, although it did improve its performance when the objective was to maximize the proportion assignments that were correct. This, however, is not a distinct advantage over other methods such as famoz or cervus that employ likelihood estimators (Marshall et al. 1998; Gerber et al. 2003; Kalinowski et al. 2007). These methods identify a threshold of assignment based on the distributions of likelihood scores for simulated true and false parent–offspring pairs. If the distributions overlap, the threshold value is usually set at the intersection of the two distributions to minimize both type I and type II errors or can be set higher (e.g. a value that is equal or higher than 95% or 99% of all simulated false pairs LOD scores) or lower to minimize type I or type II errors, respectively.

Clearly there can be different objectives of parentage analysis that may favour minimizing false positives, false negatives or maximizing overall accuracy. In some circumstances, where the cost of false-positive assignments is too high, minimizing type I errors to ensure that all assignments are correct may be necessary. In other cases, minimizing type II to ensure that all true parent pairs are identified may be more important. In our studies, where we have used parentage analysis to examine patterns of juvenile recruitment and the reproductive success of adults in fishes (Jones et al. 2005; Planes et al. 2009; Saenz-Agudelo et al. 2011; Berumen et al. 2012; Harrison et al. 2012; Almany et al. 2013), we consider that minimizing both type I and type II errors will provide the best estimate of these parameters. Whatever the goal or the method used, type I and type II errors should always be estimated and reported. Fixing alpha at the expense of type II errors and then only reporting type I errors can be misleading and may result in false depiction of accuracy and inaccurate estimates population parameters that rely on parentage. Lastly, increasing the quantity and quality of marker information reduces both false-positive and false-negative assignments, which can only improve the outcome of parentage studies. We concur that in the future, with next-generation techniques for sequencing large numbers of markers, all methods will be able to be applied with extremely high accuracy, and arguments about the relative merits of trading false-positive and false-negative assignments will be of marginal concern.

All authors contributed to the development and structure of this Reply. H.B.H. analyzed the data, wrote the original manuscript and all authors contributed to revisions.

Data accessibility

Simulated data sets and R scripts deposited in the Dryad Digital Repository: http://dx.doi.org/10.5061/dryad.2ht96.

Ancillary