Abstract
 Top of page
 Abstract
 Introduction
 Methods
 Computer simulation
 Results and discussion
 Acknowledgments
 Literature cited
 Appendices
The effective population size, N_{e}, is an important parameter in population genetics and conservation biology. It is, however, difficult to directly estimate N_{e} from demographic data in many wild species. Alternatively, the use of genetic data has received much attention in recent years. In the present study, I propose a new method for estimating the effective number of breeders N_{eb} from a parameter of allele sharing (molecular coancestry) among sampled progeny. The bias and confidence interval of the new estimator are compared with those from a published method, i.e. the heterozygoteexcess method, using computer simulation. Two population models are simulated; the noninbred population that consists of noninbred and nonrelated parents and the inbred population that is composed of inbred and related parents. Both methods give essentially unbiased estimates of N_{eb} when applied to the noninbred population. In the inbred population, the proposed method gives a downward biased estimate, but the confidence interval is remarkably narrowed compared with that in the noninbred population. Estimate from the heterozygoteexcess method is nearly unbiased in the inbred population, but suffers from a larger confidence interval. By combining the estimates from the two methods as a harmonic mean, the reliability is remarkably improved.
Introduction
 Top of page
 Abstract
 Introduction
 Methods
 Computer simulation
 Results and discussion
 Acknowledgments
 Literature cited
 Appendices
The effective population size, N_{e}, is one of the most important parameters in population genetics and conservation biology, because this parameter determines both the amount of genetic drift and the rate of inbreeding (Crow and Kimura 1970; Falconer and Mackay 1996). N_{e} can be estimated from demographic data such as the number of parents and the variance in their progeny number (Caballero 1994). However, the demographic data needed to estimate N_{e} is often not available in many wild species. As an alternative to estimating N_{e} from demographic data, methods for estimating N_{e} from genetic data have been developed (for reviews, see Waples 1991; Schwartz et al. 1999; Beaumont 2003; Leberg 2005; Wang 2005). These methods have different time scales on which N_{e} is measured. Some of them infer the longterm N_{e} in the past on an evolutionary time scale, and others estimate the current or shortterm N_{e} (Waples 1991; Wang 2005). For solving practical issues such as managing a small population of endangered species, an accurate estimate of the current or shorttem N_{e} is of special importance, which is a major concern of this study.
To date, three methods are available for this purpose: the temporal method (Nei and Tajima 1981; Pollak 1983; Waples 1989), the linkage disequilibrium method (Hill 1981) and the heterozygoteexcess method (Pudovkin et al. 1996; Luikart and Cornuet 1999). These methods actually assess the effective number of breeders (N_{eb}) of a cohort from which a sample is obtained. If the sample consists of reproductive adults, N_{eb} is nearly equivalent to N_{e} in populations with nonoverlapping generations (Schwartz et al. 1999; and as will be discussed later). N_{e} can be estimated from N_{eb} in populations with overlapping generations, if the age structure is known (Waples 1991).
The logic behind the temporal method is that the change of allele frequency in samples separated in time is a reflection of genetic drift. This method is the most tested of the genetic N_{eb} estimators and has been used to estimate N_{eb} of various species (Schwartz et al. 1999). The primary weakness of this method is that two or more samples separated in time are necessary (Schwartz et al. 1999). This can be expensive and, by nature, timeconsuming. The linkage disequilibrium method is based on the fact that genetic drift generates nonrandom association among alleles in different loci. Despite of the obvious advantage that this method can be used to estimate N_{eb} from a single cohort sample, there are several drawbacks (Schwartz et al. 1999; Wang 2005). Perhaps, the most critical one is that the estimator assumes an isolated equilibrium population with a constant effective size, which may not be tenable for natural populations of endangered species. The heterozygoteexcess method is based on the fact that when the breeding population is small, binomial sampling error produces allele frequency differences between male and female breeders, resulting in an excess of heterozygotes in their progeny (Robertson 1965). As in the linkage disequilibrium method, this method has the advantage that only a single cohort sample is required. Further, this method is appealing because the estimate is easily computed. However, there are few applications of this method, presumably because of the low precision, as empirically shown by Luikart and Cornuet (1999).
Several authors (Waples 1991; Pudovkin et al. 1996; Luikart and Cornuet 1999) emphasized the importance of exploring a method that gives an estimate independent of ones from existing methods, because a combined estimate of several independent estimates is expected to improve the precision of separate estimates. In the present study, a novel method for estimating N_{eb} from genetic data of a single cohort sample is proposed. The estimator is obtained from a simple parameter (molecular coancestrty) of allele sharing among sampled individuals. Reliability of the new estimator is compared with that from the heterozygoteexcess method using computer simulation. Improvement of the reliability attained by combining the two methods is also examined.
Computer simulation
 Top of page
 Abstract
 Introduction
 Methods
 Computer simulation
 Results and discussion
 Acknowledgments
 Literature cited
 Appendices
Computer simulation was carried out to evaluate the reliability of the presented method. Genotypes of individuals in the initial population were generated by assigning alleles randomly sampled from an infinite (conceptual) gene pool with a uniform allele frequency distribution with two alleles for the ‘lowpolymorphic’ marker loci case or 10 alleles for the ‘highpolymorphic’ marker loci case. The number of loci was 80 for both polymorphic cases. Prior to progeny sampling for the estimation of N_{eb}, eight generations of random mating with a breeding system defined below were simulated to accumulate inbreeding and relationship. As the breeding system, monogamy and polygyny were modeled. Under monogamy model, an equal number of male and female parents (N/2) were randomly paired to form N/2 permanent couples. Progeny (parent of the next generation) was produced from a randomly sampled couple, and the sampling of a couple and the reproduction were repeated until N/2 replacements of each sex have been obtained. Under polygyny model, N_{m} males and N_{f} (>N_{m}) females were generated, and each female was mated with a randomly sampled male (thus, there are N_{f} fixed matings). Progeny was produced from a randomly sampled mating, and this was replicated to obtain N_{m} males and N_{f} females for the parents of the next generation. In the final generation, a sample of n progeny was obtained in the same manner of reproduction of the respective breeding system. From the loci each with at least two segregating alleles in the sampled progeny, L = 5–30 loci were randomly chosen as marker loci. For the standard parental population size, N = 10 in monogamy, and N_{m} = 5 males and N_{f} = 20 females in polygyny were computed. Sample size of progeny (n) in the final generation was 100 for the two breeding systems. In the lowpolymorphic marker loci case, all the marker loci should have exactly two alleles (n_{l} = 2) as in single nucleotide polymorphisms, but the allele frequency distribution is varied among the loci. In the highpolymorphic marker loci case, not only the allele frequency distribution but also the number of alleles is varied among the loci. In the above standard population size, the average numbers of alleles per marker locus was 3.83 in monogamy, and 5.31 in polygyny, which would be comparable with the allele number of microsatellite markers in a practical survey. This type of data generation is referred to as the ‘inbred population’ model, in a sense that the parental population of sampled progeny consists of inbred and related individuals, which will be a general situation of endangered species populations.
As another type of data generation, the ‘noninbred population’ model was also simulated. The manner for the assignment of initial genotypes and the acceleration of generations were exactly same as in the inbred population, except for that the number of accelerated generations was seven. At the final generation, the allele frequency distribution of each locus was memorized. Then, genotypes of parents were regenerated by assigning alleles randomly sampled from an infinite gene pool with the memorized allele frequency distribution. The sampling of progeny and the choice of marker loci were same as in the inbred population. These procedures could produce a parental population consisting of noninbred and nonrelated individuals but having the same quality of molecular information as in the corresponding inbred population. This type of data generation could be an approximation of a recently recolonized population in an ephemeral habitat.
In additional computations, different sizes of parental population and progeny sample were examined. The effect of unequal contribution of parents on the estimates was also evaluated under monogamy with N = 10, by considering the following two patterns of unequal contributions of N/2 = 5 couples: (0.4, 0.3, 0.1, 0.1, 0.1) and (0.6, 0.1, 0.1, 0.1, 0.1). The number of replicated runs for each combination of population model, breeding system and variables was 5000.
The derivation of this equation is shown in Appendix B. N_{eb} from pedigree coancestry was also computed, which was simply obtained by substituting the average parentbased pedigree coancestry of sampled progeny into (7). The computed N_{eb} well agreed with N_{eb,demo}. Thus, only the value of N_{eb,demo} was presented in results, and it was referred to as the true value of simulation. In addition to the estimate (denoted as hereafter) obtained from (7), estimate from the heterozygoteexcess method (; Pudovkin et al. 1996) was computed for comparison. The locus specific is estimated as
where
and H_{obs,i} and H_{exp,i} are the observed and expected proportion of heterozygotes having allele i, respectively. Multiple loci estimate was simply computed as the harmonic mean of over the marker loci, following the previous simulation studies (Pudovkin et al. 1996; Luikart and Cornuet 1999). In both methods, when a negative estimate was obtained, the estimate was regarded as an infinite ().
As a criterion of evaluation, the harmonic mean of estimates over 5000 replicates was computed. Furthermore, to characterize the variation and distribution of estimates, 10th, 50th and 90th percentiles in replicates were calculated. The xth percentile was obtained as the 5000 × (x/100)th smallest estimate in 5000 replicated estimates.
Results and discussion
 Top of page
 Abstract
 Introduction
 Methods
 Computer simulation
 Results and discussion
 Acknowledgments
 Literature cited
 Appendices
Left and middle panels in Fig. 1 (A: monogamy and B: polygyny) illustrate the 10th, 50th and 90th percentiles, and a harmonic mean of 5000 replicated estimates of the effective number of breeders (N_{eb}) from the heterozygoteexcess and molecular coancestry methods applied to the noninbred population with L = 5–20 highpolymorphic marker loci. The three percentiles indicate that the distributions of estimates from both methods are skewed upward. The 50th percentile and harmonic mean were, however, close to N_{eb,demo} (10 for monogamy and 13.79 for polygyny) in both methods. Under monogamy, the interval between 10th and 90th percentiles in tended to be wider than that in , whereas the reversal tendency was observed under polygyny.
In a strict sense, the heterozygoteexcess method is valid only when the progeny are produced by random union gametes (Pudovkin et al. 1996; Luikart and Cornuet 1999). When the progeny are produced by individualbased pairwise matings such as monogamy and polygyny, the sample of progeny is familystructured. In such a sample, heterozygote deficiency generated by the interfamily Wahlund effect may mask the heterozygote excess, reducing the usefulness of the heterozygoteexcess method (Luikart and Cornuet 1999). Using computer simulation, Luikart and Cornuet (1999) examined the effect of a familystructured sample on the reliability of the heterozygoteexcess method. They found that the heterozygoteexcess method gives an essentially unbiased estimate even with a familystructured sample. However, the existence of family structure in sampled progeny substantially increased the variance of estimates under monogamy. Simulation data of Luikart and Cornuet (1999) was generated in the same manner as the noninbred population of the present study. Thus, their sample of progeny contains only sib families. On the other hand, the sample of progeny from the inbred population consists of families with various degrees of relationship (e.g. cousins). The increased confidence interval observed in Fig. 2 indicates that the application of the heterozygoteexcess method to such a sample reduces the reliability, although the method still gives an unbiased estimate. The reduction of reliability will be more serious under monogamy (Fig. 2).
As a detail information on the estimation process in the molecular coancestry method, Table 1 gives the observed and estimated [from equation (6)] AIS probability (s_{l}) in the parental population, and the average estimated parentbased coancestry among actual nonsibs (NS), actual halfsibs (HS), actual fullsibs (FS) and all pairs of sampled progeny, for the case of monogamy and polygyny with L = 15 highpolymorphic marker loci. All the values are shown as the average over 5000 replicates (and over 15 marker loci for s_{l}). In the noninbred population, the estimated AIS probability was close to the observed value, giving the average estimates of the parentbased coancestries in the three categories (NS, HS and FS) close to the pedigree coancestries, i.e. 0, 0.125 and 0.25 for NS, HS and FS, respectively. Thus, the molecular coancestry method gives an essentially unbiased estimate of N_{eb} for the noninbred population (Fig. 1). However, the process of selecting putative nonsibs in the molecular coancestry method causes a problem when applied to the inbred population. The selection method may select the actual nonsibs with a reasonably high probability. But the putative nonsibs selected from the inbred population may be lessrelated nonsibs with regard to further back ancestral relationships than the average nonsibs among the sampled progeny. As seen from Table 1, this causes an underestimation of AIS probability, implying that the base population for coancestry is set at a further back generation over the parental generation. This overrun in setting the base population results in an overestimation of the parentbased coancestry, leading to a downward bias of as observed in Fig. 2. Irrespective of this drawback, the narrow confidence interval of in the inbred population is attractive in its practical use. Although the molecular coancestry method will be less useful for a point estimate of N_{eb} in inbred populations, it will be useful for detecting a small N_{eb}.
Table 1. Observed and estimated AIS probability, and estimated parentbased coancestries among actual nonsibs (NS), actual halfsibs (HS), actual fullsibs (FS) and all pairs of sampled progeny from the noninbred and inbred parental populations under monogamy with N = 10 parents or polygyny with N_{m} = 5 male and N_{f} = 20 female parents, for the case of L = 15 highpolymorphic marker loci and the sample size of n = 100. Breeding system  Population  AIS probability  Estimated parentbased coancestry among 

Observed  Estimated  Actual NS  Actual HS  Actual FS  All pairs 


Monogamy  Noninbred  0.3587  0.3571  0.0045  –  0.2552  0.0546 
 Inbred  0.3565  0.3366  0.0346  –  0.2651  0.0806 
Polygyny  Noninbred  0.2967  0.2972  0.0008  0.1259  0.2503  0.0370 
 Inbred  0.2981  0.2830  0.0237  0.1418  0.2592  0.0579 
The simulation results for the estimation with the lowpolymorphic marker loci are shown in the left and middle panels in Fig. 3(A) for noninbred and Fig. 3(B) for inbred populations in monogamy. Results in polygyny (data not shown) were essentially similar to those in monogamy. As seen from the 10th and 90th percentiles in , the heterozygoteexcess method suffers from a larger confidence interval. In fact, even with L = 30 marker loci, the 90th percentile in was still infinite in both noninbred and inbred populations. In contrast, the molecular coancestry method gave an estimate with a practically acceptable confidence interval when L = 30 marker loci were available.
Table 2 shows the results from simulation runs with additional combinations of the number of parents and sample size, for the case of L = 15 highpolymorphic marker loci. As the harmonic mean of replicated estimates well agreed with the 50th percentile, it was not shown in the table. The general properties of estimates, e.g. a small bias of estimation from both methods in the noninbred population and a downward bias of in the inbred population, were similar to those observed in Figs 1–3. A remarkable point in Table 2 is a narrower confidence interval of in a small sample of progeny from a small inbred population. For example, under monogamy with N = 10 parents, the 90th percentile of from n = 10 progeny was 38.2, while the corresponding percentile of was infinite. In most of the practical situations of conservation biology, the population in question will be small and inbred, and may suffer from a low reproductive ability. The molecular coancestry method could significantly contribute to the detection of small N_{eb} of such populations. The magnitude of the downward bias of increased in a larger inbred population, as seen from the 50th percentiles in monogamy with N = 50 and polygyny with N_{m} = 20 and N_{f} = 80, which may limit the usefulness of the molecular coancestry method. However, even in these populations, the narrow confidence interval of would be of practical significance for obtaining a conservative estimate of N_{eb}.
The effect of unequal contributions of parents on estimates of N_{eb} is shown in Table 3, in which a monogamy with N = 10 (half of each sex) and a sample size of n = 100 offspring was assumed. In all the cases computed, the 90th percentile in the molecular coancestry method was much smaller than in the heterozygoteexcess method. As unequal contribution of parents is an important factor for a smaller N_{e} than the census number of breeders (Frankham 1995), the higher accuracy of the present method observed in Table 3 will be a practically appealing point.
Table 3. Percentiles (10th, 50th and 90th) of estimated effective number of breeders for 5000 replicated simulation runs with unequal contribution of parents under monogamy in the noninbred and inbred populations with N = 10 (half of each sex) parents and the sample size of n = 100. Contribution  N_{eb,demo}  Population    

10th  50th  90th  10th  50th  90th  10th  50th  90th 


0.4, 0.3, 0.1, 0.1, 0.1  7.18  Noninbred  4.53  8.14  302.02 (9.3)  3.59  6.91  18.55 (2.1)  4.81  7.31  13.46 (0.2) 
Inbred  4.07  8.30  ∞ (16.9)  2.69  5.45  14.09 (1.1)  4.09  6.31  10.95 (0) 
0.6, 0.1, 0.1, 0.1, 0.1  5.03  Noninbred  3.80  6.82  107.07 (8.8)  2.26  4.74  13.90 (2.0)  3.40  5.42  9.94 (0.1) 
Inbred  3.63  7.24  ∞ (14.6)  1.76  4.17  12.50 (1.6)  2.96  5.02  8.90 (0.1) 
Figure 4 represents the joint distribution of estimates from the heterozygoteexcess and molecular coancestry methods applied to the inbred populations under polygyny with N_{m} = 5 and N_{f} = 20 parents and L = 15 highpolymorphic marker loci. The moment and Spearman’s rank correlations, excluding the pairs with infinite estimate, were −0.003 and −0.164, respectively. In all other cases simulated, the correlations of these orders were obtained. An interesting point in Fig. 4 is that the incidence of overestimations in the two methods tends to be exclusive. At present, it is not theoretically obvious how to combine several estimates of N_{eb} optimally to give a single best estimate (Wang 2005). As a tentative method, I combined the two estimates as the harmonic mean, according to the suggestion of Waples (1991):
The harmonic mean is expected to work well in the present case, because of the exclusive incidence of overestimations in the two methods; an overestimated N_{eb} returned by one method is filtered out and the combined estimate is largely determined by the estimate from the other method. The property of the combined estimate is shown in the right panels in Figs 1–3 and the column of in Tables 2 and 3. The combined estimate in the inbred population was biased downward because of the downward bias of . However, as expected, the confidence interval of the estimate was substantially narrowed, comparing with the separate estimates. It is notable that the improvement is larger for lower marker quality, i.e. for a smaller number of marker loci and/or a smaller number of alleles in each locus (Figs 1–3), and for a smaller sample size (Table 2). Although the development of an optimal method for combining separate estimates into a single estimate deserves further investigation with sophisticated statistical tools, the above results strongly suggest that a highly reliable estimate can be obtained from the optimal combination.
Some of the limitations of the method proposed in this study are shared by most of the published methods: marker alleles are assumed to be selectively neutral, mating within the population is at random and immigration from other populations is absent (Leberg 2005). In addition, the present method involves a problem associated with age at sampling. Estimation of N_{e} from the recurrence equation (1) is based on the assumption that the average coancestries in two successive generations are measured as the same age stage. In fact, the application of the present method to a sample of juveniles gives an estimate of ‘the effective number of breeders’. But even in a population with nonoverlapping generations, the estimate can be largely different from N_{e}, depending on the survival pattern of juveniles to adults. Following Crow and Morton (1955), we consider two extreme patterns of the survival: (i) random survival and (ii) survival of the family as a unit. In the random survival model, survival from juvenile to adult is randomly determined with the expected survival rate s. Under this pattern of survival, the average coancestry among adults is expected to be unchanged from that among the juveniles. Thus, if the present method is applied to a population with nonoverlapping generations, . Under the survival of the family as a unit, the entire juveniles in a family either survive or do not. With the average survival rate s in the population, obtained from a sample of juveniles is related to N_{e} as (for the theoretical aspect of the above consideration, see Appendix C). Although this model describes an extreme pattern of survival, of animals with low fecundity and high survival rate, such as mammals and birds in which parental nursing for their brood is generally observed, should be cautiously interpreted. On the other hand, will give an appropriate estimate of N_{e} when the method is applied to animals with high fecundity and low survival rate, such as marine invertebrates and fishes, whose survival seems to be essentially random.
The present method involves additional problems associated with the selection method for putative nonsibs. One is the problem as to the determination of the number (n_{0}) of selected pairs as putative nonsibs. Although the selection method applied to the present study automatically assigns the number (n) of the sampled progeny to n_{0}, this is an arbitrary choice. With a smaller n_{0}, it is more likely that the selected pairs are actually nonsibs, but the coancestry among them will underestimate the AIS probability, and vice versa. Another problem is the driftinduced linkage disequilibrium among marker loci. In small populations, the driftinduced linkage disequilibrium may be an important factor (Hill 1981) and reduce the degree to which loci provide independent information about coancestry. This may reduce the effectiveness of the selection criterion of putative nonsibs defined by equation (8). One potential for solving these problems and improving the estimates of N_{eb} from molecular coancestry is the use of a sibship reconstruction technique. To date, several methods for sibship reconstruction from molecular markers have been developed using different algorithms, such as Markov Chain Monte Carlo (MCMC) algorithm (Almudevar and Field 1999; Thomas and Hill 2002; Wang 2004) and simulated annealing (Almudevar 2003; Fernández and Toro 2006), and have been reviewed by Blouin (2003) and Butler et al. (2004). I here take the method proposed by Fernández and Toro (2006) as a trial example of the use of a sibship reconstruction technique for estimating N_{eb}. By the use of their method, we can find the sibships among sampled individuals that yield a parentbased coancestry matrix with the highest correlation with the molecular coancestry matrix. A notable feature of their method is that it is free from the assumption of linkage equilibrium among marker loci. Two methods for the use of the reconstructed sibships were examined: In the first method (SR1), the reconstructed sibships were directly used for computing in equation (7). In the second method (SR2), the average locusspecific coancestry among the inferred nonsib pairs were used for estimating s_{l} as in equation (6). Simulation with 200 replicates was run for the case of polygyny in the inbred population with N_{m} = 5 and N_{f} = 20 parents, n = 100 sample of progeny and L = 15 highpolymorphic marker loci. The results are summarized in Table 4. The two methods with sibship reconstruction worked quite well; they gave nearly unbiased estimates and narrower confidence intervals. Although further evaluations including other published methods for sibship reconstruction should be carried out under a wide range of scenario, the results in Table 4 suggest the potential for improving the molecular coancestry method.