Nearly unbiased estimator of Ne/N based on kinship relationships

This study develops a nearly unbiased estimator of the ratio of the contemporary effective mother size to the census size in a population, as a proxy of the ratio of contemporary effective size to census size (Ne/N). The proposed estimator is based on both known mother–offspring (MO) and maternal-sibling (MS) relationships observed within the same cohort, in which sampled individuals in the cohort probably share MO relationships with sampled mothers. The rationale is that the frequency of MO and MS pairs contains information regarding the contemporary effective mother size and the (mature) census size, respectively. Therefore, the estimator can be obtained only from genetic data. Moreover, We also evaluate the performance of the estimator by running an individual-based model. The results of this study provide the following: i) parameter range for satisfying the unbiasedness, and ii) guidance for sample sizes to ensure the required accuracy and precision, especially when the order of the ratio is available. Furthermore, the results demonstrate the usefulness of a sibship assignment method for genetic monitoring, providing insights for interpreting environmental and/or anthropological factors fluctuating Ne/N, especially in the context of conservation biology and wildlife management.


INTRODUCTION
The Estimation of the ratio of the contemporary effective population size to the census size (N e /N) has attracted much research attention for providing information about a current population, especially in the context of conservation biology and wildlife management (Frankham, Bradshaw, & Brook, 2014;Palstra & Fraser, 2012). Small N e /N demonstrates large variance in reproductive success (Akita, 2019;Wang, Santiago, & Caballero, 2016;Waples, 2016), resulting from the variance of reproductive potential (e.g., the big old fat fecund female fish hypothesis; Hixon, Johnson, & Sogard, 2014) or from the situation in which only some families successfully reproduce (referred to as the "Sweepstakes reproductive success" hypothesis, Hedgecock & Pudovkin, 2011).
Moreover, if N e /N is invariant across years, then N e may behave like an index of N, and vice versa (Luikart, Ryman, Tallmon, Schwartz, & Allendorf, 2010). However, if N e /N fluctuates across years, the trends can clarify the interpretation of environmental and/or anthropological factors, causing the variance of reproductive potential, family-correlated survivorship, or fluctuating population dynamics.
The estimation of N e /N has been performed by utilizing the estimated values of contemporary effective population size (N e ) and census size (N), unless complete pedigree data and/or full census data are available. Additionally, there are numerous methods for estimating N e from genetic markers (Wang et al., 2016, and the references contained therein). There are also numerous methods for estimating N, such as a mark-recapture method or population dynamics modeling with survey data (e.g., Kéry & Schaub, 2011;Methot & Wetzel, 2013;Quinn & Deriso, 1999;Seber, 1982). It is known that there are large variations in both estimators; thus, their combination (i.e., the estimator of N e /N) also shows large variation (Marandel et al., 2018;Palstra & Fraser, 2012). There is currently a little theoretical foundation for the estimator of N e /N, indicating no guidance for a sample size to ensure the required accuracy and precision.
Close-kin mark-recapture (CKMR) is a recently developed method for estimating N that utilizes the information about kinship in a sample. This was possible owing to the recent advances in genetic methods for kinship determination (Bravington, Grewe, & Davies, 2016;Bravington, Skaug, & Anderson, 2016;Hillary et al., 2018;Skaug, 2017) although similar methods have been proposed in the beginning of the 21st century (Nielsen, Mattila, Clapham, & Palsbøll, 2001;Pearse, Eckerman, Janzen, & Avise, 2001;Skaug, 2001). Besides, the rationale is that the presence of a kinship pair in the sample is analogous to the recapture of a marked individual in mark-recapture.
Therefore, kinship pairs in the sample are less likely to be observed in larger populations; thus, the number of kinship pairs may reflect N. Since the original CKMR is designed to estimate adult abundance (i.e., N), the monitoring data for CKMR also produce the estimator of N e by detecting half-sibling (HS) pairs within the same cohort (Akita, 2019). Since kinship determination is more accurate, this kinship-oriented estimation of N e was presented in the context of the sibship assignment method (Wang, 2009) and is expected to provide a much more accurate estimator.
In this study, we propose a new method for estimating the ratio of contemporary effective mother size to the census size (N e,m /N m ) in a population, as a proxy of N e /N. Assuming that kinships are genetically detected without any error, this method is based on the numbers of maternalsibling (MS) and mother-offspring (MO) pairs in a sample. Thus, sampling is completed at a single breeding time; sampling offspring within the same cohort and mothers probably shares MO relationship with sampled offspring. Our model provides a nearly unbiased estimator of N e,m /N m that explicitly incorporates two types of overdispersed reproduction (i.e., parental and nonparental variations; Akita, 2019), making it possible to target a species that shows iteroparity (i.e., multiple reproductive cycles during the lifetime) and/or sweepstakes reproductive success. First, we explain the modeling assumption and sampling scheme. Then, we analytically determine (nearly) the un-biased estimators of N e,m , 1/N m , and N e,m /N m , which are based on the numbers of MS and/or MO pairs. Finally, by running an individual-based model, we investigate the performance of the estimator and provide a guide for a sample size. Moreover, it is noteworthy that our modeling framework can be applied to diverse animal species. However, the description of the model focuses on fish species, which are presently the best candidate target of our proposed method.

THEORY
The main symbols used in this paper are summarized in Table 1.

Hypothetical population
We suppose that there is a hypothetical population comprising N m mothers and there is also no population subdivision or spatial structure. In this study, a mature female is called a mother even if she does not produce offspring. For mathematical tractability, we assume that only one spawning ground exists in which the mothers remain for the entire spawning season. Following Akita (2019), our modeling framework employs two types of overdispersed reproduction: parental and nonparental variations. Thus, the former indicates a variation caused by the mother's covariates, such as age, weight, and residence time on the spawning ground, while the latter indicates a variation caused by non-random stochastic events during a series of reproductive episodes, which are independent of the mother's covariates, such as family-correlated survivorship or the mating behavior effects (e.g., competition for males/females and correlation between mating opportunities of the mother and the number of her offspring). Let k i denote the number of surviving offspring of mother i (i = 1, 2, . . . , N m ) during sampling.
It is noteworthy that k i is assumed to be observed at the sampling, as explained in the next subsection. Following Akita (2019) and giving the expected number of the surviving offspring per mother during the sample timing, λ i (> 0), k i is assumed to follow a negative binomial distribution through a conventional parametrization: where φ (> 0) denotes the overdispersion parameter that describes the degree of the nonparental variation. Presently, φ is assumed to be constant across mothers, whereas the expected number of the surviving offspring (λ i ) varies across mothers. The mean and variance of this distribution are denoted by λ i and λ i + λ 2 i /φ , respectively. In the limit of infinite φ , this distribution becomes a Poisson distribution, which is given by Pr[k i |λ i ] = λ k i i e −λ i /(k i !). We assumed that λ i is independent and identically distributed with the density function f (λ ), which produces a parental variation.
The shape of the density function is often complex, but it may be described by information, for example, the mother's weight composition in the population. The specific form of f (λ ) is given in Supporting Information and is used for running an individual-based model.

Sampling
To obtain the estimator of N e,m /N m , we utilize the number of MS and MO pairs observed in a sample. For the mathematical tractability, only one reproductive season is targeted for sampling.
Thus, whether the sampling method does not affect our modeling framework whether it is invasive or noninvasive. In the population, n M mothers are randomly sampled immediately after the end of the reproductive season. Additionally, in the population, n O offspring are also randomly sampled but allowed to die before sampling. The numbers of MS and MO pairs are determined by pairwise comparison of all the sample individuals ( n O C 2 and n M n O comparisons, respectively). As depicted in Fig 1, six offspring and three mothers are sampled and two MS and three MO pairs are observed.
In our modeling framework, if an MS pair also shares a paternal-sibling (PS) relationship, we count it as an MS pair and ignore the existing full-sibling (FS) relationship.
2.3 Effective mother size (N e,m ) and the ratio to census size (N e,m /N m ) Akita (2019) derived the approximate probability showing that two offspring share an MS relationship with an arbitrary mother (denoted by π MS ) as follows: where Without both parental and nonparental variations (i.e., λ is constant among mothers and φ → ∞), π MS converges to 1/N, corresponding to the Poisson variance for all mothers in a population.
Moreover, the effect of the two factors causing a deviation from the Poisson variance can be combined as parameter c (≥ 1). In the sequel, we refer to "overdispersion" as the distribution of the offspring number that results from this combined effect. By applying the combined effect, the variance of the offspring number can be given by suggesting that the variance substantially increases with c.
Akita (2019) defined the contemporary effective mother size as follows: Besides, this definition is related to the inbreeding effective population size (Nordborg & Krone, 2002;Wang, 2009). When sampling from a single cohort in a population with overlapping generations, the effective mother size in our definition corresponds to the effective breeding mother size that produces a single cohort. We obtain the ratio of the effective mother size to census size using Eqs. 2 and 4 (Akita, 2019), and it is given by where N m 1 is assumed for approximation.

Estimator of N e,m /N m
This subsection proposes the estimator of N e,m /N m as follows: A "hat" denotes the estimator of a variable in this study. The requisite condition that satisfies Eq. 6 is independent of N e,m and 1/N m . This property will be shown later in this subsection. Akita (2019) derived the nearly unbiased estimator of N e,m , which is given by where H obs MS denotes the observed number of MS pairs in a sample. This estimator works well unless n O is very small. Akita (2018) obtained a probability in which a randomly sampled mother and her offspring shares an MO relationship, which is given by This indicates that π MO is independent of the distribution of the offspring number. By definition of π MO , we can set its estimator by H obs MO /(n M n O ), where H obs MO denotes the observed number of MO pairs in a sample. Thus, the estimator of 1/N m can be determined as follows: Finally, substituting N e,m (Eq. 7) and 1/N m (Eq. 9) into Eq. 6, we obtain the estimator of N e,m /N m as follows: Let both n M and n O be given. We numerically confirmed that there is no correlation between H obs MO and H obs MS (results are not shown). To intuitively explain this independency, we consider three mothers (i = 1, 2, 3) and their offspring, and assume that (k 1 , k 2 , k 3 ) = (3, 1, 1) and (n M , n O ) = (1, 3). Moreover, when the three offspring born to the first mother are sampled (i.e., H obs an offspring is sampled from each mother's offspring (i.e., H obs MS = 0), the expected number of MO relationship is also one (= 1/3 × 1 + 1/3 × 1 + 1/3 × 1). Therefore, we conclude that both N e,m and 1/N m are independent, and N e,m /N m is expected to work well (see details in the RESULTS section).
The bias of N e,m /N m is defined by b, which is approximately given by (see APPENDIX for It is noteworthy that N e,m /N m is downwardly biased, especially when n O is very small. However, this bias may be ignored for a wide range of parameters (see details in the RESULTS section). Theoretically, b asymptotically converges to zero as n O increases, making N e,m /N m a nearly unbiased estimator. Moreover, as demonstrated in the RESULTS section, it is observed that an extremely skewed reproduction breaks down the unbiasedness (e.g., in the case that c = 20 and 100 in the results).

Individual-based model
We developed an individual-based model that tracks kinship relationships to evaluate the estimator's performance (e.g., N e,m /N m ). The population structure was assumed to be identical to that in the development of the estimators. In addition, the population comprised mothers and their offspring, and it was assumed to follow a Poisson or negative binomial reproduction. The expected number of the surviving offspring of a mother followed the density distribution f (λ ) (see Sup- porting Information for details). We calculated overdispersion parameter (c) from φ and f (λ ), as well as confirmed numerically that the value of c gives the same statistics of the estimators even if the combination of φ and f (λ ) differs (results are not shown). Therefore, each offspring retained the mother's ID, making it possible to trace an MS and MO relationship.
Let a parameter set (n O , n M , N m , φ , and parameters that determine f (λ )) be given. We simulated a population history in which N m mothers generated offspring; this process was repeated 100 times. The sampling process for each history was repeated 10,000 times, acquiring 1,000,000 data points that were utilized to construct the distribution of the estimators for each parameter set.
Furthermore, true value of N e,m was calculated from N m and c (Eqs. 2 and 4).

RESULT
We  Fig. 2. For most of the investigated parameter sets, we observed that their relative error is less than 10%. As expected, the relative error is not affected by n M since 1/N m is exactly an unbiased estimator of 1/N m (see Eq. A2 in APPENDIX and also Fig. S2 in Supporting Information). Meanwhile, N e,m is downwardly biased when n O is relatively small to true N e,m (e.g., see c = 1 in Fig. 2 and also Fig. S1 in Supporting Information), as presented in Akita (2019); thus, N e,m /N m is downwardly biased.
Contrary to the theoretical prediction for the direction of the bias (Eq. 11), relatively strong overdispersion results in an upwardly bias for N e,m /N m when c is relatively large (e.g., c = 20 and 100 in in Supporting Information, the dependency results from the combined effects of variation of both 1/N m and N e,m . As c increases, it is noteworthy that the parameter space of sample sizes demonstrating large variation of 1/N m (e.g., CV > 30%) expands; however, when c is small (e.g., c = 1), relatively small n O results in large variation of N e,m because of a relatively large N e,m .

DISCUSSION
We theoretically developed a nearly unbiased estimator of the ratio of contemporary effective mother size to the census size (N e,m /N m ) in a population (Eq. 10). The proposed estimator is based on known MO relationship and MS relationships observed within the same cohort, in which sampled individuals in the cohort probably share MO relationships with sampled mothers (Fig 1). Moreover, the performance of the estimator (accuracy and precision) was quantitatively evaluated by running an individual-based model (Figs. 2 and 3). Meanwhile, the bias is analytically provided (Eq. 11). Our modeling framework utilizes two types of reproductive variations (Akita, 2019): variance of the average offspring number per mother (parental variation, denoted by f (λ )), and variance of the offspring number across mothers with the same reproductive potential (nonparental variation, denoted by φ ). Additionally, these two effects result in a skewed distribution of offspring number and are summarized into one parameter (c) in the framework.
Thus, our estimator can be calculated from sample sizes of mother and offspring ( Our modeling framework is presented by combining the context of the sibship assignment method (for estimating N e,m ) and the CKMR method (for estimating 1/N m ), which defines a kinship-oriented estimation of effective/census population size. Therefore, improvements to these methods directly contribute to the estimation of N e /N. Furthermore, the original theory of the sibship assignment method requires HS and FS pairs but does not require a distinction between the MS and PS pairs. This is a significant advantage due to the difficulty of the distinction from genetic data. However, the limitation of using MS or PS pair enables us to employ a nearly unbiased estimator of N e for particular sex (Akita, 2019), which greatly improves the accuracy of the estimation of the N e,m in this study and thus that of N e,m /N m . It is noteworthy that the estimator of 1/N is given by where n P and H obs PO denotes the sample size of the parent and the observed number of parentoffspring (PO) pairs in a sample, respectively (Bravington, Skaug, & Anderson, 2016). The development of the unbiased estimator of N e without a distinction between MS and PS pairs that could provide an unbiased estimator of N e /N coupled with Eq. 12, is a study for the future. Furthermore, using cross-cohort HS pairs, the CKMR method also provides the estimator of N (Bravington, Skaug, & Anderson, 2016) that does not require the sampling of the parent, which probably provides the estimator of N e /N only from unmatured samples. This perspective of the study will also be conducted in the future. π MO Probability that a randomly selected pair (mother and offspring) shares a mother-offspring relationship π MS Probability that a randomly selected pair (two offspring) shares a maternal-sibling relationship b Bias of N e,m /N m As stated in the main text, both N e,m and 1/N m are independent. Thus, the first term in the righthand side of Eq. A1 can be ignored. The expectation of 1/N m is given by which is illustrated in Appendix D of Akita (2019). Together with these relationships, we can obtain the bias of N e,m /N m described in Eq. 11.

Supporting Information
Probability density function and its moment of λ Suppose that the mean fecundity of a mother depends on her age. Let λ a denote the mean fecundity, which is a function of age (denoted by a). The moment can be defined as E[λ m ] = ∑ a max a=0 λ m a h mat (a), where h mat (a) is the frequency of mature mothers at a given age, and a max denotes the maximum age. Thus, we can numerically obtain the moment by applying λ a and h mat (a).
For marine species with a type-III survivorship curve, it is generally assumed that individual fecundity is proportional to weight. By utilizing the von Bertalanffy growth equation for body weight, λ a is explicitly defined as a function of age as follows: where κ, a 0 , and β are conventionally used parameters in the von Bertalanffy equation, and they denote the growth rate, the adjuster of the equation for the initial size of the animal, and the allometric growth parameter, respectively. To obtain a specific value of λ , a coefficient value of 10 multiplied by the right-hand side of Eq. S1 was used when running the individual-based model.
The frequency of mature mothers at a given age can be given as the following: h mat (a) ∝ h(a)Q(a), satisfying ∑ a max a=0 h mat (a) = 1, where h(a) and Q(a) denote the frequency and maturity at a given age, respectively. Although f (a) is affected by historical population dynamics and age-dependent survival, for simplicity, the mortality rate is assumed to be constant (i.e., age independent): where S denotes a survival probability. The maturity at age (Q(a)) is assumed to be a knife-edge function, which is given by where a mat denotes the mature age.