Pedigree reconstruction from SNP data: parentage assignment, sibship clustering and beyond
Abstract
Data on hundreds or thousands of single nucleotide polymorphisms (SNPs) provide detailed information about the relationships between individuals, but currently few tools can turn this information into a multigenerational pedigree. I present the r package sequoia, which assigns parents, clusters half‐siblings sharing an unsampled parent and assigns grandparents to half‐sibships. Assignments are made after consideration of the likelihoods of all possible first‐, second‐ and third‐degree relationships between the focal individuals, as well as the traditional alternative of being unrelated. This careful exploration of the local likelihood surface is implemented in a fast, heuristic hill‐climbing algorithm. Distinction between the various categories of second‐degree relatives is possible when likelihoods are calculated conditional on at least one parent of each focal individual. Performance was tested on simulated data sets with realistic genotyping error rate and missingness, based on three different large pedigrees (N = 1000–2000). This included a complex pedigree with overlapping generations, occasional close inbreeding and some unknown birth years. Parentage assignment was highly accurate down to about 100 independent SNPs (error rate <0.1%) and fast (<1 min) as most pairs can be excluded from being parent–offspring based on opposite homozygosity. For full pedigree reconstruction, 40% of parents were assumed nongenotyped. Reconstruction resulted in low error rates (<0.3%), high assignment rates (>99%) in limited computation time (typically <1 h) when at least 200 independent SNPs were used. In three empirical data sets, relatedness estimated from the inferred pedigree was strongly correlated to genomic relatedness.
Introduction
Pedigrees have many uses in a wide variety of fields, ranging from animal breeding and human genealogy to wildlife genetics and ethology. Parentage assignment remains essential for unbiased estimation of trait heritabilities, as even though pairwise relatedness coefficients can now be estimated more precisely directly from genomic data than from a pedigree (Visscher et al. 2006; Bérénos et al. 2014), heritability estimates still require proper accounting for the similarity due to shared parents (Kruuk & Hadfield 2007; Bérénos et al. 2014). The relevant shared parent is unobservable in many marine species, den‐sharing social mammals or seed‐dispersing plants, and in such cases, a pedigree is required to distinguish parents from full‐siblings and offspring, or between paternal and maternal half‐siblings. Moreover, in natural populations, pedigrees provide estimates of reproductive success, the key indicator of individual fitness. Thus, pedigree reconstruction remains useful in the current genomics era.
A plethora of methods have been developed to reconstruct pedigrees based on a dozen or so multi‐allelic microsatellites (see Jones et al. (2010) for an overview). High‐resolution SNP data can open up new ways of pedigree reconstruction, by utilizing the more reliable distinction between different categories of relatives. Simultaneously, the lower information content per SNP necessitates a large number of markers to obtain the same accuracy as with a dozen microsatellites. This puts a considerable strain on machinery intended to deal with variable number of alleles per marker, while the binary nature of typical SNPs allows some computational short cuts to be taken. For example, dealing with genotyping errors and missing data requires summation of probabilities over all possible actual genotypes (Wang 2004; Hadfield et al. 2006. For an offspring–mother–father trio, there are 33 = 27 possible genotype combinations per SNP, and all probabilities for each locus are easily calculated once and stored in look‐up tables. This is less practical for a microsatellite locus with say 10 alleles and (102 + 10)3/2 = 166 375 possible trio genotypes, and alternative tactics have been developed (e.g. Wang 2004). Therefore, new tools are required, specifically designed for SNPs.
Pedigree reconstruction not only entails parentage assignment, but when sampling of candidate parents is incomplete, also clustering of (half‐)siblings sharing the same, nongenotyped parent. This is often performed using colony (Wang 2004, 2012; Jones & Wang 2010), and can substantially increase the number of within‐cohort pedigree links (e.g. Walling et al. 2010). However, amalgamating sibships across multiple cohort is not straightforward, and reconstructed sibships are typically unconnected to earlier parts of the pedigree, affecting amongst others estimates of inbreeding coefficients (Taylor et al. 2015). Assigning grandparents to sibship clusters would overcome the latter limitation and involves highly similar comparisons to assigning half‐siblings. To my knowledge, this is not attempted in any available software, although methods to assign grandparents to individuals have been described (e.g. Letcher & King 2001; VanRaden et al. 2013).
Pedigree reconstruction methods
Most pedigree reconstruction methods can be grouped into three broad categories: exclusion methods, relatedness‐based methods and likelihood‐based methods, which are of increasing power, but have increasing computational cost as a trade‐off. The first simply excludes all candidate parents which do not share at least one allele with the focal individual at each marker locus, and has been used with both microsatellites (see Thompson & Meagher 1987) and SNP data (Calus et al. 2011; Hayes 2011). Often some genotyping errors or mutations are allowed for, and the main advantage is that it is very fast. When a very large number of SNPs are used, the number of opposing homozygotes can also be used to differentiate full‐siblings and half‐siblings from unrelated pairs (Calus et al. 2011). The major caveat is that when several candidate parents are nonexcluded, other methods are required to differentiate between them.
Methods in the second category estimate pairwise relatedness r or kinship coefficients between individuals, and use these to categorize the data into first‐degree relatives, second‐degree relatives and unrelated (Thompson 1975). In systems with nonoverlapping generations and no inbreeding, this may be sufficient to fully reconstruct a pedigree. When generations overlap, different statistics are required to differentiate between parent–offspring pairs and full‐siblings, for example, which are both related by r = 0.5. Parent–offspring and full‐sibling pairs can be distinguished using the Cotterman coefficients, the probabilities that the pair share 0, 1 or 2 alleles identical by descent at a locus, but neither pairwise measure can distinguish between half‐siblings, grandparents and full aunts/uncles (all r = 0.25).
In comparison, likelihood methods (the third category) are considerably more powerful (Thompson 1986; Hill et al. 2008), although computationally notably slower. The likelihood of a particular pedigree configuration is the probability of observing the observed genotypes, conditional on the genotypes of the assigned parents, multiplied over all individuals and, when loci are assumed independent, multiplied over all loci. This approach makes use of heterozygous genotypes, which are ignored by exclusion methods, and can be calculated over many individuals jointly, whereas relatedness is typically calculated pairwise (although see Wang (2007) for a triadic version). Likelihoods allow more powerful distinction between alternative candidate fathers when one can condition on the genotype of a known mother, as implemented in cervus (Marshall et al. 1998), colony (Wang 2004) and masterbayes (Hadfield et al. 2006), amongst others. Likelihood calculations that condition on at least one parent each of a pair of individuals can distinguish between the three types of second‐degree relatives (see 2 in Appendix S1, Supporting information), which is impossible when considering only the genotypes of the two focal individuals and (presumed) unlinked markers (Epstein et al. 2000).
Likelihood maximization
Maximizing the total likelihood over all individuals is challenging, as the number of possible pedigree configurations increases quickly with the number of individuals. A common way to reduce computational cost is to consider only pairwise likelihoods, and find the most likely parent(s) for each individual in turn (e.g. cervus, Marshall et al. 1998). One caveat with this is that close relatives who are not parent and offspring (not PO) may have a higher pairwise likelihood to be PO than to be unrelated (U) and thus a positive log‐likelihood ratio ΛPO/U (Thompson 1986). To put it differently, when PO and U are not the only possible alternatives, rejecting hypothesis U is not equivalent to accepting PO, and ΛPO/U is no longer the most powerful test statistic (the Neyman–Pearson lemma, Anderson & Garza 2006). Consequently, there is often considerable overlap in the distribution of ΛPO/U of true PO pairs and other types of relatives (Thompson & Meagher 1987; Marshall et al. 1998). Those true full‐siblings who have at least one allele in common at every locus have a higher expected ΛPO/U than parent–offspring pairs, but have an even higher expected likelihood to be full‐siblings (Thompson & Meagher 1987). Therefore, while ΛPO/U and ΛPO/FS are necessarily highly correlated, each provides information that the other does not (Thompson 1986).
Thus, one solution to ensure that one indeed maximizes the total likelihood is to calculate for each set of candidate relatives the likelihoods under many possible alternative relationships. This is implicit to Kinship (Goodnight & Queller 1999) and has been implemented for parentage assignment in franz (Riester et al. 2009), and is implemented more comprehensively here. One reason for the limited implementation of this approach with microsatellites is the large computational costs involved with calculating likelihoods of many relationship alternatives over the very large number of possible true genotypes. Moreover, with a typical number of 10–20 microsatellites, it is nearly infeasible to distinguish reliably between the various relationship classes. In contrast, with a large number of SNPs, different relationships can be distinguished reliably.
Inbred and complex bilineal relationships (see Fig. 10) are often excluded from consideration, to keep computations feasible and tractable (Goodnight & Queller 1999; Wang 2004; Jones & Wang 2010; Anderson & Ng 2016) However, pedigree reconstruction in small populations is regularly performed with the specific aim to study the amount of inbreeding. Moreover, in a range of mammal species, female relatives live together and are therefore likely to mate with the same male (Stopher et al. 2012, and references therein). The resulting offspring are related by more than r = 0.25 and can therefore easily be misclassified as full‐siblings when full‐sibling, half‐sibling and unrelated are the only alternatives considered.
Here, I present an algorithm that compares likelihoods for seven different relationship alternatives, including their inbred derivatives, speeded up by steps to exclude unlikely relatives. It (1) assigns parents, (2) clusters sibling groups across multiple cohorts, (3a) assigns grandparents to sibships and singletons and (3b) identifies avuncular links between sibships (Fig. 1), using presumed independent SNPs. Pedigree inference based on the length and distribution of genome segments shared between individuals is theoretically a more powerful approach (Hill & White 2013), but for many species, a reliable linkage map is not (yet) available. Performance of sequoia is illustrated on simulated data sets from three different pedigree structures, and empirical data sets from wild red deer (Cervus elaphus), great tits (Parus major) and domestic pigs (Sus scrofa). I show that several hundred independent SNPs with high minor allele frequency are sufficient to obtain a high assignment rate (>99%) and low error rate (<0.1%).

Methods
Overview
The input format for sequoia is easily obtained from a genotype file in standard plink format (Fig. 2, details in r vignette) and should be provided together with sex and birth year information for the majority of genotyped individuals.

When sequoia is called, first a check for duplicate identities and genotypes is performed to avoid downstream problems. Next, several iterations of parentage assignment are performed, until the total likelihood (defined in Eqn 1 below) asymptotes. This provides a robust, conservative ‘pedigree scaffold’, as distinguishing parents from nonparents has a lower false‐positive rate than distinguishing between various other classes of relatives (see 3). The pedigree scaffold is returned for user inspection, to check for swapped or mislabelled samples, for example. In addition, a list is returned of identified parent–offspring pairs for which polarity could not determined, due to absent or incompatible age or sex information.
Then, clusters of half‐siblings with an unsampled parent are found and assigned a ‘dummy’ parent. Subsequently, parents may get assigned to these dummy individuals, providing pedigree links across generations. This is again done in an iterative fashion. Alternative orders of the various steps were explored but resulted in higher error rates (see Appendix S1, Supporting information).
Biological feasibility of the resulting pedigree is achieved by ensuring that, given the current pedigree, (i) an individual cannot be its own ancestor; (ii) ancestors are born prior to their descendants, or either or both have an unknown birth year; and (iii) the two parents of an individual are of opposite sex, or either one is of unknown sex (i.e. no hermaphrodites or asexual reproduction allowed and thus no selfing).
Filtering steps
Use of opposite homozygosity as a filtering step is a computationally fast method to dramatically reduce the number of potential parent–offspring (PO) pairs (Hill et al. 2008; Hayes 2011; Anderson 2012). By default, a liberal threshold of TOH = 3 + εL is used to avoid exclusion of true PO pairs, where L is the number of loci and ε the per‐locus genotyping error rate. Typically some pairs of non‐PO close relatives will be nonexcluded, particularly full‐sibling (FS) pairs (see Calus et al. 2011).
A second filtering step for parentage assignment, and the only filtering step for the other stages, consists of calculating the log‐likelihood ratio
between the focal relationship R and unrelated U, without conditioning on the parents in the current pedigree to simplify and speed up computations. A liberal, log‐scale negative threshold (the user‐adjustable TFilter) is used to again avoid exclusion of true relatives.
Parentage assignment
For each individual in turn, from earliest born to last born to unknown birth years, all individuals with which the focal individual is nonexcluded as a PO pair and which are older or of unknown age difference are considered as candidate parents, and the likelihoods for the seven alternative relationships are calculated (Table 1,
–
). If the focal relationship R (here PO) has a higher likelihood than the most likely alternative relationship (denoted by ∨ for brevity), by a user‐defined margin Tassign, an assignment is made (ΛR/∨ > Tassign; glossary provided in Table 2). If there are multiple candidate parents, these likelihoods are calculated for all possible opposite‐sex candidate parent pairs and all possible single candidate parents (details in Appendix S1, Supporting information). Parent assignments are made according to the highest likelihood, which may include removal of earlier‐assigned parents. This approach maximizes assignment rate and minimizes the chance that, for example, full‐siblings or double‐grandparents are assigned as parents.
| Relationship | Code | Mean r | |
|---|---|---|---|
| H1 | Parent‐offspring | PO | 1/2 |
| H2 | Full‐siblings | FS | 1/2 |
| H 3 | Half‐siblings | HS | 1/4 |
| Maternal siblings (full or half) | MS | 1/2 or 1/4 | |
| Paternal siblings (full or half) | PS | 1/2 or 1/4 | |
| H4 | Grandparent–grand‐offspring | GG | 1/4 |
| H5 | Full aunt/uncle–niece/nephew | FA | 1/4 |
| H6a | Half aunt/uncle–niece/nephew | HA | 1/8 |
| H6b | Great‐grandparent–great‐grand‐offspring | GGG | 1/8 |
| H6c | Full cousins | CC | 1/8 |
| H0 | Unrelated | U | 0 |
- Double full first cousins (r = 1/4) are currently not explicitly considered
| Definition | |
|---|---|
| A | Focal individual |
| A | Focal sibship (group of half‐siblings) |
| D A | Mother (Dam) of focal individual |
| k | Parent or sibship type; maternal or paternal |
| l | Locus |
| Pε | Genotyping error term |
| PM | Mendelian inheritance term |
| PP | Parental probability term |
| R | Focal relationship |
| SA | Father (Sire) of focal individual |
| Tassign | Threshold ΛR/∨ for assignments |
| Tfilter | Threshold for to differentiate ‘possibly relatives’ from ‘certainly not relatives’
|
| X | Observed genotype |
| x | Actual genotype |
| ∨ | Most likely alternative relationship |
|
Likelihood under H0 |
|
Likelihood ratio, does not condition on current parents |
| ΛR/∨ | Likelihood ratio, does condition on current parents |
Likelihood calculations
of the pedigree configuration P over all N genotyped individuals,
(eqn 1)
(eqn 2)The first term (Pε) is a function of A's actual genotype x and the genotyping error rate ε, which is assumed constant across loci. Details of the genotyping error model are given in Methods in Appendix S1 (Supporting information). The second term (PM) is the probability that individual A inherited actual genotype x from its parents DA and SA, conditional on their actual genotypes y and z. This probability can take values of 0, 1/4, 1/2 and 1. As SNP genotypes can only take three possible values (0, 1 or 2 copies of the minor allele), the likelihood components Pε and PM can be calculated once at initiation and stored in look‐up tables, for increased computational efficiency. In contrast, the parental genotype probabilities PP (the third term) are continuously updated. They give the probability that A's parents carry actual genotypes y and z and come in three different flavours, denoted by a superscripted prefix:

, 2ql(1 − ql) and (1 − ql)2, that is unknown parents are assumed a random draw from the population. When DA is a known genotyped individual, the probabilities for all possible actual genotypes y are calculated conditional on DA's observed genotype Y and its parents, if any. Using that P(A|B) = P(B|A)P(A)/P(B) (Bayes’ theorem) and dropping subscripts l for brevity,
(eqn 3)
and
are grandparents of A. When DA is not genotyped at a particular locus, the term Pε(DA = Y|DA = y) is omitted from Eqn 3, and gPP(DA = y) becomes dependent on the grand‐parental genotypes only. When both
and
are unknown, gPP(DA = y) reduces further to hPP(DA = y|ql). The probability dPP for dummy parents is defined in the section ‘Sibship likelihoods’ (Eqn 5).
Pε, PM and PP can be combined to calculate the likelihood of observing the genotypes of a group of individuals (n ≥ 1) under any relationship configuration. Single‐locus likelihoods are illustrated in Fig. 3 for the special case of two focal individuals A and B, when neither individual has any parent yet assigned. In this case, second‐degree relatives (HS, GG and FA) cannot be distinguished from each other. However, when one can condition on the genotype of a parent or dummy parent of each individual, such a distinction can be made. Details on these likelihood equations, and those for inbred relationships, are given in Appendix S1 (Supporting information).

Sibship clustering
A sibship is here defined as a group of half‐siblings sharing an unsampled parent, containing zero or more sets of full‐siblings. During each iteration of sibship clustering, first all pairs of likely HS and FS are identified using
, followed by calculation of
–
for the pair. These pairs are clustered into sibships using likelihoods calculated over the pair and all putative siblings. Assignments are made when (max (ΛHS/∨, ΛFS/∨) >Tassign. Subsequently during each iteration, all sibships of the same type are considered for merging to minimize erroneous splitting of true sibships, and all individuals who lack a parent of type k are considered for addition to each sibship of type k to maximize assignment rate (Methods in Appendix S1, Supporting information).
Sibship likelihood equations
(eqn 4)
, Si of opposite sex than DA, and sibship A consists of nA full‐sib families. This is a standard expression, used by for example colony (Wang 2004; Eqn 3) and Fullsniplings (Anderson & Ng 2016, implicit). A more general expression allowing for inbreeding (Equation S16 in Appendix S1, Supporting information) is implemented in the algorithm.
(eqn 5)Note that when Si also is a dummy parent, dPP(Si = yi) in Eqn (4) is calculated without the contribution of the joined offspring Ai, to avoid double counting. Most often, the joined likelihood over A and all directly connected sibships is calculated, as PP(Si = yi) will be a function of the presumed genotype of DA, and therefore, the PP(Si = yi)'s of different connected sibships are nonindependent (Methods in Appendix S1, Supporting information).
Parents and grandparents of sibships
Initial parentage assignment may have been incomplete, for example when the true parent has an unknown birth year. Therefore, replacement of dummy parents by genotyped individuals is attempted for all sibships, as well as assignment of parents to singletons, as described above.
Lastly, in each iteration, grandparents are assigned, in a process similar to parentage assignment. This includes potential assignments of the dummy parent of one sibship (say DB) as the grandparent of sibship A, when DB is more likely to be the grandparent of
than related in any of the alternative ways listed in Table 1). To minimize false positives, grandparent assignment to sibship is conducted from the second iteration onwards, and assignment to singletons from the third iteration onwards; this should not prevent assignment of any true grandparents (Results: Algorithm order in Appendix S1, Supporting information). Grand‐offspring–grandparent pairs are treated as sibship clusters with a single member, to which additional siblings may be added in subsequent iterations.
Age information
The age difference between individuals can be highly informative to distinguish between, for example, parents and full‐siblings, or between grandparents and half‐siblings. sequoia makes use of an age‐difference‐based prior, which in its simplest form is an indicator whether a given relationship is possible (1) or not (0) given the age difference between the two individuals. After parentage assignment, the empirical age distribution of fathers and mothers and between maternal and paternal siblings is used as prior to assist subsequent sibship clustering (Methods in Appendix S1, Supporting information). For each hypothesized relationship, the genetic‐based likelihoods are multiplied by these age‐difference‐based prior probabilities, that is genotypes and age differences are treated as independent sources of information. Methods are implemented to deal with missing age or sex information (Methods in Appendix S1, Supporting information).
Assignment confidence
In the returned pedigree, a value ΛPO/∨ is associated with each assigned parent and dummy parent, which is the log10 likelihood ratio between the candidate parent being the parent and the most likely alternative relationship, calculated conditional on all other pedigree links. The ΛPO/∨ for the parent pair is calculated relative to the highest likelihood scenario with one or neither parent assigned. For dummy individuals, a similar approach is followed with respect to the sibship grandparents; calculations are always conditional on all its offspring. Assignment confidence is currently not expressed as a probability, but various post hoc approaches could be considered if these are required (see 3 and 4 in Appendix S1, Supporting information).
Data sets
The algorithm was tested on simulated data sets generated from three different pedigree structures, described below, to give a general indication of assignment and error rates. For each pedigree, after simulation of genotype data (Methods in Appendix S1, Supporting information), a varying proportion of parental genotypes was discarded to assess sibship clustering. For all simulated data sets, 0.5% of per‐locus genotypes were set to missing, and 0.1% were replaced by a random genotype, which is a low but realistic error rate (2, see also Fig. S7 in Appendix S1, Supporting information).
In addition, the algorithm was run on empirical SNP data sets from red deer (Huisman et al. 2016), great tits (Santure et al. 2015), and pigs (Cleveland et al. 2012). In each case, plink (Purcell et al. 2007) was used to select 400–600 SNPs for pedigree inference, with minor allele frequency above 0.4 and in low linkage disequilibrium with each other.
Pedigree I: Full‐sib families
Pedigree I consisted of 1157 genotyped individuals in a single generation, divided over 432 full‐sib families (Table 3) with 1–11 individuals each (mean: 2.68, 143 singletons). It is identical to the pedigree structure used in Anderson & Ng (2016) to compare performance of colony (Wang 2012) and Fullsnplings (Anderson & Ng 2016), and is derived from an empirical salmon data set.
| Pedigree I | Pedigree II | Pedigree III | |
|---|---|---|---|
| Total | 2021 | 1000 | 1998 |
| Mother known | 1157 | 960 | 1642 |
| Father known | 1157 | 960 | 1202 |
| Unique mothers | 432 | 80 | 462 |
| Unique fathers | 432 | 120 | 193 |
- Pedigree I consists of a single generation of full‐sib families, Pedigree II of 5 discrete generations of full‐ and half‐sib families, and Pedigree III is the empirical pedigree of the 17 most recent birth year cohorts of a wild Red deer population.
Pedigree II: Multigenerational half‐sib
The second pedigree mimicked a small closed population and consisted of five nonoverlapping generations, with full‐sib families nested within interconnected half‐sib clusters. Each female mated with two random males and each male with three random females, producing four full‐sib offspring per mating (Fig. 4). Each generation, 24 female and 16 male breeders were drawn at random from the 192 offspring born. Matings between full or half‐siblings were allowed, and average inbreeding coefficient in the fifth generation was 0.053 (range: 0.008–0.289). This artificial pedigree is provided with the r package.

Pedigree III: Red deer
The third set of simulated data sets was based on the empirical pedigree of red deer detailed below. It consists of the last 17 birth year cohorts (1999–2015) and their parents, totalling 1998 individuals.
Empirical data set 1: Insular red deer
The pedigree from the study population of wild red deer on the Isle of Rum is characterized by extensively overlapping generations, matrilineal association of females, and numerous instances of close and moderate inbreeding (Clutton‐Brock et al. 1982). Each breeding season, immigration of males born elsewhere on the island occurs. The previous pedigree was reconstructed based on 9–15 microsatellite markers using masterbayes and colony (Walling et al. 2010), and includes 441 founders and 2340 nonfounders born up until 2012. The SNP data set used consisted of 2572 individuals born up until 2013 genotyped for 37 410 polymorphic autosomal SNPs (Huisman et al. 2016), of which 440 SNPs were used for pedigree inference.
Empirical data set 2: Pig breeding line
This data set was made available for comparing genomic prediction methods (Cleveland et al. 2012), and contained 3534 individuals with genotypes for 52 843 SNPs, of which 652 SNPs were used here. The provided pedigree consisted of 6473 individuals and included the parents and grandparents of genotyped individuals, where known. No birth year information is publicly available; therefore, the generation numbers in the provided pedigree (1 = founders, 2 = offspring of founders, 3 = offspring of g2 or g2 × founders, etc.) were treated as cohorts.
Empirical data set 3: Wild great tit
The second data set was the larger of the two data sets used for a study on the genetic architecture of quantitative traits by Santure et al. (2015), from a open population of great tits in Oxfordshire. It consisted of genotype data for 2497 individuals on 5592 SNPs, of which 488 SNPs were used here. The provided social pedigree included 1035 founders and 1674 nonfounders, and birth year data for 1558 individuals was extracted from the excel file with phenotypic data.
Comparison to other software
sequoia's performance was compared to that of colony 2.0.6.1 (Wang 2013), using its full‐likelihood–pair‐likelihood score combined (FPLS) analysis method, with otherwise default settings: without inbreeding (as recommended by the colony user guide when the inbreeding level is not high), medium run length, weak sibship size priors of 1.0, and with sibship scaling.
In addition, the program franz (Riester et al. 2009) was run, which performs parentage assignment only, optionally assisted by clustering of full‐siblings. Lenient settings were used throughout, with a maximum number of candidate parents of 500, reproductive ages of females and males between 1 and 20, and otherwise default settings.
Lastly, exclusion based on the number of opposing homozygous loci was evaluated as a parentage assignment method, assigning the first nonexcluded parent of each sex. The same allowance for genotyping errors was used as in sequoia, of maximum 3 mismatching loci.
Assignment and error rates
The assignment rate (AR) for the simulated data sets was calculated as the number of individuals with a correctly inferred parent, divided by Nk, the number of individuals with a parent of sex k in the true pedigree, averaged over maternal and paternal links. The error rate (ER) was calculated as the fraction of the total number of individuals (founders + nonfounders) with an incorrectly assigned parent. A sibship parent, say dummy father, was deemed correct if the majority (>50%) of inferred paternal siblings (PS) were true PS. For both erroneous merging and erroneous nonmerging, the error count equalled the size of the smaller of the two sibships.
Results
Distribution of Λ
Simulated distributions of ΛPO/∨ showed a clearer divide between true PO pairs and non‐PO pairs than did ΛPO/U (Fig. 5, left panels). A similar pattern is apparent for FS (middle), and HS (right), although the latter shows less clear separation. Note that when both parents of both individuals are unknown, no HS assignments can be made as it is impossible to distinguish between maternal HS, paternal HS, FA and GP.

The thresholds for an optimal trade‐off between AR and ER will depend on the proportions of different categories of relatives in the sample, which by definition are not known a priori, as well as the number of SNPs and their allele frequencies. Initial explorations showed that for the three different types of simulated data sets and 200 SNPs, results were largely insensitive to varying TFilter between −3 and −1, while a value of TAssign = + 0.5 gave the best overall trade‐off between AR and ER (Appendix S1, Supporting information). Results will be shown using the same thresholds across all simulations, of TFilter = −2 and TAssign = + 0.5.
Parentage assignment
When all individuals are genotyped, assignment rates are high (AR > 99.8% in pedigrees I and II) and error rates low (ER < 0.1%) when at least 100 SNPs are used (Fig. 6, Table S4 in Appendix S1, Supporting information). When using over 400 SNPs, opposite‐homozygosity‐based exclusion (OH‐Exclusion) performs similar to sequoia (ER < 0.1%), in a fraction of the time. franz is somewhat slower than sequoia, but the difference is negligible compared to for example masterbayes (Hadfield et al. 2006) which takes many hours for a data set of similar size (C. Berenos, pers. comm.). In Pedigree III, some parents with unknown birth year are never assigned by sequoia or OH‐Exclusion, while franz appears less conservative, resulting in higher AR but also higher ER. Performance of franz in pedigree II was unchanged or worsened when using the option to assist parentage assignment by clustering of full‐siblings (Fig. S6 in Appendix S1, Supporting information).

Full‐sib clustering
Clustering of full‐sib families within a single generation, without any parental genotypes, gave high ARs (>98.4%) and low ER (<0.1%) when at least 200 SNPs were used with sequoia, but ER was consistently higher than for colony (Fig. 7). Even at high marker numbers sequoia erroneously inferred some FS as HS (Fig. S8, see 7 in Appendix S1, Supporting information). Both colony and sequoia performed better when a monogamous breeding system was assumed (grey filled symbols in Fig. 7; Fig. S9 in Appendix S1, Supporting information).

Combination of parentage assignment and sibship clustering
The combination of parentage assignment, sibship clustering and grandparent assignment resulted in reconstruction of 99% of parent–offspring links in Pedigree II when at least 20% of parental genotypes was treated as known (Fig. 8). When simulating 60% of parental genotypes as known, AR was somewhat lower in Pedigree III at 86%–89% (Table 4), partly because for some identified likely HS it could not be determined whether they were paternal or maternal half‐siblings, or FA. Additionally, when generations overlap and one of a pair of individuals truly is a founder, sequoia cannot differentiate between GG or FA, which would require that one parent is already assigned to each individual. AR for parentage assignment (e.g. franz) is necessarily limited by the number of PO pairs where both are genotyped, while the upper limit for colony is determined by the number of dummy individuals (=number of sibships), to which it does not assign parents.

| Pedigree | SNPs | Assignment rate | Error rate | Computational timeaa
On a laptop with a quadcore intel i7 2.3 GHz processor and 8 GB RAM.
|
||||||
|---|---|---|---|---|---|---|---|---|---|---|
| franz | sequoia |
colony
bb
AR = within‐cohort AR – 0.042, to take into account that no grandparents are assigned to the on average 48 + 32 sibships (see data set description).
|
franz | sequoia | colony | franz | sequoia | colony | ||
| II | 75 | 0.550 | 0.802 | 0.951 | 8.38E−2 | 4.41E−2 | <4.5E−4 | 01:44 | 03:45 | 2:23:00 |
| 100 | 0.561 | 0.927 | 0.955 | 5.17E−2 | 1.42E−2 | <4.5E−4 | 01:03 | 02:54 | 2:36:00 | |
| 200 | 0.559 | 0.989 | 0.958 | 1.37E−2 | 1.25E−3 | <4.5E−4 | 00:36 | 01:53 | 4:34:00 | |
| 400 | 0.559 | 0.993 | 6.50E−4 | <5.0E−5 | 00:42 | 01:51 | ||||
| 800 | 0.564 | 0.991 | <5.0E−5 | <5.0E−5 | 01:13 | 03:59 | ||||
| III | 75 | 0.540 | 1.25E−1 | 04:12 | ||||||
| 100 | 0.546 | 0.725 | 6.62E−2 | 2.40E−2 | 02:11 | 49:32 | ||||
| 200 | 0.554 | 0.861 | 1.38E−2 | 1.48E−3 | 00:52 | 27:19 | ||||
| 400 | 0.549 | 0.888 | 2.98E−3 | 6.51E−4 | 00:50 | 27:38 | ||||
| 800 | 0.555 | 0.894 | 2.20E−3 | 7.26E−4 | 01:28 | 57:30 | ||||
- For franz (parentage only) and sequoia, averages over 10 simulations are given, and for colony (polygamous), numbers are extrapolated from running on generations 1 and 5 (founders = 0) for three replicates. Times in minutes: seconds for franz and sequoia, and hours: minutes: seconds for colony.
- a On a laptop with a quadcore intel i7 2.3 GHz processor and 8 GB RAM.
- b AR = within‐cohort AR – 0.042, to take into account that no grandparents are assigned to the on average 48 + 32 sibships (see data set description).
Error rates for sequoia were low when at least 200 SNPs were used (0.1%–0.3%) and were undetectably low for colony (Table 4) despite both data sets containing closely related parents, which is not explicitly dealt with by this program. Computational time had a minimum around 200 SNPs, increased approximately quadratically with the number of individuals (Fig. S10 in Appendix S1, Supporting information), and was considerably longer for the more complex Pedigree III. A slight increase in ER with increased pedigree size and depth (to 0.26%) and with decreased proportion of genotyped parents (to ER= 0.9%) was observed (Fig. S10 in Appendix S1, Supporting information).
For Pedigree II and 200 SNPs, ER increased and AR decreased approximately exponentially with an increase in simulated genotyping error rate (Fig. S7 in Appendix S1, Supporting information).
Empirical data sets
As a proxy for the true pedigree relatedness between pairs of individuals, the genomic relatedness rgrm as estimated by gcta (Yang et al. 2011) from all 40 000–50 000 SNPs was used. For each of the three data sets, the relatedness estimated from the sequoia‐reconstructed pedigree (rped, sequoia) was more strongly correlated to rgrm than rped, FRANz (Table 5). Note that correlations differed between the three data sets not only due to the pedigree accuracy, but also due to the proportion of close relatives in the sample (see Fig. 9; if fewer pairs were closely related, the correlation would be lower) and the amount of Mendelian variance, determined by the number and size of chromosomes. Correlations were lowest in the pig data set, amongst others because maternal siblings were often fully nested within paternal sibling groups, which cannot be differentiated from paternal siblings nested within maternal sibling groups when none of the parents are genotyped (see also 4). Correlations between rped, sequoia and rped, provided ranged from 0.72 for the red deer data set to 0.87 in the great tits and 0.89 in the pigs.
| Pedigree | cor(rgrm, rped) | rped − rgrm < −0.2 | rped − rgrm > 0.2 | ||||||
|---|---|---|---|---|---|---|---|---|---|
| Deer | Pig | Tits | Deer | Pig | Tits | Deer | Pig | Tits | |
| Providedaa
Correlation over genotyped individuals present in the pedigree only.
|
0.66 | 0.55 | 0.70 | 270 | 1700 | 110 | 20 | 4.1 | 6.1 |
| Providedbb
Assuming that individuals not present in the pedigree are unrelated to all others.
|
0.45 | 0.55 | 0.56 | 550 | 1700 | 110 | 8.5 | 4.1 | 2.9 |
| franz | 0.72 | 0.34 | 0.53 | 130 | 2500 | 140 | 0.37 | 0.13 | 3.2 |
| sequoia | 0.81 | 0.47 | 0.64 | 5.3 | 2200 | 71 | 0.091 | 4.5 | 0.096 |
- a Correlation over genotyped individuals present in the pedigree only.
- b Assuming that individuals not present in the pedigree are unrelated to all others.

The fraction of pairs with a much higher rped than rgrm provides a rough estimate of ER, and was consistently lower for sequoia than for franz, and lower than the provided pedigree for the two wild species. The pattern for the fraction of pairs with much higher rgrm than rped (likely but nonassigned relatives) showed a similar pattern across data sets and pedigrees (Table 5). Note that pairwise AR and ER are not directly comparable to the per‐individual AR and ER reported elsewhere in the Results, as a single erroneous assignment typically results in erroneous rped between multiple pairs (see also Fig. S8 in Appendix S1, Supporting information).
As illustrated for the red deer data set (Fig. 9), rgrm was more closely correlated to rped, sequoia than to the genomic relatedness estimated from the 440 SNPs used for pedigree reconstruction. This may partly be an artefact of the different average allele frequencies in the two sets of markers, but is probably largely due to Mendelian noise. It suggests that when only a few hundred SNP markers are available, it can be better to estimate quantitative genetic parameters using rped than rgrm.
Discussion
sequoia enables pedigree inference even with complex mating structures, extensively overlapping generations and inbreeding. Parentage assignment performs very well down to about 100 independent highly informative SNPs, while for subsequent sibship clustering, at least a few hundred SNPs are required. For these marker numbers, false‐positive rates in the simulated data sets are low (<0.1%) and assignment rates high (>99%). As for any software, performance in real data sets will be somewhat lower, but results in three empirical data sets are favourable compared to existing pedigrees and parentage assignment only.
Comparison to other methods
The main difference in approach between sequoia and most other methods is that a high likelihood solution is found in a handful of iterations, rather than the tens of thousands of iterations typical of MCMC approaches. sequoia's sequential, heuristic method requires a conservative approach to assignments, which results in lower AR than colony under identical conditions. There is also some loss of accuracy, but this can be overcome using a few hundred extra SNPs. When less than approximately 200 independent, high frequency SNPs are available, due to a small genome size or for budgetary reasons, the methods initially developed for a dozen or so microsatellite markers still perform best. For limited marker numbers, Mendelian noise can be substantial, and as a result, the true configuration may not be amongst those with the highest partial likelihood, violating a core assumption underlying sequoia. The true pedigree will typically still have the highest global likelihood, which can be more easily found by MCMC or simulated annealing algorithms such as colony, than by a hill‐climbing algorithm such as sequoia.
Parentage assignment
When interest is solely in parentage assignment, sequoia performs intermediately between opposite‐homozygosity‐based exclusion and franz (Riester et al. 2009). The former performs very well when a large number of markers is available, although allowing for genotyping errors creates room for false‐positive assignments (Strucken et al. 2015). franz explicitly deals with genotyping errors and makes use of birth year, death year and gender information, but is less conservative than sequoia when this life‐history information is lacking for some individuals. Note that while franz performs clustering of full‐siblings, it does so only to support parentage assignment, and in a less integrated way than sequoia.
Sibships
It has been observed that likelihood scores tend to favour more complex explanations (Thomas & Hill 2002; Almudevar 2007), resulting in splitting true sibling groups (Almudevar 2007) as well as creation of spurious sibling groups (Anderson & Ng 2016). With sequoia, the number of unrelated pairs spuriously inferred as HS or FS was orders of magnitude lower than nonassignment of true siblings (Fig. S8 in Appendix S1, Supporting information). Nonassignment in Pedigree I was predominantly due to a limited likelihood difference for true full‐siblings to be FS (r = 1/2) versus paternal HS and maternally related as HA or CC, for example (r = 1/4 + 1/8, Fig. 10). Such configurations might be comparatively common in some species, but very rare in others. A priori estimates of the fraction of pairs in each type of relationship (PO, FS, HS, CC, etc.) are likely to lessen this problem, as implemented in franz (Riester et al. 2009) and snppit (Anderson 2012). Assuming a monogamous breeding system could be seen as a special case of this and did indeed improve performance. However, in real data sets, there is typically no a priori certainty about monogamy.

In the empirical red deer data set, sequoia identified many paternal half‐sib links across cohorts, which cannot be identified with per‐cohort sibship clustering using colony. Several birth year cohorts may be analysed together using a sliding‐window approach, but combining the results into a single pedigree is hindered by the presence of erroneous sibship clusters, and the lack of concordance between a sibship's posterior probability and its correctness (Anderson & Ng 2016). More generally, separate reconstruction within each cohort may lead to biologically impossible pedigrees when combining results (Taylor et al. 2015) and complicates inclusion of individuals with unknown birth year. In the red deer example, immigrant males were never considered as offspring during paternity assignment, but sequoia identified various paternal links between immigrants.
Potential caveats
Real‐world data sets are often incomplete and imperfect, especially those for wild populations. For example, birth years may be unknown for many individuals, as was the case for the great tit data set. Nonetheless, the pedigree reconstructed by sequoia showed strong correlation with rgrm, and 81 unique fathers were assigned despite unknown hatching year. In such cases, lists of per‐cohort candidate parents, such as used by masterbayes and colony, may be more convenient than estimating birth years, although great care should be taken to not inadvertently leave out the true parent. Candidate parent lists allow implicit incorporation of data on year of death, when available, which is used explicitly by franz but currently cannot be used by sequoia. Note as well that uncertainty around birth year estimates is currently not accounted for by sequoia, although parent–offspring pairs with impossible or unknown age differences will be flagged.
One general problem with pedigree reconstruction is the differentiation between maternal and paternal relatives. For example for a full‐sib family (n ≥ 1) and in absence of parental genotypes, it is impossible to distinguish between maternal and paternal HS. Programs incorporating prior information on observation‐based parents are then preferred, such as colony (Wang 2012). I am not aware of any programs that incorporate data on sex‐linked markers, which would provide an alternative way to differentiate between maternal and paternal relatives.
Currently, no confidence probability is attached to sequoia's assignments, but various methods to estimate these exist (3 and discussion in Appendix S1, Supporting information). For example, one may simulate many SNP data sets according to the observed allele frequencies and inferred pedigree, and count the mismatches between pedigrees reconstructed from simulated data and the initial pedigree, similar to cervus (Marshall et al. 1998). To this end, and to investigate the sensitivity of pedigree inference to specific relatedness structures or various other properties of real‐world data sets, including genotyping errors, the r package includes functions to simulate genotypes of unlinked SNPs through any pedigree and to count mismatches between two pedigrees. The sensitivity to various aspects is likely to be data set specific and therefore not explored in detail here.
Acknowledgements
I wish to thank JM Pemberton for the liberty to pursue this project and use of the red deer data, the International Centre for Mathematical Sciences for organizing an inspirational workshop on pedigree reconstruction, J Hickey for motivating me to relearn Fortran and A Santure for answering my queries regarding the great tit data. JM Pemberton, T Ellis, JD Hadfield, WG Hill and three anonymous reviewers provided many constructive comments that substantially improved the manuscript. This work was funded by ERC (ERC‐2009‐AdG) and NERC (NE/L00688X/1) grants to JM Pemberton.
References
Data accessibility
The r package is available on cran (CRAN.R‐project.org/package=sequoia) and includes a pedigree and life‐history file for Pedigree II, a user manual, and the Fortran source code. The red deer SNP data are available from github.com/JiscaH/Manuscripts‐Papers, the great tit data were retrieved from datadryad.org/resource/doi:10.5061/dryad.5t32v (UK part), and the pig data were from www.g3journal.org/content/suppl/2012/04/06/2.4.429.DC1.
Citing Literature
Number of times cited according to CrossRef: 45
- Bridgett M. vonHoldt, Alexandra L. DeCandia, Elizabeth Heppenheimer, Ilana Janowitz‐Koch, Ruoyao Shi, Hua Zhou, Christopher A. German, Kristin E. Brzeski, Kira A. Cassidy, Daniel R. Stahler, Janet S. Sinsheimer, Heritability of interpack aggression in a wild pedigreed population of North American grey wolves, Molecular Ecology, 10.1111/mec.15349, 29, 10, (1764-1775), (2020).
- Maria G. Nayfa, David B. Jones, Curtis E. Lind, John A.H. Benzie, Dean R. Jerry, Kyall R. Zenger, Pipette and paper: Combining molecular and genealogical methods to assess a Nile tilapia (Oreochromis niloticus) breeding program, Aquaculture, 10.1016/j.aquaculture.2020.735171, (735171), (2020).
- Laura Gervais, Aidan J. M. Hewison, Nicolas Morellet, Maria Bernard, Joël Merlet, Bruno Cargnelutti, Yannick Chaval, Benoit Pujol, Erwan Quéméré, Pedigree‐free quantitative genetic approach provides evidence for heritability of movement tactics in wild roe deer, Journal of Evolutionary Biology, 10.1111/jeb.13594, 33, 5, (595-607), (2020).
- Keats R. Conley, Jonathan D. Ebel, John S. Hargrove, Wyatt Petersen, Lytle P. Denny, In‐Stream Egg Incubators Produce Hatchery Chinook Salmon with Similarities to and Differences from Natural Juveniles, North American Journal of Fisheries Management, 10.1002/nafm.10409, 40, 1, (256-277), (2020).
- Anubhab Khan, Kaushalkumar Patel, Subhadeep Bhattacharjee, Sudarshan Sharma, Anup N. Chugani, Karthikeyan Sivaraman, Vinayak Hosawad, Yogesh Kumar Sahu, Goddilla V. Reddy, Uma Ramakrishnan, Are shed hair genomes the most effective noninvasive resource for estimating relationships in the wild?, Ecology and Evolution, 10.1002/ece3.6157, 10, 11, (4583-4594), (2020).
- Julie Gauzere, Josephine M. Pemberton, Sean Morris, Alison Morris, Loeske E. B. Kruuk, Craig A. Walling, The genetic architecture of maternal effects across ontogeny in the red deer, Evolution, 10.1111/evo.14000, 74, 7, (1378-1391), (2020).
- Shi‐Yi Chen, Cao Li, Zhihao Luo, Xiaowei Li, Jia Gan, Xianbo Jia, Song‐Jia Lai, Wei Wang, Genotyping‐free parentage assignment using RAD‐seq reads, Ecology and Evolution, 10.1002/ece3.6503, 10, 14, (7783-7791), (2020).
- Emily Humble, Anneke J. Paijmans, Jaume Forcada, Joseph I. Hoffman, An 85K SNP Array Uncovers Inbreeding and Cryptic Relatedness in an Antarctic Fur Seal Breeding Colony, G3: Genes|Genomes|Genetics, 10.1534/g3.120.401268, 10, 8, (2787-2799), (2020).
- Carmen Piza-Roca, David Schoeman, Celine Frere, Fitness benefits of male dominance behaviours depend on the degree of individual inbreeding in a polyandrous lizard, Proceedings of the Royal Society B: Biological Sciences, 10.1098/rspb.2020.0097, 287, 1927, (20200097), (2020).
- Jacek Maselko, Kimberly R. Andrews, Paul A. Hohenlohe, Long‐lived marine species may be resilient to environmental variability through a temporal portfolio effect, Ecology and Evolution, 10.1002/ece3.6378, 10, 13, (6435-6448), (2020).
- Katherine A. Farquharson, Carolyn J. Hogg, Katherine Belov, Catherine E. Grueber, Deciphering genetic mate choice: Not so simple in group‐housed conservation breeding programs, Evolutionary Applications, 10.1111/eva.12981, 13, 9, (2179-2189), (2020).
- E. A. McLennan, C. E. Grueber, P. Wise, K. Belov, C. J. Hogg, Mixing genetically differentiated populations successfully boosts diversity of an endangered carnivore, Animal Conservation, 10.1111/acv.12589, 0, 0, (2020).
- Alina K. Niskanen, Anna M. Billing, Håkon Holand, Ingerid J. Hagen, Yimen G. Araya-Ajoy, Arild Husby, Bernt Rønning, Ane Marlene Myhre, Peter Sjolte Ranke, Thomas Kvalnes, Henrik Pärn, Thor Harald Ringsby, Sigbjørn Lien, Bernt-Erik Sæther, Stefanie Muff, Henrik Jensen, Consistent scaling of inbreeding depression in space and time in a house sparrow metapopulation, Proceedings of the National Academy of Sciences, 10.1073/pnas.1909599117, (201909599), (2020).
- Kaylee A. Byers, Tom R. Booker, Matthew Combs, Chelsea G. Himsworth, Jason Munshi‐South, David M. Patrick, Michael C. Whitlock, Using genetic relatedness to understand heterogeneous distributions of urban rat‐associated pathogens, Evolutionary Applications, 10.1111/eva.13049, 0, 0, (2020).
- Samantha R. Mulvena, Jennifer C. Pierson, Katherine A. Farquharson, Elspeth A. McLennan, Carolyn J. Hogg, Catherine E. Grueber, Investigating inbreeding in a free-ranging, captive population of an Australian marsupial, Conservation Genetics, 10.1007/s10592-020-01278-2, (2020).
- Caitlin E. Morrison, Rebecca N. Johnson, Catherine E. Grueber, Carolyn J. Hogg, Genetic impacts of conservation management actions in a critically endangered parrot species, Conservation Genetics, 10.1007/s10592-020-01292-4, (2020).
- Jared Crain, Steve Larson, Kevin Dorn, Traci Hagedorn, Lee DeHaan, Jesse Poland, Sequenced-based paternity analysis to improve breeding and identify self-incompatibility loci in intermediate wheatgrass (Thinopyrum intermedium), Theoretical and Applied Genetics, 10.1007/s00122-020-03666-1, (2020).
- Nicola Kent, Romane H Cristescu, Carme Piza-Roca, Bethan L Littleford-Colquhoun, Kasha Strickland, Céline H Frère, Maternal nesting behaviour in city dragons: a species with temperature-dependent sex determination, Journal of Urban Ecology, 10.1093/jue/juz005, 5, 1, (2019).
- Sarah P. Flanagan, Adam G. Jones, The future of parentage analysis: From microsatellites to SNPs and beyond, Molecular Ecology, 10.1111/mec.14988, 28, 3, (544-567), (2019).
- Lídia Escoda, Ángel Fernández‐González, Jose Castresana, Quantitative analysis of connectivity in populations of a semi‐aquatic mammal using kinship categories and network assortativity, Molecular Ecology Resources, 10.1111/1755-0998.12967, 19, 2, (310-326), (2019).
- Brenna A. Levine, Marlis R. Douglas, Amy A. Yackel Adams, Björn Lardner, Robert N. Reed, Julie A. Savidge, Michael E. Douglas, Genomic pedigree reconstruction identifies predictors of mating and reproductive success in an invasive vertebrate, Ecology and Evolution, 10.1002/ece3.5694, 9, 20, (11863-11877), (2019).
- Darren C Hunter, Josephine M Pemberton, Jill G Pilkington, Michael B Morrissey, Pedigree-Based Estimation of Reproductive Value, Journal of Heredity, 10.1093/jhered/esz033, 110, 4, (433-444), (2019).
- Stefanie Muff, Alina K. Niskanen, Dilan Saatoglu, Lukas F. Keller, Henrik Jensen, Animal models with group-specific additive genetic variances: extending genetic group models, Genetics Selection Evolution, 10.1186/s12711-019-0449-7, 51, 1, (2019).
- H.K.A. Premachandra, Nguyen H. Nguyen, Wayne Knibb, Effectiveness of SNPs for parentage and sibship assessment in polygamous yellowtail kingfish Seriola lalandi, Aquaculture, 10.1016/j.aquaculture.2018.09.022, 499, (24-31), (2019).
- Laura Gervais, Charles Perrier, Maria Bernard, Joël Merlet, Josephine M. Pemberton, Benoit Pujol, Erwan Quéméré, RAD‐sequencing for estimating genomic relatedness matrix‐based heritability in the wild: A case study in roe deer, Molecular Ecology Resources, 10.1111/1755-0998.13031, 19, 5, (1205-1217), (2019).
- Vivienne Foroughirad, Alexis L. Levengood, Janet Mann, Céline H. Frère, Quality and quantity of genetic relatedness data affect the analysis of social structure, Molecular Ecology Resources, 10.1111/1755-0998.13028, 19, 5, (1181-1194), (2019).
- Katherine A. Farquharson, Carolyn J. Hogg, Catherine E. Grueber, A case for genetic parentage assignment in captive group housing, Conservation Genetics, 10.1007/s10592-019-01198-w, (2019).
- G. Schiavo, F. Bertolini, G. Galimberti, S. Bovo, S. Dall’Olio, L. Nanni Costa, M. Gallo, L. Fontanesi, A machine learning approach for the identification of population-informative markers from high-throughput genotyping data: application to several pig breeds, animal, 10.1017/S1751731119002167, (1-10), (2019).
- Timothée Bonnet, Michael B. Morrissey, Alison Morris, Sean Morris, Tim H. Clutton-Brock, Josephine M. Pemberton, Loeske E. B. Kruuk, The role of selection and evolution in changing parturition date in a red deer population, PLOS Biology, 10.1371/journal.pbio.3000493, 17, 11, (e3000493), (2019).
- Alexandra M. Sparks, Kathryn Watt, Rona Sinclair, Jill G. Pilkington, Josephine M. Pemberton, Tom N. McNeilly, Daniel H. Nussey, Susan E. Johnston, The genetic architecture of helminth-specific immune responses in a wild population of Soay sheep (Ovis aries), PLOS Genetics, 10.1371/journal.pgen.1008461, 15, 11, (e1008461), (2019).
- Mario P L Calus, Jérémie Vandenplas, Ina Hulsegge, Randy Borg, John M Henshall, Rachel Hawken, Assessment of sire contribution and breed-of-origin of alleles in a three-way crossbred broiler dataset, Poultry Science, 10.3382/ps/pez458, (2019).
- Tetsuya Akita, Nearly unbiased estimator of contemporary effective mother size using within-cohort maternal sibling pairs incorporating parental and nonparental reproductive variations, Heredity, 10.1038/s41437-019-0271-6, (2019).
- Andrew Whalen, Gregor Gorjanc, John M. Hickey, Parentage assignment with genotyping‐by‐sequencing data, Journal of Animal Breeding and Genetics, 10.1111/jbg.12370, 136, 2, (102-112), (2018).
- Susan E. Johnston, Jisca Huisman, Josephine M. Pemberton, A Genomic Region Containing REC8 and RNF212B Is Associated with Individual Recombination Rate Variation in a Wild Population of Red Deer ( Cervus elaphus ) , G3: Genes|Genomes|Genetics, 10.1534/g3.118.200063, 8, 7, (2265-2276), (2018).
- Hugo Cayuela, Quentin Rougemont, Jérôme G. Prunier, Jean‐Sébastien Moore, Jean Clobert, Aurélien Besnard, Louis Bernatchez, Demographic and genetic approaches to study dispersal in wild animal populations: A methodological review, Molecular Ecology, 10.1111/mec.14848, 27, 20, (3976-4010), (2018).
- Michael B. Morrissey, Maria João Janeiro, Alexandra M. Sparks, Stephen White, Gabriel Pigeon, Céline Teplitsky, Denis Réale, Emmanuel Milot, Into the wild—WAMBAM goes to Canada, Molecular Ecology, 10.1111/mec.14510, 27, 5, (1098-1102), (2018).
- Eva Troianou, Jisca Huisman, Josephine M. Pemberton, Craig A. Walling, Estimating selection on the act of inbreeding in a population with strong inbreeding depression, Journal of Evolutionary Biology, 10.1111/jeb.13376, 31, 12, (1815-1827), (2018).
- Derrick J. Thrasher, Bronwyn G. Butcher, Leonardo Campagna, Michael S. Webster, Irby J. Lovette, Double‐digest RAD sequencing outperforms microsatellite loci at assigning paternity and estimating relatedness: A proof of concept in a highly promiscuous bird, Molecular Ecology Resources, 10.1111/1755-0998.12771, 18, 5, (953-965), (2018).
- Lauren Lee, Nathan Tirrell, Christian Burrell, Scott Chambers, Steve Vogel, Eric T. Domyan, Genetic tests reveal extra‐pair paternity among Gentoo penguins (Pyogoscelis papua ellsworthii) at Loveland Living Planet Aquarium: Implications for ex situ colony management, Zoo Biology, 10.1002/zoo.21432, 37, 4, (236-244), (2018).
- C. Perrier, B. Delahaie, A. Charmantier, Heritability estimates from genomewide relatedness matrices in wild populations: Application to a passerine, using a small sample size, Molecular Ecology Resources, 10.1111/1755-0998.12886, 18, 4, (838-853), (2018).
- Kirsty Lee Moore, Conrad Vilela, Karolina Kaseja, Raphael Mrode, Mike Coffey, Forensic use of the genomic relationship matrix to validate and discover livestock pedigrees, Journal of Animal Science, 10.1093/jas/sky407, (2018).
- Alexandra M. Sparks, Kathryn Watt, Rona Sinclair, Jill G. Pilkington, Josephine M. Pemberton, Susan E. Johnston, Tom N. McNeilly, Daniel H. Nussey, Natural Selection on Antihelminth Antibodies in a Wild Mammal Population, The American Naturalist, 10.1086/700115, (000-000), (2018).
- Sangeet Lamichhaney, Fan Han, Matthew T. Webster, Leif Andersson, B. Rosemary Grant, Peter R. Grant, Rapid hybrid speciation in Darwin’s finches, Science, 10.1126/science.aao4593, 359, 6372, (224-228), (2017).
- H. Froy, E. J. Bird, R. V. Wilbourn, J. Fairlie, S. L. Underwood, E. Salvo-Chirnside, J. G. Pilkington, C. Bérénos, J. M. Pemberton, D. H. Nussey, No evidence for parental age effects on offspring leukocyte telomere length in free-living Soay sheep, Scientific Reports, 10.1038/s41598-017-09861-3, 7, 1, (2017).
- Susan E. Johnston, Jisca Huisman, Philip A. Ellis, Josephine M. Pemberton, A High-Density Linkage Map Reveals Sexual Dimorphism in Recombination Landscapes in Red Deer ( Cervus elaphus ) , G3: Genes|Genomes|Genetics, 10.1534/g3.117.044198, 7, 8, (2859-2870), (2017).



to differentiate ‘possibly relatives’ from ‘certainly not relatives’

