SEARCH

SEARCH BY CITATION

Keywords:

  • microsatellites;
  • molecular methods;
  • relationships;
  • gene flow;
  • heritability;
  • selection

Abstract

  1. Top of page
  2. Abstract
  3. Introduction
  4. Uses in animal conservation studies
  5. An overview of pedigree inference
  6. Relatedness
  7. Pairwise methods
  8. Full-likelihood methods
  9. Common assumptions of pedigree inference methods
  10. Complete sampling of population
  11. Large and randomly mating population
  12. Linkage equilibrium
  13. Error and mutation
  14. Recent developments
  15. Inclusion of field observation data in the genetic framework
  16. Conclusion
  17. Acknowledgements
  18. References
  19. Supporting Information

Pedigrees, depicting the genealogical relationships between individuals in a population, are of fundamental importance to several research areas including conservation biology. For example, they are useful for estimating inbreeding, heritability, selection, studying kin selection and for measuring gene flow between populations. Pedigrees constructed from direct observations of reproduction are usually unavailable for wild populations. Therefore, pedigrees for these populations are usually estimated using molecular marker data. Despite their obvious importance, and the fact that pedigrees are conceptually well understood, the methods, and limitations of marker-based pedigree inference are often less well understood. Here we introduce animal conservation biologists to molecular marker-based pedigrees. We briefly describe the history of pedigree inference research, before explaining the underlying theory and basic mechanics of pedigree construction using standard methods. We explain the assumptions and limitations that accompany many of these methods, before going on to explain methods that relax several of these assumptions. Finally, we look to future and discuss some recent exciting advances such as the use of single-nucleotide polymorphisms, inference of multigenerational pedigrees and incorporation of non-genetic data such as field observations into the calculations. We also provide some guidelines on efficient marker selection in order to maximize accuracy and power. Throughout we use examples from the field of animal conservation and refer readers to appropriate software where possible. It is our hope that this review will help animal conservation biologists to understand, choose, and use the methods and tools of this fast-moving field.


Introduction

  1. Top of page
  2. Abstract
  3. Introduction
  4. Uses in animal conservation studies
  5. An overview of pedigree inference
  6. Relatedness
  7. Pairwise methods
  8. Full-likelihood methods
  9. Common assumptions of pedigree inference methods
  10. Complete sampling of population
  11. Large and randomly mating population
  12. Linkage equilibrium
  13. Error and mutation
  14. Recent developments
  15. Inclusion of field observation data in the genetic framework
  16. Conclusion
  17. Acknowledgements
  18. References
  19. Supporting Information

Pedigrees, depicting the genealogical relationships between individuals in a population, are one of the best-understood biological concepts. They are of fundamental importance to research areas including conservation biology. For example, they can be used in making estimates of inbreeding (e.g. Collevatti et al., 2007; Blackmore & Heinsohn, 2008; Richard et al., 2009), heritability (e.g. Garant & Kruuk, 2005; Charmantier et al., 2006) and gene flow (Zeyl et al., 2009). They are also important for examining captive populations (e.g. Nielsen, Pertoldi & Loeschcke, 2007) and inferring breeding behaviour.

Pedigrees from direct observations of reproduction are rare for wild populations. Instead workers use molecular marker data, derived from samples taken from the animals themselves or from scat, fur or feathers, to infer relationships statistically. Despite the importance of these pedigrees and the fact that they are, in principle, well understood, the methods and limitations of their construction are often less well understood. With this review we will shed light on these processes and explain the mechanics of their construction, and their assumptions and limitations with particular reference to conservation biology.

Uses in animal conservation studies

  1. Top of page
  2. Abstract
  3. Introduction
  4. Uses in animal conservation studies
  5. An overview of pedigree inference
  6. Relatedness
  7. Pairwise methods
  8. Full-likelihood methods
  9. Common assumptions of pedigree inference methods
  10. Complete sampling of population
  11. Large and randomly mating population
  12. Linkage equilibrium
  13. Error and mutation
  14. Recent developments
  15. Inclusion of field observation data in the genetic framework
  16. Conclusion
  17. Acknowledgements
  18. References
  19. Supporting Information

Molecular marker-based pedigrees are of great importance in evolutionary and conservation biology. One area where pedigrees are particularly useful is in the study of wild animal mating systems where direct observation of mating is impossible or misleading. Such studies are fundamental to our understanding of life-history strategies, and have important implications for conservation biology.

Before the advent of molecular marker-based pedigrees, accurate estimates of extra-pair paternities (EPPs) were impossible and studies of cooperative breeding were similarly handicapped. Now, numerous studies have used molecular marker-based pedigrees and these indicate that EPPs are common in nature (Akcay & Roughgarden, 2007; Cohas et al., 2007). We now know that relationships in social pedigrees cannot be trusted and genetic measures such as selection or heritability derived from such pedigrees may be biased (Charmantier & Reale, 2005).

An interesting example is provided by Gottelli et al. (2007), who examined the mating behaviour of cheetahs Acinonyx jubatus in the Serengeti National Park, Tanzania. Generally, male reproductive success is expected to increase with multiple mating and it is, therefore, usually assumed that males are promiscuous while females, for whom the benefits of multiple mating are not obvious, are coy. However, their paternity analysis demonstrated that a high proportion of litters included offspring of more than one male indicating that, for cheetahs, female promiscuity is high. They note that a large proportion of the paternities are inferred to be from unsampled males from outside of the study area and conclude that promiscuity is a strategy to minimize inbreeding. Carpenter et al. (2005) also made use of a molecular marker-based pedigree in their recent study of the Eurasian badger Meles meles mating system. They also found that badgers exhibit high levels of extra-group matings, attributed to inbreeding avoidance, and suggest that the tactic could help social cohesion by reducing the cost of philopatry.

Estimating inbreeding is often a specific goal for conservation studies but inbreeding measures that do not require pedigrees, such as heterozygosity, have been criticized (Balloux, Amos & Coulson, 2004). It is thus favourable to use estimation methods that use a well-resolved pedigree. For example, Charpentier et al. (2006) used a long-term dataset including a molecular-marker-based pedigree to estimate inbreeding and examine its correlates with life-history traits in mandrills Mandrillus sphinx. They found that inbreeding in females was associated with small body size and an earlier age at first conception. Clearly, inbreeding could have important effects on the dynamics of the population.

Robust estimation of evolutionary parameters like heritability and selection (see Garant & Kruuk, 2005) also requires pedigree data. These parameters can help us understand how species cope with environmental perturbation, or longer-term environmental change, and are therefore fundamental to conservation biology.

Although some parameter estimation methods require only pairwise-relatedness estimates rather than pedigrees Ritland (2000), these approaches perform poorly compared with pedigree-based methods (Coltman, 2005). Recently, the ‘animal model’ approach, which requires a pedigree, has gained favour because it allows an entire multigenerational pedigree structure, rather than simply the pairwise relationships, to contribute to parameter estimation (Kruuk, 2004).

A recent study of heritability and selection in lemon sharks in Bimini, Bahamas, demonstrates the utility of molecular marker-based pedigrees (DiBattista et al., 2009). Sharks are heavily harvested worldwide, both by directly and as by-catch (Barker & Schluessel, 2005), and because this tends to remove larger individuals from the population, there may be significant selection for smaller body size (Fenberg & Roy, 2008) resulting in evolutionary change in the population. Similar changes in other fish species have been implicated in fishery collapse (Olsen et al., 2004). DiBattista et al. (2009) found that both body mass and size, which are predictors of survival, are moderately heritable. Therefore, harvesting that targets large individuals can potentially lead to the population becoming smaller bodied, less fecund and less viable.

It is clear from these examples that molecular marker-based pedigrees can make a great contribution to our understanding of evolutionary and ecological processes in wild-animal populations, all of which have implications for conservation biology. We now offer a brief overview of pedigree inference methods.

An overview of pedigree inference

  1. Top of page
  2. Abstract
  3. Introduction
  4. Uses in animal conservation studies
  5. An overview of pedigree inference
  6. Relatedness
  7. Pairwise methods
  8. Full-likelihood methods
  9. Common assumptions of pedigree inference methods
  10. Complete sampling of population
  11. Large and randomly mating population
  12. Linkage equilibrium
  13. Error and mutation
  14. Recent developments
  15. Inclusion of field observation data in the genetic framework
  16. Conclusion
  17. Acknowledgements
  18. References
  19. Supporting Information

Until the late-1980s wild animal pedigrees were constructed using field observation data (e.g. social pedigrees). Then, in the mid-1980s, came the biggest advance in the inference and use of pedigrees: the discovery of highly variable neutral genetic markers (e.g. microsatellites; Bennett, 2000) that allowed multilocus genetic fingerprinting of individuals (Jeffreys, Wilson & Thein, 1985). These molecular methods allowed the inference of relationships from genetic data rather than from observational data, which is often unreliable (Coltman et al., 1999). Typically, individuals in a population are observed on a generation-by-generation basis and pedigree construction consists of assigning relationships between individuals from different generations (parent–offspring relationships) or among individuals of the same generation (sibship relationships).

Relatedness

  1. Top of page
  2. Abstract
  3. Introduction
  4. Uses in animal conservation studies
  5. An overview of pedigree inference
  6. Relatedness
  7. Pairwise methods
  8. Full-likelihood methods
  9. Common assumptions of pedigree inference methods
  10. Complete sampling of population
  11. Large and randomly mating population
  12. Linkage equilibrium
  13. Error and mutation
  14. Recent developments
  15. Inclusion of field observation data in the genetic framework
  16. Conclusion
  17. Acknowledgements
  18. References
  19. Supporting Information

Perhaps the simplest approach to handling these kinds of genetic data is to construct a pairwise-relatedness matrix for the population, rather than assigning discrete relationships. Relatedness coefficients measure the probability of identity-by-descent (IBD) between individuals over the whole genome and several estimators exist (Lynch & Ritland, 1999; van de Casteele, Galbusera & Matthysen, 2001; Wang, 2002). In other words they are estimates of the probability that two alleles at the same locus, one randomly selected from each individual in a pair (dyad), will have recently descended from a single ancestral allele. At any locus, dyads may share zero, one or two alleles that are IBD and the probabilities of the events depend on their relationship. In a large outbred population, monozygotic (identical) twins, for example, will share two alleles that are IBD at each locus and will have a relatedness of 1. Parent–offspring dyads will share one allele that is IBD per locus and have a relatedness of 0.5. Parent–offspring dyads and full-sib dyads have the same total relatedness (0.5) but distinguishing between them is not difficult because the pattern of relatedness is different. While a parent–offspring dyad always share one allele IBD, for a full-sib dyads, share zero, one or two alleles, with probabilities 1/4, 1/2 and 1/4, respectively. Bink et al. (2008) recently compared several relatedness estimators and found that they all performed approximately equally well.

The utility of relatedness estimates depends on how they are used. Although relatedness estimates of particular dyads are unlikely to be accurate with the typical marker information in current practice, the error in these estimates would be averaged away across numerous dyads in a moderate-sized sample. It follows that using relatedness estimates to address questions like, ‘are juveniles more related than adults?’ or ‘are females more related than males?’, to investigate age- or sex-biased migration are likely to produce useful results given sufficient statistical power. However, other questions require assignment to discrete relationships so, although relatedness measures are useful for addressing a range of questions it is often expedient to infer discrete relationships rather than relatedness per se. Several approaches exist for doing this and they can be split into two camps: pairwise and full-likelihood methods.

Pairwise methods

  1. Top of page
  2. Abstract
  3. Introduction
  4. Uses in animal conservation studies
  5. An overview of pedigree inference
  6. Relatedness
  7. Pairwise methods
  8. Full-likelihood methods
  9. Common assumptions of pedigree inference methods
  10. Complete sampling of population
  11. Large and randomly mating population
  12. Linkage equilibrium
  13. Error and mutation
  14. Recent developments
  15. Inclusion of field observation data in the genetic framework
  16. Conclusion
  17. Acknowledgements
  18. References
  19. Supporting Information

Pairwise methods consider dyads for a number of candidate relationships. When considering a particular dyad, all other individuals, and their relationships, have no influence on the inference of the focal dyad's relationship. For parentage inference, in an ideal world, we might use an exclusionary approach. Exclusionary approaches for sibship inference are limited to the case of large full-sibship groups. This is because dyads are never excludable as full-siblings in diploid species no matter how many markers are used in the analysis.

With exclusionary parentage inference several genetic markers are considered and individuals that do not share any alleles at one or more loci are excluded. In paternity analysis, for example, this would ideally leave a single candidate as the father (the principle is the same with maternity analysis). Unfortunately, incomplete candidate sampling (i.e. missing individuals) and insufficient markers mean that it is rarely possible to exclude all but one individual with certainty. In addition, genotyping errors and mutations contribute to false exclusions (Wang, 2004). Therefore, we must turn to likelihood-based methods, which assess the likelihood of one hypothesis relative to another.

Likelihood-based methods are categorized as categorical (discrete) or fractional. Categorical methods aim to assign dyads to particular relationships. In sibship inference for example they ask, ‘Given the genotypes of two individuals, what is the most likely relationship: full-sib, half-sib or unrelated?’, while in parentage inference they ask, ‘Given the genotypes, which of these candidate individuals is most likely to be the father (or mother)?’. Fractional methods, which are restricted to parentage analysis, split the relationship probabilistically among compatible individuals. With both approaches, likelihood is calculated using the rules of Mendelian segregation of alleles between parents and offspring. Equations for calculating these probabilities are available elsewhere (e.g. Thompson, 1975; Marshall et al., 1998; Wang, 2004). Both the polymorphism used, and the number of molecular markers used will influence the efficacy of these approaches (Box 1).

Table Box 1.   Marker selection
Every marker used in an analysis contributes information. Butler et al., 2004, based on the performance of a number of parentage assignment algorithms on simulated and empirical datasets, suggested that six to eight loci, with at least eight alleles each, should be used. However, the number or markers required will depend on such factors as the (unknown) family structure and marker polymorphism. Workers should select markers that maximize the amount of useful information they provide, while minimizing any increase in potential error. Increasing both the number of loci, and allelic diversity (i.e. degree of polymorphism at each locus), will tend to increase the confidence in relationship assignments [although the number of loci is more influential than their allelic diversity (Bernatchez & Duchesne, 2000)]. However, workers should note that, where the sample itself is used to estimate population allele frequencies, using highly polymorphic loci can have the side-effect of inflating the error in allele frequency estimates unless the sample size is large (Gomez-Uchida & Banks, 2005; Kalinowski, 2005). Another factor to consider is that, as polymorphism at a locus increases, so does the potential for scoring error (Buchan et al., 2005; Hoffman & Amos, 2005). The addition of noise in this manner will only be a problem when using methods that do not account for error (see ‘Error and mutation’). Markers are usually assumed to be independent. Therefore, the use of tightly linked, non-independently assorting loci, introduces a pseudo-replication problem to the analysis. Failure to account for linkage results in an overestimate of precision, and therefore overconfidence in the inference made. A number of methods apply a correction to account for this problem (see main text), but for all but the most tightly linked loci the problem is likely to be minor. Lastly, for conventional methods that only consider one or two generations, the inclusion of non-neutral loci is not problematic. However, for multigenerational approaches (which, like conventional methods, assume allele frequencies are fixed) these types of markers may pose problems because allele frequencies may change systematically via selection across generations (Estoup et al., 2002), and should be avoided as they will reduce the accuracy of inference. Selkoe & Toonen (2006) provide an overview of marker selection issues.

With the categorical approach we examine the likelihoods of the putative relationships and accept the candidate relationship that is significantly more likely than any other. With the fractional parentage approach the parentage is split among every non-excluded candidate in proportion to their relative likelihood such that the one with the highest likelihood received the highest proportion and the others receive smaller proportions (summing to 1). Although the fractional approach has some advantages (i.e. it can be used even when the discriminatory power of loci is low, it uses all available data, produces a probability distribution of the pedigree), it is not as commonly used as categorical methods. Henceforth, we concentrate on categorical assignment methods.

The precise algorithms used to make categorical assignments of parental or sibship relationships vary. For parentage analysis, the simplest method is to award parentage to the individual with the highest logarithm of the likelihood ratio, or LOD score (the likelihood ratio is the likelihood of parentage of a particular individual relative to the likelihood that the individual is unrelated to the offspring in question). However, we should only award parentage if the best candidate is significantly more likely than the second best candidate (Marshall et al., 1998).

We will now describe the algorithm used by the popular parentage inference program, cervus (Marshall et al., 1998). cervus begins by calculating the LOD scores for all possible pairings between a focal individual and the candidate parents. Rather than simply assigning paternity or maternity to the individual with the highest LOD score it compares the LOD score of the most likely candidate with that of the next most likely candidate (by taking the difference in the LOD score, Δ). The magnitude of Δ indicates our confidence in assigning parentage to this particular individual and its statistical significance is tested by comparing with a null distribution of Δ generated from simulated parentage. If the magnitude of Δ is deemed satisfactory then parentage is assigned to that individual, otherwise it remains unassigned. The simulations carried out by cervus can also be informative of the power of a study. For example, in a pilot study one could examine the change in assignment rate with increasing number of alleles or loci, and use the information to decide how many markers to use.

Sibship assignment algorithms tend to be slightly more straightforward; usually assigning the most likely relationship of a number of candidate relationships. Some algorithms assign both full- and half-sibships while others assign only full-sibships.

Full-likelihood methods

  1. Top of page
  2. Abstract
  3. Introduction
  4. Uses in animal conservation studies
  5. An overview of pedigree inference
  6. Relatedness
  7. Pairwise methods
  8. Full-likelihood methods
  9. Common assumptions of pedigree inference methods
  10. Complete sampling of population
  11. Large and randomly mating population
  12. Linkage equilibrium
  13. Error and mutation
  14. Recent developments
  15. Inclusion of field observation data in the genetic framework
  16. Conclusion
  17. Acknowledgements
  18. References
  19. Supporting Information

Although pairwise methods have been fruitful, they discard potentially valuable information and are therefore inefficient. In addition, they can result in incompatibilities in the pedigrees produced. For example, when examining the relationship between individuals A, B and C, the dyads A–B and A–C may be inferred as full-sibs, while B–C may be inferred as half-, or non-sibs. In a parentage analysis the dyads A–C and B–C may be inferred as father–offspring and mother–offspring, respectively, but when considered jointly, the the trio A–B–C may be revealed as incompatible with a parent-pair and offspring relationship.

A major advance in the field, to which we now turn, was the development of full-likelihood methods which are more accurate than pairwise methods (Thomas & Hill, 2002; Wang, 2004). Several algorithms exist (Emery et al., 2001; Smith, Herbinger & Herbinger, 2001; Thomas & Hill, 2002; Wang, 2004; Wang & Santure, 2009) but their common feature is that, unlike pairwise methods, they retain the information lost when individuals other than the focal pair are ignored. For example, in parentage assignments, a single offspring only provides information for a single allele at a parental locus. However, with more offspring in the set considered (e.g. a group of siblings), the probability that both parental alleles are represented is increased, and consequently the power of parental assignment is improved. Another major benefit is that relationship incompatibilities of the kind described above are avoided.

The methods aim to reach a solution that maximizes likelihood of the entire pedigree configuration given the marker data. There are an astronomical number of potential configurations to test, even for small samples of individuals, and the approach is computationally intensive. Therefore, techniques such as Markov Chain Monte Carlo or simulated annealing are used to explore parameter space for all possible pedigree configurations to find a solution that maximizes likelihood. Parameter space is traversed with certain rules, aiming to go to areas with higher likelihood values, but avoiding getting stuck in local maxima (Wang & Santure, 2009).

Full-likelihood methods are best-suited to populations that have large family groups; for example, populations with large full-sibships, or that are highly polygamous (Wang & Santure, 2009). For practical purposes, the family group size depends both on the actual genetic structure of the population and on the sampling regime used to collect the data. As family group size declines, the amount of extra information exploited by full-likelihood methods compared with pairwise-likelihood methods also declines and the computational challenges posed by analysing large datasets with little family structure (e.g. Almudevar, 2007) may eventually outweigh the increased accuracy of the method.

Most full-likelihood approaches consider either sibships, or parentage, not both. However, the algorithm presented by Wang & Santure (2009) allows the joint inference of both parentage and sibship and as such represents a major advance in pedigree inference.

Common assumptions of pedigree inference methods

  1. Top of page
  2. Abstract
  3. Introduction
  4. Uses in animal conservation studies
  5. An overview of pedigree inference
  6. Relatedness
  7. Pairwise methods
  8. Full-likelihood methods
  9. Common assumptions of pedigree inference methods
  10. Complete sampling of population
  11. Large and randomly mating population
  12. Linkage equilibrium
  13. Error and mutation
  14. Recent developments
  15. Inclusion of field observation data in the genetic framework
  16. Conclusion
  17. Acknowledgements
  18. References
  19. Supporting Information

We have described the general principles of pedigree inference, but there are several assumptions commonly made by these methods. It is imperative that workers understand these assumptions so that they can evaluate potential biases that result from violations of them. Although different methods make different assumptions, there are four very common assumptions. We now describe these assumptions, highlighting the practical implications and, where possible, describe methods that relax them or even make use of their violation.

Complete sampling of population

  1. Top of page
  2. Abstract
  3. Introduction
  4. Uses in animal conservation studies
  5. An overview of pedigree inference
  6. Relatedness
  7. Pairwise methods
  8. Full-likelihood methods
  9. Common assumptions of pedigree inference methods
  10. Complete sampling of population
  11. Large and randomly mating population
  12. Linkage equilibrium
  13. Error and mutation
  14. Recent developments
  15. Inclusion of field observation data in the genetic framework
  16. Conclusion
  17. Acknowledgements
  18. References
  19. Supporting Information

The most important assumption of earlier parentage inference methods was probably that the population of candidate parents is completely sampled. If this assumption is violated we risk assigning the wrong parent. Several methods relax this assumption including those presented by Wang & Santure (2009) (implemented in colony), and Marshall et al. (1998) (implemented in cervus). In both colony and cervus this is accomplished by defining a prior probability that the candidate is present in the sample. Because full-likelihood methods, such as those implemented in colony, maximize likelihood over the entire pedigree, rather than for single relationships, the inferences with these methods are robust to uncertainty in this sampling rate (Wang & Santure, 2009). However, with pairwise methods (like those implemented in cervus) the prior can have a significant impact on inferences made (Marshall et al., 1998), and therefore workers using these methods should examine the effect of sampling uncertainty on inference. This assumption is not made for sibship inference.

Large and randomly mating population

  1. Top of page
  2. Abstract
  3. Introduction
  4. Uses in animal conservation studies
  5. An overview of pedigree inference
  6. Relatedness
  7. Pairwise methods
  8. Full-likelihood methods
  9. Common assumptions of pedigree inference methods
  10. Complete sampling of population
  11. Large and randomly mating population
  12. Linkage equilibrium
  13. Error and mutation
  14. Recent developments
  15. Inclusion of field observation data in the genetic framework
  16. Conclusion
  17. Acknowledgements
  18. References
  19. Supporting Information

Another typical pair of assumptions is that the sampled population is large and randomly mating. The underlying assumption here is that parental genotype frequencies can be calculated under Hardy–Weinberg equilibrium and that the probability of a given mating type is the product of the frequencies of the parental genotypes (Thomas & Hill, 2000; Wang, 2004). Of course, in the real world, populations are often not large; especially those of conservation concern. Most inference approaches do not attempt to deal with this assumption. However, violation of the assumptions may not be particularly important in terms of the accuracy of relationship assignment and the resulting pedigree structure. Simulations carried out by Wang & Santure using the colony full-likelihood approach indicate that violation of these assumptions (in the form of increasing inbreeding coefficient) has little effect on accuracy. The effect of violation of the assumption with pairwise methods has yet to be investigated but it is likely that the assumption would be more important using these methods, than using full-likelihood methods.

Linkage equilibrium

  1. Top of page
  2. Abstract
  3. Introduction
  4. Uses in animal conservation studies
  5. An overview of pedigree inference
  6. Relatedness
  7. Pairwise methods
  8. Full-likelihood methods
  9. Common assumptions of pedigree inference methods
  10. Complete sampling of population
  11. Large and randomly mating population
  12. Linkage equilibrium
  13. Error and mutation
  14. Recent developments
  15. Inclusion of field observation data in the genetic framework
  16. Conclusion
  17. Acknowledgements
  18. References
  19. Supporting Information

Usually, methods assume that genetic markers are unlinked between loci, and in linkage equilibrium. When two loci are close together on a chromosome, alleles at the loci may not assort independently, and will tend to be transmitted to the offspring as a pair. Although loci may not be linked functionally, they are clustered on the genome (Bachtrog et al., 1999) and as researchers use more markers in their analyses [e.g. single nucleotide polymorphisms (SNPs), below] significant linkage becomes increasingly likely (Abecasis & Wigginton, 2005) and must be tested, and perhaps accounted, for.

Linkage between loci, and the resulting non-independence of the information they provide, is essentially a statistical pseudo-replication problem for relationship inference. Although the estimate may not be biased the errors around this estimate would be inflated. Therefore, to avoid Type 1 error (false rejection of the null hypothesis), one of the pair of linked loci should either be discarded or down-weighted in the analysis. Fortunately, methods to account for linkage exist, mainly relying on an estimated linkage map depicting the strength of association between pairs of loci (Epstein, Duren & Boehnke, 2000; McPeek & Sun, 2000). The quantitative accuracy (i.e. recombination rate between loci) of such a map is not as important as the qualitative accuracy (i.e. relative positions of markers on a chromosome).

Linkage is not all bad news though: linkage between loci provides additional information that can be used to distinguish between certain relationship types (Boehnke & Cox, 1997). In addition, for specific relationship–sex combinations, the inclusion of X-linked markers can improve our ability to correctly infer relationships, for example by allowing the improved differentiation of second degree relationships such as cousins (Epstein et al., 2000).

Error and mutation

  1. Top of page
  2. Abstract
  3. Introduction
  4. Uses in animal conservation studies
  5. An overview of pedigree inference
  6. Relatedness
  7. Pairwise methods
  8. Full-likelihood methods
  9. Common assumptions of pedigree inference methods
  10. Complete sampling of population
  11. Large and randomly mating population
  12. Linkage equilibrium
  13. Error and mutation
  14. Recent developments
  15. Inclusion of field observation data in the genetic framework
  16. Conclusion
  17. Acknowledgements
  18. References
  19. Supporting Information

The final assumptions are that genetic data are free of error and mutation. Although these are two separate assumptions, error and mutation are usually not distinguishable and the effects on the analysis are the same. We therefore deal with them together here. The assumptions are universally violated (Bonin et al., 2004; Wang, 2004; Hoffman & Amos, 2005; Soulsbury et al., 2007) because mutations are widespread and errors cannot be totally eliminated. For microsatellites, errors include allelic dropouts (where PCR fails to amplify one of an individual's two homologous genes, one from each parent, at a locus; Dakin & Avise, 2004), false alleles (polymerase errors rendering an allele other than the true one), miscalling (allele identification error), contaminant DNA and data entry error (Dakin & Avise, 2004; Wang, 2004). The presence of such errors can present apparent failures of Mendelian inheritance and lead to incorrect relationship assignments, departures from Hardy–Weinberg equilibrium, overestimated inbreeding, etc. Even minor errors may lead to the incorrect classification of a monozygotic twin relationship as a full-sib. Butler et al. (2004) showed that sibship algorithms that follow strict Mendelian inheritance rules, are not robust to most kinds of errors. In fact, they show that errors can cause >70% of individuals to be misclassified. This underlines the importance of selecting methods that are robust to error.

Several workers provide methods to identify and cope with errors in pairwise parentage inference (e.g. Marshall et al., 1998; Kalinowski, Taper & Marshall, 2007) and full-likelihood sibship and parentage inference (e.g. Wang, 2004; Wang & Santure, 2009). Marshall et al.'s (1998) approach was to include an error rate parameter to account for imperfections in the data, while Wang's (2004) approach explicitly models error by distinguishing between observed and actual genotypes while estimating the likelihood of a particular pedigree configuration. The approach can identify and account for genotype errors (or mutations) at each locus of each sampled individual.

Recent developments

  1. Top of page
  2. Abstract
  3. Introduction
  4. Uses in animal conservation studies
  5. An overview of pedigree inference
  6. Relatedness
  7. Pairwise methods
  8. Full-likelihood methods
  9. Common assumptions of pedigree inference methods
  10. Complete sampling of population
  11. Large and randomly mating population
  12. Linkage equilibrium
  13. Error and mutation
  14. Recent developments
  15. Inclusion of field observation data in the genetic framework
  16. Conclusion
  17. Acknowledgements
  18. References
  19. Supporting Information

SNPs

Microsatellites are the prevalent molecular markers for implementing pedigree inference. Recently however SNPs have been highlighted as a potential alternative (Anderson & Garza, 2006; Slate et al., 2009). Individually SNPs are less informative than microsatellites because they are less polymorphic (Glaubitz, Rhodes & Dewoody, 2003; Wang, 2006): most SNPs have just two alleles, and their allele frequencies tend to be skewed, resulting in low heterozygosity and information content (Marth et al., 2001). On the other hand they are considerably more abundant than microsatellites and technological advancements in high throughput microarray methods will soon make it feasible to assay hundreds, or thousands, of loci relatively easily (Kwok, 2001; Anderson & Garza, 2006; Slate et al., 2009). In addition, SNPs seem less prone to typing error than microsatellites (Ranade et al., 2001) and, therefore, their combined power is potentially much greater than that of microsatellites (Anderson & Garza, 2006). The processes of SNP discovery and typing are reviewed by Morin, Luikart & Wayne (2004) and Slate et al. (2009).

There is one major potential disadvantage to using SNPs: linkage disequilibrium. Simulations have shown that as the density of markers on the genome increases, the accuracy will decrease (fig. 3 in Wang & Santure, 2009). Failure to use methods that account for linkage limits the utility of SNPs in genealogical studies (Glaubitz et al., 2003).

Multigenerational pedigrees

Typically when workers construct multigenerational pedigrees they carry out parentage and sibship analysis recursively on a cohort-by-cohort basis, and stitch the resulting single-generation pedigrees together (Pemberton, 2008). Recently however, methods to construct multigenerational pedigrees within a single analysis have been developed (Gasbarra et al., 2007a,b). These methods use a Bayesian approach and coalescence theory to reconstruct genealogies of several generations from partially observed populations (i.e. populations where only a subset of individuals are genotyped).

The approach could be useful in populations where individuals are more likely to be related through unsampled individuals than through sampled individuals, and the resulting genealogy could potentially be used in an animal model-framework to estimate population-level parameters (Kruuk, 2004). This approach is clearly promising but has yet to be tested in its ability to detect specific relationships such as full-/half-sibships and parent–offspring relationships. In addition, the method does not currently account for genotyping error.

Inclusion of field observation data in the genetic framework

  1. Top of page
  2. Abstract
  3. Introduction
  4. Uses in animal conservation studies
  5. An overview of pedigree inference
  6. Relatedness
  7. Pairwise methods
  8. Full-likelihood methods
  9. Common assumptions of pedigree inference methods
  10. Complete sampling of population
  11. Large and randomly mating population
  12. Linkage equilibrium
  13. Error and mutation
  14. Recent developments
  15. Inclusion of field observation data in the genetic framework
  16. Conclusion
  17. Acknowledgements
  18. References
  19. Supporting Information

The ultimate goal of most studies that infer pedigrees is to estimate a population-level parameter, rather than generating the pedigree itself. Categorical methods result in a pedigree that is assumed to be true, and uncertainty is usually ignored. Fractional methods result in a probability distribution of a pedigree and thus take uncertainty into account.

However, both fractional and categorical methods suffer because the processes of pedigree inference and parameter estimation are divorced from each other. Without modification, estimates of population-level parameters are biased towards estimates that would be expected under random mating (e.g. the naïve prior that all fathers are equally likely to be the true father). Adjustment of priors to correct for this (e.g. Neff, Repka & Gross, 2001) is one approach to cope with this, but a novel approach is to estimate population-level parameters jointly with the pedigree inference. Hadfield, Richardson & Burke (2006) illustrated this concept for paternity inference with the example of a fictional study with 20 candidate fathers and 20 offspring, where each father is the social father of one offspring (i.e. it behaves as the father, even though it may not be the true father). For 19 of the offspring, the genetic data support the social father as the true father. However, for the remaining individual, the genetic data give equal support for the social father and an unknown male (i.e. a potential EPP). Using traditional methods, support for the two potential fathers remains equal. However, using their novel method, support for the social male is greater because the data indicate that the social male is inherently more likely to be the true father.

This method can be adapted so that any other population-level parameter can be estimated simultaneously with the pedigree, and can therefore contribute to paternity inference. It is easy to envisage a case where geographic distance could contribute to the pedigree estimation. Simulations show that their approach is more accurate than the simple categorical approach that would be implemented by, for example, cervus (Hadfield et al., 2006). We envisage that this method, implemented in masterbayes (Hadfield et al., 2006), an r package (R Development Core Team, 2008), could be broadened in scope to include sibship inference within the same framework.

Conclusion

  1. Top of page
  2. Abstract
  3. Introduction
  4. Uses in animal conservation studies
  5. An overview of pedigree inference
  6. Relatedness
  7. Pairwise methods
  8. Full-likelihood methods
  9. Common assumptions of pedigree inference methods
  10. Complete sampling of population
  11. Large and randomly mating population
  12. Linkage equilibrium
  13. Error and mutation
  14. Recent developments
  15. Inclusion of field observation data in the genetic framework
  16. Conclusion
  17. Acknowledgements
  18. References
  19. Supporting Information

We have provided an overview of common conventional methods of pedigree inference using molecular marker data and highlighted some important recent developments. There are several software options available to infer pedigree structure (supporting information) and although the methodologies in the field can be traced back to simple Mendelian inheritance rules, the methods are varied and rapidly advancing. It is our hope that this review will help animal conservation biologists understand, choose and use the methods and tools of this fast-moving field, as well as understand more cutting edge developments.

References

  1. Top of page
  2. Abstract
  3. Introduction
  4. Uses in animal conservation studies
  5. An overview of pedigree inference
  6. Relatedness
  7. Pairwise methods
  8. Full-likelihood methods
  9. Common assumptions of pedigree inference methods
  10. Complete sampling of population
  11. Large and randomly mating population
  12. Linkage equilibrium
  13. Error and mutation
  14. Recent developments
  15. Inclusion of field observation data in the genetic framework
  16. Conclusion
  17. Acknowledgements
  18. References
  19. Supporting Information

Supporting Information

  1. Top of page
  2. Abstract
  3. Introduction
  4. Uses in animal conservation studies
  5. An overview of pedigree inference
  6. Relatedness
  7. Pairwise methods
  8. Full-likelihood methods
  9. Common assumptions of pedigree inference methods
  10. Complete sampling of population
  11. Large and randomly mating population
  12. Linkage equilibrium
  13. Error and mutation
  14. Recent developments
  15. Inclusion of field observation data in the genetic framework
  16. Conclusion
  17. Acknowledgements
  18. References
  19. Supporting Information

Appendix S1. Some commonly used software available for pedigree inference.

As a service to our authors and readers, this journal provides supporting information supplied by the authors. Such materials are peer-reviewed and may be re-organized for online delivery, but are not copy-edited or typeset. Technical support issues arising from supporting information (other than missing files) should be addressed to the authors.

FilenameFormatSizeDescription
ACV_324_sm_SupplInfo.doc63KSupporting info item

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.