### Abstract

- Top of page
- Abstract
- Introduction
- Materials and methods
- Results
- Conclusions and discussion
- Acknowledgments
- References

A number of procedures have been developed that allow the genetic parameters of natural populations to be estimated using relationship information inferred from marker data rather than known pedigrees. Three published approaches are available; the regression, pair-wise likelihood and Markov Chain Monte Carlo (MCMC) sib-ship reconstruction methods. These were applied to body weight and molecular data collected from the Soay sheep population of St. Kilda, which has a previously determined pedigree. The regression and pair-wise likelihood approaches do not specify an exact pedigree and yielded unreliable heritability estimates, that were sensitive to alteration of the fixed effects. The MCMC method, which specifies a pedigree prior to heritability estimation, yielded results closer to those determined using the known pedigree. In populations of low average relationship, such as the Soay sheep population, determination of a reliable pedigree is more useful than indirect approaches that do not specify a pedigree.

### Introduction

- Top of page
- Abstract
- Introduction
- Materials and methods
- Results
- Conclusions and discussion
- Acknowledgments
- References

In recent years there has been increasing interest in estimating genetic variance components in natural populations, with heritabilities being estimated in hundreds of studies (see meta-analyses by Mousseau & Roff, 1987; Roff & Mousseau, 1987; Weigensberg & Roff, 1996). Accurate estimates are important in the understanding of patterns of short-term evolution, the reconstruction of historical patterns of natural selection (Lande, 1979) and the prediction of genetic responses to selection. In addition they allow inference to be made about the underlying causes of clinal variation, through the comparison of the variance components describing the same traits within subpopulations of the same species (Coyne & Beecham, 1987).

Variance component estimates can provide information on the number of individuals required in order to maintain a viable population, and so are useful for the management of captive populations (Storfer, 1996). The loss of genetic variation is a restricting factor in a species' ability to respond to natural selection, and hence a limitation on its potential to evolve (Lande, 1982; Mousseau & Roff, 1987; Falconer & Mackay, 1996; Lande & Shannon, 1996). Variation is therefore critical for maintenance of species within a changing environment.

Whether variance components are sought for evolutionary insight or conservation biology, standard estimation methods such as regression of offspring phenotypes against parental phenotypes, sib-ship analyses or restricted maximum likelihood (REML) techniques under the animal model (see Lynch & Walsh, 1998), are often difficult or impossible to follow in the wild due to their requirement for known pedigrees. However, by typing individuals at marker loci information may be inferred about relationships (Thompson, 1975; Queller & Goodnight, 1989; Lynch & Ritland, 1999) and using this information, a number of indirect methods have recently been developed (described below) which allow variance component estimation with limited pedigree information. Unfortunately, the inherent inaccuracy of such indirect approaches may restrict their use in practice, where accurate estimates are required in order to avoid erroneous conclusions about the underlying population parameters.

The regression approach includes relationship information in the form of estimates of pair-wise relatedness. It uses a between and within locus ANOVA to remove the sampling error variance of relatedness estimation within pairs from the total variance of relatedness. The ANOVA therefore provides a `noise-free' estimate of the actual variance of the relationships within the population for use in subsequent variance component analysis (Ritland, 1996). The likelihood approach also works on pairs, and accounts for the uncertainty of the relationship data by attaching a likelihood to each of a number of relationship classes into which the pair might be assigned (Mousseau *et al*., 1998). However, the likelihood approach requires that the relative size of each of the relationship classes considered in the analysis is known prior to study. Its application is therefore limited to populations where such information is available.

The regression approach has been used previously to determine heritabilities of traits in a wild plant population, *Mimulus guttatus* (Ritland & Ritland, 1996). Resulting estimates were larger than those determined under more controlled conditions. This result is contrary to expectation because, under controlled conditions, environmental variance might be expected to be lower (Coyne & Beecham, 1987; Ritland & Ritland, 1996) although meta-analysis of studies fails to support this idea (Weigensberg & Roff, 1996). However the result may also be a reflection of the large sampling variance associated with this approach (Thomas *et al*., 2000). The likelihood technique was applied to a captive salmon population (*Oncorhynchus tshawytscha*), resulting in estimated heritabilities that were similar to previously derived estimates (Mousseau *et al*., 1998). However, the salmon population was set up under rather specific conditions so that a full-sib population structure, with known prior probabilities, could be assumed.

Alternatively, in a third approach, marker information can be used to infer exact relationships, thereby reconstructing a pedigree suitable for use in traditional variance component analysis, e.g. REML techniques (Patterson & Thompson, 1971; Lynch & Walsh, 1998). The Markov-chain Monte Carlo (MCMC) approach (Thomas & Hill, 2000) is based upon relationship assignment. First, a likely set of sib-ships is reconstructed, and then, under the assumption that the pedigree is correct, REML techniques are used to estimate variance components. The MCMC approach allows well-established methods of variance component estimation to be used, e.g. REML, thus family specific and relationship-specific information is weighted more efficiently than in pair-wise analysis (Thomas & Hill, 2000). However, incorrectly assigned relationships can lead to large bias in variance component estimation (Thomas & Hill, 2000).

For clarification, the MCMC approach to sib-ship reconstruction is also based on likelihood techniques, but it will be referred to here as the MCMC approach and the likelihood-based pair-wise approach as the likelihood approach. The adaptable nature of both likelihood-based approaches allows any information derived from known relationships to be included in the analysis (Thomas & Hill, 2000).

In a recent study, Milner *et al*. (2000) used a pedigree which was determined through field observation of mother-offspring pairs combined with paternity inference using genetic markers, to estimate the heritabilities of several traits in an unmanaged population of Soay sheep (*Ovis aries*). Paternities were inferred using CERVUS 1.0 (Marshall *et al*., 1998), which attaches confidence values to an assigned paternity. Paternities achieving an average confidence of 95% were used in variance component analysis (Milner *et al*., 2000). Variance components were estimated for both males and females using REML methodology with the data analysed under an `animal' model (Lynch & Walsh, 1998). It was found that heritability estimates for body weight were about 50% lower in males when the pedigree was based upon paternities assigned with 80% confidence and about 30% lower in females (Milner, 1999). This observed reduction, although not statistically significant, might have been because of the bias introduced through inaccurate relationship information.

In this study a Soay sheep data set is analysed using the marker-based systems of variance component estimation. The exact data set used is a modified form of the set used by Milner *et al*. (2000), and comprised animals born between 1995 and 1999 (inclusive), whose maternal identity was known from observation at lambing. Body weight is used as an example trait and an attempt is made to address the question `which of the approaches produces a good (reliable) estimate of the heritability?' rather than addressing the question `how heritable is body weight?' The promiscuity of both sexes in the study population is one of the most problematic with regards to paternity inference (Pemberton *et al*., 1999), and it is of interest to see how heritability estimates derived from the newer marker-based approaches compare. Estimates of the heritability are made using all the pedigree free approaches, and comparison is made with approaches that do specify pedigree. Alternate approaches to pedigree reconstruction are examined.

### Results

- Top of page
- Abstract
- Introduction
- Materials and methods
- Results
- Conclusions and discussion
- Acknowledgments
- References

Estimates of heritability obtained using only inferred relationship data and either the regression or likelihood (with `actual' priors) approaches were unreliable and were sensitive to the choice of fixed effects fitted (Table 3). Calculation of the actual variance of the relationship from the 95% confidence pedigree showed that, given the sample size, the variance of the relationships was low (≈0.0005), reflecting a low number of related pairs. The population structure of the Soay sheep therefore makes analysis using just 12 marker loci very unreliable.

Table 3. Summary of the heritability estimates and standard errors (bracketed) obtained using the different estimators and under the different models. Likelihood analyses that used `flat' priors are excluded from the Table 3 because the use of flat priors resulted in all estimates of the additive genetic variance components being negative and thus fixed at the zero boundary of the parameter space. Several nonzero estimates were obtained when known mothers were incorporated into the flat prior analysis but these were very close to zero, because of the incorrect prior information. In situations where estimates of the additive genetic variance were fixed at zero, estimates of the genetic variance obtained from bootstrap samples also tended to be fixed. Meaningful standard errors could not therefore be found in those situations.

Low levels of marker information and low relatedness may be partly compensated for through the inclusion of known data into the analysis. When maternal data or the 95% confident pedigree were used in the likelihood analysis, the estimates approached the REML estimates obtained using the 95% confidence pedigree (Pedigree IV), although they generally had larger standard errors (Table 3). Estimates of the heritability obtained using the 95% confidence pedigree information ranged from about 0.2–0.4 regardless of the method of analysis, with only small deviation when different fixed effects were fitted (Table 3). Differences in the heritability estimates made using the pedigree-free and pedigree-determined approaches, when using the 95% confidence pedigree, reflect differences in the weighting of family and relationship information between the techniques. Alternatively they might reflect lower efficiency in the two-step estimation procedure of the fixed and random effects for the pedigree-free analysis compared with the one-step estimation for the pedigree-determined approaches. Likewise, sample error differences for the heritability estimates made from the 95% confidence pedigree reflect either the greater efficiency of REML techniques, or the poor estimation of the sampling errors obtained using bootstrap methodology.

Estimates made using REML techniques and the different pedigrees with assigned relationships indicated that the greater the number of assigned relationships included in the pedigree, the lower the estimate of the heritability. At one extreme, when only inferred relationships were analysed (i.e. Pedigree I – when only MCMC reconstruction of half-sibs was used) heritability estimates were either zero, or negligibly small (not shown in table). At the other extreme, with only known relationships included (Pedigree II – known mother-offspring links only), heritabilities were estimated as between 0.29 and 0.39 (Table 3). This pattern may be explained by downward bias introduced as a result of incorrectly assigned relationships. It may also be explained by the presence of a maternal effect, which would increase the similarity of sibs. Heritability estimates would therefore be biased upwards. The bias would be greatest when only mother-offspring relationships are used to form the pedigree, and would diminish as further (i.e. paternal) relationships are included in the analysis.

Visual comparison of the 95% pedigree (IV) and the MCMC approach with known mothers (III) indicated that a number of the same half-sib ships were recovered, although some were specific to the method of reconstruction. Further comparison of III and a pedigree determined using paternity inference set at 80% showed the same pattern, but with more sib-ships in common. Hence greater numbers of inferred relationships were present in the pedigree when information from the 95% confidence pedigree and sib-ship reconstruction was combined (V) than when information from each was analysed on its own. This helps explain why, when either pedigree III or IV were analysed, the estimates of heritability were intermediate between estimates made from II and V (Table 3).

### Conclusions and discussion

- Top of page
- Abstract
- Introduction
- Materials and methods
- Results
- Conclusions and discussion
- Acknowledgments
- References

The objective of this study was to assess methods that use relationship information inferred from marker data to estimate variance components on an actual population. Two approaches to make use of the marker information were examined, either to gain nonspecific relationship data or to specify exact relationships. The estimates of the heritability obtained from the 95% confidence pedigree using the pair-wise frameworks were regarded as the `best' achievable estimates using the pair-wise approaches. Deviations from these values when inferred relationship information was used were a result of the inaccuracies introduced through relationship inference.

The regression approach gave unreliable results, which deviated wildly when the fixed effects were changed. Low amounts of marker data and low numbers of relatives in the sample resulted in poor estimates of the actual variance of the relationship, which was greatly under-estimated (by 100 times). Estimations of the heritability using the regression approach requires division by the actual variance of the relationship, and therefore, as the actual variance of the relationship was considerably less than zero, small changes caused by alteration of the fixed effects were greatly amplified.

The likelihood approach gave negative estimates of the heritability and so estimates were fixed at the boundary of the parameter space, especially in the situation where the priors were inaccurate. Again this is because of insufficient amounts of marker data to gain useful relationship data, and low numbers of relatives in the sample upon which to partition the variance. The MCMC approach also failed for similar reasons. For these techniques to operate successfully in a natural situation, much greater numbers of relatives are required in the sample as well as a greater amount of marker information. Incorporation of known relationship information into the likelihood and the MCMC approaches allowed more reliable estimates of the variance to be determined.

The likelihood approach requires that population structure be known prior to study. Its application is therefore limited to situations where such information is available. Alternatively, prior probabilities may be inferred from existing knowledge, such as the average life-time reproductive success and the age structure of individuals in the study population. Most of the information on the genetic variance components is derived from close relatives (e.g. full-sib, half-sib and parent-offspring groups), and accurate prediction of the prior probabilities for these groups is important.

In this study, the environmental covariance of full-sibs was not fitted into the model as there are few full-sibs in the Soay sheep population (Table 2), although the likelihood approach and the approaches that assign relationships allow for its inclusion. When a pedigree was used that contained only the known mother–offspring links, estimates of the heritability were larger than when assigned relationships were included. This could be because of bias introduced through the inclusion of inferred relationships in the other pedigrees, or because of the existence of a maternal effect that would inflate the similarity between maternal sibs. Attempts to fit maternal effects into the model often resulted in REML analysis that failed to converge, presumably because of insufficient contrasts within the pedigree. Maternal effects were therefore excluded from the analysis. Milner *et al*. (2000) found that heritabilities tended to be lower when maternal effects were included, although in body weight the change was not significant.

A problem with all of these approaches is the calculation of the standard errors of variance component estimates. In this study, bootstrap methodology was used to estimate errors for the pair-wise approaches that did not specify a pedigree. In cases where a pedigree was specified, large sample estimates of the variance of the parameters (from ASREML) were used to calculate the standard errors. Neither of these approaches provided a reliable means of estimating the standard errors. In the case of estimates obtained using ASREML, no account is made of the inaccuracy of the pedigree and so estimates of sampling errors are likely to be underestimates. In the case of the bootstrap-derived estimates, simulated studies of balanced populations with known relationships indicated that the sampling errors were overestimated. Ideally the bootstrap would resample over independent data points, a condition clearly violated when resampling over pairs. The individuals within the sample are not independent either, because they share relationships, and so the conditions for the bootstrap are also violated. As a result, when the level of relatedness in the population increases, the accuracy of parameter estimation increases, but the accuracy of standard error estimation decreases.

In the Soay sheep population, sib-ships could be reconstructed via paternity inference. In many cases paternities could be assigned with high confidence, and so would probably lead to the most reliable estimates of variance components. In the absence of information on candidate fathers, MCMC reconstruction of half sibs using the known maternal information provides a means to recreate the lost sib-ships. Indeed, a number of the same sib-ships were reconstructed using the MCMC approach including maternal data, as were reconstructed through assignment of individuals to the same sire using CERVUS (Marshall *et al*., 1998) although some sib-ships determined were specific to each approach. Increasing the number of assigned relationships led to a decrease in the size of the estimated heritability, probably due to an increase in the number of misassigned relationships, an effect also noted by Milner *et al*. (2000). Therefore, only relationships assigned with a high degree of confidence should be included in the analysis. Confidence levels may be determined for relationship assignments using sib-ship reconstruction and paternity inference by simulation (Marshall *et al*., 1998; Thomas & Hill, 2000).

In testing the marker-based approaches, we examined individuals born from 1995 to 1999 inclusive, in order to mimic a field project where each new cohort is sampled and phenotyped. Consequently, there were fewer related pairs within the sample (see Table 2) than existed in the standing population at any one time (data not shown). Although a `standing population' sampling approach would yield more related pairs, the related pairs would be more distantly related on average, and the number of unrelated pairs would increase at a much faster rate; paradoxically therefore the variance in relationship would decrease. We would not expect a `standing population' sampling approach to yield improved estimates over the `cohort sampling' approach described in this paper, especially in populations with small family sizes.

In summary, it is clear from previous simulation studies (Thomas *et al*., 2000) that for the marker-based approaches to be reliable, the relationship structure of the population is important. There is a basic need for large families or, equivalently, a large variance of relationship in the sample. In addition, these methods require a considerable amount of polymorphic marker data be typed per individual before estimated heritabilities become reliable, unless known relationships are included in the analysis (e.g. maternal information, Thomas & Hill, 2000). In consequence, the techniques are not appropriate for use on all natural populations of interest, and indeed misleading results may be obtained in populations that violate the above requirements. Considerable caution must therefore be exercised before field studies are undertaken, with simulation being a useful tool in deciding the merit of the marker-based techniques. Finally, given the comparative unreliability of marker-based estimates compared with estimates based on known relationships, the determination of a reliable pedigree is of more use in populations of low average relationship than the more indirect approaches.