Factors affecting accuracy of estimated effective number of chromosome segments for numerically small breeds

Abstract For numerically small breeds, obtaining a sufficiently large breed‐specific reference population for genomic prediction is challenging or simply not possible, but may be overcome by adding individuals from another breed. To prioritize among available breeds, the effective number of chromosome segments (M e) can be used as an indicator of relatedness between individuals from different breeds. The M e is also an important parameter in determining the accuracy of genomic prediction. The M e can be estimated both within a population and between two populations or breeds, as the reciprocal of the variance of genomic relationships. However, the threshold for number of individuals needed to accurately estimate within or between populations M e is currently unknown. It is also unknown if a discrepancy in number of genotyped individuals in two breeds affects the estimates of M e between populations. In this study, we conducted a simulation that mimics current domestic cattle populations in order to investigate how estimated M e is affected by number of genotyped individuals, single‐nucleotide polymorphism (SNP) density and pedigree availability. Our results show that a small sample of 10 genotyped individuals may result in substantial over or underestimation of M e. While estimates of within population M e were hardly affected by SNP density, between population M e values were highly dependent on the number of available SNPs, with higher SNP densities being able to detect more independent chromosome segments. When subtracting pedigree from genomic relationships before computing M e, estimates of within population M e were three to four times higher than estimates with genotypes only; however, between M e estimates remained the same. For accurate estimation of within and between population M e, at least 50 individuals should be genotyped per population. Estimates of within M e were highly affected by whether pedigree was used or not. For within M e, even the smallest SNP density (~11k) resulted in accurate representation of family relationships in the population; however, for between M e, many more markers are needed to capture all independent segments.

M e is currently unknown. It is also unknown if a discrepancy in number of genotyped individuals in two breeds affects the estimates of M e between populations. In this study, we conducted a simulation that mimics current domestic cattle populations in order to investigate how estimated M e is affected by number of genotyped individuals, single-nucleotide polymorphism (SNP) density and pedigree availability. Our results show that a small sample of 10 genotyped individuals may result in substantial over or underestimation of M e . While estimates of within population M e were hardly affected by SNP density, between population M e values were highly dependent on the number of available SNPs, with higher SNP densities being able to detect more independent chromosome segments. When subtracting pedigree from genomic relationships before computing M e , estimates of within population M e were three to four times higher than estimates with genotypes only; however, between M e estimates remained the same. For accurate estimation of within and between population M e , at least 50 individuals should be genotyped per population. Estimates of within M e were highly affected by whether pedigree was used or not. For within M e , even the smallest SNP density (~11k) resulted in accurate representation of family relationships in the population; however, for between M e , many more markers are needed to capture all independent segments.

| INTRODUCTION
Numerically small breeds often have difficulties to compete with larger and highly performing mainstream breeds, which endangers their existence (Addo et al., 2017;Hiemstra et al., 2010). These small breeds, however, are well worth preserving as they possess unique genetic diversity and show high adaptation to local environments. In other words, they can fulfil a sustainable role in the society (Oldenbroek, 2007). To improve the long-term perspectives of small breeds, it is necessary to maintain their economic competitiveness and preferably enhance it. In recent years, genomic prediction of breeding values, that is prediction based on marker data alone, revolutionized the field of animal breeding (Meuwissen et al., 2001). In dairy cattle breeding genomic selection significantly reduced the generation interval through selection of animals earlier in their life, which resulted in higher genetic gains per year (Bouquet & Juga, 2013;Pryce et al., 2011). Genomic selection, therefore, can be used in small breeds to improve their competitiveness and economic perspectives for farmers to use these breeds on their farms. In addition, methods such as genomic optimal contribution selection (Sonesson et al., 2012) can be applied to simultaneously assure genetic improvement of the breed and the maintenance of its diversity.
The principle of genomic prediction is that the reference population, which consists of individuals that are both phenotyped and genotyped for thousands of single-nucleotide polymorphisms (SNPs), is used to estimate SNP effects. The estimated SNP effects are subsequently used to infer genomic estimated breeding values (EBVs) of selection candidates, who only have genotypes. Size of the reference population is one of the key parameters that affects accuracy of genomic prediction (Daetwyler et al., 2008;Meuwissen et al., 2001;VanRaden et al., 2009). For numerically small breeds, however, obtaining a sufficiently large breed-specific reference population for genomic prediction may be challenging or simply not possible, either because of limited resources available for genetic improvement of the breed, or simply because limited numbers of animals are available within the breed. Adding individuals from other breeds to the reference populations may help to overcome this issue. The benefit of reference individuals from another breed strongly relies on relatedness between the breeds, where higher increase in accuracy is expected when closely related breeds are combined in the reference population, while no or only low increases in accuracy are expected when those breeds are more distant (Brøndum et al., 2011;Habier et al., 2007Habier et al., , 2010Hozé et al., 2014). To prioritize among available breeds, the effective number of chromosome segments (M e ) can be used as an indicator of relatedness between individuals from different breeds .
The M e is an important parameter in determining the accuracy of genomic prediction in breeds with a single-breed (Goddard, 2009) or multi-breed reference population . The M e can be estimated both within a population and between two populations or breeds. The M e within a population describes the number of chromosome segments that are segregating independently in the population. Effects for each of these segments need to be estimated in order to predict genomic breeding values of individuals from a given population (Meuwissen et al., 2013;Wientjes et al., 2016). The accuracy of genomic prediction increases as the number of segment decreases (Daetwyler et al., 2008).
The M e within a population is directly related to the effective population size (N e ) (Brard & Ricard, 2015;Goddard, 2009;Lee et al., 2017). Low N e is associated with higher relatedness among individuals, higher extent of linkage disequilibrium (LD) (Falconer & Mackay, 1996;Sved, 1971) and lower number of segregating chromosome segments. Hence, populations or breeds with similar selection history and LD structure are expected to have similar values of M e . The M e between populations gives insight in the consistency of LD between the two populations . Low M e between populations indicates high relatedness between two populations, while between populations that were split more generations ago usually a higher value of M e is observed (see general discussion in Wientjes, 2016).
In general, before all genotypes are available for both reference animals and selection candidates, a population parameter such as M e can be used to predict the anticipated accuracy of genomic selection Vandenplas et al., 2017;VanRaden, 2008;. The predicted accuracies can then help to decide whether implementation of genomic selection is expected to be beneficial. To keep initial costs minimal, the number of animals to genotype to be able to estimate M e , and predict the accuracies of genomic selection, should preferably be as small as possible. Previous studies aiming to estimate within and between population M e used 100 or more individuals (van den Berg et al., 2015;Erbe et al., 2013;. The threshold for number of individuals needed to accurately estimate within or between populations M e is currently unknown. It is also unknown if a discrepancy in number of genotyped individuals in two breeds affects the estimates of M e between populations.
The main objective of our study was to investigate number of individuals needed to accurately estimate M e within and between populations, and the size of difference in number of individuals in two breeds that allows for accurate estimation of between population M e . For this purpose, we simulated two populations that were separated by 100 generations. We evaluated how fast M e changes across generations after separation and we also investigated if the absence of pedigree, a frequent occurrence in small breeds, affects the value of estimated M e . Finally, we studied the effect of marker density on the estimates of within and between population M e .

| Population structure
Two populations were simulated to reflect current domestic cattle breeds, specifically in terms of size of population, selection history and LD structure. These populations were related through common ancestry, originating from a historical population. The historical population consisted of 8,000 individuals in the base population. In the next 300 generations, population size gradually decreased (by ~25 individuals in each generation) to 400 individuals, and remained of such size for the following 20 generations, that is until generation 320. The bottle neck was used to achieve LD. From generation 320 until generation 340, the population size gradually increased to 5,000 individuals. Number of males in generation 340 was 50; number of females was 4,950. The genome consisted of 30 chromosomes, each of 100 cM. A total of 720,000 SNP markers were distributed equally and randomly over the chromosomes so that each chromosome contained 24,000 markers, similar to the high density Bovine BeadChip. As most traits of economic importance are quantitative traits, and to ensure a sufficient number of segregating QTL in the final data, the number of simulated QTLs was high, that is 9,000, which were equally distributed over the chromosomes, so that each chromosome contained 300 QTLs. QTLs were randomly distributed across the genome and their effects followed a gamma distribution with a shape parameter of 0.4. SNPs and QTLs had equal allele frequencies in the base generation of the historical population. The mutation rate of QTLs and markers was set to 2.5 × 10 -5 . All markers and QTLs were segregating in the last historical population.
The last generation of the historical population (i.e. generation 340) was randomly divided into two equally sized populations (A and B), so-called founder populations, of each 2,500 individuals. In the next generation, the size of both populations was increased to 5,000, and in each population, 30 breeding males and 2,500 breeding females were available to produce 5,000 individuals for the next generation. Total number of individuals was kept constant for the following 100 generations. Number of offspring per female was set to 2, with 1:1 sex ratio. Throughout these 100 generations, both populations underwent selection based on EBVs, estimated from a best linear unbiased prediction method via an animal model, using phenotypic records and pedigree data. In each generation, 12 males and 500 females were replaced with individuals with the highest EBVs (a replacement ratio of 0.4 for the males and of 0.2 for the females). Thus, overlapping generations were present in the data. Selected males and females were randomly mated to each other, keeping the number of matings per male on average ~83.
Simulations were performed using QMSim software (Sargolzaei & Schenkel, 2009) and consisted of 10 replicates. Appendix S1 contains the QMSim parameter file, and Appendix S2 contains the seed file used for simulation.

| Estimating M e
Different approaches can be applied to estimate within population M e , relying on either N e or on the variation in genomic relationships between the individuals (Goddard, 2009;Goddard et al., 2011;Hayes, Visscher, & Goddard2009). In this study, we used the latter (see Discussion). The within population M e was estimated using the following equation Wientjes et al., 2013) : where G ij is the genomic and A ij is the pedigree relationship between individual i and j, and the variance is taken over all pairs ij in the population. In analogy to this equation, M e between populations can be estimated as follows (Wientjes et al., 2013): where G pop1 i pop2 j is the genomic relationship between individual i from population 1 and individual j from population 2, and A pop1 i pop2 j is the corresponding pedigree relationship, with the variance taken across all pairs of individuals from population 1 and 2. Conceptually, two populations can be considered as one reference population and M e is estimated as the effective number of chromosome segments that are segregating in the combined population . The genomic relationship between unrelated individuals is expected to be 0 .
The M e was estimated with calc_grm software (Calus & Vandenplas, 2016), using an exponential function to adjust G-A values to be on average 0 across the range of pedigree relationship values . The matrix G was calculated using following equation where G 11 is a matrix with genomic relationships in population 1, G 22 is a matrix with genomic relationships in population 2, while G 12 and G 21 are matrices with genomic relationships between population 1 and 2 . Z 1 (Z 2 ) matrix contains genotypes for all individuals from population 1 (population 2) at all loci, centred by subtracting twice the allele frequency per locus, and p 1k (p 2k ) is the allele frequency of marker k in the population 1 (population 2). Z 1 Z ′ 2 and Z 2 Z ′ 1 are matrices of genetic covariance between the genetic values of two populations, divided by the SDs of the genotypes in each population

| Scenarios
To get insight into the effect of number of genotyped individuals used on the accuracy of estimated within population M e , we tested five different sample sizes of 10, 50, 100, 500 and 1,000 individuals, respectively. M e was also estimated for the whole population of 5,000 individuals using 720k SNPs, which was considered closest to the true within M e value, and was used for comparison with all other estimates. To test the effect of discrepancy in sample sizes from two populations on the accuracy of between M e , each sample size from each population was tested against each sample size from another population, resulting in 25 combinations in total. Similarly as for within M e , between M e was also estimated using all 5,000 individuals from both breeds and 720k SNPs, and this estimate was used for comparison with all other estimates. All sampling of individuals was performed 50 times within each replicate, and the mean and standard deviation of 50 estimates of within and between M e within a replicate were computed. Results are presented as averages of those means and standard deviations, across the 10 replicates. The estimates of M e using all 5,000 individuals are presented as average values across the 10 replicates. The described estimation of M e was done at generation 10, 50 and 100, in order to infer changes of M e across generations. The pedigree consisted of 20,000 individuals that traced each population back four generations.
Detected levels of LD may be affected by marker density, such as SNPs compared to genome-wide sequence data (Erbe et al., 2013;Qanbari et al., 2014), which subsequently can effect estimates of M e . In the default scenario, we simulated 720k SNPs at the last historical population, to reflect high marker density used in dairy cattle. To study the influence of different marker densities, we reduced the number of markers to subsets of 360, 180, 90, 45, 22.5 and 11.25k, which was achieved by selecting every 2 x -th marker, where x ranged from 1 to 6.
Calculation of M e with Equations 1 and 2 requires pedigree to estimate additive genetic relationships between pairs of individuals in the same or between different populations. When this information is missing, M e may be underestimated, especially for within population M e . For between M e , absence of pedigree may be less of an issue, since depending on the distance between the breeds, no or only a small number of individuals may have recent ancestry with individuals from another breed. We investigated the effect of pedigree absence on the estimation of M e at generation 10, 50 and 100 after the split of the two breeds.
T A B L E 1 Estimates of within population M e with and without pedigree, across generations and SNP densities using all 5,000 individuals and N f the number of breeding females. This value of N e is close to those found in previous empirical cattle studies, where N e was approximately 100 (Hall, 2016;Leroy et al., 2013). The squared correlation between pairs of SNPs (r 2 ) (Hill & Robertson, 1968) had on average (±SD) a value of 0.22 ± 0.23 at pairwise distances of 20-30 kb and 0.18 ± 0.20 at 60-70 kb for generation 100 for both populations, similar to observed LD patterns in real cattle populations (Qanbari et al., 2009(Qanbari et al., , 2014. Allele frequencies of the SNPs followed the U-shape distribution. Table 1 presents estimates of within M e in the whole population, at different SNP densities and at generations 10, 50 and 100. Since population 1 and 2 had similar estimates, because they had the same population history, one value was presented in the table, which was calculated as average within M e of the two populations, with the average standard deviation. Using the whole population of 5,000 individuals, 720k SNPs, and no pedigree, estimated within M e value across 10 replicates was 254 (SD ± 7) in generation 100. Regardless of the SNP density, similar values of M e were obtained when the number of sampled individuals was 50 or higher; however, when the number of individuals was 10, within M e was overestimated, with average estimated value of 361 ( Figure 1 and Table 2). In addition, with 10 individuals, across replicates the average standard deviation of the estimated M e was large, 189. Within M e was overestimated and showed high variation when number of individuals was 10, regardless whether M e was estimated at generation 10, 50 (Appendix S3) or 100 generations after splitting the populations (Table 2). These results indicate that at least 50 individuals are needed for accurate estimates of within M e and that decreasing SNP density had a very small effect on the estimated M e .

| Within population M e
With pedigree, estimated within M e was ~3x higher in both populations, 776 (SD ± 44) on average, at generation 100 when 720k SNPs were used. When pedigree was included, estimates of within M e were slightly more affected by SNP density (Table 1). With the smallest sample size, within M e on average had similar value as other sample sizes; however, variation around the mean remained high (Figure 1, Table 2, Appendix S3). Across generations, within M e values showed a decreasing trend in all scenarios (Figure 2).

| Between population M e
The estimated M e between the two populations using all individuals, 720k SNPs and no pedigree, was 16,036 (SD ± 529) at generation 100 (Table 3). Unlike for estimation of within M e , where different SNP densities had small effect, between M e was highly influenced by number of available SNPs (Table 3). For example, at generation 100, using 45k SNPs between M e was underestimated by 23%, and the lowest SNP density of ~11k SNPs, often used to genotype cows, underestimated between M e by more than 46% (Figure 3, Appendix S4). Regardless of SNP density, when the number of sampled individuals was 50 or more in both populations, estimates of between M e were close to that of the whole population. On the other hand, whenever one population had only 10 individuals, between M e was on average overestimated with a large standard deviation (Figure 3, Appendix S4). These results suggest that at least 50 individuals from both populations are needed for accurate estimation of between M e .
From generation 10-100, between M e increased by ~9,000 when 720k SNPs were used. Increase of between M e is expected as populations diverge more in time, especially when there is no exchange of individuals, which was the case in our simulation. Since pedigree used had no shared ancestors in either 10 or 100 generations beyond the historical population, they effectively had pedigree based relationships of ~0, and between M e estimates were the same as those without pedigree (results not showed).

| DISCUSSION
In this study, we conducted a simulation that mimics current domestic cattle populations in order to investigate how estimated effective number of chromosome segments (M e ), within and between populations, is affected by number of genotyped individuals, SNP density and pedigree availability. Our results show that a small sample of genotyped individuals is expected to lead to overestimation of M e and therefore may not accurately represent population structure. Based on our findings, at least 50 genotyped individuals are needed for

Sample size
Without pedigree With pedigree accurate estimation of both within and between population M e . While estimates of within population M e were hardly affected by SNP density, between population M e values were highly dependent on the number of available SNPs, with higher SNP densities being able to detect more independent chromosome segments. When pedigree was used, estimates of within population M e were approximately three to four times higher than estimates with genotypes only; however, between M e estimates remained the same. Although the two populations used here had a similar population history, in term of implications of our results it may equally well represent situations where the reference population of a local breed is complemented with animals from another local or mainstream breed. This is because the effective population size of the simulated populations of 118 (calculated based on the numbers of breeding males and females) is close to estimates for local and mainstream breeds.

| Within population M e
Estimated within M e using all individuals and no pedigree had a value of ~254 in both populations at generation 100.  (14) 738 (10) Note: Estimates are averages over 10 simulation replicates, averaged over population 1 and 2. Within each replicate sampling and M e estimation has been repeated 50 times, and average M e and standard deviation of a replicate have been calculated.
Standard deviations of a replicate are given in brackets as an average over 10 simulation replicates, averaged over population 1 and 2.
Appendix S3 contains estimates of within population M e for generation 10 and 50. In previous studies on cattle populations, within M e values varied significantly depending on the breed and the method used to estimate within M e (Brard & Ricard, 2015). When formulas based on N e were used, within M e ranged between 800 and 8,000, based on the results from 76 studies (Brard & Ricard, 2015). Back-solving M e from deterministic formulas for genomic prediction accuracy, after equating those to empirical cross-validation accuracies for milk yield and somatic cell score, yielded within M e of ~1,000-2,000 for a Holstein Friesian population, and M e values of 150-400 for Brown Swiss (Erbe et al., 2013). As M e is linked to effective population size, it is expected that breeds with lower genetic diversity have smaller M e values. In a recent study that analysed five numerically small Dutch Red cattle breeds, within M e ranged between 100 and 300, corresponding to values in our simulation (Marjanovic et al., 2018). From generation 10-100, within M e in our study decreased by ~50, which is expected since artificial selection reduces genetic variation and increases relatedness among individuals. Hence, empirical estimates of within M e are expected to strongly depend on the selection history of the population. When within M e was estimated using pedigree, the values increased approximately fourfold at generation 10 and threefold at generation 100 in both populations. Estimated M e of similar magnitude (~1,390 at generation 10) has been reported for a Holstein Friesian population, where M e was computed using the same approach as in our study . Considering the computation using M e = 1 Var (G −A ) , it is worthwhile noting that all variance in the genomic relationships is likely also present in the pedigree relationships, since E (G|A) = A , meaning that Var (A ij ) may be a lower limit of E (Cov (G ij ; A ij )). Assuming E (Cov (G ;A )) ≈ Var (A ) for simplicity, we get: and M e ≈ 1 Var (G ) − Var (A ) . Within livestock populations relatively high relationships, such as those between full-and halfsibs, parent-offspring, and parent-grand offspring, are abundant. The presence of such relationships will considerably add to the variance across all relationships in the population. The above reformulated equation for M e clearly shows that the subtraction of the pedigree from the genomic relationships will considerably reduce the variance of the denominator, and thus increase the estimated M e .
In numerically small breeds, pedigrees may be incomplete or not available, which could result in underestimation of M e and therefore overestimation of genomic prediction accuracy. In such cases where the aim is to predict the accuracy of within breed genomic prediction, it would be advisable to derive the pedigree from genotypic information, and use this to build the pedigree relationship matrix. Although such approach may result in incomplete pedigree if not all relationships are reconstructed. With incomplete pedigree, some pedigree relationships will incorrectly be considered zero, and therefore not appropriately corrected in G-A, leading to increase in var(G-A) and decrease in M e . The majority of small breeds, however, may require a multi-breed reference population, which requires also the M e values between breeds. Those are, however, not influenced by pedigree information unless recent introgression occurred, and in general can be safely computed while ignoring pedigree information.
We tested the effect of five different sample sizes on the estimates of within M e . When the number of genotyped individuals was more than 50, the estimates varied only slightly across 50 replicates, and average M e corresponded to that from the whole population, both for scenarios with and without pedigree. However, when the sample size was 10, average M e was substantially overestimated when pedigree was not used. A possible explanation is that with 10 animals, the relative contribution of high pedigree relationships to the term Var (G − A ) is greater than when a larger number of animals is selected, which inflates the M e but gets corrected with the pedigree. Nevertheless, even when using the pedigree relationships, there was a large standard deviation of the M e across iterations, suggesting that a single estimate based on 10 animals could still deviate considerably from the true value.
The within M e value can be computed using different formulas. In our study, the within M e was based on the variance of genomic relationships, and in some scenarios, the additive genetic relationships were used as well (Equation 1). This approach has two important benefits. Firstly, it can be extended to two breeds, allowing for computation of between M e , necessary for across-breed prediction, which is not possible with other formulas. Other frequently used approaches rely on effective population size (N e ) and size of the genome (L), for example M e = 2N e L ln (4N e L) (Goddard, 2009)  10_50 Between M e Sample combinations 720 k 45 k 11 k _ value decomposition of the genomic relationship matrix (Misztal, 2016;Pocrnic et al., 2016). The estimates from different formulas can vary considerably, consequently affecting predicted accuracy of genomic selection (Brard & Ricard, 2015). In addition, equations based on N e introduce another variation, as N e can be estimated in several different ways (Leroy et al., 2013;Wang et al., 2016). Secondly, computing M e based on the variance of relationships enables to consider specific characteristics of a population, such as population structure, as disclosed by observed genotypes of the population. In a recent study by van den Berg et al. (2019), authors have found that prediction accuracy using within M e from genomic relationship matrix resulted in overestimation of the accuracy. It should be noted, however, that M e is not the only parameter affecting the accuracy of GP (Goddard, 2009;. Nevertheless, in the study by van den Berg et al. (2019), the true within M e may have been underestimated due to close relationships among some animals in the reference population, which could also be expected in numerically small breeds. However, using breed-specific allele frequencies, as done in our study, reduced overestimation for between M e .

| Between population M e
At generation 100, between population M e had a value of 16,036 (529) when all individuals and 720k SNPs were used. This value is ~63 times larger than M e within population computed without pedigree, and ~21 times larger than within M e estimated with pedigree. Larger between population M e compared to within M e is expected, since LD structure, upon which M e is dependent, is at least partly different between the two populations, as generally observed between different breeds (De Roos et al., 2008;Wientjes, Calus, Goddard, & Hayes 2015;. Indeed, between M e in a study on Groningen White Headed, Holstein Friesian, and Meuse-Rhine-Yssel (MRY) breed, was 10× higher than within M e , and ranged between 18,000 and 24,000 . The between M e value in our study increased by ~9,000 from generation 10-100, indicating that closely related breeds, that is those that have split recently, are expected to have smaller between M e . Our recent study showed that M e between MRY and Deep Red breed, which was derived from MRY, was ~3,600 but ~17,000 between these two breeds and distantly related Groningen White Headed (Marjanovic et al., 2018). SNP densities used to compute between population M e substantially affected its value, with higher number of SNPs giving higher between M e value. This finding is related to the number of independent segments, which is much larger between breeds, than within the breed; hence, many more markers are needed to capture all independent segments.

| Implications
One of the challenges of numerically small breeds is that in terms of performance, they may be lagging behind compared to mainstream breeds. In that respect, their survival can significantly be aided by using genomic selection to speed up genetic gain in those breeds, as an alternative to increasing revenues for instance by focusing on specific niche markets. Whether or not implementation of genomic selection for small breeds is cost-effective, depends not only on the achieved additional genetic improvement, but also on the costs of the implementation. It has been suggested that genotype costs can be shared across multiple applications, including use in conservation programs to manage genetic diversity and control inbreeding (Fernández et al., 2016), and parentage and pedigree verification (Berry et al., 2016). Also, based on continuously dropping costs of genotyping, it has been envisaged that entire cattle populations, or at least large proportions thereof, may be routinely genotyped in the near future (Boichard et al., 2015). Aiming to overcome the limited additional genetic improvement due to the reference population size being restricted by limitations to investments or numbers of available animals within a small breed, in recent years a lot of research has been dedicated to the use of a multi-breed reference population as an attractive approach to increase the accuracy of genomic prediction for numerically small populations (Hayes, Bowman, Chamberlain, Verbyla, & Goddard 2009;Hozé et al., 2014;Lund et al., 2016). In general, reliabilities of across-breed predictions tend to be lower than within-breed genomic prediction, due to differences in LD structure, allele frequencies and independent chromosome segments between the breeds (De Roos et al., 2009;Wientjes, Calus, Goddard, & Hayes 2015). Close family relationships between the breeds are often missing, which further affects the reliabilities. High SNP density gives more accurate representation of consistency of LD phase across populations, which at short distances are expected to be conserved across populations (De Roos et al., 2008), possibly resulting in an increased accuracy. Our study showed that accurate computation of between M e does require a SNP density higher than the common 50k. Genotyping individuals with high density SNP chips is more expensive compared to commonly used 50k SNP chip. Alternatively, if possible, individuals could be genotyped with lower SNP density and imputed to higher density, albeit the impact of using imputed genotypes on the estimated M e is currently unknown. Nevertheless, high density genotyping will likely become more affordable in the coming years. Based on our results, no more than 50 individuals are required to be genotyped per population, to enable assessing the potential benefit of genomic selection for this population, which should help keeping the costs down.
In conclusion, our results showed that for accurate estimation of within and between population M e , 50 or more animals should be genotyped per population. Pedigree information was not relevant for between M e in our simulation, which is expected to be also true for real populations, unless recent introgression occurred. Estimates of within M e were highly affected by whether pedigree was used or not. For numerically small breeds, pedigree may often be absent, in which case a pedigree relationship matrix could be built using a pedigree derived from genotypic information. For within M e , even the smallest SNP densities resulted in accurate representation of family relationships in the population; however, for between M e , many more markers are needed to capture all independent segments. Presented findings can be used as guidelines for studies investigating possibilities for genomic predication in numerically small populations.