Examination of the efficacy of small genetic panels in genomic conservation of companion animal populations

Abstract In many ways, dogs are an ideal model for the study of genetic erosion and population recovery, problems of major concern in the field of conservation genetics. Genetic diversity in many dog breeds has been declining systematically since the beginning of the 1800s, when modern breeding practices came into fashion. As such, inbreeding in domestic dog breeds is substantial and widespread and has led to an increase in recessive deleterious mutations of high effect as well as general inbreeding depression. Pedigrees can in theory be used to guide breeding decisions, though are often incomplete and do not reflect the full history of inbreeding. Small microsatellite panels are also used in some cases to choose mating pairs to produce litters with low levels of inbreeding. However, the long‐term impact of such practices has not been thoroughly evaluated. Here, we use forward simulation on a model of the dog genome to examine the impact of using limited marker panels to guide pairwise mating decisions on genome‐wide population‐level genetic diversity. Our results suggest that in unmanaged populations, where breeding decisions are made at the pairwise—rather than population‐level, such panels can lead to accelerated loss of genetic diversity at genome regions unlinked to panel markers, compared to random mating. These results demonstrate the importance of genome‐wide genetic panels for managing and conserving genetic diversity in dogs and other companion animals.

. In lieu of deep, high-quality pedigrees, molecular data are used. While there has been a push in recent years to move toward monitoring and managing populations of conservation concern with genomic data in the form of whole-genome genotyping or whole-genome sequencing (Flanagan, Forester, Latch, Aitken, & Hoban, 2017;Ivy, Putnam, Navarro, Gurr, & Ryder, 2016;Leroy et al., 2017;Shafer et al., 2015), many populations are still monitored with small sets of neutral markers, typically tens to hundreds of microsatellites (Abdul-Muneer, 2014;Attard et al., 2016;Kaczmarczyk, 2016;Kirk & Freeland, 2011;Song et al., 2018;Pedersen, Pooch, & Liu, 2016;Toro, Fernández, & Caballero, 2009). While many researchers have cautioned against using small marker panels alone to guide captive breeding (see for example Toro, Silió, Rodrigáñez, & Rodriguez, 1998;Wang & Hill, 2000)) at least some previous research has suggested that small microsatellite panels can be used to maintain genetic diversity in captive breeding programs (Kaczmarczyk, 2016). Importantly though, as Nicholas, Mellersh, and Lewis (2018) recently pointed out, such small microsatellite panels do not effectively survey genome-wide genetic diversity. Rather, they survey genetic diversity at and near (depending on the extent of linkage disequilibrium) the assayed microsatellites.
Direct comparison of microsatellite panel and genome-wide (e.g. single nucleotide polymorphisms [SNPs]) panel estimates of genetic diversity is rare, but the argument of Nicholas and colleagues is supported by a recent direct comparison in Arabidopsis halleri, which found that microsatellite-based estimates of genetic diversity and population differentiation differ substantially from unbiased estimates from SNPs (Fischer et al., 2017).
In many ways, dogs are an ideal model for the study of genetic erosion and population recovery. Genetic diversity in many common domestic dog breeds has been declining systematically since the beginning of the 1800s, when modern breeding practices came into fashion (Jansson & Laikre, 2018). As such, inbreeding in domestic dog breeds is substantial and widespread (Freedman et al., 2014;Kettunen, Daverdin, Helfjord, & Berg, 2017;Pedersen et al., 2016;Sams & Boyko, 2018) and has led to an increase in recessive deleterious mutations of high effect (Jagannathan et al., 2019;Marsden et al., 2016) as well as general inbreeding depression (Chu et al., 2019).
Dog breeders and breed clubs are increasingly aware of the serious consequences of diversity loss, and with robust panels of both microsatellite markers and genome-wide SNP arrays widely available commercially, there is great potential for breeders to use genetic testing in ways that ultimately improve (or worsen) genetic diversity. However, a key challenge, at least in the United States, is a lack of population-level management. Rather, individual dog breeders or groups of breeders typically manage small subsets of a breed, often relying on pedigrees or commercially available molecular tests to minimize known genetic health risks and sometimes overall inbreeding in individual litters of dogs. For breeds still managed in such a way, it is critical to long-term breed health and survival to understand the long-term impacts on genome-wide genetic diversity of chosen mating strategies and the molecular tools used to guide those strategies.
As a first step in understanding the impact of such population management, we conducted individual-based forward-time population genetic simulations of linked genetic diversity on a model of the dog genome using SLiM 3 (Haller & Messer, 2019). We apply a range of human-directed mate choice models to ask how well different mate choice schemes applied to a restricted panel of 33 "microsatellite" locations in our model dog genome (referred to throughout as MS33) affect genetic diversity genome-wide. More specifically, we evaluate several combinations of metrics calculated on this set of 33 multi-allelic markers to guide diversity-based mate choice including heterozygosity, internal relatedness (IR; Amos et al., 2001), and average genetic relatedness (AGR; Wang, 2002). Importantly, we do not model the generally recommended strategy of selecting parents to minimize population-level kinship. This optimal strategy is ideal for populations that are small and managed by humans. Dog breeds, however, at least in the United States, are not managed as a whole by any single entity. Rather, mating decisions are most often made by individual breeders and dog owners. Successful strategies for preserving genetic diversity in individual dog breeds will need to take this into account. Therefore, as a first step, we simulate individual mating decisions aimed at optimizing genetic measurements for single offspring. Using these methods, we ask how mate choice impacts genome-wide genetic variation as reflected by heterozygosity and allelic richness (average number of alleles per locus) compared to random mate choice, as well as mate choice guided by two-generation pedigree awareness, genome-wide heterozygosity, and relatedness calculated from genome-wide identity by descent (see Section 2).

| Genome model
We implemented all simulations in SLiM (v3.3) (Haller & Messer, 2019), and all simulations were run in parallel using Amazon Web Services EC2, SQS, and auto-scaling services. The genome model in our simulations is a rough approximation of the canine genome. For computational efficiency, we model genetic variation as nonrecombining 0.5 megabase (Mb) multi-allelic haplotype blocks. These haplotype blocks are represented by mutation type 1 (m1) in the simulation template (Appendix S1). Chromosomes are created by dividing the genome into 38 sets of 120 haplotype blocks, approximating a genome size of 2.28 gigabases. The average recombination rate in dogs is approximately 1 cM/Mb (probability of crossover per base pair of 10 -8 ) and is likely more uniform than in humans (Auton et al., 2013;Axelsson, Webster, Ratnakumar, Ponting, & Lindblad-Toh, 2012). Therefore, we model the recombination rate between chromosomes as 0.5 and within chromosomes as 0.005 (500,000 bp × 10 -8 ). Additionally, we modeled 33 microsatellite loci (m2 in the simulation template) spaced across the first 25 chromosomes as such: 3 on chromosome 1, 2 each on chrs 2 -7, 1 each on chrs 8 -25, and with no microsatellites on chrs 26-38. This distribution of markers across chromosomes is similar to the 33 STR panel used by Pedersen et al. (2016).
Given a number of unique haplotype blocks and microsatellite alleles at the start of the simulation burn-in, we evenly distributed those alleles across individuals in the founding population (see Figure B1, Appendix S2 for a graphical example of this model).

| Demographic model
We created a relatively simple demographic model in which a single ancestral population evolves for a burn-in and drift period (200 generations) to allow founding genetic diversity to recombine sufficiently and experience sufficient genetic drift. This is followed by a short immediate bottleneck of five generations. Finally, the population expands and goes through 40 mate choice generations.
Population genetic data are collected at the beginning of the first generation of mate choice, once every five generations, and again at the very end of the simulations (see Figure B2, Appendix S2 for a graphical example of this model).

| Mate choice models
Mate choice models that we implemented in this study primarily differ in how a second parent is selected. First parents are selected randomly from the entire population as described above.

| Random
Second parent is randomly selected from the sampled mating pool.

| Pedigree
Second parent is the individual with the lowest relatedness as calculated from three-generation pedigrees, randomly sampled in ties.
In other words, using the keepPedigrees = True option in SLiM 3, pedigree relatedness between individuals in the current generation can be calculated from pedigrees. This option maintains pedigrees including all current individual's parents and grandparents.

| Heterozygosity models [Microsatellite (MS33-HET) and Genome-wide (GW-HET)]
For these models, we calculate the expected heterozygosity for offspring between the first parent and all individuals in the mating pool as the average pairwise observed homozygosity across all four pairwise combinations of parental genomes (assuming no recombination). We choose as the best mate the individual that would produce offspring with the highest heterozygosity and select randomly among ties.

| Internal relatedness (MS33-IR)
For this model, as above, we calculate mean internal relatedness (Amos et al., 2001) among four possible gametic pairs at each microsatellite locus and then average across all loci and choose the individual that produces the lowest IR value, selecting randomly from ties.

| Average genetic relatedness (MS33-AGR)
Here, we apply a method designed to calculate relatedness between individuals based on small panels of SNPs or microsatellites (Wang, 2002). We based the code in our SLiM template on the implementation of this calculation found in the R package Demerelate (https://github.com/cran/Demer elate) and ran several tests to ensure that our SLiM implementation and the R version produced identical results. We chose as the second parent the individual that produced the lowest AGR value, selecting randomly from ties.

| Whole-genome relatedness (GW-REL)
We calculate whole-genome relatedness as in (Hedrick & Lacy, 2015) and as above, calculated relatedness across all four possible gametic pairs between two individuals. We chose as the second parent the individual least related to the first randomly selected parent and chose among ties randomly.

| Layered mate choice (MS33_IR_AGR)
We additionally investigated a single microsatellite mate choice model combining the IR and AGR statistics. First, a fraction of individuals in the mating pool are chosen based on the IR statistic, and then, an individual is chosen from that sample based on AGR (see Section 2.3 above).

| Observed heterozygosity
We calculated per individual as the fraction of all genotypes in an individual that are heterozygous.

| Allelic richness
We calculated allelic richness as the total number of unique alleles at each position. In some outputs (results not presented here but data available), we also calculated richness for alleles >=0.05 frequency (raw data available but not presented here).

| Coefficient of inbreeding
We calculated the coefficient of inbreeding (COI) for individual dogs as the fraction of all mutations of type "m1" that are identical within an individual divided by the total genomic length. Because each mutation of this type represents 0.5 Mb, this method is identical to that presented in (Sams & Boyko, 2018). F I G U R E 1 Small marker panel mate selection performs worse and genome-wide marker panel mate selection improves over time relative to random mating. (a) Percent loss of Heterozygosity versus percent loss in allelic richness, (b) percent increase in coefficient of inbreeding versus percent loss in allelic richness. Both panels illustrate data from parameter set 1 and represent change relative to the random mate selection model. Dots represent the mean difference between each mate choice model and random mating using a randomization method. Each model includes eight time points, one for every five generations of mate selection, with the final generation indicated with a black box. Height and width of boxes represent 95% confidence intervals for richness and heterozygosity, respectively (see Section 2). Gray box represents the quadrant in which both loss of heterozygosity and richness is slowed relative to random mating. GW models (including Pedigree) improve over the course of simulations, while MS models lose more diversity over time than random mating  With a few exceptions, all other MS33-based models performed worse than random mating using both heterozygosity, coefficient of inbreeding, and allelic richness as a metric of diversity loss. In these cases, diversity loss is as great or in some cases substantially greater than random mate choice and recent pedigree-based mate choice.

| Genetic variation in simulated populations compared to present-day dogs
Importantly, even in cases where MS models preserve heterozygosity and limit increases in coefficient of inbreeding more than random mating, the trajectory of diversity loss over time in these simulations suggests that given enough time these models would also perform worse overall than random mating (Figure 1; Appendix S2, Figure S1).
Among the MS33-based models, MS33-IR mate choice lost the least amount of genetic diversity, and MS33-AGR mate choice lost the greatest amount of genetic diversity. In all cases, the accelerated loss of genetic diversity compared to random mating is due to preserving diversity at a small number of loci at the expense of the remainder of the genome. In other words, by avoiding inbreeding with individuals more closely related at a small number of loci scattered throughout the genome, the effective population size at unlinked loci, which are evolving under drift, is further reduced.
As direct evidence of this reduction of effective size at unlinked loci, we observe that genetic diversity at MS loci is preserved well, but is lost more than random mating away from these MS positions ( Figure 2a). This pattern is consistent regardless of the MS-based mate selection model we examine and is not observed in genome-wide models described below (Figure 2b). Although the rate of decay of diversity preservation near MS loci should correlate with local variation in recombination rate, the consistent difference between MS and GW models suggests that on average, using sparse microsatellites to manage genetic diversity will be ineffective.

| Number of repeated matings can moderate "popular parent" effects
We examined whether adjusting the total number of individuals that any single individual can access as potential mates within the population, termed here "mating pool size," as well as the maximum number of times any single individual can contribute to the next generation,

| Genome-wide metrics improve diversity preservation
In addition to the MS-based mate selection models, we also included several models meant to capture the viability of using genome-wide metrics in general to preserve genetic diversity. These include tracking two-generation pedigrees (pedigree)-to avoid matings between very close relatives, genome-wide heterozygosity (GW-HET)-to select mates which maximize heterozygosity in the offspring, and genome-wide relatedness (GW-REL)-which prefers the most distantly related individuals as mates. We find that these three models all lead to greater preservation of genetic diversity than random mating. Perhaps more importantly, these models reduce the rate of genetic drift over time (random model), compared to the MS models which accelerate the rate of genetic drift (Figure 1).

| D ISCUSS I ON
In this study, we used forward population genetic simulations of a model of the canine genome to investigate the efficacy of using a small genetic marker panel (e.g. a microsatellite panel) to guide mating aimed at preserving existing genetic variation in a population, when mating decisions are made at the pairwise, rather than population, level. We ran these simulations across a range of mate choice models and demographic parameterizations using a genomic model that included both genome-wide genetic markers and a set of 33 Most previous work on conservation management with molecular data has focused on cases where a population can be managed by selecting the entire configuration of parents for the next generation with marker-assisted selection (MAS). For example, Fernández, Toro, and Caballero (2004) demonstrated that in a single population, where parental contributions are chosen to maximize either heterozygosity or allelic richness at a set of multi-allelic markers, management programs are optimal at maintaining each of those statistics, but that heterozygosity can be better at maintaining allelic richness than vice versa. However, to our knowledge, no prior work has addressed whether similar strategies that optimize at the level of individual mating pairs, rather than the entire population of parents, can similarly act to preserve diversity. Consistent with this prior work, our results suggest that selecting optimal mates for individuals from an entire population using heterozygosity and other kinship metrics can act to preserve genetic diversity at markers used to calculate the test statistic (for example see Figure 2).
Importantly, however, we also found that given enough generations using small panels of markers in such a mating scheme does not preserve diversity genome-wide. In fact, mate choice models using the MS33 marker set over time led to greater loss of genetic variation compared to random mating in the form of reduced heterozygosity and allelic richness measured using the GW marker set.

López-Cortegano et al. (2019) simulated management of subdivided
populations and found that using a restricted number of markers was less effective than whole-genome data but still more effective than random mating. However, the density of markers in their simulations is greater than typical microsatellite panels and they acknowledge that less dense panels would likely be less effective.
Nonetheless, our results are partially consistent with this result, in that the MS33-IR model does preserve genome-wide genetic diversity better than random mating (but not allelic richness) during the course of our simulations.
Our results suggest that reducing the number of times that any given individual can contribute offspring to the next generation, either explicitly in the form of the "maximum number of matings" parameter or implicitly by reducing the "mating pool size" parameter, can act to moderate the severity of diversity loss compared to the random mating model. This finding is generally consistent with theory and prior simulation work which has demonstrated that optimal management schemes to preserve genetic diversity include limiting variance in family size, in other words, ensuring that no single individual contributes disproportionately to the next generation (Toro et al., 1998).
The better performance of the MS33-IR model compared to other microsatellite-based statistics may be explained by this statistic being less susceptible to such "popular parent" effects. As formulated, when considering between two potential mates, internal relatedness may favor a mate that is more genetically similar if the alleles carried by that individual are on average more common. We suspect that within each generation, this leads to fewer instances of the same outlier individual being repeatedly chosen as the optimal mate for other individuals. Comparing between baseline parameter sets and parameter sets where the maximum number of matings is most restricted (e.g. PS1 vs PS13), we see that this reduced allowance of multiple matings has a much weaker impact in MS33-IR mate selection compared to other MS-based mate selection schemes. Toro, Silió, Rodrigañez, Rodriguez, and Fernández (1999) demonstrated that irrespective of variance in family size, MAS should lead to better preservation of diversity than using no genetic information at all. In contrast, our results suggest that in an unsupervised pairwise parental selection scheme, limited marker panels lead to substantially more diversity loss than using no genetic information at all. A general consensus in MAS of parental populations is that pedigrees should be the primary source of kinship calculations and that small microsatellite panels are generally only useful to supplement pedigrees (Toro et al., 2009). Our results from pairwise parental selection are consistent with this, as we have shown that using only shallow pedigrees to minimize loss of genetic diversity is preferable to using a small panel of genetic markers alone.
Genetic drift comes from two primary sources in a diploid population: variation in genetic contribution between individuals in a population and variation in genetic diversity at a given locus within an individual (Wang & Hill, 2000). Here, we have shown (Figure 2) that the added loss of genetic variation in our simulations relative to random mating is due to accelerated loss of diversity throughout the majority of the genome that is untagged by MS33 markers. While we did not specifically explore the causes of this difference, we suspect that even in our simulations which reduce the contributions of any given individual to the next generation, that groups of individuals which happen to be most distantly related to all other individuals in a given generation across the MS33 marker set are disproportionately chosen as mates for the next generation. This variance in contribution of families (as measured by the MS33 set) across generations will act to consistently reduce effective size at markers unlinked to the MS33 marker set.
Finally, for comparison we simulated several GW mate choice models and found, by and large, that using genome-wide genetic data to monitor genetic diversity and make mate selection decisions is far superior to small marker panels and, typically, random mating. This result has important implications for the preservation of domestic dog breeds. Most academic effort in the field of genetic diversity management over the past few decades has primarily been focused on optimal management for small populations of conservation concern where mating in the entire population can be controlled (Ballou & Lacy, 1995;Fernández, Toro, & Caballero, 2001, 2004Kettunen et al., 2017;López-Cortegano et al., 2019;Sonesson & Meuwissen, 2001). Similarly, in livestock, conservation of genomic diversity in combination with genomic selection can occur at the level of entire herds or regional populations, although this approach also suffers from geographic partitioning and localization of conservation efforts (Bosse et al., 2015;Bruford et al., 2015;Herrero-Medrano et al., 2014;Ramljak et al., 2018;Zhao et al., 2019). In contrast, dog breeds, as well as breeds in other companion species such as cats and horses, have populations which are typically maintained by networks of individual breeders. Therefore, it is very important to understand the long-term impact of different types of breeding practices in these systems.
Here, we have shown that using small panels of molecular markers is no substitute for quality pedigree information or more importantly whole-genome characterization of genetic diversity using dense genetic markers or whole-genome sequence data. For context, assuming each microsatellite in our study can act to maintain diversity at up to 4 Mb each (basing this number on the distance over which diversity preservation decays in Figure 2), only ~5% of the genome (33 × 4 Mb/2.28 Gb) is covered by our simulated microsatellite panel. Our results suggest that optimal management of unsupervised companion animal populations should (a) include strictly limiting individual and family contributions to the next generation and (b) the selection of mating pairs to minimize inbreeding in offspring using deep pedigree information or, more optimally, using dense genotype data to maximize heterozygosity/minimize inbreeding in offspring, as pedigrees are often incomplete and do not incorporate variance in inheritance of IBD segments among related individuals (Cassell et al., 2003;Hill & Weir, 2011;Keller, Visscher, & Goddard, 2011).
We note that we did not directly compare our results to whole-population management schemes, and as such, management strategies are not currently feasible for most companion animals. We suspect that such schemes will be generally superior to the unsupervised mating methods examined here, as they are better able to optimize contributions from individuals and choose the optimal (or near optimal) configuration of pairwise matings to preserve existing genetic diversity.
Most companion animal species remain relatively unmanaged with respect to genetic diversity at the breed level. As such, genetic diversity has rapidly decayed in many breeds over the past century (Jansson & Laikre, 2018). While we have not focused on optimizing the use of whole-genome molecular data to preserve genetic diversity in this study, future species-specific analyses should aim to develop specific recommendations to individual breeders. For example, more realistic (non-Wright-Fisher) models would better reflect the breeding practices used in companion animal breeding. Further, in our whole-genome mate choice methods we have focused primarily on maximizing heterozygosity, but preservation of allelic diversity is also an important metric to optimize, as the number of unique alleles creates a limit on the maximum heterozygosity attainable (Fernández et al., 2004). Finally, we have not considered here the needs of diversity management schemes to also consider balancing the goals of preserving genetic diversity with simultaneously eliminating deleterious variation from a population. In particular, due to the lack of past management, many companion animal breeds carry high effect deleterious mutations, and care must be taken to purge such variation without reducing linked neutral variation (Fernández et al., 2004;Hedrick & Garcia-Dorado, 2016).
Eventually, companion animal breeding may benefit from largescale participation in databases and services aimed at tracking breed-wide whole-genome genetic diversity, including awareness of adaptive and deleterious variation, to limit variance in family contributions, maximize the inclusion of genetic variation in subsequent generations, and purge deleterious variation over time.
Experimentation and optimization of such a system applied to breeds in a large and diverse species such as domestic dogs would provide critical case studies to the conservation genetics community (Shafer et al., 2015), help breeders and breed organizations understand the limits of truly closed breeding, and better conserve some of the world's most precious animal resources.

ACK N OWLED G EM ENTS
We thank the customers of Embark, who's participation and curiosity made this work possible. We also thank our colleagues at Embark ance and encouragement. This study was funded by Embark.