Genomics-informed captive breeding can reduce inbreeding depression and the genetic load in zoo populations

Abstract


Introduction
More than 28% of the 150,388 species on the Red List of the International Union for Conservation of Nature (IUCN) are threatened with extinction (IUCN, 2022).A relatively small subset of these species are kept as "insurance populations" in zoos (Gilbert et al., 2017).However, given their often-small effective population size, the long-term viability of captive-bred populations is not guaranteed, and many show signs of inbreeding depression (Boakes et al., 2007).Deleterious mutations create harmful genetic variants in the genome, collectively known as genetic load (Bertorelle et al., 2022).High genetic load can compromise population viability and recovery potential of species, especially if they experienced a recent population size decline (Jackson et al., 2022;Sachdeva et al., 2022).In declining populations, the impact of genetic load on fitness is not immediately apparent.It can take many generations before the harmful effects of mutations become expressed in homozygous loci (Pinto et al., 2023).
Consequently, the long-term viability of many zoo populations could be at risk, despite individuals and populations thriving now.
In the past 50 years, conservation geneticists have focused on maintaining genetic variation (DeWoody et al., 2021;García-Dorado & Caballero, 2021;Kardos et al., 2021) as genome-wide diversity generally correlates positively with fitness and adaptive potential (Willi, van Buskirk and Hoffmann, 2006;Charlesworth, 2009;Harrisson et al., 2014, but see Wood, Yates and Fraser, 2016).Recently, the Group on Earth Observations Biodiversity Observation Network (GEO BON) developed Essential Biodiversity Variables (EBVs) to assess spatiotemporal variation in biodiversity, and proposed four genetic EBVs: genetic diversity, genetic differentiation, inbreeding, and effective population size (Ne) (Hoban et al., 2022).Notably, risks posed by genetic load are generally not considered a conservation priority (van Oosterhout, 2020).This may be an oversight.However, recent advances in genomics and bioinformatics could change that.
Leveraging the extensive genomic research on human and model animals enables us to estimate the potential fitness impact of mutations in species of conservation concern (Bertorelle et al., 2022).The fitness impact of deleterious alleles can be estimated by the Combined Annotation-Dependent Depletion (CADD) framework (Rentzsch et al., 2019).Initially developed in humans (Kircher et al., 2014), CADD has been successfully applied to other model organisms, including mouse (Groß et al., 2018), pig (Groß, Derks, et al., 2020), and chicken (Groß, Bortoluzzi, et al., 2020).CADD ranks genetic variants such as single nucleotide polymorphisms (SNPs) and insertions and deletions (indels) throughout the genome.This analysis integrates surrounding sequence context, gene model annotation, evolutionary constraints (e.g., GERP scores), epigenetic measurements, and functional predictions into CADD scores.

CADD was employed to investigate conserved elements into the chicken Combined
Annotation-Dependent Depletion (chCADD) (Groß, Bortoluzzi, et al., 2020), and has helped identify regions within the chicken genome associated with known genetic disorders reported in the Online Mendelian Inheritance in Animals (OMIA).Therefore, by identifying deleterious alleles, CADD can estimate the genetic load within an individual's genome.
Presently, we cannot translate the impact scores of mutations such as CADD into fitness effects.Nevertheless, we can calculate CADD scores for all deleterious mutations present in an individual's genome and compare this proxy of the genetic load between individuals.Similarly, we can estimate the proportion of genetic load expressed as realised load, and the proportion whose fitness effects remains masked as an inbreeding load or masked load (Bertorelle et al., 2022).The realised load comprises the genetic load that reduces fitness when the harmful effect of the mutations come to light.Inbreeding increases the realised load because more deleterious mutations become fully expressed as homozygous.By minimising realised load, conservation managers can reduce inbreeding depression.This could be particularly useful in captive-bred populations where breeding pairs can be manipulated to improve the fitness of offspring.
Considerable amount of genetic variation codes for polygenic or quantitative traits.
Mutations that affect the value of a quantitative trait (e.g., body size) can be harmful of beneficial depending on whether it brings the trait value closer to the optimum.In contrast, unconditionally deleterious mutations are harmful irrespective of genetic background or environmental conditions.Mutations in ultraconserved elements (UCEs) are likely to be unconditionally deleterious (Silla et al., 2014), thereby contributing substantially to the genetic load.UCEs are areas of the genome phylogenetically conserved across diverged taxa (Bejerano et al., 2004).Their high level of sequence conservation is thought to be maintained by strong purifying selection (Lee & Venkatesh, 2013).Some polymorphisms in UCEs are associated with genetic diseases or phenotypic traits (Habic et al., 2019), with UCEs being linked to enhancers in early development in both mammals (Visel et al., 2008) and flies (Warnefors et al., 2016).Given their high level of phylogenetic conservation, comparative genomic approaches can be used to obtain a proxy of the genetic load, building on the knowledge of model organisms and humans.Studying UCEs in reference genomes allows for between-species comparisons of the proxies of genetic load, realised load and masked load.Additionally, analysis of genetic load at UCEs shows promise for captive breeding and conservation management of zoo populations.
Here, we conduct a proof-of-concept study to demonstrate the utility of genomicsinformed breeding in the conservation management of captive populations.We quantify the genetic load of six pink pigeon individuals using chCADD scores assigned to single nucleotide variants in the UCEs derived from the chicken genome.We show that genetic load components can be estimated using CADD scores calculated on a phylogenetic closely related species and cross-mapped to the annotation of the pink pigeon, our focal species.We also calculate realised load and genetic load of potential future offspring of all possible crosses.Finally, we employ computer simulations to demonstrate the potential of genomics-informed conservation, showing how it can help to reduce inbreeding depression and maximise the long-term viability of zoo populations.

Study species
Six pink pigeon (Nesoenas mayeri) individuals from the captive-bred population of Jersey Zoo (n = 4) and Bristol Zoo (n = 2) were genome sequenced.Birds shared common ancestry within the last 3-6 generations (Supplementary Figure S1) and have a high level of relatedness (F=0.064 to 0.346) (Supplementary Table 2), which is typical of many zoo populations (Boakes et al., 2007).See Supplementary Information for further details.
Genome sequencing and bioinformatics DNA was extracted from blood, using Qiagen MagAttract, linked read library preparation was 10x Genomics Chromium technology, which were then sequenced on an Illumina HiSeq X with 2x150bp reads (Ryan, 2021).The sequencing read data was mapped to a previously generated pink pigeon reference genome (Albeshr, 2016).The variant calls were used to create a per-SNP pink pigeon CADD (ppCADD) score calculated for the UCEs of each individual's genome (Figure 1).A Snakemake pipeline (Mölder et al., 2021) allowing for reproduction of this approach can be found on GitHub (https://github.com/saspeak/LoadLift).  (Mölder et al., 2021) pipeline uses as input the sequencing reads of the subject individuals, the subject species reference genome, and the CADD scores and reference genome of a model species (i.e., chicken, chCADD scores (Groß, Bortoluzzi, et al., 2020) and the Galgal6 reference genome (Warren et al., 2017)).The pipeline is separated into six sections, corresponding to sections of the pipeline (https://github.com/saspeak/LoadLift).Previously published tetrapod ultraconserved element (UCE) probes based on the chicken reference genome (Warren et al., 2017) and the Tibetan ground-jay (Pseudopodoces humilis) (Faircloth et al., 2012) were used to harvest UCEs from the pink pigeon reference genome, using the Phyluce workflow (Faircloth, 2016).A chain file was created for annotation lift-over and the CADD scores of the chicken genome (Groß, Bortoluzzi, et al., 2020) were cross mapped to the reference pigeon genome using CrossMap.py(Zhao et al., 2014).CADD scores were filtered to remove nonscoring and fixed sites.Genotypes of each locus were assessed to calculate the genetic load components.Individual's genetic load, realized load and masked load were calculated using the following formulas (Bertorelle et al., 2022): Here, si (and sj) is the ppCADD score at locus i (and j), and they are summed across all homozygous (or heterozygous) loci at the UCEs of individual k.In the computer simulations (see below), s and h stand for the selection and dominance coefficients, and the fitness impact of the load can be expressed in lethal equivalents (Bertorelle et al., 2022).For simplicity, the dominance coefficient (hj) is assumed to be hj=0.1.Noted that part of the realised load comprises heterozygous mutations that are assumed to be partially dominant.Inbreeding coefficients (FRoH) of the six pink pigeons were calculated using runs of homozygosity (RoH) with bcftools roh (Narasimhan et al., 2016).For further details, see Supplementary Information.

Computer simulations of breeding regimes
We conducted computer simulations in SLiM3 (Haller & Messer, 2019) to examine the impact of four breeding regimes on genetic and realised load, neutral genetic diversity, and fitness.In the "Minimise load" regime we examined whether mate pair selection can reduce the realised load of the offspring and alleviate inbreeding depression.
However, purifying selection against the genetic load can reduce genetic diversity (Cvijović et al., 2018) and result in the fixation of mildly deleterious mutations (Chen et al., 2020).To address this concern, we explored the impact reducing relatedness (or kinship) of parents, and this was simulated in the "Minimise relatedness" regime.
Additionally, we simulated a regime that aimed to minimise realised load of the offspring whilst maintaining genetic diversity, "Minimise load and relatedness" regime.
Here, exactly one male and one female from each family were selected to mate with an optimal partner from another family, to minimise realised load of their offspring.
Finally, we simulated random mating "Random mating" regime.In each regime we randomly sampled 20 monogamous pairs of males and females and allowed each pair to produce a brood of 64 offspring per generation.We ran 100 replicates for each regime for 50 generations.Further detail about the breeding regimes and SliM model are given in Supplementary Information.

Distribution of UCEs and CADD scores
The 4976 UCEs along the 34 chromosomes of the chicken reference genome are not evenly distributed (Fig. 2A), 15 chromosomes were significantly depleted for UCEs, whilst 9 chromosomes were significantly enriched for UCEs (Supplementary Table 1).

Genetic load components and kinship load
We analysed the genetic load in the hypothetical offspring of our six pink pigeons.This kinship load is calculated by theoretically crossing all possible combinations of individuals assuming mendelian segregation ratios.As kinship between two individuals increases, homozygosity of their offspring increases (Figure 3).Similarly, increased kinship between parents elevates offspring's' realised load and reduces masked load (Figure 3).Optimal mate pairing can significantly reduce the realised load of the offspring (R 2 =0.258,F1,13 = 8.32, p=0.00918).Next, we performed an analysis to identify optimal crosses to minimise genetic load (Figure 4). Figure 4A shows average genetic load of potential offspring.In essence, these are the deleterious mutations that offspring are predicted to inherit from both  The genetic load (grey), realised load (blue) and masked load (orange) of the hypothetical offspring of all possible crosses (including "selfing").(E) The distribution of total realised load in the offspring generation calculated by crossing all individuals at random.In this procedure, each individual was crossed twice without self-mating or repeating the same crosses, and this was repeated 10,000 times.The optimal crossing combination is shown in blue.
To predict degree of inbreeding depression, the realised load of the offspring of different crosses was calculated.Blue tiles in the correlogram in Figure 4B show the realised load of the offspring of the optimal crosses.The realised load of these offspring is 7.4% less than that of offspring of random crosses (Figure 4E), and these offspring are predicted to show less inbreeding depression.Note that the offspring from the 2 x 3 cross with the lowest genetic load possesses a relatively high realised load.Individuals 2 and 3 were closely related (Aunt and Niece), but they each possess a low genetic load.However, because they are related, their offspring expresses a high realised load, even though their genetic load is low.
Computer simulations of the genetic load Finally, we performed computer simulations examining the impact of genomicsinformed captive breeding on the neutral nucleotide diversity, genetic load, realised load, and fitness of individuals.The "Random mating" and "Minimise relatedness" regimes showed a steady increase in genetic (Fig. 5A) and realised (Fig. 5B) load over generations.Both regimes also suffered from a large decline in fitness due to a mutation meltdown (Fig. 5C).In contrast, both the genetic load and realised load were reduced in "Minimise load" and "Minimise load and relatedness" regimes (Fig. 5A,B).Therefore, genomics-informed captive breeding can effectively purge deleterious mutations and reduce their homozygosity, independently of consideration of relatedness.Consequently, mean fitness remained high in these regimes, increasing during the first ten generations (Fig. 5C).However, populations lost neutral genetic diversity at a relatively fast rate in the "Minimise load" regime (Fig. 5D).Such loss in diversity was not observed in the "Minimise load and relatedness" regime, and after 10 generations, this regime maintained more diversity than the "Random mating" regime (Fig. 5D).

B.
fitness of adults, and (D) neutral nucleotide diversity ().Each coloured line corresponds to a specific mating regime: "Random mating" (grey), "Minimise relatedness" (blue), "Minimise load" (orange), and "Minimise load and relatedness" (green).The genetic load and realised load are expressed in lethal equivalents calculated using equations [1] and [2] in the Material & Methods (see Bertorelle et al., 2022).The values presented in the figure represent the mean results obtained from 100 replicas.

Discussion
We conducted a proof-of-concept study to evaluate the utility of genomics-informed conservation for the management of captive populations in zoos.Our aim was to examine whether we could use genomic data to reduce the level of inbreeding depression and genetic load, thereby increasing both the short-and long-term population viability.We developed a novel bioinformatics pipeline to estimate the genetic load using CADD sores calculated for a model species (the chicken).We piloted our bioinformatics pipeline on the genomes of six pink pigeons from the captivebred population from two UK zoos (Jersey Zoo and Bristol Zoo).We quantified realised load in hypothetical offspring by crossing these six individuals, showing that inbreeding depression may be reduced in the captive pink pigeon population.We furthermore found that UCEs possess the most severely deleterious mutations with highest CADD scores, and that mutations in UCEs occur at a lower SNP density and frequency compared to polymorphisms in the flanking regions.These observations are consistent with purifying selection.
Substantial genetic drift and inbreeding in zoo populations reduces long-term viability.
Since the early 1970s, conservation biologists have used pedigrees and neutral genetic markers to assess and minimise inbreeding (Rabier et al., 2020).However, genetic load cannot be effectively measured or managed using this approach because neither markers nor pedigrees contain information about the segregation of deleterious mutations.Furthermore, pedigree data does not capture the possible relatedness between founder individuals.This can be especially problematic in populations that experienced a bottleneck before being sampled.
We showed our bioinformatics pipeline can identify optimal crosses that produce offspring with on average 7.4% lower realised load than random crosses.These offspring are expected to show less inbreeding depression.This reduction in realised load was modest because after nearly 10 generations in captivity, all pink pigeon individuals are relatively related.Crosses between closely related individuals have been minimised in the captive management of this species by exchanging pigeons between different zoos.However, this means that all individuals are similarly related.
More substantial gains can be made in reducing the realised load using genomicsinformed breeding in zoo populations with individuals that are less closely related.
Genomics-informed breeding will be especially efficient in reducing inbreeding depression in captive populations founded by many individuals, fewer generations in captivity, non-bottlenecked species, and species with a large ancestral population size (Bertorelle et al., 2022).These are all scenarios of populations that are likely to possess a high genetic load of segregating deleterious mutations not yet purged (Dussex et al., 2023),with considerable differences between individuals.
We do not know how CADD scores translate in fitness effects, and hence, we cannot calculate the exact benefits of genomics-informed breeding for survival rates.If a population carries a realised load of one lethal equivalent (LE), a reduction of 7.4% in realised load results in an increase of survival rate from 36.8% to 39.6%.This is a 7.7% relative increase.With a higher realise load of 2 LEs, the survival rate improves from 13.5% to 15.7%, which amounts to a relative increase of nearly 16%.More generally, reducing the realised load is likely to reduce inbreeding depression and increase fitness (Bertorelle et al., 2022).
Our simulations indicate that the genetic load and realised load can be reduced by the "Minimised load regime" and the "Minimised load and relatedness regime".This resulted in a substantial increase in fitness compared to the "Random mating regime", and the "Minimised relatedness regime".Although the "Minimised load regime" resulted in a substantial loss in nucleotide diversity, this was avoided by reducing relatedness in the "Minimised load and relatedness regime".Theoretically, this regime is the optimal approach to maximise the long-term viability of captive populations, both in terms of reduced genetic load and increased adaptive potential.
To conclude, CADD scores for model species can be successfully lifted over to provide an initial assessment of the genetic load from whole genome sequence data of nonmodel species.Optimal mate pairs can be identified to reduce the realised load and inbreeding depression in the offspring generation.Computer simulations show that genomics-informed breeding can reduce the genetic load and realised load, and this can be accomplished without significantly reducing nucleotide diversity in the population.Genomics-informed management can increase the long-term viability of captive populations and help to select the optimal individuals for reintroduction and genetic rescue programs.

Figure 1 -
Figure 1 -The pipeline for the creation of per Single Nucleotide Polymorphism Extraction of UCEs from the reference genome using Phyluce.(2) (Dark Blue) Mapping the sequencing reads for individuals to the reference genome indicating two parallel approaches for 10X chromium read data (used in this paper) and for Illumina read data.(3) (Light Blue) Variant calling for SNPs within the UCEs.(4) (Light grey) Creation of a chain file for the liftover of annotation from the chicken genome.(5) (Dark Grey) chCADD scores conversion to pink pigeon (subject species) annotation.(6)

Figure
Figure2Bshows the distribution of all chCADD scores along a single UCE (UCE-2729)

Figure
Figure2Cshows the distribution of chCADD scores along chromosome 1 of the

Figure 3 -
Figure 3 -The composition of the genetic load in six pink pigeon individuals blue tiles representing offspring with low genetic load, and red tiles offspring with high genetic load.The genetic load is lowest in the offspring from a cross between individuals 2 and 3.

Figure 4 -
Figure 4 -The genetic load at UCEs of six pink pigeons calculated using cross-

Figure 5 -
Figure 5-Impact of the four breeding regimes, simulated over 50 generations.329