High MHC gene copy number maintains diversity despite homozygosity in a Critically Endangered single-island endemic bird, but no evidence of MHC-based mate choice

Small population sizes can, over time, put species at risk due to the loss of genetic variation and the deleterious effects of inbreeding. Losing diversity in the major histocompatibility complex (MHC) could be particularly harmful, given its key role in the immune system. Here, we assess MHC class I (MHC-I) diversity and its effects on mate choice and survival in the Critically Endangered Raso lark Alauda razae, a species restricted to the 7 km2 islet of Raso (Cape Verde) since ~1460, whose population size has dropped as low as 20 pairs. Exhaustively genotyping 122 individuals, we find no effect of MHC-I genotype/diversity on mate choice or survival. However, we demonstrate that MHC-I diversity has been maintained through extreme bottlenecks by retention of a high number of gene copies (at least 14), aided by co-segregation of multiple haplotypes comprising 2–8 linked MHC-I loci. Within-locus homozygosity is high, contributing to comparably low population-wide diversity. Conversely, each individual had comparably many alleles, 6–16 (average 11), and the large and divergent haplotypes occur at high frequency in the population, resulting in high within-individual MHC-I diversity. This functional immune gene diversity will be of critical importance for this highly threatened species’ adaptive potential.

for two minutes. An aliquot of the clean PCR product was run on a 2% agarose gel, to verify 159 fragment length and to roughly estimate concentration of the PCR product based on band 160 intensity. The individual PCR products were then differentially evaporated at room temperature, 161 to achieve even concentrations. 162 In the second PCR, MIDs were added to each of the samples, as well as 360 (first library) and The indexed amplicons were cleaned with Agencourt AMPure XP-PCR Purification Kit 172 (Beckman Coulter, Indianapolis, USA) as specified above, but with a ratio of PCR product to 173 beads at 1:1.12. The cleaned PCR products were checked on a 2% agarose gel, and quantified 174 using a Quant-iT PicoGreen dsDNA Assay Kit (ThermoFisher Scientific/Invitrogen, Waltham, 175 USA) modified for a 96-well plate, measured on a plate reader.

| Bioinformatic pipeline 186
The sequence output of each sample was first trimmed using Cutadapt (Martin, 2011), based on 187 base quality of the 3'-end (-q 15) followed by linked adapter validation and removal of the paired 188 reads (-a, -A). Trimmed paired reads were then imported to DADA2 v. 1.0.3 (Callahan et al.,189 2016) in R v 3.5.0 (R Core Team, 2018) where we filtered with fastqPairedFilter, adjusting the 190 criterion for expected error rate for reverse reads from 2 to 5 (maxEE=c(2,5)); dereplicated with 191 derepFastq; learned error rates with the learnErrors function called through dada; and merged read 192 pairs with mergePairs. Thereafter, a sequence table was created with makeSequenceTable, after 193 which sequence length distributions were checked and any spurious sequences of very deviating 194 lengths were excluded, and the remaining sequences were filtered for sequence chimeras using 195 removeBimeraDenovo. 196

| Evaluation and selection of primers 197
The output from dada2 based on the six individuals amplified with four primer pairs in the first 198 library was scrutinized and cross referenced. HNalla + HN46 rendered significantly fewer alleles, 199 whereas most alleles were recovered by the three other primer combinations, some with 200 drastically varying coverage, however. For library 2, we selected the primer pair 3Fus35us8 + 201 3Rds20ex6, which generated >500 reads in ≥1 of 6 test samples for alleles A-V (see Figure 1, 202 Table S3) with the exceptions of alleles L, S, and T (all of which did not occur in those samples). 203 Further, additional allele 1 occurred with 179-240 reads and additional allele 2 with only 65-102 204 reads (see Figure S2), whereas they (and allele V) occurred in high frequency (>1000 reads) with 205 two other primer combinations (which, in turn, did not amplify other alleles well). Finally, 206 another seven alleles (additional putative alleles 3-9; Figure S2) were amplified at very low 207 frequency. 208

| Further filtering and allele calls 209
The combined output of library 1+2 from dada2 comprised 136 samples with an effective 210 coverage (number of reads used by dada2) between 0 (one failed sample) and 119,942, with 50% 211 of samples having 20,044-26,070 (median 22,322) reads. It was further filtered as follows: (1) 212 Five samples with less than 5,000 reads were excluded. (2) Alleles for which the exon contained 213 an uninterrupted open reading frame were accepted if the average read count for individuals in 214 which they were called exceeded 500. In addition, there were 11 alleles with very low read 215 counts, which were not included since the call whether present in an individual would not be 216 reliable. We made an exception for two of those 11 low-frequency alleles (R, V), because (a) they 217 had considerably higher read counts in the six samples from the first library, using the final primer 218 pair; (b) these two alleles were called consistently between multiple primer pairs in library 1, with 219 some primer pairs producing high read counts; and (c) there was concordance between technical 220 duplicates, with the exception of a single allele call in one duplicate with few total reads. (3) For 221 an allele to be called in an individual, its effective coverage had to meet either (a) a threshold 222 value of 10% of the average coverage for that allele for samples in which it was present, or (b) a 223 threshold value of within-individual read frequency of 5% (i.e. >5% of all reads in that individual 224 had to belong to that allele). 225 Allele calls were compared between the eight pairs of technical replicates, and repeatability 226 calculated as 2 × shared allele calls / (allele calls of replicate A + allele calls of replicate B). 227 Among the remaining 122 unique samples used for analyses, there was no effect of coverage on 228 number of alleles (F1,120 = 0.010, r 2 = 0, p = 0.76; Figure S3). We lacked individual data (sex, age, 229 morphometric) for eight samples, which were included in overall diversity analyses, but excluded 230 from analyses of survival and mate choice. 231

| MHC-I diversity and test of trends over time or sex effects 232
MHC-I diversity was computed with PhyML (Guindon & Gascuel, 2003) as the total length of a 233 phylogenetic tree of the amino acid sequences of an individual's alleles, using the LG substitution 234 model (Le & Gascuel, 2008). The relationship between number of alleles and MHC-I diversity 235 was explored by linear models. 236 Because few individuals were ringed as nestlings or recently-fledged juveniles, and because 237 Raso larks cannot be aged accurately from plumage after their complete post-juvenile moult 238 around three months of age, we instead approximated the age of individuals when first captured 239 using information on claw damage. Previous data from the population indicates that, while birds 240 less than two years old have undamaged feet, approximately one third of all birds known to be at 241 least two years of age show clear signs of toe or claw damage. We therefore assumed that birds 242 with claw damage on their first capture were two years old, whereas any birds with no toe damage 243 were in their first year of life (Age 1). This is most likely to be strictly true after a year of strong 244 population growth (e.g. 125% increase from 2009 to 2010). The data on the 114 individuals is 245 compiled in Supporting Information 2. 246 Temporal trends were tested with a linear regression (MHC-I diversity ~ inferred birth year) 247 and differences between males and females with a two-tailed t-test. 248

| Randomisation tests of (dis-)assortative mating 249
We used randomisation tests to determine whether Raso larks mate (dis-)assortatively. Of the 114 250 birds with individual data that were genotyped, there were 46 pairings where both the male and 251 female bird in a pair were genotyped. For each pairing (n = 46), we calculated the proportion of 252 shared MHC-I alleles and the pairwise mean amino acid distance. The preliminary analyses 253 identified two alleles that were fixed in the population, another allele present in all but one 254 sample, and three co-segregating blocks of alleles (see Results; Figure 1b-c). Because of the 255 potential influence of these co-segregating alleles on estimates of allele sharing we calculated 256 shared allelic diversity in two ways; treating the co-segregating blocks as separate alleles (termed 257 'allele sharing α'), and as single allelic blocks ('allele sharing β'). The empirical values for each 258 of the metrics were then compared to frequency distributions of the mean values generated from 259 9,999 permutations of 46 randomly selected parent pairs (including the observed value generates a 260 distribution of 10,000 values). For each randomised mating within a permutation, we assumed 261 that females could mate with any genotyped male present in the population in the year of mating, 262 sampling with replacement. We set the year of mating as the first year in which a pair were 263 observed together, and, because Raso larks are socially monogamous and pairs frequently mate 264 together in successive years, we did not allow females to mate with males that were present in the 265 population but which were known to have been paired to a female with whom they had mated 266 previously (i.e. removing the confounding effect of breeding history; see Table S4 for breakdown 267 of birds in each year). Observed values falling outside the 2.5-97.5% confidence intervals of the 268 frequency distribution for each metric would indicate significant departures from random mating 269 at an alpha level of 0.05. 270

| Survivorship 271
We used a Cox proportional hazards analysis to investigate the effect of MHC-1 diversity on age-272 related survivorship, where MHC-1 diversity represents the total tree length per genotype. Age 273 categorisation was carried out using information on claw damage and plumage (juveniles) as 274 described above. The Cox analysis fitted the time at death as the response term, with the age 275 category, sex, and MHC-1 diversity specified as fixed effects. Birds that were last seen in 2017 276 and 2018 were right censored to account for uncertainty in re-sighting, though annual re-sighting 277 rate is high for this population (c. 88%; own data). All statistical tests were performed in R v.

286
Fixed or almost fixed alleles, as well as alleles in co-segregating clusters, labelled in bold.

287
Bootstrap values >70%, based on 1,000 replicates, are written with grey font (at the branch 288 upstream of supported node), with 100% indicated by an asterisk. See Figure S2 Table S3; Supporting Information 3), many of which were present at high levels in the 300 sampled population (Figure 1b), and several of which co-occurred within individuals (Figure 1c). 301 Of the 22 alleles, which we named alphabetically following decreasing across-population allele 302 read depth in the second library, 20 (alleles A-Q, S-U) occur with higher within-individual (or 303 within-sample) frequency (read depth) whereas two (alleles R, V) occur with markedly lower 304 within-individual frequency (Table S3). We consider these two alleles valid, but under-amplified 305 by our primers, which means that we risk false negatives, when fewer than sufficient reads qualify 306 for allele calls in certain individuals (see Methods). The repeatability among 75 pairwise allele 307 comparisons in eight replicated samples was high, 99%: there was a single difference when the 308 low within-individual frequency allele R was not called in a replicate that had the sixth lowest Out of the 11 alleles or co-segregating blocks other than ABC and EFGHK, six deviated 327 significantly from expected random association to ABCEFGHK|ABC genotypes (Table S5). Five 328 alleles/blocks (three significantly so) associated with the ABC haplotype and were never observed 329 in ABCEFGHK homozygotes, whereas two (both significantly) associated with the ABCEFGHK 330 haplotype and was under-represented in ABC homozygotes (Table S5). Allele O co-segregated 331 with EFGHK in all 64 ABCEFGHK|ABC heterozygotes and ABCEFGHK homozygotes, but was 332 also observed in 2 of 58 ABC homozygotes (Table S5). 333 The raw distances between alleles ranged 1-42 nucleotides or 0-26 amino acids ( Figure S4). 334 However, since the differences between biochemical properties of amino acids vary greatly in 335 magnitude, we defined amino acid divergence as the patristic distances (branch length distance) 336 computed with the LG substitution model (Le & Gascuel, 2008): the average divergence between 337 all alleles was 0.337 ± 0.009. The fixed (or almost fixed) A, B and C alleles were found in 338 different parts of the allele tree (Figure 1a), with an average divergence of 0.364 ± 0.068. This 339 was true also for the co-segregating blocks EFGHK (0.356 ± 0.039), PQU (0.257 ± 0.034), and ST 340 (0.420). 341 MHC-I diversity in an individual is determined by the number of alleles and their degree of 342 dissimilarity, a metric that we defined as the total branch length of a phylogenetic tree comprising 343 an individual's alleles (computed with the LG substitution model (Le & Gascuel, 2008)). Using 344 this definition, the average within-individual MHC-I diversity was 1.205 ± 0.021, but number of 345 alleles was a good proxy for diversity, explaining 90% of the variation (adjusted r 2 ) in a simple 346 linear model (diversity ~ N alleles; b = 0.075, F1,120 = 1,086, p < 2.2×10 -16 ). Individuals with the 347 five co-segregating EFGHK alleles had on average 14.2 alleles ± standard error (SE) 0.8, which is 348 5.9 alleles (71%) more than individuals without the five EFGHK alleles (average 8.3 ± 0.2 349 alleles). When separating EFGHK bearers into homozygotes for ABCEFGHK and heterozygotes 350 for ABCEFGHK|ABC, heterozygotes had higher diversity than ABCEFGHK homozygotes, both of 351 which had higher diversity than ABC homozygotes (diversity ~ N alleles + ABCEFGHK|ABC 352 genotype; F3,118 = 373.1, p < 2.2×10 -16 ; Figure 2b). There was no effect of sex (t1,112 = 0.38, r 2 = 0,

| No MHC-I based mate choice 368
There was no difference in the total allele number of males and females, irrespective of whether 369 the RV alleles were included (male mean = 11.30, SD = 3.06; female mean = 10.95, SD = 2.90; 370 Welch's t-test t109.2 = -0.62, p = 0.54), or excluded (male mean = 9.67, SD = 3.20; female mean = 371 9.25, SD = 3.04; Welch's t-test t109.5 = -0.71, p = 0.48). 372 Raso lark pairs typically shared a high proportion of alleles (mean = 0.67, SD = 0.2), but this 373 was not any more or less than that expected under random mating, irrespective of how we handled 374 the co-segregating blocks (Table 1; Figure 3a-b), or treated the RV alleles (Table S6). 375 Randomisation tests likewise found no evidence for assortative or disassortative mating according 376 to divergence (mean amino acid distance; Table 1; Figure 3c; Table S6). Though our sample size 377 was modest, in all cases, the empirical mean of the different complementarity metrics sat far from 378 the 2.5% tails of the distribution of means generated under random mating (0.23 < p < 0.79). 379 Table 1 Randomisation testing of (dis-)assortative mating in Raso larks based on allelic variation 380 in MHC-I. The empirical means calculated from observed pairings were compared to randomly 381 assigned pairings generated from 9,999 permutations; significant departures from random mating 382 would fall outside the 95% confidence intervals (CI). α and β refer to the whether the co-383 segregating block of five alleles were treated as five separate alleles (α), or a single allelic block 384 (β). p is the ranked position of the observed mean in the distribution. Note that here, all results to 385 randomisations that included the RV alleles; for results excluding RV alleles, see    Table 2), where birds ringed as a pullus 400 survived longer than adult birds without claw damage on first capture, which in turn tended to 401 survive longer than adult birds with claw damage on first capture ( Figure S6). After accounting 402 for this age pattern, we found no effect of the MHC-I diversity on survival in either sex (Table 2). 403 404 rather large number of MHC-I alleles and these alleles were highly divergent. The explanation for 416 the discrepancy between population and within-individual estimates is that the within-locus 417 genetic diversity is low, as shown by a high degree of homozygosity, whereas the gene copy 418 number is high. Overall the immune gene diversity in the genome of Raso larks is surprisingly 419 high, which is promising for this threatened species. 420

| Inheritance and genomic architecture of MHC-I loci 421
The three fixed or near fixed alleles (A, B, C) most likely represent three core MHC-I loci, not 422 only due to their commonness but also because of their relatively high divergence (average amino 423 acid distance 0.364 ± 0.068; Figure 1a; Figure S4 S4). We therefore find it likely that the three loci A, B and C have been duplicated and that the 432 corresponding alleles found in the co-segregating EFGHK block are the alleles G, F and K (Figure  433 1a). The two rarer co-segregating blocks PQU and ST (Figure 1b-c), as well as several single 434 alleles, may also have resulted from duplications ( Figure 1a; Table S3). 435 For the (near) fixed ABC alleles and the frequent co-segregating allele block EFGHK, the 436 most likely evolutionary scenario is that the A, B and C alleles correspond to loci that have been 437 duplicated and formed the G, F and K loci ( Figure 1a; Table S3, Figure S4) -after which further 438 gene duplications have occurred resulting in the E and H loci within the co-segregating block -439 and that there are two main haplotypes in the population: ABC (three loci) and ABCEFGHK (eight 440 loci). This is concordant with the observed bimodal ratios of average allele read depth for EFGHK 441 over ABC alleles in individuals with EFGHK alleles, corresponding to two genotypes: 442 heterozygous for ABCEFGHK|ABC (low ratio, as EFGHK occurs with one copy) and 443 homozygous for ABCEFGHK (doubled ratio, as EFGHK occurs with two copies; Figure 2a) (Table S5)

528
In comparison with other bottlenecked populations, the Raso lark displays high diversity. The 529 only passerine where there is data to draw comparison is the Seychelles warbler (also within the 530 superfamily Sylvioidea), which went through a population bottleneck of less than 30 individuals 531 in the 1960s but has since recovered to a current estimate of 3,000 individuals. This species only 532 has 10 alleles over at least four loci (Richardson & Westerdahl, 2003

| Effects of MHC-I diversity on mate choice and fitness 558
Based on the above, we hypothesised that female Raso larks would choose males with MHC 559 genotypes complementary to theirs, in order to maximise the genetic diversity of their offspring. 560 However, we found no evidence for this phenomenon. One explanation for the absence of effect 561 of MHC complementarity on mate choice is that, given that the levels of genetic diversity are still 562 relatively high in the population (Dierickx, Sin, et al., 2019), and that the population contraction 563 around 550 years ago, coinciding with human settlement, was recent from an evolutionary point 564 of view (Dierickx, Sin, et al., 2019), maybe there has been no strong selection yet to maximise 565 diversity through mate choice; maybe not enough time has passed yet for that behaviour to evolve. 566 It is also possible that other mating criteria override the importance of MHC in mate choice. 567 For example, males occupy and defend territories, and these could play a role in the females' 568 choice (e.g. Alatalo partner probably provides enough indirect benefits (e.g. nest building, feeding of the chicks) to the 574 female for her not to take MHC diversity into account when choosing him. However, Richardson 575 and Komdeur (2005) found that, when the social partner had a low level of MHC diversity, the 576 female was more likely to engage in extra-pair copulations, with a male that had a higher level of 577 MHC diversity than the partner (Richardson et al., 2005). We do not know enough about the 578 details of the Raso lark's socially monogamous mating system to be able to hypothesize whether a 579 similar phenomenon might be at play in our study system. Our results are based on social 580 pairings. It is possible that female Raso larks engage in extra-pair matings, and indeed the 581 frequency of extrapair offspring is 20 percent in a close relative of the Raso lark, the Eurasian 582 skylark (Hutchinson & Griffith, 2008). 583 A final functional explanation for the absence of MHC-based mate choice might be due to the 584 fact that larks in arid environments generally have very few parasites (Horrocks et al., 2012). This 585 would reduce the importance of the immune system for survival and reproductive fitness and 586 thereby reduce selection for MHC-based mate choice. Our analyses did not find any effect of 587 MHC-I diversity on survivorship. That this is the case though does then beg the question of why 588 the Raso lark has such high levels of MHC-I diversity, if not to fight a wide array of diseases. The 589 high diversity may reflect past selection, as the Raso lark probably diverged from the ancestor of 590 two currently widespread continental species, the skylark and the oriental skylark A. gulgula, 591 about six million years ago (Alström et al., 2013), and nothing is known about MHC-I diversity in 592 the congeneric species. Further, based on our results, it seems as though while within-locus 593 diversity may be low (i.e. there seems to be a single allele for many loci), between-locus diversity 594 is high, and overall functional MHC-I diversity has been maintained through population 595 bottlenecks at a high level by the co-segregation of large blocks of linked loci (e.g. the haplotype 596 ABCEFGHK). 597

| Conclusions 598
This study shows that it is possible for a population to maintain relatively high levels of MHC  Table S1: Sequence, melting temperature, and other details for primers. Table S2: PCR  786 conditions and amplification success in four species for 13 primer pairs evaluated in this study. 787 Table S3: Properties, such as read counts and presumed parental allele, for alleles A-V and 788 excluded additional alleles 1 and 2. Table S4: Genotyped Male Raso larks present in the 789 population in each year of the study. Table S5: Association of alleles and co-segregating blocks 790 to ABCEFGHK|ABC genotypes. Table S6: Randomisation testing of (dis-)assortative mating 791 based on allelic variation in MHC-I; corresponds to Table 1, but excludes RV alleles. Figure S1: 792 Evaluation of amplification success by electrophoresis in four species for 13 primer pairs 793 evaluated in this study. Figure S2: Unrooted MHC-I amino acid allele tree, including the alleles 794 called and used for analyses (A-V) and additional alleles that are certainly (alleles 1 & 2) or 795 possibly (alleles 3-9) functional, but yielded too few reads with our primer pairs to enable 796 confident allele calls.