Genome of an iconic Australian bird: High‐quality assembly and linkage map of the superb fairy‐wren (Malurus cyaneus)

The superb fairy‐wren, Malurus cyaneus, is one of the most iconic Australian passerine species. This species belongs to an endemic Australasian clade, Meliphagides, which diversified early in the evolution of the oscine passerines. Today, the oscine passerines comprise almost half of all avian species diversity. Despite the rapid increase of available bird genome assemblies, this part of the avian tree has not yet been represented by a high‐quality reference. To rectify that, we present the first high‐quality genome assembly of a Meliphagides representative: the superb fairy‐wren. We combined Illumina shotgun and mate‐pair sequences, PacBio long‐reads, and a genetic linkage map from an intensively sampled pedigree of a wild population to generate this genome assembly. Of the final assembled 1.07‐Gb genome, 975 Mb (90.4%) was anchored onto 25 pseudochromosomes resulting in a final superscaffold N50 of 68.11 Mb. This high‐quality bird genome assembly is one of only a handful which is also accompanied by a genetic map and recombination landscape. In comparison to other pedigree‐based bird genetic maps, we find that the fairy‐wren genetic map more closely resembles those of Taeniopygia guttata and Parus major maps, unlike the Ficedula albicollis map which more closely resembles that of Gallus gallus. Lastly, we also provide a predictive gene and repeat annotation of the genome assembly. This new high‐quality, annotated genome assembly will be an invaluable resource not only regarding the superb fairy‐wren species and relatives but also broadly across the avian tree by providing a novel reference point for comparative genomic analyses.

radiation of birds, these resources are limited to five species from the Passerides clade. Yet the early evolution of the Oscines involved multiple branching in the Australo-Papuan region in the Oligocene before the emergence of the Corvides and Passerides, the two groups that ultimately gave the oscines their numerical and ecological dominance (Marki et al., 2017;Oliveros et al., 2019). The superb fairy-wren is an exemplar of the largest clade of that early radiation, the Meliphagides, which has almost 300 species. Furthermore, we complement this high-level assembly with a detailed genetic map from which we can infer the variation in recombination across the genome. Although genome assembly and genetic map resources have been developed separately for various species, there are currently only four other bird species for which we have both resources readily available.
Superb fairy-wrens are small, insectivorous birds found throughout south-eastern Australia. The males boast a familiar, striking bright blue breeding plumage contrasted against dark black patches.
They have become a model species in behavioural and evolutionary ecology because they combine two conceptually interesting behaviours in an unpredicted way. First, they were the first bird anywhere in the world shown to exhibit cooperative breeding, and the single female on each territory can be assisted by as many as five males in provisioning offspring (Boland & Cockburn, 2002;Rowley & Russell, 1997). Experimental studies provided a textbook example of how limitations to mating opportunities preclude dispersal by young males, which instead defer dispersal and help the breeding pair raise offspring (Pruett-Jones & Lewis, 1990). Second, this cooperative breeding occurs in a novel context. Paternity in this species is dominated by extra-group fertilizations, which are sought by the female during predawn forays to the territories of a small number of attractive males (Cockburn, Brouwer, Double, Margraf, & Pol, 2013;Cockburn et al., 2009;Double & Cockburn, 2000;Mulder, Dunn, Cockburn, Cohen, & Howell, 1994). As a consequence, males often provision offspring to which they are completely unrelated (Dunn, Cockburn, & Mulder, 1995). This paradox has provoked more than three decades of demographic, behavioural and genetic F I G U R E 1 Phylogenetic coverage of available bird genome assemblies. The placement of bird genome assemblies available in NCBI GenBank (as of August 19, 2019) on the Jetz, Thomas, Joy, Hartmann, and Mooers (2012) avian phylogeny. The dark blue circles and associated silhouettes represent the chromosome-scale genomes and light blue circles represent the scaffold-scale genome assemblies [Colour figure can be viewed at wileyonlinelibrary.com] study of cooperative breeding in this species (Cockburn et al., 2016;Cockburn, Sims, et al., 2008). Further issues investigated include mate choice and sexual selection (Cockburn et al., 2016;Cockburn, Sims, et al., 2008;Peters, 2000), inbreeding (Hajduk et al., 2018), dispersal (Cockburn, Osmond, Mulder, Green, & Double, 2003), life history evolution and parental investment (Russell, Langmore, Cockburn, Astheimer, & Kilner, 2007), parent-offspring communication (Colombelli-Négrel et al., 2012), evolution of brood parasitism (Langmore, Hunt, & Kilner, 2003) and behavioural responses to predator risk (Magrath & Bennett, 2012;Magrath, Haff, McLachlan, & Igic, 2015;Potvin, Ratnayake, Radford, & Magrath, 2018), as well as population responses to climate change (Kruuk, Osmond, & Cockburn, 2015;van de Pol, Osmond, & Cockburn, 2012).
Upgrading genome assemblies from scaffold-level to chromosome-level superscaffolds provides additional genomic context by orienting genes relative to each other and other genomic features such as centromeres, telomeres, various repeat elements and regulatory regions. Knowledge of this organization aids in understanding how genome architecture can influence variation in evolution within the genome but also provide insight into genome evolution between populations and species Sävilammi et al., 2019;Thind et al., 2018). To date, there are only 16 bird species with genome assemblies classified as 'chromosome' level in GenBank (as of August 19, 2019). Twelve of these assemblies were assembled de novo and four of these were assembled using other reference assemblies as a guide. These assemblies are distinguished from 'scaffold' level by higher contiguity where scaffolds are anchored onto chromosomes. This is performed using molecular cytogenetics (BAC and FISH; Damas et al., 2017;O'Connor et al., 2018), physical mapping (HiC, BioNano, Burton et al., 2013) or genetic mapping (Fierst, 2015). In the near future, we expect to see an exponential accumulation of bird genome assemblies to be released, particularly from the B10K consortium which is currently sequencing at least one bird species per family (~300 assemblies; Zhang et al., 2015). Despite this, only a handful of species are in the pipeline to be sequenced to chromosome level, all using HiC chromatin interaction maps (Stiller & Zhang, 2019). Genome assembly aided by a genetic map will remain valuable despite the onslaught of new assemblies being, as yet, limited to a few exemplar species.
The utility of a genetic map extends far beyond upgrading a genome assembly. The maps themselves serve as an invaluable resource by enabling us to associate a particular phenotypic trait to specific loci through quantitative trait loci (QTL) mapping (Su et al., 2017), associate different traits through tight linkage of the genes that code for them (Schwander, Libbrecht, & Keller, 2014), and provide an understanding of the variation in genetic diversity within the genome (Burri et al., 2015). It is also of interest to understand how the recombination landscape itself varies between individuals, populations and species, and how it may be influenced by selection, demographic history, genomic features and chromosomal rearrangements (Barton, 1995;Dapper & Payseur, 2017;Ortiz-Barrientos, Engelstädter, & Rieseberg, 2016;Stapley, Feulner, Johnston, Santure, & Smadja, 2017). From the existing chromosome-level genome assemblies, only four other species have a high-density, single nucleotide polymorphism (SNP)-based genetic map to complement the assembly (Groenen et al., 2009;Kawakami et al., 2014;Stapley, Birkhead, Burke, & Slate, 2008;van Oers et al., 2014). Providing a genetic map and recombination landscape to accompany a high-quality genome assembly greatly expands the array of possible research questions and avenues.
Here we combine both conventional short-read (Illumina shotgun and mate-pair) and long-read sequencing (PacBio) with an extensive pedigree of our long-term study population to achieve a highly contiguous assembly and recombination map. The fairy-wren genome assembly has 975 Mb (out of the 1.07 Gb total assembled) of sequences anchored onto 25 pseudochromosomes (out of n = 36; L. Christidis, pers. comm.) with a contig and superscaffold N50 of 465 kb and 68.11 Mb, respectively. We also provide comparisons of recombination rate and other genomic features to help understand the sources of variation of these features within the genome. Lastly, we add prediction-based repeat and gene annotations using existing libraries from other bird species. This assembly will greatly facilitate ecological and evolutionary studies of fairy-wrens, and provide a resource for understanding the evolution and diversification of the Meliphagides as a whole.

| Reference genome sequencing
We chose a female individual (ANWC:B45704) from the Flinders Island subspecies (Malurus cyaneus samueli) for reference genome sequencing ( Figure 2); a previous microsatellite survey of M. cyaneus suggested that this population has the lowest heterozygosity across the species, (D. Etemadmoghadam, et al., undated thesis, unpub-lished data, Dept. of Genetics, University of Melbourne) making it an ideal candidate for genome assembly. We extracted the DNA from a liver tissue sample using a standard salting out procedure. For small insert sizes, we prepared two size ranges centred on 250 and 500 bp using the Meyer and Kircher (2010) protocol. The DNA was sheared using a bioruptor (Diagenode), and a double-sided bead size selection was then used to obtain the correct insert size. The 250-bp library was sequenced using half a lane of an Illumina HiSeq 2500 (100 bp, paired-end) and the 500-bp library was sequenced using a lane of Illumina MiSeq (300 bp, paired-end). Three Illumina mate-pair libraries were prepared and sequenced for insert sizes centred around 3.5, 5.5 and 7.5 kb by the ACRF Biomolecular Resource Facility (ANU). Each mate-pair library was sequenced using 1/6th of an

| Linkage map sampling and sequencing
We used a subset of the extensively sampled, wild pedigree of a population of the subspecies M. cyaneus cyanochlamys in the Australian National Botanical Garden for linkage mapping (Figure 2).
We collected a small sample of blood from all captured birds. Male philopatry and high skew in extragroup mating success resulted in a complex pedigree ( Figure S1; Dunn & Cockburn, 1999). We chose 273 individuals spanning three to nine generations. DNA was extracted from blood samples with the ammonium acetate precipitation protocol (Burke, Bruford, Hanotte, & Brookfield, 1998). Paternity was inferred a priori using microsatellite data and these paternity assignments were used to select the individuals for mapping. The paternity analysis through microsatellites relied on an exclusion approach described most fully by Hajduk et al. (2018). We then sequenced tens of thousands of SNPs distributed randomly across the genome using a method provided by Diversity Arrays Technology Pty Ltd (DaRTseq; Kilian et al., 2012). All prior paternity assignments through microsatellite data were later supported by the SNP genotyping data.

| De novo genome assembly
The nuclear genome assembly pipeline can be divided into five main stages which improved contiguity, reduced gaps and/or improved quality. Each stage is detailed in Figure S2. The first stage was the Illumina assembly. The raw Illumina reads for both the short insert sizes and mate-pair libraries were trimmed for adapter sequences and lowquality bases using the libngs toolkit (https ://github.com/sylva infor F I G U R E 2 Superb fairy-wren range map and sampling. The entire range of the superb fairy-wren is shown here with the six subspecies designations. The individual chosen for the reference genome (ANWC:B45704) comes from the samueli subspecies from Flinders Island off Tasmania, while the linkage mapping population comes from the mainland cyanochlamys subspecies [Colour figure can be viewed at wileyonlinelibrary.com] et/libngs). The genome size was initially estimated using sga preqc (Simpson, 2014). This size was used to roughly guide the assembly and assessment of its resultant contiguity. We used the allpaths-lg assembler for the initial assembly and scaffolding (Butler et al., 2008).
The second stage was PacBio gap filling: spanning the gaps within and between scaffolds using error-corrected PacBio subreads. Coverage of the PacBio reads was not sufficient to be included in the de novo assembly but was sufficient for filling gaps. These PacBio reads were first error-corrected using the higher quality Illumina reads using the tool lordec (Salmela & Rivals, 2014). We then used the error-corrected PacBio reads to upgrade the Illumina scaffolds by filling in the gaps and extending the assembly using pbjelly (English et al., 2012). The third stage was superscaffolding: anchoring and orienting the scaffolds into pseudochromosome superscaffolds. This assumes that the genetic order, as inferred from the pedigree-based linkage map, corresponds to the physical order along the chromosome. Details on how the genetic map was built for scaffolding is described in more detail below and in the supporting methods. We used the software lepmap3 to build our genetic map using the DaRTseq data of our pedigree (Rastas, 2017).
Before assembling the superscaffolds, we first identified the large misassemblies in the scaffolds and split them accordingly. We then used allmaps to anchor and orient the scaffolds into 25 superscaffolds using the information from the genetic map (Tang et al., 2015). This stage resulted in new gaps and new associations within superscaffolds, which led to the fourth stage: superscaffold gap filling. Similar to the second stage, the fourth stage used the error-corrected PacBio reads and pbjelly to fill in any new gaps. The fifth and final stage was to polish the genome assembly and convert it to a pseudohaploid reference. This stage involved mapping all of the Illumina reads onto the stage four assembly using bwa. We then used pilon (version 1.22) to correct incorrectly called bases and small indels (Walker et al., 2014).
The remaining steps were curating and naming of the superscaffolds. All superscaffolds were aligned onto the Taeniopygia guttata assembly (version 3.2.4, Zhang, Jarvis, & Gilbert, 2014) using lastz and the 25 superscaffolds were assigned to pseudochromosomes based on the match to the T. guttata genome, which in turn had been initially named for homology to the Gallus gallus genome. The main nomenclature adopted from the T. guttata genome was the fissions relative to the G. gallus genome such as chromosomes 1A and 4A.
W-linked scaffolds were identified using genotyping of the DArTseq SNPs in the mapping population. Any variants that were found only to be genotyped in females and missing in males were considered putative W-linked SNPs. The scaffolds which these SNPs map to were labelled as unmapped W chromosome scaffolds. The remaining small scaffolds were also aligned to the other passerine chromosome assemblies: Parus major (version 1), Ficedula albicollis (version 1.5) and Passer domesticus (version 1). If these unmapped scaffolds were associated with the same chromosome in at least two out of the four genomes, they were assigned as 'unplaced' to that chromosome. Any remaining scaffolds were labelled as 'chromosome unknown'.
The mitochondrial genome was assembled independently from the nuclear genome. To assemble the mitochondrial genome, we used mitobim, which used a reference mitochondrial genome of a closely related species to seed the assembly (Hahn, Bachmann, & Chevreux, 2013). We used a reference assembly from a congeneric species, Malurus melanocephalus (GenBank NC024873), to bait mitochondrial sequences. We used the Illumina HiSeq (~250-bp insert size) library as it yielded the most consistent results through multiple trials. After choosing the best assembly, we mapped the reads again and corrected any misincorporated sequencing errors using the majority call for each base. Finally, we used MITOS Webserver to identify the location of the genes and tRNAs in the genome (Bernt et al., 2013).

| Genetic linkage mapping for assembly
The third stage of the genome assembly involved linkage mapping to inform the placement and orientation of each scaffold along each chromosome. First, each DaRTseq library was trimmed to remove Illumina adapters and barcode sequences using trimmomatic (version 0.32, Bolger, Lohse, & Usadel, 2014). The trimmed reads were then mapped onto the PacBio gap-filled scaffolds (stage 2) using bwa mem (Li & Durbin, 2009). The remaining steps follow the pipeline provided by lepmap3 (version 0.2). This used genotype likelihoods for mapping (Rastas, 2017). The pedigree was split into 37 full-sibling families with a total of 273 individuals. Individuals were included more than once if they belonged to multiple families and grandparents were included to aid phasing.
We first built a framework map by following the lepmap3 (LM3) pipeline. We highlight the relevant parameters here but a detailed description of the mapping can be found in the supporting methods and Figure S3. The SeparateChromosomes2 module split the loci into linkage groups that should be associated with a chromosome. We decided on log of odds (LOD) = 13 as our cut-off for the linkage groups after testing various cut-offs for their generation groups ( Figure S4).
We then added remaining markers using the JoinSingles2All module with a LOD threshold of 10. The OrderingMarkers2 module finds the best order for markers within a linkage group. This module was run repeatedly, and spuriously mapped markers were manually removed.
We continued this process until no more spuriously mapped markers remained. The final marker order constituted our 'framework map'.
Next, we built a 'forced map' using the framework map and forcibly grouping all SNPs within a scaffold into the linkage group to which that scaffold belonged. We performed another round of JoinSingles2All to map any additional scaffolds that were not mapped in the framework map. Within each linkage group, we then filtered out all SNPs that were below a LOD threshold of 3. Finally, we performed multiple rounds of the OrderingMarkers2 and manual curation to build the map we used to anchor and orient the scaffolds.

| Genome assembly assessment
We summarized the contiguity of the genome assembly using standard metrics provided by the assemblathon 2 stats perl script (https ://github.com/ucdav is-bioin forma tics/assem blath on2analysis). The contiguity of the genome assembly was assessed between each major assembly step and the final assembly was compared to other chromosome-level bird assemblies. The completeness and quality of the genome assembly was assessed using busco (3.0.2) analyses, which searches the assembly for 4,915 universal single-copy orthologues from the aves (odb9) database (Simão, Waterhouse, Ioannidis, Kriventseva, & Zdobnov, 2015).
As for assembly contiguity, this was measured between each assembly step and between the final assembly and existing chromosome-level assemblies.

| Predictive gene and repeat annotation
The predictive gene annotation followed the same annotation pipeline that was run for other avian assemblies in the B10K project for comparability . Briefly, the primary gene set was derived from ensembl85 (from the G. gallus galGal4 and T. guttata tae-Gut2 genome assemblies) totalling 20,194 genes. A supplementary gene set was also compiled using 20,169 human genes, and genes from 71 transcriptomes for a second set. The two rounds of annotation using the two different gene sets involved a rough alignment using tblastn (version 2.2.2) and generating a gene model using genewise (wise2.4.1). We then filtered out short proteins (<30 amino acids), pseudogenes, retrogenes, highly duplicated genes with 70% repeats or with a single exon, and redundant or overlapping genes.
For de novo repeat discovery, we ran repeatmodeler (version 1.0.8) on the assembly to create a Malurus-specific repeat library. To characterize the final repeat content, we ran repeatmasker (version 4.0.7) using the database containing the Malurus-specific repeat library and those from other avian databases: T. guttata, F. albicollis and Corvus cornix (Vijay et al., 2016).

| Synteny comparisons and intrachromosomal rearrangement (inversion) inference
Large-scale synteny between the final fairy-wren genome assembly and existing chromosome-scale assemblies were compared using lastz (version 1.04.00) alignments. The synteny was represented in circos plots to qualitatively compare interchromosomal rearrangements such as chromosomal fissions, fusions and translocations. We performed a preliminary exploration of potential inversions between M. cyaneus and other species. For this inference we included only the autosomes which were homologous to the 10 largest autosomes in the G. gallus genome (i.e. Chromosomes 1-10, 1A and 4A) and the Z chromosome. We also omitted Meleagris gallopavo, Aquila chrysaetos, Falco peregrinus and Strigops habroptila from the analysis due to the pronounced levels of interchromosomal rearrangements that made collinear comparisons intractable. First, the chromosomes were masked using repeatmasker (version 4.0.7). We then performed pairwise alignments between M. cyaneus and comparison species using progressivemauve (version 2.4.0, Darling, Mau, & Perna, 2010) for each chromosome separately. We then used grimm-synteny (version 2.02, Tesler, 2002) to identify syntenic blocks between alignments using two block sizes of 50 and 500 kb with gap sizes of 10 and 50 kb, respectively.
We used two sizes to see if we observe similar trends, although we are mostly interested in larger (megabase-scale) inversions and will mostly focus on the 500-kb block size results. Lastly, we used mgr (version 2.01; Bourque & Pevzner, 2002) to find the optimal number of intrachromosomal rearrangements necessary to obtain the current order and orientation of syntenic blocks.

| Recombination landscape and comparisons
The process to generate the recombination landscape was identical to the linkage mapping but used the information from the physical position on the final genome assembly as the physical order of the markers. First, we remapped the DaRTseq reads on the final polished genome assembly using bwa. The first stages of the LM3 pipeline were run as previously. As with the forced map, within each chromosome we filtered out markers which fell below the LOD score limit of 3. We then ran the OrderMarkers2 module using the physical order and obtained the sex-specific and sex-averaged genetic maps for the 24 autosomes and the male-specific map for the Z chromosome.
We obtained recombination rate estimates from the Marey map representation: genetic distance plotted against physical distance. First, we smoothed the map using a loess (local polynomial regression) smoothing with a span of 0.2. This regression smooths over windows of a fixed number of SNPs instead of physical length to reduce bias in regions where more SNPs were recovered. Using this regression, we estimated recombination rates in 200-kb nonoverlapping windows and correlated these estimates with other genomic features. GC content was calculated as the percentage of G or C bases within the given window. Gene density was the percentage of coding sequence within the 200-kb window.
Relative distance from chromosome end was standardized from 0 to 1, 0 corresponding to the chromosome end and 1 corresponding to chromosome centre. Before analyses, recombination rate was log 10 -transformed to reduce skewness. Gene density and GC content were both square root-transformed. We explicitly tested the correlation of recombination rate to both gene density and distance to chromosome end using Pearson's R. GC content was not explicitly correlated with recombination rate as this feature was not an explanatory variable but rather probably a byproduct of the variation in recombination (Bolívar, Mugal, Nater, & Ellegren, 2016;Clément & Arndt, 2013). To minimize the effects of spatial autocorrelation within the genome, we performed permutation tests to obtain the distribution of Pearson's R statistic under a null model (shuffled recombination rate and explanatory variables among windows) and our observed data. Furthermore, we performed bootstrapping by subsampling 20% of the windows with replacement. We performed 2,000 iterations of bootstrapping to obtain the null distribution of Pearson's R and another 2,000 for the observed distribution. Because chromosome sizes ranged over an order of magnitude, standardizing distance from a chromosome end from 0 to 1 across all chromosomes may not always be entirely comparable. We therefore included a separate representation of the effect of the distance to the chromosome end by comparing the log 10 -transformed recombination rate to the log 10 -transformed exact distance to each chromosome end in megabases. We then performed a loess smoothing of this correlation to illustrate the shift in recombination rates from the ends of the chromosomes to the centre. Between chromosomes, we also

| Genome sequencing effort
The raw data comprised 128,683,241 read pairs from the 250-bp insert library, 12,456,651 read pairs from the 500-bp insert library, 47,009,583 from the 3.5-kb mate-pair library, 56,672,592 from the 5.5-kb mate-pair library, and 39,069,768 from the 7.5-kb mate-pair library. The sga preqc results estimated the genome size to be 1.07 and 1.04 Gb from the 250-and 500-bp insert size libraries, respectively. Using the 1.07-Gb genome size estimate, sequencing coverage corresponded to 24.0× from the 250 bp insert library, 7.0× from the 500-bp insert library, 8.7× from the 3.5-kb mate-pair library, 10.5× from the 5.5-kb mate-pair library and 7.3× from the 7.5-kb mate-pair library for a total of 55.7× coverage from Illumina sequencing. After filtering and cleaning of the raw reads, we retained 91,417,772 read pairs and 21,503,225 merged or unpaired reads from the 250-bp insert size Illumina library and 6,303,745 read pairs and 3,496,087 merged or unpaired reads from the 500-bp insert size Illumina library.
For the mate-pair libraries, we retained 12,287,646 read pairs for the 3.5-kb insert size, 15,656,984 read pairs for the 5.5-kb insert size and 9,326,889 read pairs for the 7.5-kb insert size libraries (Table S1).
From the 27 PacBio RSII SMRT cells, we sequenced 2,676,850 subreads. The mean length of the subreads was 8.7 kb with the longest subread being 53.6 kb. Total coverage of the PacBio filtered subreads was 22× using the estimated 1.07-Gb size genome. The distribution of subread lengths is given in Figure S5. Of the total number of reads, 1,271,158 (47%) were error-corrected by lordec.

| Genome assembly
Here we focus on the stages of genome assembly, but the specific sections that require more attention (such as linkage mapping and misassembly detection) will be discussed in detail below. The initial  Table S2. The progression of contiguity and the busco assessment of the stages of genome assembly are presented in Figure 3. The assembled chromosome sizes are given in Table 1. The final mitobim mitochondrial genome assembly size was 17,031 bp. The annotation resulted in the expected 13 genes (COI-III, ND1-6, cytB, ATP6 and ATP8), two ribosomal RNAs (12S and 16S) and 22 tRNAs ( Figure S6).

| Genetic linkage mapping for superscaffolding (Stage 2 to Stage 3)
We recovered 35,276 SNPs after mapping the DArTseq reads onto the stage 2 assembly. The SeparateChromosomes2 analysis with LOD limit of 13 yielded 150 linkage groups with 7,331 markers (minimum of three markers per linkage group). From the linkage groups, 36 had at least six markers or more. For the remaining stages we retained 36 linkage groups (6,589 loci) ensuring each group had multiple markers that mapped to two or more scaffolds.
After joining additional markers to these 36 groups, we retained a total of 11,414 markers. Multiple iterations of the OrderMarkers2 F I G U R E 3 Genome assembly quality assessments. The assembly contiguity metrics on the left plots show scaffold and contig N50 estimated and the assembly completeness metrics on the right plots show the BUSCO gene search using the aves database. The plots on the top row show the progression of increasing assembly quality between the five stages of genome assembly. module and manual curation eventually resulted in 26 linkage groups that we can confidently assign to chromosomes. Some of the smaller linkage groups were associated with the same microchromosome and were merged when appropriate. Because average recombination rates were markedly higher in microchromosomes, they would probably have required a lower LOD score limit relative to the macrochromosomes. The first stage of the forced map resulted in 32,708 SNPs associated with an existing linkage group. to reconstruct all of the microchromosomes, we were unable to fully characterize the chromosomal rearrangements with confidence. A summary of the number of SNPs retained for each step can be found in Figure S3. TA B L E 1 Summary chromosome metrics: assembly lengths, linkage map lengths, and annotation of each chromosome in the final superb fairy-wren genome assembly; Taeniopygia guttata chromosomes 19 and 26 are within chromosomes 2 and 5, respectively, in these metrics

| Gene and repeat annotation
In total, the gene annotation pipeline predicted 15,458 genes across the genome. Of these, 13,483 (87.2%) were annotated in assembled pseudochromosome superscaffolds, 774 (5.0%) were annotated in unplaced scaffolds associated with chromosomes, and 1,202 (7.8%) were annotated in scaffolds of unknown location. Of the unplaced scaffolds, we annotated three genes associated with the W chromosome (Table 1). With the repeat annotation and masking, repeatmasker masked 7.90% of the total genome length. Of this fraction of the genome, 41.1% were LINEs (long interspersed elements), 28.5% were LTR (long terminal repeat) elements, 1.4% were SINEs (short interspersed elements), 1.0% were DNA elements and 8.2% were unclassified. Of the remaining, 15.4% were simple repeats, 3.8% were low complexity, 0.6% were small RNAs and 0.5% were satellite DNA (Table S3). as complete and single-copy for the other genomes and 92.2% for the superb fairy-wren assembly (Figure 3).

| Chromosomal rearrangements
Synteny of the macrochromosomes is largely conserved between the fairy-wren and other bird genome assemblies (Figure 4) (Nishida et al., 2008;O'Connor et al., 2018). Interestingly the other raptor genome, Aquila chrysaetos, also exhibits high levels of rearrangements despite being distantly related and converging on raptorial lifestyles. S. habroptila is the closest relative of F. peregrinus here and also shows many interchromosomal rearrangements, although not as substantial (Figure 1). Comparisons with nonpasserine species show quite a few interchromosomal rearrangements, even among the macrochromosomes, relative to the oscine passerine species, which show fewer. The circos plots also show probable fusions or translocations of microchromosomes to macrochromosomes in the M. cyaneus genome relative to all other genomes. This is indicated by fewer chromosomes in the M. cyaneus karyotype but may also be due to misassemblies. Currently, there are not enough data to distinguish these alternatives.
Inference of intrachromosomal rearrangement yielded a sizable number of rearrangements for each pairwise comparison. Note that the software assumes that the discrepancy between the order of syntenic blocks was generated by inversions rather than intrachromosomal translocations. Chromosomes were concatenated if the synteny suggested fusions and fissions ( Among chromosomes, inversion number and chromosome size were positively correlated ( Figure S7). In the 50-kb syntenic block sizes, chromosome Z showed a much higher number of inversions, as expected for a chromosome its size but that is lost in the more conservative 500-kb syntenic block size inference.

| Genetic map and recombination landscape comparisons
On average, the male-specific, autosomal genetic map of the fairywren was longer than that of the female-specific map ( Figure S8).
This resulted in the total male-specific map being 1.08 times longer than that of the females. Other passerine systems also show deviation from sex-equal recombination rates but differ regarding which sex has the longer map. In the F. albicollis system, the male genetic map is 1.13× longer than that of the females, and in two P. major populations the female maps are longer by a factor of 1.04 and 1.05 (Kawakami et al., 2014;van Oers et al., 2014). The difference between male and female map lengths also varied with respect to chromosome (Table 1). When map length was converted to recombination rate within the chromosomes, we did not always consistently find that male-specific local recombination rate was higher than that of the female-specific rate.
We compared the sex-averaged genetic distance (

| D ISCUSS I ON
The superb fairy-wren genome is particularly important in that it fills a gap in the passerine tree as it is the first member of the

Meliphagides infraorder to have a reference genome. Not only does
it provide a useful resource for species within that clade (Peñalba, Joseph, & Moritz, 2019) and the nonpasserine part of the tree but also a useful comparison point in the broader passerine and avian tree for comparative evolution. As part of the early Australasian radiation within oscine passerines, the Malurus cyaneus reference genome anchors the clade which comprises almost half of all avian diversity. This filled gap would allow us to search for shared genomic features that evolved uniquely within the oscine clade. On the other side of the coin, we will also be able to compare how rapidly large-scale genomic-scale variation has accumulated in oscine passerines, a relatively young radiation, in comparison to the remaining nonpasserines.
One initial example of potential for comparative genomics is the evolution of chromosomal rearrangements. Avian genomes have generally been shown to have high synteny (Ellegren, 2010).
Previous karyotypic and genome-scale studies, however, have shown that species from certain clades have genomes that have undergone more chromosomal rearrangements than others. Further across birds certain types of rearrangements are more common than others (Ellegren, 2010;O'Connor et al., 2018;Skinner & Griffin, 2012). Although gene synteny along a chromosome might remain high, chromosome number between bird species has a wide range, suggesting that chromosome fusions and fissions might be fairly common (Kapusta & Suh, 2017 whether certain chromosomes or particular motifs found in chromosomes might be more prone to rearrangements than others. Furthermore, the dynamic fission and fusion of the gene-rich microchromosomes may have implications regarding species or clade-specific adaptation (Guerrero & Kirkpatrick, 2014;Hansmann et al., 2009;Wellband et al., 2019). Physically creating or breaking linkages between sets of genes will change the local recombination rates. In turn, this would affect the efficiency of selection acting on certain combinations of alleles.
Our preliminary inference of intrachromosomal rearrangements yielded many potential inversions among the macrochromosomes. Gene density (genes/Mb) in the sex chromosome are expected to fix at a higher rate due to their role in sex chromosome evolution, local adaptation and reproductive isolation, it is also possible that this is more relevant over shallower evolutionary timescales and may slow down through time (Charlesworth, Coyne, & Barton, 1987;Connallon et al., 2018;Kirkpatrick, 2010). Although we use the terms inversion and intrachromosomal rearrangement interchangeably here, we cannot rule out intrachromosomal translocations. Furthermore, variation in assembly quality, particularly in properly orienting scaffolds within chromosomes, will affect the inference. The reported numbers are probably an overestimate but provide a good starting point for comparison. As more highly contiguous bird genome assemblies accumulate, it will be possible to explore both inter-and intrachromosomal rearrangements in greater detail using more robust inference methods and in a phylogenetic context.
A fine-scale genetic map and recombination landscape is the first step in understanding the causes and consequences of variation in local recombination rates. In particular, a pedigree-based recombination landscape is not as biased by effective population size or selection as a population-based recombination landscape might be.
From the existing chromosome-scale genomes, this is only the fifth that is accompanied by a genetic map. In comparison with the four available genetic maps, we find that the M. cyaneus map bears more similarity to the Taeniopygia guttata map and Parus major maps. By contrast, the map for Ficedula albicollis, which is also a passerine, is more similar to the distantly related Gallus gallus ( Figure 5). Within the M. cyaneus map, we also find variation between male and female recombination rate. Genetic maps from a larger variety of populations and species would be required to start characterizing the variation in this trait and testing hypotheses as to what may be driving this variation between sexes and between species.
Delving into the fine-scale variation within the genome, we found a positive correlation between local recombination rate and GC content, and gene density as well as a stronger and negative relationship with distance from chromosome ends. More broadly we also see a relationship between average recombination rate and chromosome size. These correlations are consistent with previous studies of recombination (Kawakami et al., 2014;Paape et al., 2012).
The correlation with GC content is probably due to G-biased gene conversion. Errors during repair of double strand breaks during meiosis tend to be biased towards guanine, resulting in a higher GC content in regions which have high rates of recombination (Bolívar et al., 2016;Clément & Arndt, 2013;Fullerton, Bernardo Carvalho, & Clark, 2001). Our sliding window analysis shows only a slight positive correlation between gene density and local recombination rate.
Selection pressure on local recombination rate would depend probably not only on the density of genes but also the type of genes in a given location and how they are regulated (cis-vs. trans-). In the between-chromosome comparison, we found that smaller chromosomes tend to have much higher recombination rate and also tend to have much higher gene density (Axelsson, Webster, Smith, Burt, & Ellegren, 2005). Furthermore, recombination tends to be higher closer to the ends of chromosomes (Haenel, Laurentino, Roesti, & Berner, 2018 This initial draft of the M. cyaneus genome is a well-developed foundation that can be further improved through rapidly developing sequencing technologies and scaffolding methods. As with many of the existing bird assemblies with assembled chromosomes, not all of the microchromosomes are fully resolved and may still benefit from physical mapping methods such as HiC or BioNano optical mapping (Dudchenko et al., 2017;Jiao et al., 2017;Lehmann et al., 2019). A larger sampling of the pedigree would also provide more informative meioses. That would help in orienting scaffolds already placed in chromosomes. It would also better scaffold the smaller microchromosomes, which tend to have higher recombination rate and lower linkage (Fierst, 2015). The capability of long-read sequencing technology has also advanced such that longer fragments can be sequenced. This would help close even more gaps and potentially extend through the telomeres (Kingan et al., 2019;Michael et al., 2018). Highly contiguous, chromosome-scale reference genomes are becoming easily accessible, further expanding our ability to test various hypotheses. More work can still be done, however, to further refine these assemblies and get closer to the true structure of the genome.
Here we provide the first high-quality reference assembly of an intensively studied Australian bird species, the superb fairy-wren, and the exceptionally diverse family and infraorder to which it belongs. The species has been a focal organism in studying cooperative breeding and sexual selection. This new resource will complement the wealth of behavioural and ecological data, particularly the longterm, multidecade study of the population in the Australian National Botanical Gardens, Canberra. This reference assembly accompanied by a genetic map and annotation will open up the system to the field of behavioural genomics and strengthen the system as a key player in evolutionary and ecological studies. This reference genome provides a good foundation in future genomic studies in the superb fairy-wren and more broadly across other bird species.

ACK N OWLED G EM ENTS
We would like to acknowledge the late Sylvain Forêt who guided the design for the sequencing strategy and assembly pipeline of the fairy-wren genome. We thank Kaiman Peng and Angela