A high‐quality walnut genome assembly reveals extensive gene expression divergences after whole‐genome duplication

The common walnut Juglans regia and its related species in the genus Juglans are important economic trees, which have been widely grown for nut and wood productions in many counties. In the Juglans genus, there are ~21 diploid species all with 2n=32 chromosomes, which can be divided into four major sections based on phylogenetic analyses and fruit morphology - Cardiocaryon (e.g., J. cathayensis and J. mandshurica), Juglans (e.g., J. sigillata and the most popular one J. regia), Trachycaryon (e.g. J. cinerea) and Rhysocaryon (e.g., J. hindsii, J. nigra, and J. microcarpa) (Aradhya et al., 2017). For walnuts, the draft genome sequences of several Juglans species had been generated from second-generation sequencing platform (J. regia Chandler N50: 465Kb, Martınez-Garcıa et al., 2016; J. regia Chandler N50: 640Kb, Stevens et al., 2018).

The common walnut Juglans regia and its related species in the genus Juglans are important economic trees, which have been widely grown for nut and wood productions in many counties. In the Juglans genus, there are~21 diploid species all with 2n = 32 chromosomes, which can be divided into four major sections based on phylogenetic analyses and fruit morphology -Cardiocaryon (e.g. J. cathayensis and J. mandshurica), Juglans (e.g. J. sigillata and the most popular one J. regia), Trachycaryon (e.g. J. cinerea) and Rhysocaryon (e.g. J. hindsii, J. nigra and J. microcarpa) (Aradhya et al., 2007). For walnuts, the draft genome sequences of several Juglans species had been generated from second-generation sequencing platform (J. regia Chandler N50: 465Kb, Martınez-Garcıa et al., 2016;J. regia Chandler N50: 640Kb, Stevens et al., 2018). Very recently, the J. microcarpa 9 J. regia hybrid was sequenced using single molecule sequencing technology, and the new methods for haplotype phasing facilitated the genome assembling of the two species (J. regia Serr N50: 2.90 Mb, Zhu et al., 2019).
In this study, we aimed to generate a high-quality referencelevel de novo assembly and gene annotations of J. regia for genetic and genomic studies in walnut. To search for J. regia with low heterozygous rate, we selected 7 diverse J. regia accessions for Illumina sequencing to estimate the whole-genome-level heterozygosity. The accession Zhongmucha-1 (an ancient tree from Tibet, China; N29°04.889' E92°44.432'; 3277 ASL) with the lowest (0.385%) heterozygosity was used to generate highquality genome sequences. The resulting genome assemblies of Zhongmucha-1 contain a total of 353 contigs, with the contig N50 size of 3.34 Mb. Combined with Hi-C data generated in this work and the linkage and physical maps reported previously (Luo et al., 2015;Zhu et al., 2019), the contigs were anchored and ordered, generating chromosome-level sequences of 540 Mb. We found~95% of the RNA-seq reads and 97.25% of the Illumina sequencing reads could be aligned onto the final assembly. The completeness of the genome assembly was then evaluated using BUSCO data sets (Simão et al., 2016), and~94% of the core eukaryotic genes were able to be retrieved. To annotate the walnut genome for protein-coding genes, we used a hybrid gene prediction protocol with combinations of ab initio gene predictions and homologs sequence searching, which was further integrated with RNA-seq data from 9 representative tissues in J. regia (Figure 1a).
Based on the collinear relationship with our J. regia genome, the scaffolds from other five Juglans species (J. cathayensis, J. mandshurica, J. sigillata, J. hindsii and J. nigra) were anchored onto chromosomes followed with ordering and orientation, generating 16 pseudochromosomes for each Juglans species. Taken together, coupled with the high-quality assembly of J. microcarpa (Zhu et al., 2019), the chromosome-level sequences are now available for totally seven species from the three major sections in Juglans. Through aligning the assemblies of the six related species with that of J. regia, we identified highly confident variants within coding regions, including 2 270 791 single nucleotide polymorphisms (SNPs), 235 764 small indels of ≤ 20 bp and 52 363 structural variants (SVs). The genome assemblies of all Juglans and the variation data have been deposited at http://xhhuanglab.cn/data/juglans.html, and the assembly of Juglans regia can also be downloaded from BIGD under Bioproject number PRJCA002070.
Using single-copy genes, we constructed a phylogenetic tree for the seven Juglans species, their close relatives with reported genome sequences and the core eudicots genomes, along with Oryza sativa from monocots as outgroup ( Figure 1b). Selfalignment of the walnut genome sequences based on the 39 432 gene models identified 11 446 paralogous gene groups with 555 synteny blocks, which indicated that the gene pairs were actually due to one-to-one chromosome pairs on the walnut genome (dividing into two sub-genomes -subA and subB in this work; Figure 1c). A total of 11 938 (between red bayberry and walnut subA) and 7978 (between red bayberry and walnut subB) one-toone orthologous gene pair blocks were found and used to visualize the detailed orthologous chromosome-to-chromosome relationships (Figure 1d). There were 9457 cases with both copies retained in J. regia, 7530 cases with singletons retained in subA and 3204 retained in subB, meaning more than half of the duplicated gene pairs had lost one copy in J. regia after the WGD  ( Figure 1e). According to the estimation of the WGD time (~24.48 million years ago), coding genes were lost at a rate range from 0.56% to 1.62% per million years for each chromosome. We further focused on 6981 one-to-one collinear gene pairs between sub-genomes to compare the expression patterns in each pair. The correlation coefficient of the expression levels of the two copies in nine tissues was calculated for each gene pair (Figure 1g), and~22% pairs showed significant correlations (850 pairs with P < 0.05). The other~78% pairs showed weak or no expression pattern correlations between copies in subA and subB, suggesting the duplicates of each other begin to diverge in expression levels and patterns. As a typical example for coexpressed gene pairs (Figure 1h), the two copies JreCHS subA and JreCHS subB (a key enzyme involved in the biosynthesis of flavonoids) showed closely correlated gene expressions, which may indicate that both copies were responsible for the biosynthesis of the flavonoids with less functional divergence. Another contrasting example is JreSRG1 (a gene involved in plant senescence; Figure 1h), of which the subA copy were normally expressed in immature fruit and somatic embryo while the subB copy had no transcripts in these tissues, with numerous changes in promoter regions between the two copies (sequence identity = 45%). The abundant gene copies exhibiting diverse expression patterns were probably due to the sequence divergence in the promoter and other regulation regions (e.g. UTRs and enhancers) between sub-genomes. We then performed hierarchical clustering for these gene pairs based on their expression patterns in each tissue. For the tissue 'immature fruit' (Figure 1i), gene pairs could be classified into four distinctive clusters. Frequency distributions of the Ka/Ks ratio in the four clusters showed the clusterD (high expressions in both copies) had the lowest Ka/Ks ratio peak value (ClusterA: 0.2582; ClusterB: 0.2257; ClusterC: 0.2243; ClusterD: 0.1736), which suggested genes in this cluster tended to be conserved (Figure 1j).
Along with J. microcarpa (Zhu et al., 2019), now we totally have 7 well-assembled Juglans genomes. Through the estimation of the peak Ks value of whole-genome orthologous genes between J. regia and each of the related species, J. regia speciated with J. sigillata roughly 0.84 million years ago and with other related species~2.64 million years ago, much later than the WGD event (24.48 million years). Moreover, based on the Ks values, we can estimate the divergence time between M. rubra and J. regia was about 31 million years ago close to previous estimation (Jia et al., 2019), and the ancestry of the J. regia and J. sigillata genome and the origin of J. regia and J. sigillata was dated to~0.82 Mya. Recently, Zhang et al, 2019 discovered that J. regia (and its landrace J. sigillata) arose as a hybrid between the American and the Asian lineages around 3.45 million year ago. Based on these clues, now we proposed a summarized model for the phylogeny of Juglans, as displayed in Figure 1k.