Genome-wide characterization and expression analysis of genetic variants in sweet orange



Heterozyosity is an important feature of many plant genomes, and is related to heterosis. Sweet orange, a highly heterozygous species, is thought to have originated from an inter-species hybrid between pummelo and mandarin. To investigate the heterozygosity of the sweet orange genome and examine how this heterozygosity affects gene expression, we characterized the genome of Valencia orange for single nucleotide variations (SNVs), small insertions and deletions (InDels) and structural variations (SVs), and determined their functional effects on protein-coding genes and non-coding sequences. Almost half of the genes containing large-effect SNVs and InDels were expressed in a tissue-specific manner. We identified 3542 large SVs (>50 bp), including deletions, insertions and inversions. Most of the 296 genes located in large-deletion regions showed low expression levels. RNA-Seq reads and DNA sequencing reads revealed that the alleles of 1062 genes were differentially expressed. In addition, we detected approximately 42 Mb of contigs that were not found in the reference genome of a haploid sweet orange by de novo assembly of unmapped reads, and annotated 134 protein-coding genes within these contigs. We discuss how this heterozygosity affects the quality of genome assembly. This study advances our understanding of the genome architecture of sweet orange, and provides a global view of gene expression at heterozygous loci.