Genome assembly and transcriptomic analysis to elucidate the ability of Nasonovia ribisnigri to break host plant resistance

Aphid genomic resources enable the study of complex life history traits and provide information on vector biology, host adaption and speciation. The currant–lettuce aphid (Nasonovia ribisnigri (Hemiptera: Aphididae) (Mosley)) is a cosmopolitan pest of outdoor lettuce (Lactuca sativa (Asterales: Asteraceae) (Linnaeus)). Until recently, the use of resistant cultivars was an effective method for managing N. ribisnigri. A resistant cultivar containing a single gene (Nr‐locus), introduced in the 1980s, conferred complete resistance to feeding. Overreliance of this Nr‐locus in lettuce resulted in N. ribisnigri's ability to break resistance mechanism, with first reports during 2003. Our work attempts to understand which candidate gene(s) are associated with this resistance‐breaking mechanism. We present two de novo draft assembles for N. ribisnigri genomes, corresponding to both avirulent (Nr‐locus susceptible) and virulent (Nr‐locus resistant) biotypes. Changes in gene expression of the two N. ribisnigri biotypes were investigated using transcriptomic analyses of RNA‐sequencing (RNA‐seq) data to understand the potential mechanisms of resistance to the Nr‐locus in lettuce. The draft genome assemblies were 94.2% and 91.4% complete for the avirulent and virulent biotypes, respectively. Out of the 18,872 differentially expressed genes, a single gene/locus was identified in N. ribisnigri that was shared between two resistant‐breaking biotypes. This locus was further explored and validated in Real‐Time Quantitative Reverse Transcription PCR (qRT‐PCR) experiments and has predicted localisations in both the cytoplasm and nucleus. This is the first study to provide evidence that a single gene/locus is likely responsible for the ability of N. ribisnigri to overcome the Nr‐locus resistance in the lettuce host.


INTRODUCTION
Aphids are an economically important insect group with over 450 species impacting major cropping systems worldwide (Blackman & Eastop, 2017).Over 100 species of the world's aphid fauna are of significant economic importance, transmitting 50% of all insect-borne Graham Teakle, Rosemary Collier, James R. Bell, Sergio Cerezo-Medina, Ramiro Morales-Hojas contributed equally to this work.
[Correction added on 23 February 2024, after first online publication: The funding information has been updated with Grant numbers.]plant viruses, and they are drawn largely from one family, the Aphididae (Blackman & Eastop 2017;Katis et al., 2007;Nault, 1997).
Genome sequencing has provided several resources and tools that have facilitated the study of the genetic basis of the aphid-plantvirus interactions (Biello et al., 2020;Field et al., 2017;Jiang et al., 2019;Mathers et al., 2020;Tagu et al., 2008).
Over the last decade and with advances in sequencing technology, computational power and the reduction of associated costs, reference genomes have been easily generated.However, in 2010, the only aphid genomic model available to study aphid genetics and aphid-plant interactions was the pea aphid (Acyrthosiphon pisum) (Harris) (International Aphid Genomics Consortium, 2010).This was a milestone for aphid genomics, and the genome is still used 11 years later in a variety of genomic comparative studies (Biello et al., 2020;Li et al., 2019;Mathers et al., 2020;Shahid et al., 2021).Since the A. pisum genome, there are now 28 published genomes for 12 different aphid species, ranging from draft assemblies to chromosome-level assembly and annotation located on the BioInformatics Platform for Agroecosystem Arthropods (BIPAA) (BIPAA, 2021).Along with the increasing number of genomes being assembled and annotated, researchers now have excellent genomic resources to address a multitude of fundamental questions, such as those regarding aphid evolution and adaption, the impact of virus acquisition on individuals and the modes of resistance to insecticides (Li et al., 2020;Singh et al., 2021;Troczka et al., 2021).
The currant-lettuce aphid, Nasonovia ribisnigri (Mosley), is a major pest of outdoor lettuce crops (Lactuca sativa L. (Linnaeus)) and is responsible for large economic losses (Diaz et al., 2012;Liu & Mccreight, 2006;Niemann et al., 2022).The nymphs and adults preferentially feed on young developing leaves of lettuce, so they predominantly reside in the centre of the lettuce head, making the application of foliar insecticides with contact action ineffective (Aarts et al., 1999).This is particularly a problem in well-developed lettuce plants, in which the older leaves surround feeding aphids, leading to contamination of the harvested crop and thus economic loss.Biological control by both predatory insects and parasitoids is also hindered by their concealment in the centre of the lettuce, and even when successful, contamination by beneficial insects remains an issue (Tatchell et al., 2017).
To tackle these complex issues concerning insecticide use and the changing legislation pressures (e.g., the banning of systemic neonicotinoids on both flowering and non-flowering crop in the United Kingdom), a lettuce cultivar resistant to N. ribisnigri (Nr) was introduced into the growers' recommended list during the early 1980s.The single-gene resistance (Nr-locus) used originated from a wild species of lettuce (L. virosa (Galsso, Banfi, Bartolucci and Ardenghi, 1753)) that showed natural resistance to N. ribisnigri, and it was introgressed into lettuce via an interfertile bridging species (L.serriola).(Eenink et al., 1982).This single dominant gene conferred nearcomplete resistance to N. ribisnigri and was incorporated into many lettuce cultivars to reduce the loss of lettuce crops (Tatchell et al., 2017).The resistance in N. ribisnigri associated with the Nr-locus is unlikely to be mechanical or due to a change in host plant quality (antibiosis), as it is expected that other aphid species would also be affected, so the resistance mechanism(s) involved are most likely species-specific (Tjallingii & Esch, 1993).Research has shown that the Nr-locus has little effect on other species of aphids that feed on lettuce, with complete susceptibility to Macrosiphum euphorbiae and partial resistance to M. persicae (Reinink & Dieleman, 1989).The variation in partial resistance observed in three lines of M. persicae is most likely a result of the interaction of the Nr-locus with additional genes, as opposed to the Nr-locus itself (Reinink & Dieleman, 1989).
As the Nr resistance in lettuce cultivars is conferred by a single gene/locus, and since there are few other effective control strategies, the overuse and reliance on this mechanism resulted in the first reports of resistance breakdown in early 2000 (van der Arend, 2003).
This led to the identification of a resistance-breaking (Rb) biotypes of N. ribisnigri, which could overcome the Nr-locus.In both France and Germany during 2007, these new Rb biotypes were identified on resistant cultivars and subsequently an Rb biotype was identified in Kent, United Kingdom during 2009 (Hough, 2013).In the absence of lettuce cultivars that exploit new resistance mechanisms, with the lack of effective pesticides, and with the increasing demand for fresh salads, N. ribisnigri infestations are predicted to become more frequent and have greater impact on the availability of leafy salads to the consumer (van der Arend, 2003).A selection of these avirulent and virulent lineages from the United Kingdom has been collected and cultured to study the potential mechanisms of resistance within the aphid, as they are still unknown.
Gene-for-gene resistance has been highlighted in several insects such as the brown plant hopper (Nilaparvata lugens), Asian rice gall midge (Orseolia oryzae) (Wood-Mason) and the barley midge (Mayetiola destructor) (Rawat et al., 2012;Wei et al., 2009;Zhao et al., 2015).The gene-for-gene hypothesis states that 'for each gene conditioning resistance in the host, there is a corresponding gene conditioning pathogenicity in the parasite' (Flor, 1971;Kerr, 1987).Since then, evidence for gene-for-gene resistance has been published several times, with the first example being between the barley midge (M.destructor) and wheat (Triticum spp.) (Hatchett & Gallun, 1970).More recent examples have identified several R genes (Gm1-Gm11) in rice (Oryza sativa) against the Asian rice gall midge (O.oryzae) which have been mapped in the rice genome (Himabindu et al., 2010) and a gene (BPH15) in the brown planthopper (N.lugens) that mediates a unique defence mechanism in rice (Lv et al., 2014).With the presence of a single resistance gene in the lettuce host, and the strong selection pressure to overcome it in the currant-lettuce aphid, it is hypothesised that a single gene or locus is responsible for the breakdown of resistance.
The genome for N. ribisnigri has not been sequenced to date, and only the mitochondrial cytochrome c oxidase subunit I barcode and a selection of odorant binding proteins have been sequenced.This project aimed to generate two de novo assembled reference genomes for N. ribisnigri, corresponding to both a biotype avirulent to host plant resistance and a virulent biotype.Second, gene expression of the two N. ribisnigri biotypes were investigated using RNA-sequencing (RNA-seq) to study the potential mechanisms of resistance to the Nr-locus in lettuce.By providing two draft genome assemblies of N. ribisnigri and a transcriptomics analysis of an avirulent and virulent biotype, this article provides additional insight and knowledge to understand the genetic basis of host plant resistance breaking by N. ribisnigri, bringing science closer to elucidating the resistance mechanisms involved.

Genome de novo assembly and annotation
Altogether, 45.2 Gb Illumina short read data were generated for the avirulent biotype N. ribisnigri genome, and 20.6 Gb was produced for the virulent biotype N. ribisnigri genome.From the two SpotON flow cells, a total of 2,966,975 (average sequence length; 2155) and 5,167,875 (average sequence length; 2401) long-read nanopore sequences were generated for the Kent CL (virulent) and Nr8 (avirulent) genomes, respectively.This produced an additional 30.3Gb of long-read Nanopore data for the hybrid genome assembly.The hybrid de novo genome assembly of the avirulent (Nr8_123) N. ribisnigri genome was assembled into 4778 scaffolds with an N50 of 295 Kb, whereas, the resistant Nr:1 (Kent CL 1) was assembled into 6264 scaffolds with an N50 of 163 Kb (Table 1).Multiple N. ribisnigri lineages from the sequence read data were assembled but only two most complete genomes of the avirulent (Nr8_123) and virulent (Kent CL 1) were annotated and used for further downstream analysis.The guanine and cytosine (GC) contents of the N. ribisnigri genomes ranged between 29.75% and 30.1%.After thorough contaminant removal using MEGAN, no further contaminant sequences could be identified using this contaminant removal pipeline and suggesting that most contaminant sequences were removed.
The genome size of N. ribisnigri was estimated to be between 384 and 423 Mbp, for the virulent and avirulent strains, respectively.
The genome size of N. ribisnigri is comparable to other previously sequenced aphid genomes (Chen et al., 2019;Nicholson et al., 2015;Wenger et al., 2020).The number of scaffolds in the genome assemblies of the different strains sequenced in this study ranged between 4778 and 12,004 and the maximum scaffold size between 721 and 3391 Kbp (Table 1).The k-mer spectra analyses (k-mer length of 21) on both the Rb and wild-type (WT) genome indicates that the WT genome has a k-mer coverage of 24.9 with a 0.41% error, whereas the Rb genome has a k-mer coverage of 14.4 and an error of 2.37% (Figure S1).
The completeness of the genome assemblies as determined using Benchmarking Universal Single-Copy Orthologs (BUSCO) (Insec-ta_odb9), indicated that the N. ribisnigri genomes of the avirulent and virulent (resistance-breaking) aphids were 94.2% (complete: 90.5%, duplicated: 3.6%, missing: 4.5%) and 91.4% (complete: 88.1%, duplicated: 4.7%, missing: 5%) complete, respectively (Figure 1).A de novo hybrid of multiple N. ribisnigri biotypes (PAN) genome assembly was attempted using the combination of reads from all biotypes to determine whether this would improve overall assembly statistics but was unsuccessful.After removal of contaminants and further quality control, completeness of the PAN genome was 93% (BUSCO, Insecta odb9).Augustus predicted a total of 38,389 gene models, including partial gene models, genes incorrectly shown or duplicated and unsupported ab initio models.After running an Interpro Scan (IPS) search of all 38,389 predicted genes for the WT genome, $8000 genes had no IPS matches, $23,000 genes had no gene ontology (GO) terms but had IPS matches and $ 7000 had both GO terms and IPS matches.
Out of these, a further 3044 genes had either a start or stop codon missing and were subsequently removed.For the Rb genome, IPS identified 36,018 predicted genes, $7500 genes had no IPS match, T A B L E 1 Summary statistics of the currant-lettuce aphid, Nasonovia ribisnigri genomes.1), ( 2), ( 12) and ( 123) denote different sequencing read data from the same N. ribisnigri line that was used in the genome assembly and annotation process.Abbreviation: BUSCO, Benchmarking Universal Single-Copy Orthologs.
$22,000 genes had no GO terms but had IPS matches and $6800 had both GO terms and IPS matches.There were several speciesspecific genes identified from the BLAST results, with over 12,000 to be associated with A. pisum and M. persicae (Figure S2A,B).

Transcriptome de novo assembly and annotation
A total of 90 Gb of strand-specific paired-end RNA-seq data from isogenic lines of N. ribisnigri were generated and provided by Warwick Crop Centre, and this was used in the de novo assembly of the N. ribisnigri transcriptome.This generated a near-complete transcriptome (94.8%,Arthropoda gene set [n = 1562]) with 76,782 transcripts totalling 83,926,066 bases.The length of the transcripts varied from 201 to 18,524 nucleotides with an average of 1093 (Figure S3A).The transcriptome assembly N50 was 2341, in which 50% of all bases in the assembly are covered by sequences equal or larger than 2341 and is comparatively high for a non-model organism (Francis et al., 2013).The transcriptome completeness (BUSCO) for the avirulent N. ribisnigri biotype was 94.8% and is similar to other available aphid transcriptomes, such as the pea aphid (A.pisum), which was 96.2% (Figure S3B).

RNA-seq analysis
In total, there were 183 million RNA-seq reads for N. ribisnigri for all conditions (Figure S4).Over 115 million RNA-seq reads were obtained for the N. ribisnigri, which were avirulent to the Nr-locus in the lettuce host plant, with the remaining 68 million RNA-seq reads obtained for N. ribisnigri, which were resistant to the Nr-locus in the lettuce host plant.The power analysis highlighted that the number of replicates used in the analysis has a marginal power of 0.68-0.76(Supporting Information S16).Doubling the RNA-seq depth would have only increased this to 0.7-0.78 and 10 replicates would have provided a power of 0.88-0.95.In the case of genes with very low read counts (i.e., 0-10), replicates below five had low power (0-0.2) but, when read counts were above 10, the power increased to >0.4 for four or more replicates, and >0.7 with read counts >320.
The DESeq2 data identified 689 differentially expressed (DE) genes between Rb and WT strains with an alpha of p < 0.05 out of the 18,872 genes (Figure S5).Of these 689 DE genes, 351 up-and 338 down-regulated genes were identified between the avirulent and virulent groups of N. ribisnigri.Further stringent refinement (alpha = p < 0.00001), identified 32 DE genes out of the 18,872 genes.Out of the DE genes, maker-Scaffold_2698-augustus-gene-0.8 ( 2698) showed the highest level of log2 fold change (LFC) of 3.52, being up-regulated in the resistance-breaking biotype (Figure S6).

Differential gene expression of avirulent and virulent N. ribisnigri biotypes
A principal component analysis (PCA) highlighted some differences between the N. ribisnigri biotypes (Figure 2).The PCA indicated that To produce a robust and tractable dataset for analysis, the initial 689 DE genes (p < 0.05) were further refined by increasing the p-value cut-off to p < 0.00001.This revealed the 32 DE genes of highest significant difference between the virulent and avirulent biotypes.The RNAseq analysis identified clusters of up-and down-regulated genes between the two conditions, highlighted using a heatmap (Figure 3).The RNA-seq analysis revealed that 13 DE genes were upregulated in the resistance-breaking biotypes, with the remaining 19 being down-regulated (Table 2).The 13 DE genes in the resistance-breaking biotypes clustered into four main groups.The first cluster consists of uncharacterised proteins and a carbonic anhydrase 2; the second, uncharacterised proteins and a cuticle protein 7; third, transcription factor and a zinc finger MYM-type; and fourth, uncharacterised protein, peptide transporter and a cytochrome P450 (Table 2).The 19 down-regulated DE genes are also loosely clustered into four main groups.
A possible candidate gene responsible for the Nr-locus in resistance-breaking N. ribisnigri between both Rb biotypes (Figure 4a).Due to both the log fold change and being the only gene shared between both virulent N. ribnisnigri biotypes, this gene was further investigated as a potential candidate responsible for the Nr-locus.This gene was located on scaffold 2698; it was a sequence coding for 284 amino acids with a predicted protein size of 32.51 KDa and was identified as a hypothetical protein in Aphis glycines (A0A6G0SZ22_APHGL) using UniProt (Table 2).Using deep learning protein prediction software, gene 2698 with an R 2 of >0.98 (Figure S9).Only one primer pair was used for the gene expression analysis.
The gene 2698 was highly expressed in the resistance-breaking N. ribisnigri biotypes compared with the avirulent N. ribisnigri biotype (Figure 6).
T A B L E 2 Differentially expressed (DE) genes in virulent (resistance-breaking) and avirulent biotypes of Nasonovia ribisnigri (Nr:1) feeding on both resistant (Nr-locus present) and susceptible (Nr-locus absent) lettuce host plants.A significant reduction in gene expression was observed in gene 2698 in N. ribisnigri unable to feed on lettuce containing the Nr-locus (Figure 6).adjusted p-value (Figure S6).Further analysis confirmed that gene 2698 was the only DE gene that both resistance-breaking biotypes shared (Figure 4a) and was the most promising candidate gene conferring resistance to the Nr-locus in the lettuce host plant.The qRT-PCR confirmed that this gene is significantly up-regulated in the resistance-breaking biotypes (Figure 6).The monogenic Nr resistance in lettuce cultivars is thought to reside within the phloem of lettuce during sap ingestion from the sieve element, but the exact resistance mechanism or pathway involved is still unknown and yet to be described (ten Broeke et al., 2013).The results of the present study therefore gives further support to the hypothesis that a gene-for-gene resistance exists between the Nr-locus in the lettuce host plant and a corresponding gene (gene 2698) in the resistance-breaking N. ribisnigri biotype.Unfortunately, the function of the gene identified in this study is not known in any other species, and further work will be required to determine its biological function.
Interestingly, it was shown in the qRT-PCR results (Figure 6) that gene 2698 is up-regulated in virulent biotypes when they are feeding on lettuce not containing the Nr-locus, as well as lettuce containing the Nr-locus.This suggests that gene 2698 is constitutively expressed regardless of the presence or absence of the Nr-locus in the lettuce host plant.Due to the unknown location of the mechanism/s of resistance within N. ribisnigri, whole organisms were used for the RNA extractions and subsequent RNA-seq analysis.It has been shown that using whole organisms for RNA-seq studies can produce false negatives and difficult to interpret positive signals, when the local expression pattern is variable (Johnson et al., 2013).Every effort was made to reduce the likelihood of false negative results, both with the experimental design and data analysis, but there are some limitations by using whole organisms as opposed to tissue-specific RNA extractions.
The overreliance and overuse of a single resistance gene in lettuce (Nr-locus) has resulted in the evolution of resistance-breaking N. ribisnigri biotypes (Thabuis et al., 2014).Combining multiple resistance genes or deleting susceptibility loci greatly reduces the selection pressure and can increase the longevity of resistant or tolerant cultivars (Delmas et al., 2016).For example, if a pest evolves and gains the ability to break one control gene, the plant is still protected by the remaining resistance genes.To provide more durable resistance in crop plants, it has been suggested to 'pyramid' resistance genes with mutations in susceptibility genes (Stuart, 2015).Pyramiding resistance genes in this way generally results in an increase durability of resistance in 'pyramid crops' compared to monogenetic resistance ( Mundt, 2018).Pyramiding resistance genes could be a more successful approach in controlling N. ribisnigri outbreaks in the future but will require the identification of new resistance loci.However, pyramiding resistance genes in one cultivar can still be overcome by pest insects if no other measures are applied to reduce the selection pressure and therefore is best used as a component of integrated pest management (Barzman et al., 2015).
Thorough homology searches were conducted on gene-2698, but unfortunately, no similarities were found.It is currently categorised as a hypothetical protein in a few aphid species and would benefit from further investigation into its exact function.The resistance mechanism has previously been associated with the phloem sieve elements, as avirulent N. ribisnigri biotypes were unable to feed on lettuce containing the Nr-locus (ten Broeke et al., 2013).The use of excised lettuce leaves in experiments resulted in a loss of resistance which suggested that the feeding mechanism is 'mobile' (Liu & Mccreight, 2006), but this result could be due to changes in metabolism resulting from the excision of the leaves (Gao et al., 2008).The 'deterrent' effect of the resistance has been shown not to be toxic, as normal feeding behaviour and growth is resumed when avirulent aphids are transferred to lettuce cultivars not containing the Nr-locus (Van Helden et al., 1993).Since the present study has identified a likely candidate gene involved in the ability of N. ribisnigri to overcome the Nr-locus resistance in lettuce, more targeted research can be conducted.For example, the use of gene-editing technologies, such as CRISPR-Cas9 (Cong et al., 2013;Le Trionnaire et al., 2019), could be incorporated to mediate gene knockout of the SNPs identified in gene-2698 for the N. ribisnigri resistance-breaking biotypes.By knocking out gene-2698, it would be possible to determine whether this gene is responsible for the ability of resistance-breaking N. ribisnigri to feed on lettuce containing the Nr-locus (loss-of-function strategy) (Housden et al., 2017).In addition, it could be possible to complement this approach with a gain-of-function, in which avirulent N. ribisnigri could be modified to over express gene-2698 and therefore determine if they are able to feed on lettuce containing the Nr-locus (Zimmer et al., 2018).
The two SNPs identified in gene 2698 in nucleotide positions 9 and 50 are unlikely the cause for the resistance-breaking biotype to overcome the Nr-locus in the lettuce host plant as these do not cause significant changes to the mRNA's expression.Interestingly, the SNP for the WT is in the 5 0 untraslated region (UTR) of the mRNA, located 243 bp upstream of the start codon for gene 2698 is homozygous for an A which converts a leucine to a methionine (start codon).This start codon located in the WT promotor could result in the formation of an upstream open reading frame, which could subsequently affect expression of the protein and prevent WT N. ribisnigri from feeding on lettuce containing the Nr-locus.Further research into gene 2698 is required to confirm these findings, but it is an interesting discovery.

Construction of the first reference genome for N. ribisnigri
This study presents the first genome assembly and annotation for N. ribisnigri, both for a virulent and virulent (resistance-breaking) biotype.The avirulent (Nr:0) genome had the highest completeness score (94.2% complete and single copy) and the virulent biotype (Nr:1) had the most contiguous assembly with a completeness of 92.9% complete and single copy (Table 1).These scores represent a high standard of completeness, compared with previously published aphid genomes (Figure 1) (Biello et al., 2020;Li et al., 2019;Mathers et al, 2020;Shahid et al., 2021).The removal of both the aphid symbiont (Buchnera aphidicola) and host plant (L.sativa), overall, further improved both the assembly statistics and completeness of the genomes by $1%.
From the assembled genomes, there was a discrepancy between genome size of the Nr:0 and Nr:1 biotypes (Table 1).The genome assembles for UK631 (Nr:0) and Ely (Nr:1) biotypes were similar in genome size ($366 Mb), N50, number of scaffolds and completeness (BUSCO), but this is likely to be a result of lower assembly qualities.
The higher quality genome assembles of Nr8 (Nr:0), WT Kent (Nr:0) and Kent CL (Nr:1) in fact differ in genome size, 423, 423 and 384 Mb, respectively.The differences observed in genome size is likely a result of the variation in assembly quality as Nr8 had fewer duplications and missing genes than Kent CL.The higher N50 and fewer scaffolds in the Nr8 genome would have also attributed to the higher estimated genome size.Additionally, the K-mer coverage for the Nr8 was higher with fewer errors than Kent CL and supports this conclusion.Therefore, an estimated genome size of 423 Mb for N. ribisnigri is more accurate.
The genome sizes of both the avirulent and virulent N. ribisnigri biotypes were shown to be comparable to other aphid species, although the estimation for the resistance-breaking biotype is likely low (Figure 1).The pea aphid (A.pisum) genome is estimated to be a size of $446.6 Mb (The International Aphid Genome Consortium 2010), a draft assembly of the soybean aphid (A.glycines) (Matsumura) has been estimated at 317.1 Mb (Wenger et al., 2020), the corn leaf aphid (Rhopalosiphum maidis) (Fitch) near-complete (95.8%) genome is estimated at 321 Mb (Chen et al., 2019), Russian wheat aphid (Diuraphis noxia) (Kurdjumov) is 421 Mb (Nicholson et al., 2015), the peachpotato aphid (Myzus persicae) (Sulzer) is estimated to be 347 Mb (Mathers et al., 2017) and an improved bird-cherry oat aphid (Rhopalosiphum padi) (Linneaus) genome is estimated to be of 321 Mb (Morales-Hojas et al., 2020).The number of genes identified from IPS in both the avirulent and virulent were $30,000 and $28,800, respectively.The number of genes identified by IPS in existing aphids (BIPAA, 2021) range between 19,182 (A.glycines) and 31,604 (M.persicae), and it is likely that the estimation produced by Augustus is reasonable.The transcript prediction of $77,000 is likely an over estimation and an artefact of the assembly process.
This study has provided strong foundations for future work on an aphid, which has very few effective controls.The availability of a draft reference genome of both the virulent (resistance breaking) and avirulent N. ribisnigri biotypes provides researchers with a tool to study N. ribisnigri at much greater resolution.Additionally, elucidating a potential candidate gene of resistance in the resistance-breaking biotype will enable targeted approaches to thoroughly investigate this elusive mechanism of resistance.

EXPERIMENTAL PROCEDURES
Aphid cultures, biotypes and near isogenic lines To create isogenic cultures for each biotype used in the experiments, one female from each strain was placed on a single plant and left to establish a new culture (Figure S10).All N. ribisnigri lines derive from different starting females from the original isogenic culture reared at Rothamsted Research.Once $10 nymphs had been produced, the founding mother was removed and stored in 100% ethanol.A total of $20 individuals from each isogenic line were used to obtain enough high molecular weight DNA for sequencing.

DNA extraction
N. ribisnigri high molecular weight (HMW) DNA was extracted using a QIAGEN ® Genomic-tip (20/G) extraction kit.An adapted User-Developed Protocol for mosquitoes (https://www.qiagen.com/us/resources/download.aspx?id=b45c3cc3-7f2b-4f4a-aa37-21d814ed3 730&lang=en) and other insects was utilised to help increase overall DNA yield as follows: 10-20 isogenic aphids were placed into a 1.5 mL microcentrifuge tube, submersed in liquid nitrogen for 10 s and homogenised using a sterile pestle.Subsequently, 1 mL of genomic-tip lysis buffer with DNase-free RNase A (to remove any RNA contaminants) was added to suspend the samples and incubated for 30 min at 37 C, followed by the addition of Proteinase K (0.8 mg/ mL) (removing any protein contaminants) and incubation for 2 h at 50 C.This solution was centrifuged for 20 min at 21,130 g to pellet the debris.The clarified lysate was transferred to a Genomic-tip and the solution was left to flow-through by gravity.The Genomic-tips were washed four times with 1 mL wash buffer and HMW DNA was finally eluted into 2 mL Eppendorf tubes with 1 mL elution buffer.
The addition of an equal volume of 100% isopropanol to the samples was used to precipitate DNA, which was centrifuged for 20 min at 21,130 g, washed with 70% ethanol, and centrifuged again at 10 min

Genome assembly
Prior to assembly, Fast QC (Andrews, 2019) was used to check the quality of all Illumina sequence data using the in-house Galaxy server (Afgan et al., 2018).MaSuRCA (Maryland Super Read Cabog Assembler) was used to generate a hybrid assembly for each biotype with the short Illumina and long ONT reads.The advantage of MaSuRCA over other assemblers is its ability to combine the benefits of deBruijn graph and Overlap-Layout-Consensus assembly approaches using short Illumina reads and long high-error prone data, such as MinION and PacBio, to create a consensus genome assembly (Zimin et al., 2013).After assembly, a set of summary statistics was computed using CEGMA (Parra et al., 2007) to analyse the quality of the assembly.To improve the draft assemblies, the genomes for all biotypes were polished using the relevant Illumina HiSeq RNA-seq libraries for each biotype with Pilon v.1.23(Walker et al., 2014).

Removal of contaminants
To identify and remove sequence contamination by non-target organisms, Rothamsted Research's dedicated server for DNA and protein similarity searches, DeCypher, was used.Of the contaminants in the DeCypher results, the majority were identified to be the host plant lettuce (L.sativa) and the symbiotic bacteria found in aphids, Buchnera aphidicola.Since all N. ribisnigri used for genome assembly were collected from long-term laboratory cultures, the risk of contamination from parasitoid wasps was incredibly low.By running the newly assembled aphid genome against the NCBI database of all known nucleotide sequences, DeCypher generates a text file containing all positive nucleotide matches to sequences in the database (Wheeler et al., 2007).The taxonomic analysis program, MEGAN (12.0.1) (Huson et al., 2007)

Assessment of N. ribisnigri genome completeness using BUSCO
Once all contaminants had been removed, BUSCO (Simão et al., 2015) was used to assess the completeness of the N. ribisnigri genomes using an Arthropoda gene set (insecta_odb9) (n = 1562) with default settings (Waterhouse et al., 2018).

Genome annotation
Prior to annotation, RepeatMasker (v4.1.0)(Smith et al., 2013) was conducted to identify any low-complexity DNA sequences within the genomes and soft-masked with an 'N'.RepeatMasker removes lowcomplexity DNA sequences and interspersed repeats and masks them within a given sequence.RepeatMasker was run in conjunction with NCBI BLAST and Crossmatched with the 'Sternorrhyncha', a suborder of the Hemiptera, which includes all aphids and other similar taxonomic groups on the NCBI sequence database.Genome annotation files were uploaded onto Galaxy and annotation was performed using a configured MAKER annotation pipeline (Campbell et al., 2014).Initial gene prediction was run using Augustus (Hoff & Stanke, 2013), using the pea aphid (A.pisum) and the peach-potato aphid (M.persicae) as a training set.M. persicae protein sequences were initially used to indirectly infer gene predictions but not directly from all protein alignments.After the initial round of training, the outputs from the preliminary runs were used to automatically retrain the gene prediction algorithm to produce higher quality gene models.Genome annotations were considered complete when no further improvements could be made to the gene models evaluated by BUSCO (insec-ta_odb9).The genome annotations were visually explored using the Integrative Genomics Viewer against the newly assembled N. ribisnigri genomes (resistance-breaking biotype, Kent_CL and wild type, Nr8) (Robinson et al., 2011).Functional annotations of the predicted gene models were generated in OmicsBox using InterProScan (Blum et al., 2021;Huerta-Cepas et al., 2019).

RNA-seq methods and differential gene expression analysis
RNA extraction and RNA-seq library preparation RNA-seq data for N. ribisnigri were obtained from Warwick Crop Centre (Wellesbourne Campus, University of Warwick).Aphid cultures were maintained long-term on a lettuce cultivar Pinokkio, which does not contain the Nr-locus.Prior to raising aphids for sequence analysis, their expected ability to survive or not on the Nr-locus-containing cultivar (Eluarde) was verified from culture experiments.For RNA-seq analysis, two Rb aphid strains (Kent CL and UK631) were reared on both lettuce cultivars containing the Nr-locus (cv.Eluarde) and without (cv.Pinokkio).The WT aphids were only reared on Pinokkio (Table 3).For each biotype, a single apterous female was placed onto Quality control of the RNA-seq reads was conducted on all datasets with FastQC prior to further downstream analysis (Andrews, 2010;Wingett & Andrews, 2018).The newly assembled reference genome of N. ribisnigri avirulent strain (Nr8_123) was used for mapping the reads using HISAT2 software with default options.

Reference gene discovery
The reference genes, RPS9, ribosomal protein S18 (RPS18) and ribosomal protein L13a (RPL13), were used as internal controls for the qRT-PCR.The RPS9 gene was chosen based on previous work which demonstrated that it had a stable expression among soybean aphids (Aphis glycines (Matsumura)) on avirulent and virulent host plants (Bansal et al., 2013).The ribosomal proteins RPS18 and RPL13 have been shown to be good reference genes for gene expression analyses in the mustard aphid (Lipaphis erysimi (Kaltenbach)) (Koramutla et al., 2016).Protein homology searches for these genes' transcript sequences were performed on the newly assembled N. ribisnigri transcriptome (76,782 transcripts) which was uploaded onto an in-house dedicated server (DeCypher v9.1) (TimeLogic Division, 2013).This server uses an onsite high-performance computer with parallel processing chips and enables DNA/protein similarity searches to be performed, like the NCBI BLAST function with a computer hardware acceleration (Luethy & Hoover, 2004).The identity of putative cDNAs was validated using the Blastx search function in NCBI-GenBank.
Identification and validation differentially expressed genes in N. ribisnigri One significantly differentially expressed (DE) gene of interest (gene-2698) was selected from the RNA-seq experiment for further analysis using qPCR in conjunction with the HKGs.The identified gene was located and visualised in the N. ribisnigri transcriptome using OmicsBox v. 1.4.11 (OmicsBox, 2019).Once cDNA identities had been confirmed, specific PCR and qRT-PCR primers for the potential resistance gene were designed using PrimerQuest available online: (https://eu.idtdna.com/pages/tools/primerquest) (Figure S13).Methodology for qRT-PCR analysis on N. ribisnigri to validate RNA-seq data can be found in the supporting information (S14).A t-test (Modelling of Extreme Values (MeV) package) was used to test for differences between each condition, with a Bonferroni corrected p-value of 0.008 or less was considered statistically significant.

Reference genes stability analysis
The software algorithm GeNorm was used to determine the stability of the three candidate reference genes (RPS9, RPS18 and RPL13a) (Andersen et al., 2004;Vandesompele et al., 2002), using three biological replicates (Figure S10) and three technical replicates to validate the genes.The inputs for GeNorm were the raw expression values for each gene (using the equation, 2 (ÀΔCt) ).An M-score is calculated by GeNorm, with a lower M value of <1.5 suggests a more stable gene expression or low variation, and an M-score >1.5 indicating high variation and therefore is not a suitable reference gene (Figure S15).Fold change and relative expression level were determined using the comparative Ct method (2 ÀΔΔCt ) (Schmittgen & Livak, 2008).Statistical analysis was performed using a t-test (MeV package).

1
Benchmarking Universal Single-Copy Orthologs (BUSCO) analysis score of both Nasonovia ribisnigri genomes (WT = Nr:0, avirulent; R = Nr:1, virulent (resistance-breaking) and previously published aphid genomes (obtained from the BioIformatics Platform for Agroecosystem Arthropods (BIPAA) genomic resources) using an arthropod gene set (Insecta_odb9) (n = 1562).C, complete; D, duplicate (dark blue); F, fragmented (yellow); M, missing (red); S, single copy (light blue).therewas little variation within the virulent biotypes feeding on either lettuce containing the Nr-locus (Eluarde) or lettuce without the gene (Pinokkio) but with a notable variation (particularly in PC2) between different resistance-breaking biotypes.A close grouping of three avirulent biotypes (Nr4, Nr8 and Nr29) was apparent, which are all N. ribisnigri cultures known to be unable to feed on lettuce containing the Nr-locus but have some resistance to pyrethroids or other insecticides.The remaining two avirulent biotypes (4850a and WT Kent, blue points in Figure2) are loosely clustered together and are biotypes that are unable to feed on lettuce containing the Nr-locus, nor do they have any insecticide resistance.
Both resistance-breaking (Rb) biotypes were grouped together according to their origin (Rb1 and Rb2-Kent_CL; Rb3 and Rb4-UK631) and had similar gene expression levels despite feeding on lettuce with and without the Nr-locus.Out of the top 32 DE genes, one of the avirulent biotypes (SUS4-4850a) was down-regulating two of the DE gene clusters that were up-regulated in the other avirulent biotypes.Additionally, there was little differentiation in some of the DE genes down-regulated by the other avirulent biotypes.
Comparing the 689 DE genes ( p < 0.05) between the WT and Rb in a 3-way Venn diagram, identified only one DE shared gene (gene 2698) Principal component analysis of the RNA-read count data (featurecounts) for each Nasonovia ribisnigri biotype grouped into three groups (avirulent (blue points), virulent (red points) (Kent CL) and virulent (green points) (UK631).Susceptible denotes all five N. ribisnigri cultures that are unable to feed on lettuce containing the Nr-locus (avirulent)).(E), cv.Eluarde, virulent; (P), cv.Pinokkio, avirulent.has predicted localisations in both the cytoplasm and the nucleus (FigureS7).The mRNA encoding this protein was shown to be upregulated in the resistance-breaking biotypes, with a log fold change of 11.47 and a highly significant p-value, after LFC shrinkage estimates.The number of RNA-seq counts identified for gene 2698 have a high level of associated transcripts for both resistance-breaking biotypes and a much lower number of transcripts for the avirulent biotypes, for all conditions and replicates (Figure4b).Two SNPs were identified in gene 2698 in nucleotide positions 9 and 50 at the start of the protein sequence.In the first single nucleotide polymorhism (SNP), from the WT_Kent, avirulent biotype, the allele was homozygous for a T residue (97% of 34 sequence reads) in the third-codon position of serine (position 9) residue AGT, whereas the Kent_CL (fed on resistant lettuce) virulent (resistance-breaking) biotype was homozygous for a C residue (100% of 248 sequence reads), which converted the serine to a threonine AGC.In the second SNP, from the WT_Kent avirulent biotype, the allele was homozygous for an A residue (96% of 43 sequence reads) in the first-codon position of methionine (position 50) residue ATG, whereas Kent_CL (fed on resistant lettuce) resistance-breaking biotype was homozygous for a T (100% of 290 sequence reads), which converted the methionine to a leucine TTG.These SNPs were not present in the low number of RNA-seq counts for the WT strains.Additionally, there were two SNPs identified on the WT strains (WT Kent) $300 bp upstream of the promtor sequence for gene 2698.These SNPs are located 284 and 243 bp upstream of gene 2698, respectively.In the first SNP, from the WT_Kent avirulent biotype, the allele was homozygous for a T residue (97% of 35 sequence reads) in the first-and second-codon position, which converts a valine (GTG) and cysteine (TGT) to an alanine (GCG) and arginine (CGT) in the resistance-breaking biotype, respectively.The second SNP, from the WT_Kent avirulent biotype, the allele was homozygous for an A residue (96% of 45 sequence reads) in the first-codon position which converts a methionine (ATG) to a leucine (TTG) in the resistance-breaking biotype.These had good RNA-seq coverage and were high quality with no strand bias, suggesting that they are not an artefact.qRT-PCRvalidation of N. ribisnigri resistant-breaking gene against the Nr-locus found in lettuce Three previously used housekeeping genes (HKGs) (ribosomal protein S9 [RPS9], RPS18, RPL13) were identified in the newly assembled N. ribisnigri transcriptome.The screening results of these HKGs presented two ranges of Ct values between the three HKGs (26.7-27.1 and 31.7-32.4)(Figure5).There was little variation between Ct values for both conditions (N. ribisnigri avirulent biotype feeding on host plants without the Nr-locus and N. ribisnigri feeding on host plants containing the Nr-locus) for all three HKGs.The Real-Time Quantitative Reverse Transcription PCR (qRT-PCR) analysis was highly optimised and had amplification efficiencies (E) for primers from 97.28% to 99.31%, and R 2 were >0.98 (FigureS8).Genorm highlighted that all F I G U R E 3 Heatmap of the top 32 differentially expressed (DE) genes ( p < 0.00001) between Nasonovia ribisnigri resistance-breaking (virulent) and susceptible (avirulent) biotypes, showing clear division.RES = resistance-breaking (virulent); SUS = susceptible (avirulent).SUS1 = Nr4, SUS2 = Nr8, SUS3 = Nr29, SUS4 = 4850a, SUS5 = WT Kent, RES1 = Kent_CL (Eluarde), RES2 = Kent_CL (Pinokkio), RES3 = UK631 (Eluarde), RES4 = UK631 (Pinokkio).Colours of expression levels correspond to log fold change.Gene 2698 is indicated by a red arrow in the heat map.three HKGs were stably expressed across both virulent and avirulent N. ribisnigri biotypes (M < 1.5).The primer efficiency (E) for RPL13 was <95% and was excluded from the validation of the 2698 gene.The primer efficiencies (E) for the 2698 gene were 101.25 and 102.73%

Note:
Both up-regulated and down-regulated DE genes are shown for the resistance-breaking (Nr:1) biotype.Gene names are derived from contig number (e.g., 2698) and transcript ID (0.8).Both up-regulated and down-regulated genes are ordered by P-value significance identified from DESeq2 using log2 fold change (LFC) estimates.Fold change values were considered significant if p < 0.0001.Statistical analysis (t-test-MeV package) confirmed that WT_Kent (susceptible N. ribisnigri fed on susceptible lettuce -SS) was significantly different (F = 96.54,p < 0.001) from Kent_CL (resistance-break N. ribisnigri fed on resitant lettuce -RR), UK631 (RR) and UK631 (resistance-breaking N. ribisnigri fed on susceptible lettuce -RS).Resistance-breaking N. ribisnigri biotypes fed on lettuce containing the Nr-locus (cv.Eluarde) and without the gene (cv.Pinokkio) had similar gene expression levels (non-significant).

DISCUSSION
Identification of the potential resistance-breaking gene in N. ribisnigriThe differential expression analyses results indicated that one gene was significantly up-regulated in the resistance-breaking N. ribisnigri biotypes (gene 2698).Despite other significantly up-and downregulated DE genes in the analysis, gene 2698 had the smallest log10 F I G U R E 5 Ct values (±SE) obtained for the three different housekeeping genes (HKGs) under two experimental conditions.Green circles represent avirulent Nasonovia ribisnigri feeding on susceptible lettuce (without the Nr-locus), orange circles represent virulent N. ribisnigri feeding on resistant lettuce (with the Nr-locus).Three biological and three technical replicates were used to validate the HKGs.See Supporting Information for calculations.F I G U R E 4 (a) Three-way Venn diagram of the differentially expressed genes in three contrasts (Kent_CL/susceptible; UK631/susceptible; Kent_CL/UK631) identified by DESeq2.Differentially expressed genes were considered significant and used in the diagram if p < 0.05.(b) RNAsequencing counts for gene 2698 between resistance-breaking (Rb) and susceptible (SUS) biotypes.Susceptible comprise all avirulent biotypes and both Kent_CL and UK631 are virulent.

F
I G U R E 6 Comparison of gene-2698 expression levels using qRT-PCR between four conditions of Nasonovia ribisnigri cultures feeding on resistant (cv.Eluarde) and susceptible (cv.Pinokkio) lettuce cultivars.Both Kent_CL and UK631 are resistance-breaking biotypes able to feed on resistant lettuce cultivars.WT Kent is an avirulent biotype unable to feed on the resistant lettuce cultivars containing the Nr-locus.RR, resistance-breaking N. ribisnigri fed on resistant lettuce cultivar (cv.Eluarde); RS, resistance-breaking N. ribisnigri fed on susceptible lettuce cultivar (cv.Pinokkio); SS, susceptible N. ribisnigri fed on susceptible lettuce cultivar (cv.Pinokkio).In total, three biological and three technical replicates were used to compare gene-2698 expression level (F = 96.54,p < 0.001).***p < 0.001.
All aphids used for the genome sequencing were obtained from Warwick Crop Centre (Warwickshire, UK) or field-collected and subsequently cultured at Rothamsted Research (Harpenden, UK) and reared parthenogenetically in a laboratory on whole plants of L. sativa cv.'Auvona RZ', a variety that does not contain the Nr-locus resistance.The N. ribisnigri strain Nr8 originates from a clone collected from a field in Yorkshire in 1999 and cultured at Warwick Crop Centre in a laboratory at a constant temperature of 21 C with a 16:8 h (light: dark) photoperiod.Strains Kent wt and Kent CL were collected from two different fields in Kent during 2010-2011 and were also subsequently cultured at Warwick Crop Centre.The Ely culture was collected in a field containing resistant lettuce varieties (containing the Nr-locus) at G's Fresh (Cambridgeshire, UK) in 2018.All clones were determined and validated to be virulent or avirulent by subsequent culturing (same conditions as above) on a resistant (Nr) variety (cv.Eluarde) at Rothamsted Research prior to DNA extraction and downstream pipelines.
at 50002347g).Finally, the DNA pellet was left to dry for $1 min, resuspended in an appropriate volume of molecular grade water, and placed into an orbital shaker for 2 h, with gentle agitation to dissolve the DNA pellet.The DNA was quantified using a Qubit fluorometer (dsDNA HS Assay Kit, Thermo Fisher Scientific), visualised for quality on an electrophoresis gel and stored in a À20 C freezer for later down-steam processes.SequencingDNA that passed in-house quality checks (Qubit BR Assay Kit [Thermo Fisher Scientific] and electrophoresed on a 1% agarose gel) was sent to Novogene (Cambridge Science Park, Cambridge, UK) to generate Illumina sequence data using x20 HiSeq-PE 150.Previous Illumina sequence data were provided by Warwick Crop Centre from a resistance-breaking biotype (Kent CL), and this was incorporated into the assembly of the Kent CL genome.To complement the Illumina sequence data, long-read sequence data were generated using a MinION (Oxford NANOPORE Technologies) at Rothamsted Research.Two SpotON flow cells (R9.4.1) were used to produce both virulent and avirulent long-read sequence data.N. ribisnigri HMW DNA was extracted following the method above, and the Genomic DNA by Ligation SQK-LSK109 protocol was used to construct the library (Oxford NANOPORE Technologies).This protocol includes DNA repair and end-prep, adapter ligation and clean-up followed by priming and loading the SpotON flow cell.For each run, approximately 1 μg of N. ribisnigri genomic DNA was loaded into each SpotON flow cell.The genomic DNA samples were sequenced using MinKNOW software on a Windows based computer.Kent CL was used to generate the resistance-breaking sequence data and Nr8 was used to produce the avirulent (wild type) sequence data.All reads have been submitted to the National Center for Biotechnology Information (NCBI) (BioProject ID: PRJNA857679) (Figures S11 and S12).
, was incorporated, along with the Decypher, to identify any sequences which matched other non-target organisms against the NCBI reference database.MEGAN conveniently organises the DeCypher text file containing the NCBI-positive blast hits of the newly assembled genomes into categories (cellular organisms: Bacteria, Archaea, Eukaryota (animal, plant); and other sequences: viruses, unclassified sequences, No hits and Not assigned).As a result, MEGAN enables the identification of sequences in the newly assembled genome belonging to other organisms.All identified contaminant sequences in the genome in MEGAN (e.g., bacterial and plant) were identified using the NCBI Web BLAST (nucleotide) database.Any confirmed contaminants were manually removed from the genome using the bioinformatics software Geneious (10.1).The aphid symbiotic bacteria sequences of B. aphidicola were removed but retained for future analysis.
the various conditions listed in FigureS10.Once the aphid colonies had established, approximately 50 aphids (mixed apterae and alates) of each treatment were pooled for RNA extraction.Whole organisms were used due to the uncertainty of where the mechanism/s of resistance were located (e.g., stylet/salivary, metabolic or within the midgut).Total RNA was isolated using a trizol-based method followed by purification on RNeasy clean-up columns (Qiagen) and removal of any DNA contamination by treatment with DNase I. RNA quality was checked on a Bioanalyzer (Agilent).Then, 100 bp paired-end sequencing was performed on an Illumina HiSeq platform at Welcome Trust Centre for Human Genetics, Oxford, with the resulting read counts ranging from12,634,191 to 37,227,165 per sample.
Validation of RNA-seq data by qRT-PCR Aphid and plant cultures qRT-PCR was incorporated to confirm the results of the RNA-seq data.Two avirulent cultures (Nr:0) and two virulent cultures (Nr:1) were used for the qRT-PCR gene expression validation experiment, which were also used in the initial RNA-seq analysis experiment.Once the founding mother had produced several nymphs, the founder was removed, and the nymphs left to start a new colony.In total there were four treatments, and each culture was triplicated to create three biological replicates per treatment (12 conditions) (FigureS10).Each culture was maintained in previous laboratory conditions (described above) until $50 individuals were produced.Each isogenic line was collected in 1.5 mL Eppendorf tubes containing RNAprotect (Qiagen) and frozen at À80 C prior to RNA extraction.
Nr8 and WT_Kent are avirulent biotypes and UK631, Kent CL and Ely are virulent biotypes.Only the most complete genome assemblies for the avirulent and virulent were selected and used for downstream annotation and analysis, highlighted in red.All genomes generated and presented in the article are available at the National Center for Biotechnology Information (NCBI) (BioProject ID: PRJNA857679).(