Identification of maize lethal necrosis disease causal viruses in maize and suspected alternative hosts through small RNA profiling

Maize is an important cereal crop whose global production was estimated at 1.134 tonnes per annum in 2017 (FAO, 2019). In Kenya, the average maize production was estimated to be 3.6 million tonnes in 2017, which is a drop from 4.25 million tonnes, the average for the five earlier years (FAO, 2018). The reduced production has been attributed to various biotic and abiotic challenges. Maize lethal necrosis disease (MLND), a viral disease first reported in Kenya in 2011, is one of the biotic factors associated with declining maize yields in Kenya (Fatma, Tileye, & Patrick, 2016; Mahuku et al., 2015). In Kenya, yield losses worth USD 53.2, 180 and 198 million were reported in 2012, 2013 and 2014, respectively (De Groote, Oloo, Tongruksawattana, & Das, 2016; Marenya et al., 2018). The disease Received: 22 October 2019 | Revised: 25 February 2020 | Accepted: 24 April 2020 DOI: 10.1111/jph.12908


| INTRODUC TI ON
Maize is an important cereal crop whose global production was estimated at 1.134 tonnes per annum in 2017 . In Kenya, the average maize production was estimated to be 3.6 million tonnes in 2017, which is a drop from 4.25 million tonnes, the average for the five earlier years (FAO, 2018). The reduced production has been attributed to various biotic and abiotic challenges. Maize lethal necrosis disease (MLND), a viral disease first reported in Kenya in 2011, is one of the biotic factors associated with declining maize yields in Kenya (Fatma, Tileye, & Patrick, 2016;Mahuku et al., 2015).
In Kenya, yield losses worth USD 53.2, 180 and 198 million were reported in 2012, 2013, respectively (De Groote, Oloo, Tongruksawattana, & Das, 2016Marenya et al., 2018). The disease is caused by a synergistic response as a result of a double viral infection of the host plant by Maize chlorotic mottle virus (MCMV) and any member of Potyviridae family like Wheat streak mosaic virus (WSMV), Sugarcane mosaic virus (SCMV) or Maize dwarf mosaic virus (MDMV) (Kusia et al., 2015;Stewart, Quality, States, & Agricultural, 2017;Wangai et al., 2012). In Kenya, the disease was first reported in Bomet county and it has since spread to other regions of the country as well as neighbouring countries including Tanzania, Rwanda, Burundi, Ethiopia and Uganda Wamaitha et al., 2018).
Currently, MLND management entails good agronomic practices such as weed and pest management, crop rotation, clean seed and the use of tolerant germplasm (Fatma et al., 2016;Redinbaugh & Zambrano, 2014). Breeding for tolerance through development of inbred lines is an ongoing strategy by the International Maize and Wheat Improvement Centre (CIMMYT) and other independent research teams (Beyene et al., 2017). This strategy is supported by studies involving genetic analysis of quantitative trait loci for resistance as well as genome-wide association studies aimed at identifying tolerant germplasm for breeding against MLND (Awata, Beyene, et al., 2019;Gowda et al., 2015;Gowda et al., 2018;Nyaga et al., 2019). Through these studies, various SNPs and loci markers for MLND tolerance have been validated through germplasm screening at a quarantine MLN screening facility in KALRO Naivasha, Kenya. Despite the existing strategies in MLND control, the disease is still a major threat to maize-growing areas in sub-Saharan Africa, where susceptible germplasm is still widely cultivated. Effective detection tools against MLND can form part of an integrated management package against MLND, especially during implementation of surveillance and quarantine measures. Several tools applied in MLND surveillance include double-antigen sandwich-enzyme-linked immunosorbent assay (DAS-ELISA) and real-time polymerase chain reaction (Fatma et al., 2016), but these could have limited success due to the viral divergence across regions (Braidwood, Müller, & Baulcombe, 2019) as well us low sensitivity in maize seeds due to low viral titres (Quito-Avila, Alvarez, & Mendoza, 2016). The use of next-generation sequencing (NGS) technologies in detection and characterization of MLND viruses has led to better understanding of these viruses (Braidwood et al., 2018;Xia et al., 2014Xia et al., , 2016Xia et al., , 2018. Recently, metagenomic analysis identified viruses that had never been associated with MLND which include mastrevirus, totiviruses and poleroviruses (Wamaitha et al., 2018). The sRNA sequencing technology is a novel high-throughput next-generation approach that has enabled the unravelling of the synergistic interactions between MCMV and Potyviridae members involved in MLND development (Mbega et al., 2016;Xia et al., 2014Xia et al., , 2016. Furthermore, virus-sourced small RNAs (vsiRNAs) derived from RNA interference (RNAi) mechanism exist in high levels in plants and can therefore be assembled into viral genomes hence providing insight into the viral agents infecting plant systems including the genetic variability (Braidwood et al., 2019;Burgyán & Havelda, 2011;Xia et al., 2018;Younis, Siddique, Kim, & Lim, 2014).
In this study, we report the use of sRNA sequencing in detection of viruses involved in the development of MLND in maize using samples from three MLND endemic areas in Kenya. Further, we used sRNA analysis pipelines to identify alternative markers for detection and characterization of MLND causal viruses and validated these using quantitative real-time PCR. The developed markers should be part of the more efficient tool kit for detection, monitoring and management of MLND-causing viruses in maize agroecosystems in Kenya and probably the wider east Africa region that seem to share the same strains of MLN causal viruses (Braidword et al., 2019).

| Sample collection and preparation
Leaf samples from maize plants showing typical MLND symptoms (Wangai et al., 2012) were collected from Kericho, Bomet and Nyamira counties of Kenya according to sampling procedures described by Mahuku et al. (2015). Leaf samples were collected from twelve plants from each of the six different farms sampled per county. The leaf subsamples from every county were separately macerated and then pooled to form a county sample for further processing. The sample was then placed in RNase-free Eppendorf tubes containing RNA Shield reagent (Zymo Research). The samples were taken to the laboratory and kept at −80°C to await RNA extraction.
The same procedure was adopted for the second round of sample collection for validation of markers using qPCR.

| RNA extraction and sequencing
Total RNA was extracted using Direct-zol™ RNA extraction kit (Zymo Research). Briefly, 50mg of leaf tissues was frozen in liquid nitrogen and ground into fine powder by vortexing with steel beads and the next steps of RNA extraction were done according to the manufacturers' instructions (Zymo Research). RNA quality and concentration were determined using a NanoDrop spectrometer (Maestrogen Inc). RNA samples were shipped on dry ice to BGI Hong Kong (http://en.genom ics.cn/) where library construction and small RNA sequencing were performed. Sequencing of libraries was done on BGISEQ-500 platform, a high-throughput sequencing technique based on combinatorial probe-anchor synthesis (cPAS) and DNA Nanoballs (DNB) technology (Huang et al., 2017).

| Sequence analysis and sRNA profiling
Small RNA sequence analysis was done using RNA-Seq standard pipelines. Raw sequence reads were retrieved from BGI server and filtered to remove low-quality reads at 5' and 3' ends, and reads without the insert tag were eliminated using cut-adapt (Martin, 2011).
Reads shorter than 15 nucleotides were also discarded. The data criteria were set at 10% adapter and null rate, q20 of 90% and above, and small RNA tag rate of less than 20%. Clean reads were then exported to the University of East Anglia (UEA) sRNA Workbench pipeline (Stocks et al., 2012) and host-derived miRNAs filtered through subtractive mapping to Zea mays miRBase and MirGeneDB databases.
Genome assembly of host-filtered sRNAs was performed using VirusDetect pipeline (Zheng et al., 2017) and contigs with a coverage >75% reported. The detection of viruses was done based on contigs generated by pooling those from reference-guided mapping of sRNA reads and those from de novo assembly. For each identified virus, the longest continuous contig was selected for phylogenetic analysis. These sequences were deposited at NCBI GenBank and were assigned accession numbers MK481075, MK481076 and MK481077 for SCMV, and MK491604, MK491605 and MK491606 for MCMV.
Genome sequences from viral isolates previously reported from various parts of the world were retrieved from NCBI and used to generate the phylogenetic tree and to infer ancestry of the reported isolates using the Maximum Parsimony method in MEGA 7. For target prediction analysis, plant sRNA target analysis server (psRNATarget) (Dai, Zhuang, & Zhao, 2018) was used with default parameters.
Complete Refseq genomes for MCMV (NC_003627.1) and SCMV (NC_003398.1) were retrieved from National Centre for Biotechnology Information (NCBI) and used for mapping, annotation and identification of highly expressed domains of the viruses based on host-filtered siRNAs. Mapping of sRNA reads was done using Bowtie 2 within the Geneious version 11.5 followed by calculation of expression levels based on normalized reads per kilobase million (RPKM) with reference to mature peptide domains and CDS of SCMV and MCMV, respectively. This was done to avoid bias based on the length disparity with reference to the length of mature peptide domains (mtr-PD) or the CDS. For MCMV, seven CDS were targeted including those spanning ORFs P31, P50, P111, P7a, P7b, replicase-associated protein and P32. For SCMV, eight well-characterized mtr-PD including Nla Vpg, Nlb replicase, Nla-pro, HC-Pro, 6k2, P3, CI and 6k1 protein were targeted. Sequences for the highly expressed regions were retrieved and primers designed using Primer3Plus software (Untergasser et al., 2007).

| Validation of alternative putative markers using real-time PCR
Total RNA from three representative leaf samples collected earlier in each region for sRNA sequencing was also used here for cDNA synthesis and validation of the identified markers using real-time PCR.
Briefly, DNase-treated RNA was first quantified and 5 µg used for cDNA synthesis using FIREPol cDNA Synthesis Kit (Solis BioDyne) according to the manufacturer's instructions. Quantitative real-time PCR was carried out using the 5X HOT FIREPol EvaGreen ® qPCR Mix Plus Kit in a reaction mix comprising 5X HOT FIREPol mix (1 µl), 0.5 µl of 0.25 µM for each primer (forward and reverse) and 1 µl of cDNA. The volume was topped up to 20 µl with RNase-free water.
The reaction was carried out in a Roche LightCycler (https://lifes cience.roche.com/) and included p7b, CI and p31 for MCMV and P3, CI peptide, Nla-pro and 6KI markers for SCMV. The qPCR conditions were an initial denaturation of 95ºC for 15 min, denaturation at 95ºC for 15 s, annealing in the range of 60-65ºC for 20 s (depending on the primers) and elongation step at 72ºC for 60 s. Nlb replicase and replicase CDS genes targeting genomic regions for SCMV and MCMV, respectively, were used as references for calculation of relative viral expression following Pfaffl (2004) 2dCT method. The realtime PCRs were done in triplicates for each sample. Standard curves were generated for each gene to confirm the efficiencies of PCRs.

| Identification of alternate hosts for MCMV and SCMV using small RNA markers
We also investigated whether plants not previously reported to be MLND hosts and growing near or in infected maize fields could harbour MLND-causative viruses. We collected seventeen plant species One (1 μl) of the cDNA was used for RT-PCR in a 10 µl reaction volume consisting of 1 μl 5X FIREPol ® PCR Master Mix and 0.5 μl of 0.25 µM of each primer. Primers from 2 (SCMV P3 and MCMV capsid protein) validated markers in sRNA-Seq and qPCR (Table S1) above were used alongside PPDK (FP 5'CGCGACGAATTAACAACGCT3' and RP 5'ATCGTGTTGCTAGCGTCCAA3') to confirm the success of library preparation. PCR conditions were an initial denaturation step of 95ºC for 3 min, 35 cycles of denaturation at 95ºC for 30 s, annealing at 50ºC for 30 s (for both primers) and extension at 72ºC for 1 min. PCR products were confirmed by gel electrophoresis.

| Subtractive mapping revealed five classes of microRNAs
Following a stringent quality check and cleaning of sequence reads, an average of 25 million clean reads per region was obtained. The reads ranged from 15 to 50 nucleotides long with a GC content of between 50% and 55%. Sequence reads ranging from 18 to 24 nucleotides long accounted for the highest percentage of the sRNAs ( Figure S2). On mapping the reads onto the Zea mays miRBase database, hairpin antisense, sense and mature antisense miRNAs were detected at extremely low abundance of 1.05%, 1.23% and 1.15% for Bomet, Kericho and Nyamira (Borabu), respectively. From the total mature sense miRNAs, five classes of microRNAs including zma-miR159a-3p, zma-miR168a-5p, zma-miR166a-3p, zma-miR167e-5p and zma-miR444a were highly expressed across the three regions as shown by the read per kilobase million (RPKM) values, and zma-miR159a-3p was the most frequently mapped miRNA (Table S2).

F I G U R E 1
Identification of the MLND-causing viruses by VirusDetect using siRNA size profiles from one per region (consisting of six pooled representative subsamples). The detection of viruses was done based on contigs generated by pooling those from reference-guided mapping of sRNA reads to Refseq genomic CDS from NCBI GenBank accession no. NC-003627.1 for MCMV and NC-003398.1 for SCMV and also those from de novo assembly. Blue tracks represent the best hit reference virus genomes from GenBank database, and red tracks represent assembled virus contigs. The longest contig: (a) (contig 122, 120, 69) for MCMV and (b) (contig 43, 1 and 136) for SCMV from the three sampling regions were considered for alignments and generation of the phylogenetic tree. These contigs were submitted to GenBank and assigned the accession numbers MK481075, MK481076 and MK481077 for SCMV and MK491604, MK491605 and MK491606 for MCMV [Colour figure can be viewed at wileyonlinelibrary.com]  Figure S3). Aligning the fully assembled genomes and a Maximum Parsimony tree generated from the alignment revealed evolutionary relationships for each virus with those reported across the world. For SCMV, the isolates in this study clustered close to those earlier reported from Elgeyo Marakwet, Embu and Bomet counties in Kenya. In general, SCMV isolates showed a high evolutionary divergence compared to those from across the globe (Figure 2). Similarly, the MCMV isolates showed a close relationship with those from across the world but also clustered together with those previously reported in Kenya.

| siRNA expression profile mapping to different viral domains in maize
To identify highly expressed viral domains in the isolates, we mapped a total of 3,046,799, 6,537,375 and 4,865,520 reads from Bomet, Nyamira and Kericho, respectively, onto the MCMV Refseq genome (accession id. NC_003627.1). There was variation in how the sRNA reads mapped across the entire MCMV genome ( Table 1). The viral capsid protein, p7a and p7b CDS were the most frequently mapped domains followed by the replicase CDS, while p32 CDS domain F I G U R E 2 Evolutionary relationships of whole-genome MCMV and SCMV isolates assembled in this study (marked with *) with those from different regions of the world or from the same region in a previous study. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (1,000 replicates) are shown next to the branches. The MP tree was obtained using the Subtree-Pruning-Regrafting (SPR) algorithm with search level 1 in which the initial trees were obtained by the random addition of sequences (10  replicates

| Validation of viral marker expression using quantitative real-time PCR analysis
To evaluate expression of the identified viral domains, we carried out real-time PCR on cDNA from the three replicates of the same tissues used for RNA sequencing experiments. A total of eight markers for highly expressed and conserved regions of both MCMV (capsid protein, P31 and p7b) and SCMV (P3, CI, p70 and Nla-pro) (Table   S1) were used in the qRT-PCR. The location of the markers in the MCMV and SCMV genomes is shown in Figure S5 and Figure S6, respectively. The efficiencies of Nlb replicase and replicase reference genes were confirmed by 6-point, 10-fold dilutions of the template ( Figure S4). Primer efficiencies were first confirmed using SCMV-and MCMV-positive samples (obtained from KALRO) through serial dilution before applying them in amplification (Table S3). The qRT-PCR analysis revealed a high relative expression of MCMV domains and one SCMV domain. For MCMV, the P7b together with all the three capsid protein regions is amplified with expression of log 7-fold relative to the replicase gene (Figure 4). In SCMV, the P3 domain showed the highest relative expression of log 3.5-fold relative to Nlb replicase gene. The rest of the regions tested recorded relative expression of <log 1.5-fold (Figure 4).

| Application of siRNA markers in identification of alternative hosts of MLND causal viruses
A total of 17 plant species were collected and screened for MLND This together with the positive controls proved conclusively that the bands were indeed products of the two viral markers.

| In silico applicability of host-derived miRNAs in detection of SCMV and MCMV
Through an in silico approach, three main miRNAs derived from the host mature miRNA (zma-miR167b-3p, zma-miR168b-3p and zma-miR528a-3p) were identified to target MCMV using psRNATarget

| D ISCUSS I ON
The present study successfully describes detection of MCMV and SCMV, the main causal viruses of MLND in samples from three highly   was an indication that these miRNAs could be playing a role in the MCMV and SCMV interaction that included possible binding to viral motifs including HC-Pro. Nevertheless, there was little or no chance that further analysis of this interaction could yield any diagnostic tool. As is the case of this study, past studies have independently either inferred based purely on in silico data especially for predictive purposes (Iqbal et al., 2017) or in some instances combined both in silico and wet-lab approaches to obtain more confirmatory results where the prediction was more definitive (Xia et al., 2016).
We fully assembled the MCMV genome but only managed to assemble 95% of SCMV across the three regions. This is evidence that the typical MLND symptoms observed during sample collection were due to co-infection by MCMV and SCMV. SCMV, the most re- were also detected although their assembly was poor. This could be attributed partly to the low depth of sequencing (Zheng et al., 2017) or the low complexity characterized by repeat sequences in the genome of these viruses which leads to generation of misassembled contigs. As reported by Claros et al. (2012), repeat sequences are difficult to assemble as high-identity reads could come from different portions of the genome, generating gaps, ambiguities and collapses in alignment and assembly. The presence of these viruses in infected plant material could, however, be contributing to development of the disease since they have been reported to code for a PO protein that inhibits both local and systemic RNA silencing .  (Bol, 2008). A well-characterized role of MCMV's capsid protein is its terminal-encoded amino acids which allow subcellular localizations of MCMV (Zhan, Lang, Zhou, & Fan, 2016). For SCMV, the P3 protein peptide, 6K2 and Nla protein regions were frequently mapped. The SCMV genome encodes a single, large polypeptide encoding ten pro- to infect members of the grass family including millet and sorghum (Scheets, 2004;Toler, 1985 therefore, some of the highly expressed domains could be used to develop detection kits for these viruses. Identification of MLND causal viruses in plant species not previously reported to harbour the disease shows an expanded host range for these viruses hence pointing at a need to develop more integrated control strategies.

CO N FLI C T O F I NTE R E S T S
The authors have no conflict of interest to declare.