Use of next-generation sequencing for the identification and characterization of Maize chlorotic mottle virus and Sugarcane mosaic virus causing maize lethal necrosis in Kenya




The diagnosis of novel unidentified viral plant diseases can be problematic, as the conventional methods such as real-time PCR or ELISA may be too specific to a particular species or even strain of a virus, whilst alternatives such as electron microscopy (EM) or sap inoculation of indicator species do not usually give species level diagnosis. Next-generation sequencing (NGS) offers an alternative solution where sequence is generated in a non-specific fashion and identification is based on similarity searching against GenBank. The conventional and NGS techniques were applied to a damaging and apparently new disease of maize, which was first identified in Kenya in 2011. ELISA and TEM provided negative results, whilst inoculation of other cereal species identified the presence of an unidentified sap transmissible virus. RNA was purified from material showing symptoms and sequenced using a Roche 454 GS-FLX+. Database searching of the resulting sequence identified the presence of Maize chlorotic mottle virus and Sugarcane mosaic virus, a combination previously reported to cause maize lethal necrosis disease. Over 90% of both viral genome sequences were obtained, allowing strain characterization and the development of specific real-time PCR assays which were used to confirm the presence of the virus in material with symptoms from six different fields in two different regions of Kenya. The availability of these assays should aid the assessment of the disease and may be used for routine diagnosis. The work shows that next-generation sequencing is a valuable investigational technique for rapidly identifying potential disease-causing agents such as viruses.


The first critical step of controlling a plant disease is a rapid diagnosis that accurately identifies the causal agent(s). Once this has been achieved, then appropriate control measures such as fungicide application or control of insect vectors can be deployed. However, establishing disease aetiology is often complex, especially where the disease in question is novel, e.g. a known pathogen changing its host or geographical range, and further complicated where the causal agent is a new species or variant. For bacterial and fungal diseases, methods based on culturing provide a good first line screen for new and emerging diseases as they tend to be generic in nature and, providing the pathogens are not obligate, provide a good approach for isolation of potential new or variant pathogens. After isolation, a range of more specific (e.g. immunofluorescence or PCR) or non-specific (e.g. morphological identification) methods can then be deployed. For viruses, on the other hand, most of the methods that can be used are species-specific (e.g. PCR or ELISA) and those that are not, such as sap inoculation and PCR with degenerate primers, are often applicable to a relatively small range of potential viruses. More recently, methods have been published based on next-generation sequencing (NGS) technology (Adams et al., 2009; Kreuze et al., 2009; Rwahnih et al., 2009) that provide a powerful and generic front-line screen especially suited to viruses. Effectively harnessing the massively parallel and de novo nature of the method enables the viral genome to be sequenced within a background of host nucleic acid, whilst bioinformatics approaches enable the identification of the pathogen sequences by similarity to known viruses or virus-like sequence motifs.

This paper describes the deployment of just such an approach, using a NGS technique first described by Adams et al. (2009), to elucidate the cause of a damaging and apparently new disease of maize first identified in the autumn of 2011, in the South Rift region (Narok North, Narok South, Chepalungu, Sotik), Eastern Province (Embu, Meru and Kibwezi) and Central Province (Murang'a, Kirinyaga and Nyeri) of Kenya. The initial symptoms of the disorder were varied, but apparently restricted to the leaf, stem and ears. Typically, early leaf symptoms were mottling and chlorosis that progressed to give some deformity and then extensive necrosis. Stems were also seen to show deformity, which in severe cases gave rise to a ‘shepherd's hook’. Foliar infection was often associated with the leading growth, with early formed leaves remaining green. For plants with foliar or stem infection, grain filling was shown to be markedly reduced. Figure 1a shows some of these symptoms.

In affected fields, infection rates approaching 100% were typical and yields were severely affected. Initial attempts at diagnosis using traditional approaches failed to identify any potential pathogens (fungal, bacterial or viral) and as a result the NGS approach was deployed.

The design approach included two RNA extraction methods for preferential extraction of double- and single-stranded RNA, targeting replicating viral and total RNA (viral RNA and mRNA of microbial and plant origin). For cost efficiency, 454-sequencing was undertaken on a pooled sample from which candidate pathogens were identified. The sequencing highlighted the presence of Maize chlorotic mottle virus (MCMV) and a variant isolate of Sugarcane mosaic virus (SCMV) previously identified in southeast Asia (Xie et al., 2011), a combination of viruses known to cause maize or corn lethal necrosis disease (Uyemote et al., 1981; Goldberg & Brakke, 1987). Based on this, specific real-time PCR assays were designed and applied to individual samples to make associations between the potential pathogens found in the 454-generated sequence and individual samples.

The study highlights the potential of non-targeted molecular diagnostics, in particular next-generation sequencing, to make rapid progress in elucidating the causal agents of disease and providing sequences to which rapid diagnostics can be developed. The method is especially useful in cases where new or emerging variants are present that have not been identified using conventional techniques.

Materials and methods

Sample collection

Between February and March 2012, 11 maize (Zea mays) leaf samples from Kenya were received by the Fera Plant Clinic via the CABI Plantwise Plant Clinic. Ten samples originated from Bomet District (10 plants from five fields) and one sample was from Naivasha District. These samples were displaying a range of leaf symptoms including spotting (‘early-stage’), streaking (‘mid-stage’) and necrosis on the margin (‘late-stage’).

Indicator screening

The following indicator species were sap inoculated from the original samples using standard methods (Hill, 1984): Zea mays, Chenopodium quinoa, Nicotiana benthamiana, Nicotiana occidentalis P1, Nicotiana hesperis and Hordeum vulgaris. Control non-inoculated plants were included for all species used. Sap inoculations were performed using inoculum from plant material with symptoms ground in chilled 0·1 m phosphate buffer pH 7·0 with Celite as an abrasive. Following inoculation, plants were maintained in a quarantine greenhouse held at a mean temperature of 22°C ( ± 2°C) with a natural 18 h photoperiod and observed on a weekly basis.


DAS-ELISA was performed using standard methods modified from Clark & Adams (1977) using antisera raised to the following viruses (manufacturer in bracket): Maize streak virus, Maize dwarf virus and Cucumber mosaic virus (Agdia), Sugarcane mosaic virus (BIOREBA), Maize chlorotic mottle virus (AC Diagnostics), potyvirus (DSMZ) and begomovirus (ADGEN). Appropriate positive control samples were used with each test.

Electron microscopy

Transmission electron microscopy (TEM) was performed with a Philips CM100 instrument using a leaf-dip preparation method, with uranyl acetate staining and carbon-coated grids (Hill, 1984).


A pooled sample of maize leaves was prepared from the samples obtained from Kenya. The pool was then split. The first half was extracted using the total RNA method described by Adams et al. (2009). Briefly, the leaf material was ground in liquid nitrogen, CTAB buffer (2% CTAB, 100 mm Tris pH 8·0, 20 mm EDTA, 1·4 m NaCl, 1% sodium sulphite, 2% PVP) added and then a chloroform extraction performed. RNA from this extraction was then precipitated with 4 m LiCl and passed through a QIAGEN RNeasy column extraction with on-column DNase treatment (QIAGEN). The second half of the sample was subjected to a double-stranded RNA extraction as described by Valverde et al. (1990). Briefly, the leaf material was ground in liquid nitrogen, extracted with STE (100 mm NaCl, 1 mm EDTA, 50 mm Tris pH 7·0) saturated phenol and then bound to CF-11 cellulose in the presence of STE buffer containing 16·5% ethanol. After washing with STE buffer containing ethanol, double-stranded RNA was eluted with STE buffer.

Double-stranded cDNA was produced separately from the two RNA extracts using the cDNA Synthesis System kit (Roche) following the manufacturer's protocols. A MID-labelled GS-FLX Titanium Rapid Library was then produced from each RNA extract using kits supplied by 454-Roche. Sequencing was performed on one eighth of a picotitre plate in a 454 GS-FLX+ (Roche) using a GS-FLX titanium run kit following the manufacturer's protocols.

Sequencing data analysis

The individual sequences were assembled using the Roche software newbler v. 2.6. The resulting contigs and unassembled sequences were then compared to a local download of the NCBI nr GenBank database (Benson et al., 2011) using blast+ (Camacho et al., 2009). megan (Huson et al., 2007) was then used to assign reads to possible taxa. A custom perl script was used to convert contig blast results into individual read blast results to allow use of the quantitative elements of megan. Viral reads were extracted from megan and the blast results examined. To construct partial viral genomes from the new sequences, reference assemblies against the most similar existing viral genomes were carried out using newbler v. 2.6 (Roche). The resulting sequences were compared to the de novo assembled contigs to confirm that no artefacts had been introduced during the reference assembly. Open reading frame prediction and protein translations were performed in vector nti v. 11 (Invitrogen) and alignments and phylogenetic trees were constructed (using neighbour-joining with 500 bootstrap replicates) using mega v. 5 (Tamura et al., 2011). In order to produce the tree of Tombusviridae genomes (Fig. 4), all genomes were trimmed to allow for comparison with the partial MCMV genome produced from this study, which lacks the first nine nucleotides.

Real-time PCR

Real-time (TaqMan) primers and probes were designed using primer express v. 2 (Applied Biosystems) and based upon the partial viral genome sequences identified during the 454-sequencing. RNA was extracted from individual maize samples using the CTAB followed by the RNeasy (QIAGEN) method as described previously (Adams et al., 2009). Real-time RT-PCR was performed in 96-well plates on an ABI 7500 instrument (Applied Biosystems). Reactions consisted of 1 × buffer A (Applied Biosystems), 0·2 mm each dNTP, 5·5 mm MgCl2, 0·025 U μL−1 AmpliTaq Gold (Applied Biosystems), 0·4 U μL−1 Revertaid (Fermentas), 300 nm each primer, 100 nm probe and 1 μL of extracted RNA (concentration as extracted) to give a final reaction volume of 25 μL. The cycling conditions were: 30 min at 48°C, 10 min at 95°C, then 40 cycles of 15 s at 95°C and 1 min at 60°C. Negative controls consisted of water replacing the template and extracts from uninfected plants. The sensitivity of the newly designed assays was determined using a serial dilution of known viral positive RNA in water.


Samples of maize showing disease symptoms (Fig. 1a) were collected from the Bomet and Naivasha Districts of Kenya. The samples were received in the UK in fresh or semidry condition. They were initially visually examined for a range of potential pathogens and further tests were carried out for those considered to be the potential causes. These further tests included ELISA, TEM and sap inoculation for viruses.

Figure 1.

Symptoms of stippling and streaks on a maize leaf from a field in Bomets District, Kenya (a) and disease symptoms on experimentally inoculated maize (b) and barley (c) showing necrosis.

ELISA tests were carried out for Maize streak virus, Maize dwarf virus, Cucumber mosaic virus, Sugarcane mosaic virus, Maize chlorotic mottle virus, generic potyvirus and generic begomovirus. Using a positive threshold of 3 × the negative control, all ELISA tests were positive for controls and negative for the target virus except for one sample from Bomet that tested positive for Maize streak virus. No virus particles were identified by TEM that were consistent between the samples or with the scale of disease found in the field.

Indicator species (Zea mays, Chenopodium quinoa, Nicotiana benthamiana, Nicotiana occidentalis P1, Nicotiana hesperis and Hordeum vulgaris) were inoculated with sap from a leaf sample from Bomet. After 5 weeks, systemic chlorotic speckling symptoms were observed on the maize plants. No symptoms were seen on non-inoculated control maize or other species used. Over the next 2–5 weeks further systemic symptoms of chlorotic streaking, mottling and finally striping and necrosis developed on the inoculated maize leaves (Fig. 1b). During this second period the barley plants also developed virus symptoms; these were more subtle (Fig. 1c) and consisted of spiky leaves, necrotic leaf tips, distortion and only partial ear development, some ears dying while other parts of the plant were still growing.

In addition to the traditional viral diagnostics, NGS was also used. A combined sample of leaves from both regions was sequenced by 454-pyrosequencing and 54 967 reads with an average length of 333 bp were produced. In total, 46 187 of these reads originated from the total RNA library and 8780 from the dsRNA library. The reads were assembled and compared to the NCBI GenBank nr database using blast+. Following blast analysis, 57 and 73% of total RNA and dsRNA reads, respectively, had significant similarity to sequences from fungi, 4 and 12% to bacterial sequences, 35 and 1·5% to virus sequences, and 4 and 14% to plant sequences. Figure 2 shows the likely taxa of origin for the combined reads. Closer examination of the fungal and bacterial assignments, based on taxa and prevalence, did not reveal any pathogens likely to be responsible for the symptoms and epidemiology seen in the diseased plants. Of the viral reads, 35 and 1·3% had significant similarity to Maize chlorotic mottle virus (MCMV) and a further 0·1 and 0·2% to Sugarcane mosaic virus (SCMV). Dual infection of maize with MCMV and a potyvirus (including SCMV) has been reported to produce symptoms of necrosis and a disease known as corn or maize lethal necrosis (Morales et al., 1999; Xie et al., 2011).

Figure 2.

megan output showing assignment of 454 reads to different taxa.

MCMV is a Machlomovirus, a member of the family Tombusviridae. Assembly of the MCMV sequences produced an almost complete genome (accession number JX286709). Examination of the assembly using tablet software (Milne et al., 2010) did not show any evidence of the presence of multiple common SNPs, suggesting that even though a pooled sample was sequenced, all the viral isolates were identical. Comparison of the new sequence to existing genomes of MCMV (ACC: GU18674, EU358605, NC002627) suggests that the sequence lacks nine bases at the 5′ end. Translation of the open reading frames of the new viral genome produced the expected proteins (Nutter et al., 1989). Comparison of the coat protein (CP) sequences with other members of the family Tombusviridae shows the identification of this virus as MCMV (Fig. 3). Table 1 shows the percentage identity for the individual proteins between the new and existing MCMV isolates. This confirms that the new isolate is MCMV and shows that it is more than 96% similar to the Yunnan strain from China (Xie et al., 2011).

Figure 3.

A neighbour-joining tree constructed with mega 5 using 500 bootstrap replicates for coat proteins of various members of the family Tombusviridae.

Table 1. Percentage identities between proteins of Maize chlorotic mottle virus from this study and isolates from GenBank for which there are complete sequences
Yunnan isolate (ACY82512.1)9899991009699
Nebraska isolate (ACA57844)979698 989499
Refseq isolate (NP619722.1)9796981009599

Comparison of the genome sequence with genomes of other members of the family Tombusviridae confirms the identification of this virus as MCMV (Fig. 4) and further confirms that it is most closely related to the Yunnan isolate from China (Xie et al., 2011).

Figure 4.

A neighbour-joining tree constructed with mega 5 using 500 bootstrap replicates for genomes of various members of the family Tombusviridae.

SCMV is a member of the genus Potyvirus, family Potyviridae. Assembly of the SCMV reads produced seven contigs covering 92% of the SCMV genome with eight short gaps all below 150 bp. Figure 5 shows the positioning of the contigs on the genome of SCMV and the coding sequences for the different parts of the polyprotein. As for MCMV, examination of the contigs did not show any SNP evidence of multiple viral isolates. Complete putative protein sequences were produced for P1, NIa-Pro and CP (accession numbers: JX286706–8). Phylogenetic comparisons of the CP (Fig. 6) of these sequences and those of other potyviruses suggests that the new virus is a member of the Sugarcane mosaic virus species, most similar to a strain from China (Gao et al., 2011) but distinct from other SCMV strains. This is confirmed by comparison of the percentage identity for the individual P1, NIa-Pro and CP proteins between the new and existing SCMV isolates (Table 2). This also shows that the P1 proteins of the new isolate and the Chinese strain (Gao et al., 2011) are distinct from the other strains. This is described as being a particularly virulent strain from southeast Asia (Gao et al., 2011).

Figure 5.

Map of the mature proteins and sequenced contigs of the Sugarcane mosaic virus isolate. Complete protein sequences of mature proteins P1, 6k2, VPg, NIa Pro and CP were obtained.

Figure 6.

A neighbour-joining tree constructed with mega 5 using 500 bootstrap replicates for coat proteins of various members of the genus Potyvirus.

Table 2. Percentage identities between proteins of Sugarcane mosaic virus from this study and isolates from GenBank for which complete sequences are available. Proteins derived from Yunnan isolate are in bold
ProteinAccession number
P165676869696667 98 757575746868
NIa-Pro90929090909191 98 919190909191
CP85858686868584 99 798081868286

Using the partial genomes produced from MCMV and SCMV, two real-time reverse transcriptase PCR assays were designed to each virus. A blast analysis of the proposed primer and probe sequences confirmed that no cross-reaction would be expected with other viruses found in the GenBank nr database. Using these assays, MCMV and SCMV were detected in the maize samples from Kenya. Using one of these samples as a positive control, the assays were compared and a preferred assay was chosen based on sensitivity (as determined by a serial dilution) and signal strength (based on ΔRn). The selected ones were taken forward for further use. The preferred MCMV assay consisted of a forward primer (all 5′–3′) CCG GTC TAC CCG AGG TAG AAA, a reverse primer TGG CTC GAA TAG CTC TGG ATT T and a TaqMan probe FAM-CAG CGC GGA CGT AGC GTG GA-BHQ1. This assay gave a linear response to virus concentration over three logs and was able to detect over a 50 000-fold dilution of a field-infected sample. The preferred assay for SCMV consisted of a forward primer (all 5′–3′) CCA GGC CAA CTT GTA ACA AAG C, a reverse primer CAT CAT GTG TGG ATA AAT ACA GTT GAA and a TaqMan probe FAM-TGT CGT TAA AGG CCC ATG TCC GCA-BHQ1. This assay gave a linear response to virus concentration over three logs and was able to detect over a 1 000 000-fold dilution of a field-infected sample.

Both assays were used to screen total RNA extracted from the 10 samples from Bomet District and the one sample from Naivasha District. In addition, RNA extracted from the experimentally inoculated maize and barley was tested. MCMV was found in all the Kenyan maize samples (Table 3) and in the maize and barley sap inoculated plants. SCMV was found in nine of the 11 maize samples from Kenya, but was not detected in either the sap inoculated maize or barley. Neither virus was found in the negative control maize and barley plants.

Table 3. Results of real-time RT-PCR virus assays on RNA from maize and barley plants
Sample origin No. of samplesSCMV-positiveMCMV-positive
Naivasha with symptoms (Kenya)111
Bomet with symptoms (Kenya)10810
Non-inoculated control maize (UK)100
Non-inoculated control barley (UK)100
Inoculated maize (UK)202
Inoculated barley (UK)202


This paper describes the use of NGS to identify the presence of MCMV and SCMV in Kenyan maize with previously identified disease symptoms. This method was used to not only rapidly identify the presence of two viruses from the original field samples, but also to produce nucleic acid sequence for substantial parts of both viral genomes. This allowed strain characterization and the development of specific high throughput real-time PCR assays for confirmation of the findings and potentially for routine diagnosis (Adams et al., 2012). The availability of these assays should aid in the further confirmation of the cause of the maize disease in Kenya and help in epidemiological research towards developing management strategies.

In order to maximize the chance of detecting disease causing agents in the samples, a pool of leaves was extracted using two different methods, one to target viral double-stranded RNA (dsRNA) and one untargeted approach. The results showed that a range of bacterial, fungal and viral sequences was detected but that the percentage of virus was greater in the unbiased method. This would suggest that for MCMV at least, sequencing of total RNA would be sufficient. Both extraction methods revealed a significant volume of background populations of microbes/virus. This was potentially problematic if a wide range of candidate casual agents had been identified, as having pooled samples for 454-analysis, each would have then needed to have been associated with individual samples. An approach to lessen this risk would be to sequence disease-free leaf samples to establish a subtractive baseline of the microbial/viral community, to inform on what was most likely unique with the diseased samples. Pooling of samples prior to sequencing was done to minimize cost and the risk of missing any potential pathogens if only a single leaf was tested. A further negative consequence of the pooling approach might have been the production of chimeric sequences; however, none could be identified in the virus sequencing data produced. Prior to progressing the metagenomic sequencing approach, standard plant pathology techniques had been applied with inconclusive results. Indeed, ELISA for MCMV and SCMV had proven negative. Only the sap inoculation method provided additional information, with evidence of a transmissible and pathogenic agent (only one of the two viruses present), but limited additional insight as to causality. One explanation of the oversight these traditional virology methods showed is that it is not unusual, even when a pathogenic virus is present, to miss viral infection by TEM due to lack of sensitivity of the approach. The apparent failure of ELISA may be attributed to a number of causes, amongst them, low sensitivity and poor specificity for unusual or variant isolates. Information was not provided on the range of isolates the antibodies tested were specific to, and thus reactivity (i.e. that they are broad spectrum) was assumed when purchasing the ELISA reagents. In progressing the identification by these methods, more than a month passed when needing to stepwise test and rule out candidate viruses, and purchase new ELISA kits. By contrast, once the 454-metagenomics approach was initiated, the identification was complete within two weeks from start to finish. The MCMV isolate detected was similar to other previously sequenced strains but, based on the complete genome sequence and translated protein sequences, was most similar to a Chinese isolate (Xie et al., 2011) and not the more widespread US strains (Scheets, 2000; Stenger & French, 2008). This may explain why the ELISA reagents obtained from a US company failed to detect this virus in the infected material, as the reagents were most likely raised to US strains of MCMV. The SCMV isolate was most related to a recently characterized Chinese isolate (Gao et al., 2011) but quite distinct from other more common SCMV isolates, including the genome reference sequence (Chen & Adams, 2002; Pruitt et al., 2009). Gao et al. (2011) describe this Chinese isolate as an East Asian strain, indicating that it has also been found in Vietnam and Thailand, and describing it as highly virulent and able to cause severe mosaic in up to 100% of inoculated plants in 20 days. The high degree of sequence variation identified may also explain why the isolate described in the current study was not detected by ELISA reagents raised to a German isolate of SCMV as stated in the manufacturer's instructions (BIOREBA). Based on the sequence divergence with common isolates, new real-time assays were designed rather than using published assays (Zhang et al., 2011), in order to increase the chances of detecting the specific isolates discovered in the Kenyan samples. This also highlights one of the advantages of the NGS approach over more conventional specific assays such as ELISA or PCR. The conventional assays require reagents designed to exclusively detect their viral target and any variation in the virus genome may cause the assays to fail. Metagenomic sequencing on the other hand is non-targeted and requires no a priori knowledge of the target, and hence is able to detect existing strains, new variants and even completely new viruses (Adams et al., 2009).

This research has shown the presence of two viruses, MCMV and SCMV, that in combination have been previously reported to cause maize lethal necrosis (MLN). To the authors' knowledge neither MCMV or MLN has been recorded in Africa. SCMV has previously been detected in Kenya (Louie, 1980) and South Africa (Handley et al., 1998). The symptoms of MLN previously described (Uyemote et al., 1981; Goldberg & Brakke, 1987) are remarkably similar to those exhibited in the fields of Kenya (Fig. 1) which, taken with the absence of any other pathogen known to cause these symptoms (Fig. 2), lends further support to the identification of MLN as the disease that has emerged to cause severe problems for maize growers in Kenya.

This work further demonstrates the power of NGS for allowing the rapid identification of new and unusual plant viruses and facilitating the rapid development of high throughput diagnostic assays. It should now be considered as a tool of choice for front-line diagnosis of novel plant diseases.


This research was partially supported by the UK Department for International Development through a partnership with CAB International and the Plantwise Plant Clinic.