Genetic diversity analysis of papaya leaf distortion mosaic virus isolates infecting transgenic papaya “Huanong No. 1” in South China

Abstract The commercialized genetically modified papaya “Huanong No. 1” has been utilized to successfully control the destructive virus‐papaya ringspot virus (PRSV) in South China since 2006. However, another new emerging virus, papaya leaf distortion mosaic virus (PLDMV), was found in some PRSV‐resistant transgenic plants in Guangdong and Hainan provinces of South China through a field investigation from 2012 to 2019. The survey results showed that “Huanong No. 1” papaya plants are susceptible to PLDMV, and the disease prevalence in Hainan Province is generally higher than that in Guangdong Province. Twenty representative isolates were selected to inoculate “Huanong No. 1,” and all of the inoculated plants showed obvious disease symptoms similar to those in the field, indicating that PLDMV is a new threat to widely cultivated transgenic papaya in South China. Phylogenetic analysis of 111 PLDMV isolates in Guangdong and Hainan based on the coat protein nucleotide sequences showed that PLDMV isolates can be divided into two groups. The Japan and Taiwan China isolates belong to group I, whereas the Guangdong and Hainan isolates belong to group II and can be further divided into two subgroups. The Guangdong and Hainan isolates are far different from the Japan and Taiwan China isolates and belong to a new lineage. Further analysis showed that the Guangdong and Hainan isolates had a high degree of genetic differentiation, and no recombination was found. These isolates deviated from neutral evolution and experienced population expansion events in the past, which might still be unstable. The results of this study provide a theoretical basis for clarifying the evolutionary mechanism and population genetics of the virus and for preventing and controlling the viral disease.


Papaya leaf distortion mosaic virus in the genus Potyvirus of the family
Potyviridae and was identified as early as 1954 in the northern area of Okinawa, Japan (Kawano & Yonaha, 1992). It was found in Taiwan China in 1993 (Kiritani & Su, 1999) and spread to the entire island in 2008 (Bau et al., 2008). Xiao et al. (1997) and He et al. (2001), respectively, investigated papaya viruses in papaya planting areas in South China and found only papaya ringspot virus (PRSV) and not PLDMV. Until 2011, PLDMV was reported in some areas of Hainan province, China Tuo et al., 2013;Zhao et al., 2018).
From 2012 to 2015, we detected the presence of PLDMV in some areas of Hainan and Guangdong provinces and observed that the virus quickly expanded to multiple papaya plantations in South China, causing some fruit farmers to suffer heavy economic losses (unpublished). Papaya plants infected with PLDMV exhibit disease symptoms, such as leaf chlorosis, young leaf distortion and mosaic, water-soaked oily streaks on the petiole, and ringspots on the fruit (Kawano & Yonaha, 1992). Given that these symptoms, including the subsequent analysis of the transmission vectors and physical and chemical properties, are very similar to PRSV, the virus was initially considered to be PRSV (Maoka et al., 1996). However, further analysis via electron microscopy, serological studies, and whole genome sequencing revealed that the virus is a new species of Potyvirus (Bau et al., 2008;Kawano & Yonaha, 1992;Maoka & Hataya, 2005).
According to the host range, PLDMV is classified into two types: PLDMV-P and PLDMV-C, which are difficult to identify by serological methods but can be distinguished by specific hosts (Maoka & Hataya, 2005). PLDMV-P mainly infects papaya and few cucurbits (Cucurbitaceae), whereas PLDMV-C does not infect papaya but infects cucurbits (Kiritani & Su, 1999). Similar to PRSV, its genome has a single-stranded positive-sense RNA of approximately 10 kb, encoding a polyprotein that is cleaved into 10 mature functional proteins (Yeh et al., 1992). In addition, a short polypeptide (PIPO) is expressed within the P3 cistron by frame shifting (Chung et al., 2008). Among the 11 encoded functional proteins, the coat protein (CP) sequence has been frequently used in strain identification, species classification, and phylogenetic analysis of potyviruses (Chaves-Bedoya & Ortiz- Rojas, 2015;Cuevas et al., 2012;Gao et al., 2016;Gibbs & Ohshima, 2010;Martinez, 2014;Wu et al., 2018).
In the 1990s, in order to prevent and control PRSV, American scientists developed antiviral transgenic papaya containing the PRSV CP sequence, which was commercialized in Hawaii in 1998 (Gonsalves, 1998). Later, we obtained the transgenic papaya "Huanong No. 1" containing the PRSV replicase sequence and was approved by the Ministry of Agriculture of China in 2006 for commercial cultivation in South China (Guo et al., 2009;Li et al., 2007;Rao et al., 2012). Since then, transgenic papaya "Huanong No. 1" plants have massively been commercially planted in Guangdong and Hainan provinces and sustained high resistance to PRSV. However, some "Huanong No. 1" plants in several plantations are found to be susceptible in recent years, and most of them are identified to be infected with PLDMV. Current research on PLDMV is only limited to detection and identification, and the molecular evolution and population genetic structure of PLDMV have not been reported. Studying molecular evolution helps to understand the molecular basis of the adaptation, geographic expansion, and emergence of the virus, which are the key to its management and control (Lauring & Andino, 2010).
With the rapid development of sequencing technology, the sequencing analysis of whole genome sequence provides a good condition for the study of molecular evolution mechanism. To understand the genetic diversity and population genetic structure of PLDMV, we investigated and sequenced PLDMV isolates collected from Guangdong and Hainan provinces of South China from 2012 to 2019. We further analyzed the molecular genetic structure of PLDMV in terms of genetic diversity, recombination, phylogeny, selection pressure, and neutral evolution detection. A systematic molecular genetic structure analysis was performed to understand the genetic diversity and population genetic structure of PLDMV in Guangdong and Hainan provinces, clarify the evolutionary mechanism of the virus and the genetic mechanism of the population and further provide theoretical basis for the prevention and control of the viral disease.

| Field investigation and sample collection
A total of 48 plantations of transgenic "Huanong No.1" papaya from ten counties (Guangzhou, Jiangmen, Zhanjiang, Zhongshan, Foshan, Shenzhen, Qingyuan, Huizhou, Luoding, and Meizhou) in Guangdong Province and six counties (Sanya, Ledong, Dongfang, Changjiang, Haikou, and Lingshui) in Hainan Province were investigated from 2012 to 2019 ( Figure 1). Random sampling method was adopted in each plantation, and the disease prevalence was recorded and calculated. A total of 200 infected fresh leaves were sampled, stored on ice-cold package, and taken back to the laboratory. RT-PCR and sequencing analysis were employed to identify the PLDMV isolates (Table S1).

| Artificial inoculation
According to two types of leaf symptoms (mosaic and distortion), all diseased papaya samples collected in Guangdong Province were divided into two groups. Similarly, the diseased papaya samples collected in Hainan Province were also divided into two groups. 0.1 M phosphate buffer (pH 7.0) was used to grind leaf samples as inoculum. Five representative samples were selected from each group and inoculated the seedlings with 2 leaves of squash cultivar "Yucuixihulu" (Cucurbita pepo) and with 3-5 leaves of transgenic papaya cultivar "Huanong No.
1" (Cai & Fan, 1994). Phosphate buffer was administered in the control group. Symptoms were recorded at 7-15 days postinoculation (dpi) from 15 plants per treatment, and the experiment was repeated thrice. step was performed at 72°C for 5 min. PCR products were separated by 1% agarose gel electrophoresis, extracted, and then purified from the gel with a Tiangen Gel Extraction kit (TianGen). The purified products were cloned into the pMD20-T vector (Takara) in accordance with the manufacturer's protocol and then transformed into Escherichia coli DH5α competent cells. Three positive clones from each transformation were selected and sequenced in both directions by Sangon Biotech (Sangon). All sequences generated in the study were deposited in the GenBank database (Table S1). In addition, 11 PLDMV nucleotide sequences of different isolates from different countries and regions were downloaded from GenBank as the reference sequences (Table S1).

| Genetic diversity and population genetic differentiation
The nucleotide sequences of 111 PLDMV isolates were aligned by MAFFT v7.149b software (Katoh & Standley, 2013). Nucleotide sequence identity matrices were calculated using BioEdit software after all gaps were removed (Hall, 1999). The Boxplots of 111 Guangdong and Hainan isolates and three representative isolates (J56P, J199C, and WF) previously reported from Japan and Taiwan China were mapped using R 2.9.1 (R Project for Statistical Computing website). The amino acid sequences of 10 representative isolates from Guangdong and Hainan and three isolates (J56P, J199C, and WF) from Japan and Taiwan China were selected for further analysis of mutation sites. After sequence comparison with NCBI, the conserved regions and mutations were analyzed and edited in Word software.
Haplotype and nucleotide diversities were estimated using DnaSP 5.0 (Librado & Rozas, 2009). Haplotype diversity refers to the frequency and number of haplotypes in the population. Nucleotide diversity estimates the average pairwise differences among the sequences. Genetic differentiation among populations was calculated by F ST using Arlequin 3.5 (Excoffier & Lischer, 2010). Genetic differentiation among populations was also calculated by K ST and S nn using DnaSP 5.0 (Hudson et al., 1992;Librado & Rozas, 2009). The hypothesis of deviation from the null population differentiation was tested by 1,000 permutations of the raw data. The level of gene flow between populations was measured using the N m value calculated by DnaSP 5.0 (Librado & Rozas, 2009).

| Recombination analysis
The nucleotide sequences of 111 PLDMV isolates from Guangdong and Hainan were subjected to recombination analysis using seven methods (RDP, GENECONV, BOOTSCAN, MaxChi, CHIMAERA, SiSCAN, and 3SEQ ) implemented in the RDP v4.71 software package (Martin et al., 2015). The probability of a putative recombination event was corrected by a Bonferroni procedure with a cutoff of p-value <.01. To avoid misidentification, only events supported by at least four of the seven methods were considered to be recombinants.

| Phylogenetic analysis
The CP nucleotide sequences of the PLDMV isolates were aligned using the Muscle algorithm (Edgar, 2004) implemented in MEGA 6 (Kumar et al., 2018). A phylogenetic tree of the PLDMV isolates excluding the recombinants was reconstructed by the neighborjoining method (Saitou & Nei, 1987) (Tajima, 1989). Fu and Li's D and Fu and Li's F values are sensitive to population expansion and are usually negative when the population is expanded (Fu & Li, 1993).

| Field investigation of the virus infection
In recent years, the disease prevalence in Guangdong province was generally 5%-30% but ≥60% in few seriously diseased areas, whereas that in Hainan province was usually 80% but 90%-100% in some areas. This finding suggests that "  (Table 1).

| Symptoms of squash and papaya plants inoculated with representative isolates
The inoculation results of three repetition showed that all of the representative isolates from four groups of the samples caused similar viral symptoms in the inoculated plants of papaya. Specifically, in the early stage (10-15 dpi), mosaic, irregular chlorosis, and green islands on the young leaves of papaya were observed. In the later stage (30 dpi), the leaves were gradually deformed in the shape of chicken feet. However, no symptom was observed on the squash plants ( Figure 2). The total RNAs were extracted from the inoculated leaves of plants, and the PLDMV CP sequences were amplified by RT-PCR. The results showed that PLDMV was detected in inoculated papaya but not in squash. The products of RT-PCR from the inoculated papaya plants were further cloned and sequenced, and the results showed that the nucleotide sequences were essentially coincident with those of the original inoculated isolates. These results indicate that the PLDMV isolates from Guangdong and Hainan provinces can infect transgenic papaya but not squash.  Figure 3). Five representative isolates from Guangdong (GZ20, GZ41, FM80, FM186, and NS195) and five isolates from Hainan (S3, S19, SD11, SD55, and HA5) were selected for comparison of the amino acid sequence of CP with three Japan and Taiwan China isolates (J56P, J199C, and WF). As a result, multiple sites of variation were found ( Figure 4). Among them, only J199C lacks two amino acids in "EK" (glutamic acid and lysine) repeat region. The E in the partial region is converted to N, presenting an "NK" (asparagine and lysine) repeat region that is different from other isolates of potyvirus. In addition, the amino acids of 10 isolates at the three sites at the N-terminus were different from those of J56P, J199C, or WF, that is, from proline (Pro8) to serine (Ser), alanine (Ala10) to valine (Val), and alanine (Ala46) to threonine (Thr). The two sites were changed at the C-terminal, that is, alanine (Ala237) to serine (Ser) and alanine (Ala277) to threonine (Thr). However, only one site was different (Asn117 and Asp117) between the Guangdong and Hainan isolates.

| Analysis of sequence variation in the CP regions of PLDMV
Meanwhile, three conserved DAG, WCIEN, and QMKAAA regions were found in the 13 isolates.

| Analysis of genetic differentiation
The genetic differentiation of 111 CP sequences of the Guangdong

| Recombinant analysis
The RDP software was used to detect the possible recombination events of 111 isolates in Guangdong and Hainan. The results showed that none of the recombination event was supported by at least four of the seven methods. This result indicates that recombination does not occur in all of the Guangdong and Hainan isolates.

| Phylogenetic analysis
A total of 122 CP nucleotide sequences were used to construct a phylogenetic tree by combining 113 sequences from Guangdong and Hainan isolates with 9 sequences from Japan and Taiwan China isolates downloaded from NCBI ( Figure 5). The phylogenetic tree diagram showed that 122 isolates were divided into two groups ( Figure 5). Japan and Taiwan China isolates were clustered into group I, whereas Guangdong and Hainan isolates were clustered into group II, which was further divided into two subgroups. Most Hainan isolates were grouped into subgroup I, and all Guangdong isolates were grouped into subgroup II. The genetic distance value was 0.046 ± 0.006 between groups I and II, 0.040 ± 0.005 within group I, and 0.005 ± 0.001 within group II (

| Selection pressures acting on the CP regions of PLDMV isolates
Numerous selection pressure sites (Table 4)

| Neutrality tests
Although Tajima's D, Fuand Li's D and Fu, and Li's F values for two groups from Guangdong and Hainan were all negative, the data were

F I G U R E 4
The amino acid sequences of the coat protein of papaya leaf distortion mosaic virus isolates from Japan and Southern China showing the deletions (−) in the J199C isolate, conserved DAG, WCIEN, and QMKAAA domains (blue regions) and mutation sites. J56P and J199C were the previously reported main isolates in Japan, and WF was the main isolate in Taiwan China. GZ20, GZ41, FM80, FM186, and NS195 are Guangdong isolates, while S3, S19, SD11, SD55, and HA5 are Hainan isolates. Different letters in the same column indicate different amino acids in that position. The yellow regions mean the glutamic acid and lysine (EK) repeat patterns, the purple regions mean the different amino acids between this study's isolates and three previously predominant isolates in Japan and South China, and the green region means the different amino acid between Guangdong and Hainan isolates. Multiple alignments were performed with CLUSTALW program included in MEGA X

TA B L E 2 Statistical tests for differentiation between two papaya leaf distortion mosaic virus populations from Guangdong and Hainan based on the CP nucleotide sequences
.000 * 1.000 .000 * 0.530 0.220 Note: K ST , S nn , and N m were implemented in DnaSP 5.10, while F ST was in Arlequin 3.5. The hypothesis of deviation from null population differentiation was tested by 1,000 permutations of the raw data.
*0.01 < p < .05; **.001 < p<.01; ***p < .001. significant or extremely significant (Table 5). The results indicate that PLDMV populations in Guangdong and Hainan deviate from neutral mutation and that the two populations may not be stable.

| D ISCUSS I ON
Papaya is an important fruit crop that is widely cultivated in tropical and subtropical regions of countries worldwide (Manshardt, 1992). PRSV is the most widespread and damaging virus that infects papaya and the main factor limiting the development of papaya industry (Mishra et al., 2016;Yeh et al., 2010). The largescale commercial cultivation of antiviral transgenic papaya plants has been used to successful control the disease in Hawaii and South China for a long time (Gonsalves, 1998;Li et al., 2007;Mishra et al., 2016;Tecson Mendoza et al., 2008;Yeh et al., 2010).
However, recent studies have found that PRSV infects "Huanong No.1" transgenic papaya plants in some regions of South China, F I G U R E 5 Phylogenetic tree of 111 isolates of papaya leaf distortion mosaic virus from Guangdong and Hainan and 11 isolates from Japan, Taiwan China, and Hainan derived from GenBank was reconstructed by neighbor-joining method implemented in MEGA6 based on the CP nucleotide sequences. The sequenced PLDMV as reference sequences presented with GenBank Accession number followed the region code: HN for Hainan, TW for Taiwan China and JP for Japan. ▲ is for Hainan, and ■ is for Guangdong and the occurrence tendency increases gradually with time (Zhao et al., 2015). We found that J199C lacked two amino acids in the "EK" (glutamic acid and lysine) repeat region, and E was converted to N, showing an "NK" repeat region different from that of other potyviruses. The "EK" region is located on the outer surface of CP (Shukla & Ward, 1989), which is closely related to aphid transmission element (Atreya et al., 1990). Thus, the change of amino acids in the "EK" repeat region may affect the spread of PLDMV by aphids. Given that PRSV has been restricted for a long time by the widely grown commercial anti-PRSV transgenic papaya in South China, PLDMV may have evolved a new "EK" repeat region suitable for aphid transmission and gradually disseminated and substituted for the original PRSV, which may also be the reason why PLDMV can spread rapidly once it is found in the field. Outside of the "EK" domain, five amino acid sites (three at the N-terminus and two at the C-terminal) in the 10 representative isolates from Guangdong and Hainan were different from those in J56P, J199C, and WF. The N-and C-terminals of CP are related to the long-distance movement, viral particle assembly, and interaction with the 3′-UTR of the virus (Dolja et al., 1995;Jagadish et al., 1991;Rojas et al., 1997;Varrelmann & Maiss, 2000;Yusibov et al., 1996;Zamora et al., 2017). A point mutation from Ser47 to Pro of pea seed-borne mosaic virus or from Asp5 to Lys of tobacco vein mottling virus in the N-terminal of CP alters the systematic spread ability of viruses in Chenopodium quinoa or tobacco, respectively (Andersen & Johansen, 1998;López-Moya & Pirone, 1998).
A single substitution (Ser7 to Gly) at the CP N-terminal of potato virus A reduces virus accumulation by 10-fold but does not affect the rate of systematic transportation (Andrejeva et al., 1999). In the present study, most of the variation sites were located at the N-or C-terminals of CP. Given that China and Japan isolates are much different in terms of virus hosts, we speculate that those variations may contribute to difference of the host range. This study also found that one site (Asn117 to Asp) of Guangdong and Hainan isolates was TA B L E 3 Genetic distances between and within variant groups of papaya leaf distortion mosaic virus isolates based on the CP sequences inconsistent, which provides a basis for distinguishing isolates from the two regions.
Among the indicators for studying species diversity, the haplotype diversity index (Hd) and nucleotide diversity index (Pi) are widely used. Hd refers to the frequency of two different haplotypes randomly selected in sample (Excoffier et al., 1992), and Pi refers to the average number of nucleotide differences or substitutions of a set of DNA sequence unit sites in a population (Nei & Li, 1979). Grant and Bowen (1998)  showed that all of the PLDMV isolates were distinctly divided into two groups ( Figure 5). Although Guangdong and Hainan isolates were divided into different subgroups on the basis of geographical locations, they all belong to group II, whereas J56P, J199C, WF (Bau et al., 2008;Maoka et al., 1995), and other Japan isolates and Taiwan China isolates were classified as group I. The genetic distance values of the isolates between group II and group I were 0.046 ± 0.006, which was significantly greater than that within in groups I or II, which was 0.040 ± 0.005 or 0.005 ± 0.001, respectively. This result suggests that PLDMV isolates from Guangdong and Hainan have evolved into a new lineage.
Recombination is an important factor to promote virus evolution, which can increase genomic biodiversity (Dietrich et al., 2007), reduce mutations in specific genome sequences (Worobey & Holmes, 1999), and restore genome integrity (Garcíaarenal et al., 2001). In addition, recombination enhances the virulence of the virus and extends its host ranges (Escriu et al., 2007). In the present study, obvious recom-  (Whitlock & Mccauley, 1999). When N m < 1, genetic drift between groups is prone to occur, gene flow between groups is not frequent, and genetic differentiation is large; when N m > 1, gene flow can occur between groups through some channel or the geographical distance between two groups is within the group. In the present study, the N m value of PLDMV populations in Guangdong and Hainan was less than 1, indicating that the gene flow between populations is infrequent and that genetic differentiation is obvious, which is consistent with the high genetic differentiation between PLDMV populations obtained from K ST or F ST values.
Given that Guangdong and Hainan provinces are separated by the Qiongzhou Strait, they may have relatively limited migration and diffusion opportunities between different geographical populations, thus limiting gene exchange between populations to some extent.
The positive selection of viral population may endow the virus increased fitness to adapt to new hosts or environments, whereas rapid divergence driven by positive selection has been seldom demonstrated (Nielsen, 1999 (Du et al., 2013). These differences may lead to the gradual differentiation of Guangdong and Hainan isolates and eventually induce those isolates to evolve into two subgroups. Tajima (1989) has developed a statistical method for testing the neutral mutation hypothesis by using the average number of nucleotide differences and the number of segregating sites. If a population experiences a bottleneck or balance, Tajima's D value is significantly higher than zero. If a population experiences a size expansion or directional selection, Tajima

ACK N OWLED G M ENTS
The authors sincerely thank Mr. Fubin Deng for his assistance of the field investigation and sample collection.

CO N FLI C T O F I NTE R E S T S
The authors declare that there are no competing interests.