A putative antimicrobial peptide from Hymenoptera in the megaplasmid pSCL4 of Streptomyces clavuligerus ATCC 27064 reveals a singular case of horizontal gene transfer with potential applications

Abstract Streptomyces clavuligerus is a Gram‐positive bacterium that is a high producer of secondary metabolites with industrial applications. The production of antibiotics such as clavulanic acid or cephamycin has been extensively studied in this species; nevertheless, other aspects, such as evolution or ecology, have received less attention. Furthermore, genes that arise from ancient events of lateral transfer have been demonstrated to be implicated in important functions of host species. This approximation discovered relevant genes that genomic analyses overlooked. Thus, we studied the impact of horizontal gene transfer in the S. clavuligerus genome. To perform this task, we applied whole‐genome analysis to identify a laterally transferred sequence from different domains. The most relevant result was a putative antimicrobial peptide (AMP) with a clear origin in the Hymenoptera order of insects. Next, we determined that two copies of these genes were present in the megaplasmid pSCL4 but absent in the S. clavuligerus ATCC 27064 chromosome. Additionally, we found that these sequences were exclusive to the ATCC 27064 strain (and so were not present in any other bacteria) and we also verified the expression of the genes using RNAseq data. Next, we used several AMP predictors to validate the original annotation extracted from Hymenoptera sequences and explored the possibility that these proteins had post‐translational modifications using peptidase cleavage prediction. We suggest that Hymenoptera AMP‐like proteins of S. clavuligerus ATCC 27064 may be useful for both species adaptation and as an antimicrobial molecule with industrial applications.

Streptomyces clavuligerus ATCC 27064 contains a linear chromosome (6.7 Mb), three linear short plasmids, pSCL1 (10.5 kb), pSCL2 (149.4 kb), and pSCL3 (442.2 kb), and a linear megaplasmid named pSCL4 (1.8 Mb) (Song et al., 2010). These plasmids can recombine with the chromosome, enhancing the genetic interchange of secondary metabolites (Medema et al., 2010). Interestingly, S. cattleya NRRL 3841 and S. lipmannii NRRL 3584 are two species of the genus lacking plasmids (Netolitzky, Wu, Jensen, & Roy, 1995), and even the strain S. clavuligerus F613-1 has a different number of plasmids (Cao et al., 2016). These findings suggest that plasmids could provide a vehicle for genetic interchange by horizontal gene transfer or some other mechanism in this species and even in the genus.
Horizontal gene transfer (HGT) corresponds to the transmission of genetic information between related or unrelated organisms (Soucy, Huang, & Gogarten, 2015). This mechanism has been recognized as a driving evolutionary force, and it allows for gene flow between distant evolutionary taxa (Boto, 2010;Syvanen, 2012).
Therefore, HGT provides a new combination of sequences that usually confer selective advantages to the host, and it can survive in the "new" genome for long periods of time (Soucy et al., 2015).
Furthermore, there are neutral transferred genes that do not confer an immediate benefit but could provide material for variation and posterior innovations (Boto, 2010;Syvanen, 2012).
Horizontal gene transfer has been associated with prokaryotes, so their processes (conjugation, transformation, and transduction) are well understood. However, the advent of sequencing technologies has uncovered a lot of evidence to show that this process also occurs in eukaryotes (Dunning Hotopp et al., 2007;Hirt, Alsmark, & Embley, 2015;Husnik et al., 2013;Sieber, Bromley, & Dunning Hotopp, 2017;Wu et al., 2013). Additionally, there are many instances of HGT events in eukaryotes, for instance the transfer of genes from organelles, such as chloroplasts and mitochondria, to the nucleus (Kleine, Maier, & Leister, 2009) or certain reported cases of gene transmission that occurs in symbiotic relationships between prokaryotes and eukaryotes (Acuña et al., 2012;Boto, 2014;Dunning Hotopp, 2011;Klasson et al., 2014). Examples of HGT involving animals are less detected, presumably because the germline is isolated from somatic cells, meaning that contact with foreign DNA is reduced (Acuña et al., 2012). Despite this observation, several studies have reported animals as acceptors in this process. Genomic sections of the endosymbiotic bacteria Wolbachia pipientis have been found in fruit flies and wasps (Conner et al., 2017;Dunning Hotopp et al., 2007). Anecdotally, HGT in the opposite direction was also observed because genomic regions of the mosquito Aedes aegypti were detected in W. pipientis. Data support the hypothesis that those genes are expressed after an extended evolutionary period because the transfer, with functional significance, produces evolutionary innovation (Woolfit, Iturbe-Ormaetxe, McGraw, & O'Neill, 2009).
Antimicrobial peptides (AMPs), also known as host defense peptides, are molecules from the innate immune system of all living organisms that protect against a host of bacterial, yeast, fungal, and viral infections and modulate immune responses during infections (Zhang & Gallo, 2016). The best-known mode of action of these molecules involves the membrane disruption of microbes by forming cavities and producing cell death (Mahlapuu, Håkansson, Ringstad, & Björn, 2016). However, mechanisms affecting intracellular processes such as cell wall formation, DNA, RNA and protein synthesis, and protein folding have also been described (Cudic & Otvos, 2002;Ho, Shah, Chen, & Chen, 2016;Le, Fang, & Sekaran, 2017). Moreover, certain AMPs have been reported as products of an HGT event.
Finally, a similar case was suggested for thaumatins from nematodes, ticks, and insects, which probably appeared due to HGT from plants (Petre, Major, Rouhier, & Duplessis, 2011).
Antimicrobial peptides exhibit certain properties that give them extraordinary antimicrobial activity and, in some cases, even antitumoral activity. Their shortness in length (usually <100 residues) enables the easy penetration of membranes. In the case of antibacterial peptides, a net positive charge ensures the electrostatic interaction with negatively charged membranes as well as selectivity action against prokaryotes given the neutrality of eukaryote membranes.
Antimicrobial peptides have been classified based on their biological activity (antibacterial, antiviral, antifungal, anticancer, and so on), synthesis machinery (gene encoded and nongene encoded), and different properties, such as peptide charge, length, and hydrophobic content, chemical modifications, and others (Wang, 2017). The most extended classification is based on the secondary structure of peptides, which consists of four families (α-helices, β-sheets, αβ structures, and non-αβ structures) determined by the presence or absence of α and β secondary structures in the three-dimensional structures (Wang, 2017). However, most AMPs lack a 3D structure; thus, a classification has been proposed considering the connection mode of polypeptide chains (Wang, 2015).
Insects are one of the largest producers of AMPs in nature. This is explained by the lack of an adaptive immune system in these animals, meaning that they need to produce a large amount of AMPs from different types as a defense mechanism against pathogens (Bulet & Stöcklin, 2005;Mylonakis, Podsiadlowski, Muhammed, & Vilcinskas, 2016;Yi, Chowdhury, Huang, & Yu, 2014). They release AMPs from the fat body (analogous to a vertebrate's liver) into the hemolymph; thus, they can be distributed throughout the insect body, although other epithelial cells and hemocytes in certain species can also secrete AMPs (Bulet & Stöcklin, 2005). Other important sources of AMPs are insect venom (e.g., melittins and apamins from bees and mastoparans from wasps), which serves as a defense against pathogens or for infecting prey (Moreno & Giralt, 2015), and saliva (e.g., drosomycin from flies and termicin from termites) (Bulet & Stöcklin, 2005;Mylonakis et al., 2016), which protects eggs from infections (Józefiak & Engberg, 2017).
In this study, we report and characterize a putative AMP of the S. clavuligerus strain ATCC 27064 transferred from Hymenoptera insects. We used several programs to predict the AMP annotation.
Additionally, public RNAseq data were used to corroborate the expression of the transferred gene, and we predicted the potential cleavage sites of the protein in congruence with a post-translational scenario. To our knowledge, this is the first time that such an event has been reported and provides hints of the complexity of S. clavuligerus and its capacity to produce antimicrobial compounds.

| Alignments and phylogenetic reconstruction
We performed global alignment with the BLAST hit candidates ob- DNA alignments were also performed with MAFFT 7 using the same parameters described above.

| Contamination elimination and HGT region determination
To avoid obtaining HGT candidates due to contamination in the S. clavuligerus genome project, we verified the existence of similar proteins in our HGT candidates through BLASTp searches in the PATRIC database (https://www.patricbrc.org/). The first searches were performed using PATRIC's representative proteome database (e-value threshold 10). Next, we used the Streptomyces database and finally the S. clavuligerus database, which has available the complete proteome of the two strains (ATCC 27064 from three different projects and F613-1) plus several plasmids.
To verify the expression of HGT candidates, we used the RNAseq data available in GEO (GSE104738, bioproject PRJNA413703). These data contain deep sequencing of S. clavuligerus (strains F613-1 and ATCC27064) mRNA. We selected the genetic region that encodes one of the HGT protein candidates, and then used this as a template to map readings from RNAseq experiments. To perform the mapping process, we used Geneious mapper (iterations of up to 5 times) from Geneious 10 (http://www.geneious.com; Kearse et al., 2012).
To detect putative HGT regions in the plasmid pSCL4 of S. clavuligerus ATCC 27064, we used the software Alien Hunter (Vernikos & Parkhill, 2006), optimizing predicted boundaries with a change-point detection 2 state 2nd order HMM.

| AMP candidate annotation and posttranslational modification prediction
Annotation of our candidates as AMPs was made through several web servers available for this purpose. To estimate the accuracy of the AMP prediction methods, we selected a list of verified AMPs as positive controls and housekeeping genes as negative controls (see Supporting Information Table S1).

| Interdomain HGT candidates in Streptomyces clavuligerus ATCC 27064
We applied a pipeline adapted from Armijos-Jaramillo et al.
(2015) and Armijos-Jaramillo, Santander-Gordón, Soria, Pazmiño-Betancourth, and Echeverría (2017)   Thus, it could be that the original copy of the gene is in the chromosome and is then excised to the plasmid. An alternative scenario is the appearance of one copy in the pSCL4 megaplasmid without trespassing to the chromosome. In any case, the similarity between these two paralogs suggests a recent duplication event.  Figure 1. The topology observed in this tree strongly suggests the direction of the transference from Hymenoptera to S. clavuligerus or at least this is the last bacterial descendent (discovered so far) that maintains a copy of that gene in its lineage.
Despite the fact that the origin of HGT events cannot be determined with certainty, several reports conclude that close ecological interactions increase the chances of lateral gene transmissions (Degnan, 2014;Venner et al., 2017;Wang & Liu, 2016).
In that sense, the interaction between the Streptomyces species and Hymenoptera has been constantly reported. That is the case for the fungus-growing ants, which use antibiotics produced by Streptomyces bacteria to protect their cultivations (Currie, Scott, Summerbell, & Malloch, 1999

| Expression evidence of the HGT candidates
We mapped RNAseq data from S. clavuligerus ATCC 27064 to the genomic scaffold DS570654 (BioProject PRJNA28551), from 42,968 to 43,734. This region is the same in the three sequencing projects (PRJNA28551, PRJNA42475, and PRJNA19249) and encodes one of the two copies of Hymenoptera AMP-like proteins (EFG03588). We observed a clear expression signal along this region (Figure 2 Lawrence and Ochman (1997) proposed the idea that foreign genes undergo amelioration in order to adapt their sequence to the host genome. This proposition suggests that laterally transferred genes could be differentiated from indigenous ones by several specific features of the donor and receptor genomes. However, by the same effect of amelioration, only "recently" transferred genes could be identified by this approximation because transferred genes over sufficient time ("old" transferred genes) evolve to imitate the receptor genomic features. Considering this argument, we predicted unusual regions in the pSCL4 plasmid genome (contig CM001019).

| Inference of HGT recentness
We performed this task through interpolated variable order motifs, a technique implemented in the software Alien Hunter (Vernikos & Parkhill, 2006). This program located six potentially foreign regions in pSCL4: 1-7,500, 19,165-26,714, 46,125-62,763, 73,381-80,082, 102,991-113,397, and 134,187-144,263 (these positions are relative to CM001019). The codifying gene of EFG03588 is located in the region 108,662-109,105 and that of EFG03676 is located in the re- gion 196,642-197,085. Thus, only the codifying gene of EFG03676 is located inside an Alien Hunter HGT predicted region. EFG03676 and EFG03588 are 98% identical; thus, it is improbable that the difference in the predicted regions was produced by these sequences. before the appearance of Apocrita (Hymenoptera suborder). In agreement with the TimeTree database (Kumar, Stecher, Suleski, & Hedges, 2017), this event occurred between 227 and 254 million years ago. The correlation of this time range with HGT occurrence should be considered with precaution because it is based upon the impossibility of associating S. clavuligerus proteins with a particular Apocrita clade. When more Apocrita sequences become available, a more precise determination of the time of lateral transfer can be achieved. Additionally, it is possible to identify the specific Apocrita species that underwent transfer.
In this scenario, the estimated time of the transfer could be ostensibly lower.

| Size estimation of the HGT region
We performed alignment within the codifying sections of Hymenoptera AMP-like paralogs (EFG03676 and EFG03588) and their neighbor genes in the pSCL4 genome ( Figure 3a). In this alignment, we identified a region of 736 bp with 97.8% identity that contains the coding genes of EFG03676 and EFG03588, the last section of the upstream coding genes of the hypothetical proteins EFG03677 and EFG03589 and part of the intergenic section with the downstream genes that codify proteins EFG03675 and EFG03587. Thus, we propose that the entire section of 736 bp was duplicated inside the pSCL4 plasmid.
To explore the size of the HGT event in the current S. clavuligerus genome, we performed a BLASTn search using the 736-bp section annotated as duplicated as a query. We found that only a region of 293 nucleotides has sufficient similarity to retrieve results from Hymenoptera genome sequences (this region excludes the signal peptide) (Figure 3b). In addition, we performed BLASTp searches of the contiguous proteins of AMP-like sequences (data not shown).
The results for EFG03676 and EFG03588 (left contiguous sequences) showed these proteins had a vertical transmission phyletic pattern that was highly similar to that of other Streptomyces proteins. By contrast, the BLAST results of EFG03676 and EFG03588 proteins (right contiguous sequences) showed similarity to their own or partial sequences of S. clavuligerus. This result shows a misannotation or the possibility that these proteins can be unique to S. clavuligerus. In any case, the contiguous sequences to Hymenoptera AMP-like proteins showed no HGT signal, at least in the current annotation state.
F I G U R E 3 (a) Schematic view of the alignment between CM00914 sections that contain the coding regions of EFG03676-EFG03588 paralogs and their neighbor genes. The section labeled "Duplicated region" shares 97.8% identity. (b) Query centric view of the BLASTn performed on the duplicated region observed in (a). BLAST hits with NCBI codes are shown in the left column with the higher taxonomic level. The region with metazoan hits was annotated as the putative HGT region. The percentage of identity is proportional to the color intensity (with black denoting more likeness and white, less). The genomic annotations are labeled over and under the alignment in yellow or gray

| Antimicrobial peptide prediction
To explore whether the HGT candidates maintain the same annotation of their xenologs, we used several AMP prediction software platforms (see Section 2). First, we established the sensibility and specificity of each predictor (Supporting Information Table S1) because we observed heterogeneity in the results obtained from preliminary analysis. Next, we submitted the candidate sequences to the predictors with the highest sensitivity and specificity values (ADP3, AMPA, MLAMP, and AMP Scanner Vr.2).
ADP3 predicted the entire EFG03588 and EFG03676 sequences as AMPs. Its output provides estimations of physicochemical parameters and sequence structure. In the case of HGT candidates, the formation of alpha helices and hydrophobic surfaces was predicted, so they may interact with membranes and function as AMPs.
AMPA predicted the peptide TIRFKSQRGHGI (from 126 to 137) of the sequence EFG03676 as an AMP; meanwhile, the sequence EFG03588 was not predicted as an AMP. Both sequences differ in position 134 with G instead of S. Using the regions between 126 and 137 of the sequences EFG03588 and EFG03676, the MLAMP program predicted these peptides as AMPs with a probability of 0.8 and 0.9, respectively, and as antibacterial AMPs with probabilities higher than 0.8 in both cases. AMP Scanner Vr.2 predicted that neither the entire sequence nor the fragments are AMPs.
Despite the increasing number of AMP predictors (Liu, Fan, Sun, Lao, & Zheng, 2017;Porto, Pires, & Franco, 2017) and various approximations used for this task, the diversity of AMPs makes it difficult to use them as an accurate prediction of these types of molecules. Considering this limitation, the prediction of our candidates using 3 of the 4 most sensible programs evaluated in this study makes us confident of the veracity of the annotation.

| Post-translational modification of AMP candidates
Given the predictions of AMPA and MLAMP predictors, we followed the possibility that our candidates were encrypted. These peptides need post-translational processing from a mature protein in order to be functional (Brand et al., 2012). There is evidence that many AMPs originate from precursor proteins that undergo post-translational modifications, so these sequences have to undergo cleavage or other modifications in order to be functional (Zhang & Gallo, 2016). Thus, the APD database (http://aps.unmc. edu/AP/) has a collection of 24 different post-translational modifications that are common among AMPs . This We performed the prediction of the cleavage sites of our candidate proteins using the web server PROSPER, and we found that they could be cut in different positions by several proteases. Remarkably, the server predicted cutting sites (by serine proteases) near to the AMP prediction of the AMPA and MLAMP programs. Thus, the PROSPER results suggest that an elastase-2 makes cuts in the sequence EFG03588 at positions 25,96,99,127,140,and 141 and in EFG03676 at positions 25,33,96,99,127,140,and 141. The cleavage of the residues 127 and 141 includes the peptide predicted with AMPA and MLAMP. The resultant peptide of the cleavage in these two positions (RFKSQRG/ SHGIDFV) was also predicted as AMP by the AMPA and MLAMP programs.
Elastase-2 belongs to the serine proteases family, which are widely extended in all kingdoms of life (Tripathi & Sowdhamini, 2008), including S. clavuligerus, which has thirteen of these proteins annotated in its proteome. We performed BLASTp searches of S. clavuligerus serine proteases in the MEROPS database (https://www.ebi.ac.uk/merops/index.shtml) and found eight serine proteases of the S1 family, four of the S8 family and one of the M6 family. Elastase-2 belongs to the S1A subfamily, and we found three of these sequences in the S. clavuligerus proteome. The presence of these enzymes suggests that S. clavuligerus possesses the machinery to cleave EFG03676 and EFG03588 proteins. What is more, the signal peptide of these proteins could be used to translocate the sequence to some transporter (like an ABC transporter) to process the protein and then secrete the peptide outside the cell. A similar mechanism has been observed in class IIa bacteriocins (Drider, Fimland, Héchard, McMullen, & Prévost, 2006;Perez, Zendo, & Sonomoto, 2014). In this scenario, Hymenoptera AMP-like sequences could be translocated, guided by the signal peptide, and then be submitted to post-translational proteolytic cleavage using serine proteases (and likely many other enzymes and complexes) before being secreted outside the cell (Figure 4).
Although some other peptides (especially with cyclic conformation) with antimicrobial and/or antitumoral effects have been reported in the Streptomyces genus (Shetty, Buddana, Tatipamula, Naga, & Ahmad, 2014;Um et al., 2013;Zhou et al., 2014), this is the first study to detect an AMP that is laterally transferred in this group of organisms. More than 2,500 natural or encrypted AMPs have been described previously (Zhang & Gallo, 2016). However, some of those have low antimicrobial activity or an unknown function. Therefore, the stable HGT genes were demonstrated to play relevant roles in receptor organisms (Belbahri, Calmin, Mauch, & Andersson, 2008;Friesen et al., 2006;Slot & Rokas, 2010;Tiburcio et al., 2010;Wenzl, Wong, Kwang-won, & Jefferson, 2005). The rationale behind this observation suggests that the increase in the fitness generated by the HGT gene in the receptor organism compensates or overtakes the negative selection that a gene of this type would undergo, because a foreign gene (especially in the interdomain HGT) should overcome the incompatibility of promoters and an alternative genetic code, without mentioning the cost generated by the synthesis and duplication of new genetic material, especially in prokaryotic genomes that tend to be highly efficient (Kuo & Ochman, 2009;Mira, Ochman, & Moran, 2001). Thus, the presence of Hymenoptera AMP-like genes in S. clavuligerus ATCC 27064 should play an important role in this strain. This assertion is supported by the estimated time in which the Hymenoptera transferred genes are retained in the S. clavuligerus genome. Additionally, the presence of two copies in the genome suggests an expansion of the family, perhaps to produce dosage duplication. These lines of evidence lead us to propose that Hymenoptera AMP-like proteins could be useful antimicrobial molecules for S. clavuligerus ATCC 27064 with several biotechnological applications that can be explored.

ACK N OWLED G M ENTS
This research was supported by internal funds of the Universidad de Las Américas-Ecuador under the project BIO.VA.17.04. We would also like to thank Jennifer Gallardo for her valuable work on this study.

CO N FLI C T O F I NTE R E S T
None declared.

AUTH O R CO NTR I B UTI O N S
SA-R performed experiments and wrote the first draft of the manuscript; DS-G performed experiments and edited the manuscript; ET and YP-C helped in experimental design and in manuscript editing; and VA-J designed and performed the experiments and wrote and edited the manuscript.