Triggering of cancer cell cycle arrest by a novel scorpion venom‐derived peptide—Gonearrestide

Abstract In this study, a novel scorpion venom‐derived peptide named Gonearrestide was identified in an in‐house constructed scorpion venom library through a combination of high‐throughput NGS transcriptome and MS/MS proteome platform. In total, 238 novel peptides were discovered from two scorpion species; and 22 peptides were selected for further study after a battery of functional prediction analysis. Following a series of bioinformatics analysis alongside with in vitro biological functional screenings, Gonearrestide was found to be a highly potent anticancer peptide which acts on a broad spectrum of human cancer cells while causing few if any observed cytotoxic effects on epithelial cells and erythrocytes. We further investigated the precise anticancer mechanism of Gonearrestide by focusing on its effects on the colorectal cancer cell line, HCT116. NGS RNA sequencing was employed to obtain full gene expression profiles in HCT116 cells, cultured in the presence and absence of Gonearrestide, to dissect signalling pathway differences. Taken together the in vitro, in vivo and ex vivo validation studies, it was proven that Gonearrestide could inhibit the growth of primary colon cancer cells and solid tumours by triggering cell cycle arrest in G1 phase through inhibition of cyclin‐dependent kinases 4 (CDK4) and up‐regulate the expression of cell cycle regulators/inhibitors—cyclin D3, p27, and p21. Furthermore, prediction of signalling pathways and potential binding sites used by Gonearrestide are also presented in this study.

high capacity for structural modifications and high target specificity. [5][6][7][8] Unlike most conventional chemotherapies, many anticancer peptides have the capacity to specifically and selectively target cancer cells and they can also be used in combination with other anticancer therapeutics, with which the observed synergistic effects have been found to improve outcomes. 9,10 Nature has always been a capable and predictable source of remarkable pharmaceutical substances with potential usage for the treatment of many diverse diseases. It has been proven that peptides/proteins are generally the major components of venoms, and many of these have either shown high potential or indeed had such realised, to defeat diseases such as bacterial/fungal infections, cardiovascular disorders, diabetes and cancers. [11][12][13] However, less than 1% of the peptide/protein components of venoms have been well studied at present. 6 One of the reasons for this is due to the fact that traditional proteomic approaches to address this problem are slow. Recently, improvements in liquid chromatography-tandem mass spectrometry (LC-MS/MS) have made it more accurate, sensitive and amenable to high-throughput de novo sequencing of venom peptidomes/proteomes. [14][15][16] Also, the development of next-generation DNA sequencing technology can also facilitate higher throughput in this process and indeed is powerful enough and cost-efficient for ultrahigh-throughput transcriptome analysis. [17][18][19][20][21] Some venom researchers have started to apply these two approaches, but erroneous assembly is a major limitation of next-generation sequencing technology, while LC-MS/MS requires an existing database of structures to validate the de novo sequencing results. [22][23][24] Hence, a highthroughput platform combining transcriptome and proteome sequencing was established in this study and employed successfully to enable large-scale, high-throughput identification of novel bioactive peptides in venoms.
Based on this platform, we have identified a panel of novel potential anticancer peptides in scorpion venoms. Anticancer peptide research has been performed at a low level for around 50 years now with limited success and understanding of their mechanisms of action is still in the initial stage. To reveal the potential anticancer mechanisms of candidate peptide, a recently invented transcriptomecentric strategy was employed to predict their putative functions and targeted signalling pathways. Compared to traditional mechanistic study approaches, this was able to monitor the collective responses of all relevant genes without specific mechanistic or targeting hypotheses. [25][26][27][28][29][30][31] On the other hand, although the transcriptome-centric approach has been applied in some research, the enormous challenges in terms of data processing, storage and interpretation as well as sequencing quality control have been huge limitations which have hindered the translation from sequence data to clinical practice, but a few studies have succeeded in proving the concept. [32][33][34][35] Hence, here, we performed related in vitro, in vivo and ex vivo experiments to lend further substance to the validation of this approach.
In this study, we initially isolated a panel of potential anticancer peptides from venoms using this state-of-the-art high-throughput platform. Subsequently, a transcriptome-centric method was applied to address and to reveal the putative anticancer mechanism of the lead peptide candidate, followed by in vitro, in vivo and ex vivo experimental validation. In addition, we hypothesized as to the involvement of specific signalling pathways and potential binding sites through bioinformatic analyses and use of 3D modelling construction software. Finally, one peptide candidate (Gonearrestide) was identified to affect cancer cell proliferation in vitro and reduce tumour growth in vivo, while negligible cytotoxicity was observed on normal human epithelial cells and erythrocytes. These current data suggest that Gonearrestide has a high potential for development as an anticancer drug. In addition, the findings presented here have further validated the ever-increasing potential of this high-throughput platform and transcriptome-centric mechanistic study strategy to reveal additional novel anticancer peptide drug candidates.

| Transcriptomic procedures
Five mg sample from each lyophilized scorpion venom was dissolved in 1 mL of cell lysis buffer (Thermo Fisher Scientific). Polyadenylated mRNA was extracted by use of a Dynabeads â mRNA DIRECT TM Kit (Ambion by Life Tech), then subjected to a cDNA library construction procedure using a NEBNext â Ultra TM Directional RNA Library Prep Kit for Illumina â (New England BioLabs Inc.). According to the instructions, the cDNA library construction consisted of several main steps. First of all the mRNA fragmentation was achieved by incubating at 94°C with random primers. Then the RNA fragments were subjected to first strand cDNA and second strand cDNA syntheses.
After purification with 1.8X Agencourt AMPure XP beads, end repair of the cDNA library was performed by incubating with the end repair reaction buffer and the nucleotide A was added to the 3 0 end of the DNA fragments to avoid self-ligation. Therefore, the adapters with the nucleotide T at the 3 0 end were ligated to the DNA fragments. At last, the DNA fragments with adapters were enriched by PCR reactions and purified by AMPure XP beads. The quality of the cDNA library was verified using an Agilent 2100 Bioanalyzer (Agilent Technologies) with an Agilent DNA 1000 kit (Agilent Technologies).
The quantity of the cDNA library was validated by qPCR with a KAPA SYBR â FAST qPCR kit (KAPA Biosystems). The cDNA library was then loaded into a flow cell with oligos complementary to the adapters to generate clusters through bridge amplification. Finally, the transcriptome was obtained by performing RNA Sequencing on the Illumina MiSeq platform. The raw data obtained from the Miseq platform were analysed as follows. Firstly, the index primers used to identify different samples were removed. Secondly, the data were LI ET AL.
| 4461 transferred to the program FastQC 0.1.0.1 to filter out the reads with low quality and less than 25 nucleotides. The filtered data were saved as fastq files. At last, the clean reads were de novo assembled using Trinity 2.0.6 software to obtain the transcriptome with default parameters.

| Proteomic procedures
A second set of 5 mg lyophilized scorpion venom sample was dissolved in phosphate buffer (50 mmol/L, pH 7, containing 0.15 mol/L NaCl), and subjected to AKTA Avant 150 (GE Healthcare Life Science) fractionation using a Superose â 12 10/300 GL column (Sigma Aldrich) to filter out large proteins with molecular masses higher than 10 KD. After this, Tris (2-chloroethyl) phosphate (TCEP) was used to reduce disulphide bonds, and iodoacetamide (IAA) was used to alkylate free cysteine residues. At last, desalting was performed using PierceTM C18 Spin Tips (Thermo Scientific). Proteomics was performed using the Q Exactive TM Hybrid Quadrupole-Orbitrap Mass Spectrometer (Thermo Scientific) with a nano-LC system. Samples were loaded onto the Acclaim pepMap100, 75 lm 9 2 cm C18 trap column (Thermo Scientific) using LC-MS water (containing 2% ACN, 0.1% formic acid). Samples were trapped onto the column at 6 lL/min for 5 minutes for pre-concentration, then the trap column was connected to an EASY-Spray column, 15 cm 9 75 lm ID, 3 lm-C18 particle sizes (Thermo Scientific) by a 10 port valve automatically. Eluting was performed at a flow rate of

| Comparison between transcriptome and proteome
The software, PEAKS (version8.0) (Bioinformatics Solutions Inc., Waterloo, ON, Canada), was used to interrogate the transcriptome database with parameters set as: precursor ion mass tolerance, 3 ppm; fragment ion mass tolerance, 0.01 Da; no enzymes employed; a variety of post-translational modifications (PTMs) including cysteine carbamidomethylation and oxidation, were used and false discovery rate (FDR) was at ≤1%. The peptides identified high confidence and accuracy, constituted the prospective venomderived peptide libraries.

| Molecular cloning procedures
A third set of 5 mg lyophilized scorpion venom sample was dissolved in 1 mL of cell lysis/mRNA protection buffer, and polyadenylated mRNA was isolated from each by magnetic oligo-dT beads as described by the manufacturer (Thermo Fisher Scientific). The isolated mRNA was subjected to 5 0 -and 3 0 -rapid amplification of cDNA ends (RACE) procedures using a SMART-RACE kit (Clontech, UK).
The 3 0 -RACE reactions were purified and cloned using a TOPO â TA Cloning â Kit (Invitrogen) and sequenced using an ABI 3100 automated sequencer. The nucleic acid sequences were isolated by employing the primers below, which were designed to highly-con-

| Novel venom-derived peptide library construction and peptide synthesis
The prospective novel venom-derived peptide libraries were subjected to BLAST analyses against the NCBI non-redundant database and the peptide sequences which produced hits with the sequences on this were then subjected to Pfamscan online searching to identify whether the sequences aligned to any reported toxins. The peptide sequences reported before were removed manually during this step.
The novel peptides were manually filtered by removing very short sequences and finally, the venom-derived peptide library was constructed with all the peptide sequences selected as above. The

| Haemolysis assay
Human erythrocytes were washed with PBS until the supernatant was clear, and then samples of a 4% (v/v) suspension were treated with peptide concentrations ranging from 1 to 250 lmol/L at 37°C for 1 hour. Lysis of cells was assessed by haemoglobin release, measured by optical density changes at k550 nm, and calculated compared to positive controls. Positive control groups and negative control groups were treated with equal volumes of Triton X-100 and PBS instead of peptide, respectively.   were stored at À80°C prior to use. Equal amounts of protein samples at each time-point, including those from the blank control group, were subjected to SDS-PAGE and then transferred to a membrane. Membranes were then blocked with 5% non-fat dry milk in TBST (0.1% Tween) for 2 hour at room temperature, and then incubated with primary antibodies (against Cyclin D1, Cyclin D3, CDK2, CDK4, CDK6, p18, p21, p27) at 4°C overnight. The next day, TBST (0.1% Tween) was used to wash membranes three times, after which secondary HRP conjugated antimouse/anti-rabbit antibody was incubated with membranes for 1 hour at room temperature. After washing with TBST (0.1% Tween) another three times, Immobilon Western HRP Substrate was employed for chemiluminescent detection in Western blots. Tumours were snap-frozen for further histological analysis. On the one hand, tumours were fixed and cut into slides. Immunohistochemical staining of tumour slides were performed by de-paraffinizing in xylene and rehydrating in graded ethanol, followed by incubation in sodium citrate buffer (pH 6.0) for high-pressure antigen retrieval. Afterwards, slides were incubated in 3% hydrogen peroxide to block endogenous peroxidase activity, and incubated with cell cycle checkpoint antibody CDK4 at 4°C overnight. On the second day, the slides were incubated with secondary antibody and subjected to diaminobenzidine staining. After counterstaining with 20% haematoxylin, slides were dehydrated and mounted on cover slips.

| LDH assay
The IHC-stained tissue sections were reviewed under a microscope.
In addition, mRNA and proteins were isolated from frozen tumours for qPCR and Western blot analysis. The procedures were for both techniques were described before.

| RNA sequencing with next-generation sequencing technology
In total, there were 514 974 and 625, 389 raw reads generated for the scorpion species, Androctonus mauritanicus (AMa) and Androctonus australis (Egypt) (AAu), respectively. As these two scorpion species do not have their genomes assembled so far, the transcriptome data were de novo assembled by use of Trinity software. Consequently, there were 112 235 and 46 360 nucleic acid sequences, respectively, obtained for each species (Table 1A). The transcripts were saved as Fastq files (S1 and 2).

| Isolation and de novo sequencing of proteomes through use of a highly accurate and sensitive LC-MS/MS system
The raw spectra generated from the AMa venom contained 20 606 peptides fragments. After alignment, the remaining 9225 peptide fragments obtained were filtered with the filtration parameter setting of an average local confidence of more than 50%. The de novo sequencing results including 3346 peptide fragments were saved as Excel files (S3 ; Table 1A). Similarly, from the AAu venom, we identified 2393 different peptide fragments after filtered from 7464 peptide fragments (S4 ; Table 1A).

| Comparison between respective transcriptomes and proteomes
In AMa transcriptome and proteome analysis, there were 1176 peptide fragments in the overlapping region between transcriptome and proteome databases for the two species, which aligned to 128 peptides (Table 1A). With the same method, 110 peptides were aligned from 924 peptide fragments in AAu transcriptome and proteome analysis (Table 1A). The overlapping peptides were regarded as the prospective venom-derived peptide libraries and saved as Excel files (S5).
To further validate our findings, we also compared AMa and AAu proteomes with an online scorpion protein database (UniProt). In total, there were 130 peptides matched with UniProt database, which indicated the robustness of our data.

| Molecular cloning
There were eight peptide sequences isolated through the molecular cloning approach, including three novel peptide sequences. All these eight peptide sequences were found in the overlapping region of the transcriptome and the proteome results, which validated the new high-throughput approach. The nucleotide and translated open-reading frames of the three novel sequences were saved ( Figure S1).

| Novel venom-derived peptide library construction and peptide synthesis
The overlapping regions described previously were subjected to a variety of bioinformatic analyses and the filtered results were saved ( Figure S2). As a result, novel venom-derived peptide libraries with 41 and 30 peptides were obtained for scorpion species AMa and AAu, respectively (S5). Twenty-two peptide sequences were initially selected to synthesize for use in further analyses (Table S1).  Table S1) was found to have the most potent activity on human colon cancer cell line HCT116, thus it was chosen as a lead peptide for further investigation. The does-dependent anti-proliferation effect of Gonearrestide was shown in Figure 1A.

| Haemolysis assay
This demonstrated that Gonearrestide had negligible cytotoxicity on human erythrocytes with a haemolytic activity below 5% (positive control was regarded as 100%), indicating that Gonearrestide was a worthy candidate for further study with the view to its clinic application ( Figure 1B).  Gonearrestide affected cancer cell proliferation through inhibition of growth ( Figure 1C).

| Determination of cellular location of peptide through use of confocal microscopy
We Taken together, it demonstrated that Gonearrestide was specifically bound to the cancer cell membrane, and induced subsequent reactions via membrane-related signalling pathways which will further be hypothesized and analysed in our "cell-peptide" RNA sequencing experiments.

| Apoptosis assay (Phosphatidylserine exstrophy detection)
Annexin V and PI staining along with flow cytometry can differentiate individual cells into different stages. The results of these experiments were shown in Figure 1E. The data showed that Gonearrestide did not induce apoptosis of HCT116 cells when compared to blank controls-a finding which was also consistent with our RNA sequencing data in this study.

IncuCyte live cell imaging system
As shown in Figure 1F, the proliferation of HCT116 cells treated with Gonearrestide was much slower than that observed for the blank controls. The results proved that Gonearrestide could inhibit cancer cell growth which again was consistent with our RNA sequencing data in this study.

| RNA sequencing revealed the biological function and signalling pathways altered by peptide treatment
As Gonearrestide had the best activity on the cancer cell line HCT116 with no obvious cytotoxicity on normal human cell lines and erythrocytes, it was chosen for further experiments to identify its molecular mechanism of action. There were three groups employed in this assay: HCT116 treated with Gonearrestide (HCT116_Gonearrestide), HCT116 treated with a negative control peptide (HCT116_NC) and HCT116 treated with PBS (HCT116_BC).
The negative control peptide was one of the previously identified peptides which showed no anticancer cell activity.
Approximately 40 million reads were generated per sample and around 90% of these were aligned to the human genome with a correlation value of 0.9 between and across replicates ( Figure S3). The top 500 variable genes (based on standard deviation) demonstrated that the replicates were consistent but slight differences were present between the blank control and the negative peptide groups ( Figure S4). Genes were determined as being significantly differentially expressed (DEG) based on a twofold log change and a q-value cut-off of absolute (0.6) and 0.05, respectively (S6). After filtration with the fold change and statistical significance, the remaining significantly differentially expressed genes were shown in Table 1B Figure 2C, such as phagosome, endocytosis and lysosome, which are all related to membrane reactions. Hence, we hypothesized that Gonearrestide worked through initial combination with cancer cell membranes, then produced subsequent effects.
Based on this, confocal imaging experiments were designed and conducted as described in Figure 1D. Gonearrestide was shown in Figure 3A and B. These data demonstrated that Gonearrestide arrested cancer cell cycle in G1 phase, and was consistent with our findings in Figure 2D which showed the cell cycle signals were actually down-regulated after Gonearrestide treatment.

| Identification of proteins involved in cell cycle processes by Western blotting
As Gonearrestide could arrest HCT116 cells in G1 phase, G1/S checkpoint protein antibodies were employed to identify the regula-

| Prediction of signalling pathways involved
Through combination of RNA sequencing data and biological in vitro/in vivo results, we hypothesized the signalling pathways LI ET AL.
| 4467 involved in cell cycle progression, which were shown in Figure 5A.
We believed that after Gonearrestide bound with the cancer cell membrane, cationic Gonearrestide could affect PIP3 by electrostatic attraction to its anionic trisphosphate group, resulting in the inhibition of the Akt pathway. PTEN, a lipid phosphatase that catalyses the dephosphorylation of PIP3 to produce PIP2, is a major negative regulator of Akt signalling. The inhibition of Akt was followed by the up-regulation of FOXO1/3, GSK-3b and p21, inducing the dysregulation of CDK4/6 and CDK2. The decline in CDK4/6 and CDK2 would inhibit the phosphorylation of retinoblastoma (RB) protein, which could release the transcription factor, E2F/DP, from the RB-E2F/DP complexes, thereby promoting cells entry from G1 to S phase.
Among all these genes, p27, p21, cyclin D3 and CDK4/6 were evaluated through Western blot assays, which further proved this prediction. On the other hand, after bioinformatics analysis, a series of biomarkers involved in peptide treatment related to colon cancer were also identified ( Figure S5).

| Hypothesis of potential binding site
It has been reported that phosphatidylserine (PS) and phosphatidylethanolamine (PE) can enhance membrane poration by a peptide with anticancer properties. 48

| DISCUSSION
Venom from animals has been proven to be a useful source of drug candidates in natural drug discovery to fight against diseases. 49,50 Especially for venom-based peptide/toxin, it has been paved new insights into therapeutic and diagnostic potential for cancer treatment in the last decade. 51  snake venom from Walterinnesia aegyptia and Vipera ammodytes meriodionalis, respectively, induced apoptosis on a broad spectrum of cancer cell lines. [52][53][54][55] Furthermore, the combination of WEV and silica nanoparticles efficiently enhanced the in vivo suppressive effect in mouse models. [52][53][54] To increase the identification of potential venom-sourced anticancer peptides, a high efficiency and low-cost screening platform is urgently required. In this study, a high-throughput screening platform consisting of transcriptome and proteome sequencing is described and has been of proven efficacy. Based on the use of this platform, complete peptide libraries of venoms from the scorpions, Androctonus mauritanicus (AMa) and Androctonus australis (Egypt) (AAu), have been constructed. To confirm the validity of this platform, a traditional cloning approach was also applied in parallel. The traditionally-derived cloning data provided evidence of the robust nature of the described high-throughput screening platform. The coupling of transcriptomic and proteomic/peptidomic approaches using bioinformatics produced venom-derived peptide panels, which could be used to explore the potential therapeutically useful peptides present in respective venoms. 58 Within the scorpion peptide panels, bioinformatics filtration and MTT screening were initially applied to detect novel candidate anticancer peptides. As a result, peptide 13(Gonearrestide) was identi-

CONFLI CTS OF INTEREST
The authors confirm that there are no conflict of interest.