Prediction of B cell and T‐helper cell epitopes candidates of bovine leukaemia virus (BLV) by in silico approach

Abstract The bovine leukaemia virus (BLV) is a retrovirus responsible for enzootic bovine leukaemia (EBL) disease, the most common cattle disease leading to high annual economic losses to the cattle breeding industry. Virus monitoring among the sheep and cattle herds is usually done by vaccination. Inactivated virus vaccines can partially protect the livestock from viral challenge. However, vaccinated animals are likely to be infected. So, there is an essential need for producing vaccine by other methods. Gp60 SU, encoded by Env gene, is the surface glycoprotein of BLV detected to be the major target for the host immunity against the virus. Different stages were performed to predict the potential B and T‐helper cell epitopes. The general framework of the method includes retrieving the amino acid sequence of gp60 SU, conducting the sequence alignment, getting the entropy plot, retrieving the previously found epitopes, predicting the hydropathy parameters, modelling the tertiary structure of the glycoprotein, minimizing the structure energy, validating the model by Ramachandran plot, predicting the linear and discontinuous epitopes by various servers and eventually choosing the consensus immunogenic regions. Ramachandran plot scrutiny has demonstrated that the modelled prediction is accurate and suitable. By surveying overlaps of various results, 4 and 2 immunogenic regions were selected as linear and conformational epitopes respectively. Amino acids 35–53, 67–97, 288–302 and 410–421 and those of numbers 37–58 and 72–100 were the regions selected as linear and conformational epitopes respectively. The tertiary structure of the final epitope was modelled as well. A comparison of the predicted epitopes structure with that of gp60 SU envelope, illustrated that the tertiary structure of these epitopes does not change after being separated from the primary complete one. The present achievements will lead to a better interpretation of the antigen–antibody interactions against gp60 in the designing process of safe and efficient vaccines.


| INTRODUC TI ON
Enzootic bovine leukaemia (EBL) is the most common neoplasia and prevalent disease of cattle, caused by a retrovirus called bovine leukaemia virus (BLV; Ghysdael et al., 1984). The EBL disease is one of the most prevalent diseases causing high economic losses to the cattle breeding industry each year (Gillet et al., 2007). BLV is usually controlled by vaccination among sheep and cattle herds. Inactivated virus vaccines are obtained from persistently infected cell lines. The inactivation process is done through treating the virus with various chemical agents. Inactivated virus vaccines, including specific neutralizing humoral response, partially protect sheep and cattle from low dose viral challenge. However, vaccinated animals are infected with high challenge doses. Thus, it is of high necessity to develop vaccines using other methods (Rodríguez et al., 2011).
Many researches have been conducted on the different proteins of the virus. Altaner, Ban, Altanerova, and Janik (1991) illustrated the vaccination with dissimilar alive cells of the virus which lead to the productions of Env gene (gp60 SU surface glycoprotein) in the body protecting the cattle against bovine Leukaemia (Altaner et al., 1991). Portetelle et al. (1991) surveyed the gene of BLV surface proteins.
They demonstrated that the vaccination with Env vaccine recombinants is capable of protecting the cattle against infection .
Experimental approaches to predict immunogenic residues are too expensive and time consuming. Several computational tools have been developed in order to predict B and T-helper cell epitopes (Yang & Yu, 2009). One of the essential challenges in the field of in silico vaccine design is the epitopes prediction (Almagro, 2004).
According to the above-mentioned points about the immunogenicity of gp60 SU envelope glycoprotein and its function in causing EBL disease, the present investigation aims to predict regions involved with B and T-helper cell immune responses against gp60 SU envelope glycoprotein through in-silico approaches. The results lead to a better interpretation of the antigen-antibody interactions against the envelope glycoprotein, thus assisting in the design of more efficient vaccines (Haste Andersen, Nielsen, & Lund, 2006).
First, linear epitopes are predicted by different servers. Second, modelling of the tertiary structure of gp60 SU is conducted. Energy minimization and validation of the structure are checked before predicting the conformational epitopes. In addition to the linear and conformational epitopes, the entropy plot, sequence alignment, previously found epitopes and hydropathy parameters are achieved as well. Finally, the consensus reliable immunogenic regions of gp60 SU envelope glycoprotein are chosen.

| Sequence retrieval
The complete amino acid sequence related to the reference gp60 SU one with the accession number of Q77YG2, was obtained from UniProtKB database (accessible via the link https://www.unipr ot.org/help/unipr otkb). The other 895 Env protein sequences corresponding to BLV were also retrieved from the database.
Four different parts of the reference sequence were not mentioned in the server annotation. Although, 100% similar sequence with the accession number of P51519, was mentioned. The alignment of the sequences Q77YG2 (reference sequence) and P51519 was performed by BioEdit 7.9 software (Ma et al., 2019). This was done to make sure that the sequences are 100% similar. Then, according to the server annotation of P51519 sequence, four different parts of the reference sequence, (signal peptide, extracellular, helical and cytoplasmic sequences) were clarified. Also, according to UniProtKB annotations presented for the P51519 sequence, post translational modifications (PTMs) were diagnosed.

| BLAST
BLAST was done using the NCBI server (accessible via the link https://blast.ncbi.nlm.nih.gov/Blast.cgi) in order to find regions of similarity between the biological sequences. In this way, other species that may be affected by the predicted epitopes can be found.
In fact, the vaccine strain coverage can be found by BLAST. The server compares the sequence with a database of all available sequences and similar sequences with higher than a certain threshold can be obtained (Boratyn et al., 2013). This has been accomplished by the protein to protein tool (being available in the server). The none-redundant protein sequences (nr) has been used as the default database.

| Alignment
The protected regions of the reference sequence have been obtained by alignment with the aid of BioEdit 7.9 software. The entropy plot has been also presented by this software (Haokip et al., 2016;Ziv & Merhav, 1993).

| Modelling, energy minimization and validation of the tertiary structure
As mentioned before, the reference sequence contains four parts.
The signal peptide is not transcribed, thus having no impact on the epitope structure. The helical part of the protein is transmembrane without any direct contact with the antibodies. Also, the cytoplasmic part is located in the cell without any relation with the environment outside it. Thus, the extracellular part of the sequence, amino acids number 34-436, is the only part of the protein with the potential of being immunogenic. For this reason, only amino acids number 34-436 were used for the tertiary structure modelling (Chou, 2004).
Since, the structure could not be found in PDB server or other reliable databases, the tertiary structure of gp60 SU structure was modelled by I-TASSER (accessible via the link https://zhang lab.ccmb. med.umich.edu/I-TASSE R/). This server, working based on the multiple-threading alignments and iterative template fragment assembly simulations, has been ranked as the best one for modelling the 3-D protein structure in recently conducted surveys. C-score (confidence score) presented by the server, defines the quality of the predicted models. Higher level of C-score points out to the high confidence of the model. TM-score and RMSD (room-mean-square deviation) are the standards used for measuring the structural similarity between two structures. There is a strong correlation among C-score, TMscore and RMSD. The lower values of RMSD and TM-score indicate better fit and high-resolution models (Wu, Skolnick, & Zhang, 2007).
The aim of ABCpred server (accessible via the link http://crdd. osdd.net/ragha va/abcpr ed/) is to predict B cell epitopes in an antigen sequence, using artificial neural network. This server is developed based on the recurrent neural network (machine based technique) using the fixed length patterns. It also uses a threshold between +0.1 to +1.0. An increment in the threshold value, results in better specificity but worse sensitivity. Another simulation study like the present one, indicated an accuracy of 65.93% with equal sensitivity and specificity using window length of 16 at the threshold value of 0.5. So, a default value of 0.51 was used and the default windows size was considered to be 16 (Han et al., 2015;Saha & Raghava, 2006).
The predicted B-cell epitopes were ordered by their score obtained by the trained recurrent neural network (Han et al., 2015). Data with higher antigenicity are closer to our predictive results. So we selected the first 10 data as the most relevant results and used them in our analysis (Ebrahimi et al., 2019).
IEDB server (accessible via the link http://www.iedb.org/) is a freely available resource cataloging the experimental data on antibody and T-cell epitopes investigated in humans, non-human primates and other animal species in the context of infectious disease, allergy, autoimmunity and transplantation. The server also contains tools to assist in predicting and analysing the epitopes (Zhang et al., 2008). The present prediction method was Bepipred Linear Epitope Prediction (Larsen, Lund, & Nielsen, 2006). As stated before, there is a relation among the threshold and sensitivity/specificity of the prediction method. The server threshold of 0.35 was used here which results in the sensitivity and specificity of 0.49 and 0.75 respectively (Chou, 1978).
CBTOPE server (accessible via the link http://crdd.osdd.net/ ragha va/cbtop e/) predicts the discontinuous B-cell epitope without any homology, just using the antigen primary sequence. This server uses amino acid composition as an input feature. The prediction accuracy of the server has been reported to be 85%. During the prediction, each amino acid of the sequence gets a SVM score. The users need to set a threshold value. Amino acids with SVM scores above this threshold would be considered as epitope, otherwise non-epitope. By default, it is −0.3 which was observed during the server development. Using the default threshold, leads to high specificity in the prediction (Ansari & Raghava, 2010). Thus, the SVM threshold used in current study was taken as −0.3.
SVMTriP server (accessible via the link http://sysbio.unl.edu/ SVMTr iP/) has developed a new method in which support vector machine (SVM) has been utilized by combining the tri-peptide similarity and propensity scores (SVMTriP) in order to achieve a better prediction performance. Using sequences with 20 amino acids leads to a sensitivity (Sn) of 80.1% and precision (P) of 55.2%. The mentioned sensitivity and precision values are the ideal ones for the server (Yao et al., 2012), thus this sequence length has been applied in the current research.
LBtope server (accessible via the link http://crdd.osdd.net/ragha va/lbtop e/) has developed several models using various techniques (e.g. SVM and IBk) on a large dataset of epitopes and non-epitopes (12,063 epitopes and 20,589 non-epitopes obtained from IEDB database). During the prediction, each overlapping 20-mers of the sequence is given a SVM score. The scores are scaled from 20% to 100%. User need to set the probability threshold (%) above which epitopes should be printed. By default, this threshold is 60%, as this value or above ones results in high specificity in the prediction (Singh et al., 2013). Therefore, this percentage was used in the current study.
Discotope 2.0 server (accessible via the link http://www.cbs. dtu.dk/servi ces/Disco Tope/) predicts the discontinuous B cell epitopes from the protein's 3-D structures. The method utilizes the calculation of surface accessibility (estimated in terms of the contact numbers) and a novel epitope propensity amino acid score. The final scores are calculated by combining the propensity scores of residues in spatial proximity and the contact numbers. The different thresholds for the Discotope 2.0 score can be translated into the different sensitivity/specificity values. The lower threshold amounts correspond to the higher sensitivity. The threshold value of −3.7 has been applied here as it was suggested to yield the best sensitivity/specificity combination, resulting in the sensitivity of 0.47 (Kringelum et al., 2012).
The Ellipro server (accessible via the link http://tools.iedb. org/ellip ro/) predicts the linear and discontinuous antibody epitopes on the basis of the protein antigen's 3-D structure. ElliPro accepts the protein structure in PDB format as an input. The server associates each predicted epitope with a score, defined as a PI (protrusion index) value averaged over epitope residues. The discontinuous epitopes are defined based on the PI values and clustered based on the distance R between the residue's centres of mass (in Å). The larger R value is associated with the larger discontinuous epitopes being predicted. The minimum score and maximum distance should be declared to be able to use the server calculation. The minimum score falls within the range of 0.5-1.0.
Higher scores predict fewer epitopes and vice versa. The suggested minimum score of the server is 0.5 which has been applied here. Also, the maximum distance ranges from 4 to 8 Å. Longer distances lead to the prediction of discontinuous epitopes which span larger regions and vice versa. The distance of 6 Å being suggested as the best maximum distance, was applied here. Ellipro server has been also reported to predict the linear epitopes (Ponomarenko et al., 2008).

| Previously found epitopes
As mentioned before, IEDB server is an important resource of epitopes. It has been attempted to search for epitopes relating to BLV among the epitope sequences existing in the server, in order to compare them with the present results (Zhang et al., 2008).

| Final sequences
Having the results of linear and discontinuous predicted epitopes, entropy plot, sequence alignment, previously found epitopes and hydropathy parameters, the best sequences can be selected as the probable epitopes. The predicted epitopes were linked to each other by the appropriate linkers.
In silico predicted epitopes must be connected to each other using the empirical linkers. Gly linkers are the most frequent flexible ones (Chen, Zaro, & Shen, 2013). Hence, 'GGG' linker was used to link the linear epitopes to each other. Flexible linkers contain small amino acids which lead to the flexibility, movement or interaction for domains. Incorporating Ser linkers, hydrogen bonds with water molecules will be formed. Thus, not only the linker stability in the aqueous solution will be maintained, but also unfavourable interaction between the linker and protein moieties will be reduced as well (Chen et al., 2013). Therefore, 'GGSSGG' linker was used to link the conformational epitopes to each other.

| Final structure
Modelling, energy minimization and validation of the predicted B-cell epitope structure was performed by I-TASSER, Swiss-PdbViewer and Rampage respectively. Then, the 3-D structure of the predicted discontinuous epitope was compared with that of the extracellular part of the reference sequence, in order to ensure that the 3-D structural shape of the epitope (being separated from the complete protein structure) remains unchanged with respect to that when located in the complete protein structure.

| Sequences
The length of the reference sequence with accession number of Q77YG2, is 515 amino acids. As mentioned earlier, the ref-

| Blast
The list of 100 sequence accession numbers presented by BLAST tool of NCBI server is:

| Alignment
The alignment result is reported through the Supplementary File.

| Modelling, energy minimization and validation of the reference tertiary structure
The Modelled structure of gp60 SU extracellular region is shown in Figure 2. The values of C-score, estimated TM-score and RMSD of the modelled structure were −1.50, 0.53 ± 0.15 and 10.3 ± 4.6 Å respectively. After the energy minimization, the total energy was computed as E = −18,395.275 kj/mol. The modelled structure was validated by Ramachandran plot after energy minimization. The number of residues in the favoured, allowed and outlier regions are 300 (74.8%), 67 (16.7%) and 34 (8.5%) respectively. The obtained results indicate that the structure modelling is suitable enough.

| Entropy plot
The entropy plot, presented by BioEdit 7.9 software, is demonstrated by Figure 3.

| Prediction of the linear epitopes
The linear epitope prediction results of the above-mentioned servers are listed in Table 1. As mentioned before, Ellipro server has also presented linear epitope prediction. The conformational epitope prediction results of Ellipro server are presented in Table 2.

| Prediction of the conformational epitopes
The raw results associated with the conformational epitope prediction obtained via the two servers of Ellipro and Discotope 2.0, are presented in Table 2.

| Final sequences
Gathering all information together, four protein sequences were predicted as linear epitopes, including amino acids number 35-53, 67-97, 288-302 and 410-421. The predicted linear epitopes, linked to each other by GGG linker, resulted in the following sequence: Also, the predicted conformational epitopes which are amino acids number 37-58 and 72-100, are linked to each other by GGSSGG linker and lead to the following sequence:

S L S L G N Q Q W M T T Y N Q E A K F S I S G G S S G G C P R S
PRYTLDFVNGYPKIYWPPPQGRRRF.
Amino acids of the predicted conformational epitope regions were also predicted as the linear epitopes. Also, they were checked to be on the surface of the gp60 SU tertiary structure in order to be accessible by the antibodies.
Epitopes predicted in the current study were compared with the results of previously found epitopes retrieved from IEDB server.
Comparison of the linear and conformational epitopes are given by Tables 3 and 4 respectively. The epitope IDs are reported through parentheses. All retrieved epitopes from IEDB belonging to BLV, are encoded from Env gene and related to gp60 SU glycoprotein. The first three predicted linear and both predicted discontinuous epitopes have been also reported by the previous experimental studies.
It means that the current in silico prediction has been sufficiently accurate. The fourth linear epitope, amino acids number 410-421, has not been previously predicted by the experimental studies. Thus, the epitope can be considered as a novel predicted one which can be surveyed more by future experiments.

| Modelling, energy minimization and validation of the conformational epitope structure
The modelling, energy minimization and validation of the conformational epitope structure was performed using the above-mentioned software and servers. The estimated values of C-score, TM-score and RMSD are −2.95, 0.38 ± 0.13 and 9.1 ± 4.6 Å respectively. The total energy has been computed as E = −2,466.034 kj/mol. The number of residues in the favoured, allowed and outlier regions are 32 (58.2%), 19 (34.5%) and 4 (7.3%) respectively. The surveys demonstrate that the model results are of acceptable accuracy. The final model is exhibited in Figure 4. In this figure, both predicted epitopes, epitope containing amino acids number 37-58 and epitope containing amino acids number 72-100, are illustrated from left to right respectively.
The 3-D structures of both predicted discontinuous epitopes have been compared with that of the extracellular part of the reference sequence. It has been indicated that the structures of both conformational epitopes are the same with their structure in the whole extracellular part of gp60 SU glycoprotein. Thus, the 3-D structural shape of both epitopes remains unchanged after being separated from the primary complete structure.

| D ISCUSS I ON
As mentioned in above, BLV contains different proteins and glycoproteins. Gp60 SU glycoprotein is detected to be the main infectious agent. Therefore, the immunization with gp60 SU, can lead to the host immunity against BLV . In the current study, several computational methods have been used to predict the reliable B and T-helper cell epitopes.
Epitopes are able to stimulate immune response. Thus, epitope identification, either by experimental or computational methods, is essential for purposes such as vaccine design, laboratory kit design, etc. (Irving, Pan, & Scott, 2001).
X-ray crystallography, nuclear magnetic resonance (NMR) and monoclonal antibodies are introduced as the most reliable methods for the epitope identification. Since, these experimental methods are time-consuming and costly (Yang & Yu, 2009) ( Resende et al., 2012).
Given the results associated with the linear and discontinuous predicted epitopes, entropy plot, sequence alignment, previously found epitopes and hydropathy parameters, the best sequences have been selected as the probable epitopes and presented in Tables 1 and 2 as well. Amino acids of the predicted conformational epitope regions were also predicted as the linear one. Moreover their locations on the 3-D structure of the glycoprotein were examined and two regions were eventually chosen as the conformational epitopes.
As would be observed, both peptides were on the surface of the gp60 SU tertiary structure, indicating that the epitopes are accessible by the antibodies.  Callebaut et al. (1991). Also, the predicted discontinuous epitopes with amino acids 37-58 and the linear ones with amino acids 35-53, are consistent with the epitope sequences reported by Portetelle et al. (1989) and Bai et al. (2015).
The predicted linear epitope with amino acids 97-67, overlaps with the epitope sequence achieved by Gatei, Good, Daniel, and Lavin (1993) and Callebaut et al. (1993). Further to these, the predicted linear epitope with amino acids 288-308, overlaps with the epitope sequence obtained by Callebaut et al. (1993).
However, the fourth linear epitope with amino acids 410-421, has not been previously predicted by the experimental studies.
Thus, this newly predicted epitope can be surveyed more by future experiments.
The present study has led to information about gp60 SU glycoprotein immunogenic regions which may be helpful for conducting future laboratory experiments. The present findings can be used to develop novel, safer, more effective as well as precise vaccines against BLV.

ACK N OWLED G EM ENT
The authors appreciatively acknowledge Dr. Ranjbar for his suggestion to improve the result validation, and the Agricultural Sciences and Natural Resources University of Khuzestan for their support.

CO N FLI C T O F I NTE R E S T
The authors declare that they have no conflict of interest. TA B L E 4 Comparison of conformational predicted epitopes with previously found epitopes retrieved from IEDB F I G U R E 4 Tertiary structure of predicted B cell epitope

ETHICS
We hereby declare all ethical standards have been respected in preparation of the submitted article.