T and B cell Epitope analysis of SARS‐CoV‐2 S protein based on immunoinformatics and experimental research

Abstract COVID‐19 caused by SARS‐CoV‐2 is pandemic with a severe morbidity and mortality rate across the world. Despite the race for effective vaccine and drug against further expansion and fatality rate of this novel coronavirus, there is still lack of effective antiviral therapy. To this effect, we deemed it necessary to identify potential B and T cell epitopes from the envelope S protein. This can be used as potential targets to develop anti‐SARS‐CoV‐2 vaccine preparations. In this study, we used immunoinformatics to identify conservative B and T cell epitopes for S proteins of SARS‐CoV‐2, which might play roles in the initiation of SARS‐CoV‐2 infection. We identified the B cell and T cell peptide epitopes of S protein and their antigenicity, as well as the interaction between the peptide epitopes and human leucocyte antigen (HLA). Among the B cell epitopes, ‘EILDITPCSFGGVS’ has the highest score of antigenicity and great immunogenicity. In T cell epitopes, MHC‐I peptide ‘KIADYNYKL’ and MHC‐II peptide ‘LEILDITPC’ were identified as high antigens. Besides, docking analysis showed that the predicted peptide ‘KIADYNYKL’ was closely bound to the HLA‐A*0201. The results of molecular dynamics simulation through GROMACS software showed that ‘HLA‐A*0201~peptide’ complex was very stable. And the peptide we selected could induce the T cell response similar to that of SARS‐CoV‐2 infection. Moreover, the predicted peptides were highly conserved in different isolates from different countries. The antigenic epitopes presumed in this study were effective new vaccine targets to prevent SARS‐CoV‐2 infection.


| INTRODUC TI ON
SARS-CoV-2 was discovered in Wuhan City, Hubei Province in early December 2019. This novel coronavirus unlike the previous ones, not only has higher infectivity and pathogenicity, but faster transmission. 1,2 The present science stood at droplets at the main mode of transmission. Direct contact [3][4][5][6] as well is common, and some studies have shown that it can be transmitted through faecal-oral route. 7 SARS-CoV-2 has also been found in urine, but whether the virus can be transmitted through urine remains to be proven. 8 On 31 January 2020, the World Health Organization declared the COVID-19 epidemic as a 'public health emergency of international concern'. This was followed by raising concerns as the global risk level of the epidemic to 'very high' on February 28 and defined it as a 'pandemic' on March 12. A certain number of asymptomatic and mild patients make prevention and containment more difficult.
SARS-CoV-2, a single-stranded plus-strand RNA virus, belongs to the genus beta CoV. 9,10 The four major structural genes encode the nucleocapsid protein (N), the spike protein (S), the envelope protein (E) and the membrane glycoprotein (M); an additional membrane glycoprotein (HE) is present in the HCoV-OC43 and HKU1 beta-coronavirus genus. 11 It has been reported that coronavirus S protein is the main determinant of virus entry into host cells. 12 Therefore, S protein is an effective choice for vaccine design. Immunoinformatics can be used to predict the conformational (discontinuous) and linear epitopes of viral antigens and to evaluate the immunogenicity and virulence of pathogens. The reverse vaccinology approach has been successful in development of FDA-licensed vaccines. 14 Immunoinformatics tools have been used to design a vaccine for MERS-COV. 15 The infusion of bioinformatics analysis of vaccine candidates in the decision-making pipeline could potentially help save billions of dollars and guide to researchers to make an informed choice on selection of candidates. In addition, immunoinformatics methods are time-saving in designing new vaccines. Using this method, we set the main purpose of this study to identify potential B cell and T cell epitopes from the envelope S protein. This can be used as potential targets to develop anti-SARS-CoV-2 vaccine preparations.

| Data retrieval and structural analysis
The primary sequence (QHU79173.2) of S protein was searched from the NCBI database. 16 The 3D structure (PDB ID:6VXX) of S protein was searched by Protein-Data-Bank. 17 The chemical and physical properties of protein sequences including GRAVY (mean hydrophilicity), half-life, molecular weight, stability index and atomic composition of amino acids were analysed by online tools Protparam 18 (https://web.expasy.org/Protp aram/). The secondary structure was analysed by PSIPRED 19 (http://bioinf.cs.ucl.ac.uk/ psipr ed/). Using the TMHMM Server v.2.0 (http://www.cbs.dtu. dk/servi ces/TMHMM/) online tool, the transmembrane topology was checked. The presence of disulphide bonds through the online tool DIANNA v1.1 20 (http://clavi us.bc.edu/~clote lab/DiANN A/) was checked. DiANNA is a unified software for Cysteine state and Disulfide Bond partner prediction. Vaxijen v2.0, the prediction of protective antigens and subunit vaccines, was also used for antigenicity examination. 21

| B cell epitope prediction
The free online access server IEDB 22 (Immune-Epitope Database And Analysis-Resource) was used to predict B cell epitopes. Vaxijen 2.0 server was used to study the antigenicity of selected epitopes.
Bepipred linear epitope prediction and Parker hydrophilic prediction algorithm, Kolaskar and Tongaonkar antigenicity scale, Emini surface accessibility prediction tool, and Karplus and Schulz predictability tool were used to analyse hydrophilicity, linear epitope separation, surface accessibility and flexibility. 23 The β-rotation in polyproteins was predicted using Chou and Fasman β-rotation prediction algorithm. 24 As the discontigous epitopes are clearer and have more advantages than linear epitopes, 25 the prediction of discontiguous epitopes had been carried out separately through the DiscoTope server. 26 The parameter was set to ≥0.5. The position of the predicted epitope on the 3D structure of S protein was examined by Pymol. 27

| T cell epitope prediction
Epitopes of cytotoxic T lymphocytes (CTL) are playing a key role in vaccine research. Two online tools, Propred-1 28 and Propred, 29 were used to predict the CTL epitopes of MHC class I and MHC-II target proteins, respectively. For Propred-1 proteasome and immune proteasome, the threshold was maintained at 5%.

| Molecular dynamic simulation
The analysis of structural stability of the HLA-A*0201~peptide complex is essential to understand its binding affinity. In the present study, we used GROMACS 2019.6 software (downloaded from www.groma cs.org/) on Linux Centos 7 operating system for molecular dynamics simulation. The structure was processed by pdb2gmx and the force field that we used is the all-atom OPLS. And then, we started defining the unit cell and adding solvent. The solvated, electroneutral system was then assembled. Before we begin dynamics, we must ensure that the system has no steric clashes or inappropriate geometry. The structure was relaxed through a process called energy minimization (EM). To equilibrate the system, we conducted a 100-ps NVT equilibration (the NVT equilibration, stabilized the temperature of the system) and a 100-ps NPT equilibration (the NPT ensemble, wherein the number of particles, pressure, and temperature were all constant) for this system. And then, run production MD for data collection. All data were processed by Originlab (downloaded from www.origi nlab.com).

| Isolation of peripheral blood mononuclear cell (PBMC)
Peripheral blood mononuclear cells from healthy donors were isolated from fresh blood samples using Ficoll-Paque density gradient centrifugation in our BSL-3 laboratory at the same day of blood collection. The majority of purified PBMCs were used for immune cell phenotyping where was plasma samples were subjected to antibody testing. The rest of the cells were cryopreserved in freezing medium (90% FBS + 10% DMSO) at 5 × 10 6 cells/mL at −150°C.

| Intracellular cytokine staining (ICS)
To measure T cell activation, PBMCs were stimulated with the commercially available cell activation cocktail (BioLegend) containing

| Western blot analysis and Immunoprecipitation (IP)
Primary T cells stimulated by peptide were collected, washed with cold PBS twice and lysed with NP-40 lysate at 4°C for 20 minutes. The protein extracts were separated by electrophoresis with 12% prefabricated sodium dodecyl sulphate-polyacrylamide microgel (SDS-PAGE) and transferred to PVDF membrane. The membrane was incubated with the indicated antibody and detected by chemiluminescence.
Immunoprecipitation: The total cell lysates were first rotated with the labelled peptide at 4°C overnight. The total protein suspension was then incubated with appropriate antibodies (anti-HLA-A2: Abcam, ab74674 and ab168405; anti-Flag: Cell Signaling Technology, 14793) overnight, and then rotated with protein A/G beads overnight at 4°C. Wash with np-40 lysis buffer for 3 times, mix with 4 × SDS sample buffer and boil for 10 minutes. Western blot analysis of co-precipitation.

| Structural analysis of S protein
The  (Table S1). In addition, the secondary and 3-D structures of S protein by PSIPRED and Pymol, including its Beta sheets, Helixes and Loops structures were shown in Figure S1 and S2, and the positive and negative 3D conformations of S protein were shown in Figure S3.
In addition, the position of the disulphide bond (S-S) was calculated using DiANNA1.1 tool and assigned a score (Table S2). The high specificity of the protein was evaluated by Vaxijen 2.0 by setting the threshold to ≥0.5. The antigenicity analysis of the full-length protein showed that the S protein was the expected antigen with a antigenicity of 0.4700. Through TMHMM, an online tool for examining the topology of transmembrane proteins, it was found that residues from 1 to 1213 were exposed on the surface, 1214 to 1236 were inside the transmembrane region, and 1237 to 1273 were buried in the core-S protein region.

| Recognition of B cell epitopes
The primary sequence of S protein was scanned by the IEDB server to predict B cell epitopes. From all the predicted epitopes, 10 epitopes exposed to the surface of S protein with high antigenicity scores were selected (Table 1)  Note: The predicted B cell epitopes are sorted according to the scores obtained by the trained recurrent neural network. The higher the score, the higher the probability of being an epitope. The higher the score of antigenicity, the more likely it is to be used as an antigen.
In order to further increase the specificity and range of B cell epitopes, the Discotope 2.0 server (http://www.cbs.dtu.dk/servi ces/ Disco Tope/) was used to calculate the surface availability according to the number of residue contacts. The 3D structure (PDB ID: 6vxx) of S protein was used to predict discontinuous epitopes with 90% specificity, −3.700 threshold, and 22.000 A tendency score radius.
A total of 64 discontiguous epitopes were calculated on different exposed surface areas ( Table 2: the first 20 epitopes were shown in Table S4). The position of each predicted epitope on the 3D structural surface of S protein was analysed using Pymol, as shown in  According to their antigenicity scores, 10 potential peptides were selected for further analysis (Table 3)

| Eminent features profiling of selected T cell epitopes
In order to support our findings, we analysed some important features of the selected epitopes. Peptides digested by fewer enzymes are highly stable and more favourable vaccine candidates. The protein digestion server showed a total of 13 digesting enzyme sites.
Allergen FP was used to predict the allergenicity of epitopes, which was a novel method to classify proteins into allergens and non-allergens according to Tanimoto coefficients. ToxinPred was used to F I G U R E 1 Recognition of B cell epitopes. A, Kolaskar and Tongaonkar antigenicity scales were used to predict antigenic determinants; B, Parker hydrophilicity was used to predict hydrophilicity; C, Emini surface accessibility scale for surface accessibility analysis; D, Chou and Fasman β metamorphosis prediction were used to analyse β variants of structural polyproteins; E, Karplus and Schultz flexibility scale analysis predict the toxicity of the selected epitopes. All T cell epitopes along with their digestion, mutation, toxicity, allergenicity, hydro and physiochemical results were given in Table 5.

| Conservation analyses of selected epitopes
Isolates  Figure S4). Analysis the conservatism of the selected epitopes were analysed. As predicted, all selected epitopes were highly conservative as shown in Table S5. At the same time, a phylogenetic tree was established by MEGA7.0 to represent the evolutionary relationship of SARS-CoV-2 from 8 different countries (Figure 3). The epitope protection study conducted by the IEDB epitope protection analysis tool showed that, the selected B cell and T cell (MHC-I class II) epitopes had 100% homology, as shown in Table S5.

| Interaction of predicted peptides with HLA alleles
According to the antigenicity of MHC type I allele and binding peptides predicted by Propred-I, and in order to further improve the effectiveness of the predicted peptide epitopes, we conducted molecular modelling and molecular docking between the peptide epitopes and their predicted MHC molecules. The

TA B L E 4 MHC Class II Binding Peptide Prediction Results with their antigenicity scores
Peptides

MHC class II alleles
Vaxijen score

| 'MHC-I -peptide epitope-T cell TCR receptor' complex
Considering the biological function of MHC, which form complexes with peptide epitopes, and then is presented to T cells, thus initiating specific immune response, we also simulated the 'MHC-I -peptide epitope-T cell TCR receptor' polymer complex. We downloaded the 3D structure (PDB ID:1AO7) of T cell receptors from the PDB database, a complex between human T cell receptors, viral peptide (TAX) and HLA-A*0201. 35 Similarly, we did molecular docking through Autodock vina tool. We bind the peptide 'KIADYNYKL' to the T cell receptor ( Figure 5) with a docking fraction of −9.7562.
The interacting residues include the A chain 'Asp57 and Ile46' of the T cell receptor and the 'GLu3, Arg102 and Tyr107' of the B chain.

| Analysis of HLA-A2 interacting with the peptide
We verified the interaction of HLA-A2 with the peptide by Molecular dynamic simulation and Immunoprecipitation (IP). RMSF means to take samples for a certain period of time, calculate the average position of each atom during this period of time and then find the square deviation of the atom position during this time. Our result showed that the RMSF was at a very stable value (0.1nm) ( Figure 6A).
This result showed that the RMSD level was as low as ~0.175 nm (1Å), indicating that the structure was very stable. The slight difference between the curves showed that the structure at t = 0 ns (RMSD = 0.0327 nm) was slightly different from the crystal structure. This was to be expected to be energy-minimized.
The radius of gyration (Rg) is a measure of its compactness. If a protein is stably folded, it will likely maintain a relatively steady value of Rg. We can see that over the course of 1 ns, the Rg value remains very stable, about 2.33 nm (Figure 7).
Co-IP analysis with anti-Flag or anti-HLA-A2 antibody in T cells.
Whole cell lysates were immunoprecipitated with an anti-Flag or anti-HLA-A2 antibody and blotted with an anti-HLA-A2 or anti-Flag antibody, respectively ( Figure 8).

| CD8 + T cell responses
Proliferated T cells were identified by the percentage of CFSE low cells. The results showed that, compared with the negative control group, the predicted peptide could stimulate CD8 + T cell proliferation as significantly as S protein of SARS-CoV-2 ( Figure 9A

| D ISCUSS I ON
SARS-CoV-2 has spread rapidly throughout the world and become a major threat to human health worldwide due to its extremely high transmissibility and pathogenicity. Therefore, there is an urgent need for effective therapy and prevention for COVID-19. Its main clinical symptoms include varying degrees of pneumonia, impaired organ function and obvious gastrointestinal symptoms in some patients. 36 The virus underwent rapid evolution after infection with host cells due to recombination between the genomes of different virus particles, which creates great difficulties for virus-therapy, especially vaccine development. Vaccine development is also ongoing, without

F I G U R E 3
The phylogenetic tree showed that the S protein of SARS-CoV-2 isolates from 8 different countries was highly conserved vaccines or drugs approved to treat COVID-19, and many therapies with initial good clinical response are still being tested in clinical trials. In consequence, we hope to find a vaccine against SARS-CoV-2 that will lead to effective treatment and prevention strategies.
The S protein of coronavirus is vital for vaccine production. On the one hand, due to be exposed to the cell surface, it has an advantage antigenicity; on the other hand, S protein initiates viral infection by binding to the receptor ACE2. 37 The S protein contains TA B L E 6 Docking results of predicted peptide epitopes and MHC class I molecules two subunits (S1 and S2). The S1 subunit can be further defined by The peptide we selected 'KIADYNYKL' had a high affinity binding to HLA-a*0201. However, this study has not yet conducted a combined response analysis of hybrid peptides among different HLA-specific groups. For example, we predict that both MHC-I and MHC-II binding peptide 'VVVLSFELL' with HLA have a very good binding force. So the mixed peptide vaccine may be more effective, which deserves further study.
In conclusion, our results indicated that the peptide epitopes we predicted had high antigenicity, stability and conservatism and were potential targets for peptide vaccines. It is hoped that our results will help with ongoing vaccine development and in particular reduce the difficulties in vaccine design.

CO N FLI C T O F I NTE R E S T
The authors have declared no conflicts of interest.

AUTH O R CO NTR I B UTI O N S
Chen Ziwei: Conceptualization (equal); Data curation (equal); Formal analysis (equal); Software (equal); Visualization (equal); Writing-original draft (equal); Writing-review & editing (equal). Project administration (lead); Resources (equal); Supervision F I G U R E 9 T cell proliferation and T cell response to the predicted peptide. Flow cytometry analysis(A) and statistics(B) of the percentage of CFSE low cells. Flow cytometric plots(C) representing CD8 + T cells expressing TNF-α (x axis), IFN-γ (y axis) and GZMB (y axis) upon stimulation with peptide and the SARS-CoV-2 S spike. And comparison(D) of TNF-α, IFN-γ and GZMB of the three groups (NC, Peptide and SARS-CoV-2 S spike). Student's t test was used for the analysis. ***P < 0.001 (equal); Validation (lead); Visualization (equal); Writing-original draft (equal); Writing-review & editing (lead).

DATA AVA I L A B I L I T Y S TAT E M E N T
The data sets used and/or analysed during the current study are available from the corresponding author on reasonable request.