Identification of Potential Vaccine Candidates Against SARS-CoV-2 to Fight COVID-19: Reverse Vaccinology Approach

Background The recent emergence of COVID-19 has caused an immense global public health crisis. The etiological agent of COVID-19 is the novel coronavirus SARS-CoV-2. More research in the field of developing effective vaccines against this emergent viral disease is indeed a need of the hour. Objective The aim of this study was to identify effective vaccine candidates that can offer a new milestone in the battle against COVID-19. Methods We used a reverse vaccinology approach to explore the SARS-CoV-2 genome among strains prominent in India. Epitopes were predicted and then molecular docking and simulation were used to verify the molecular interaction of the candidate antigenic peptide with corresponding amino acid residues of the host protein. Results A promising antigenic peptide, GVYFASTEK, from the surface glycoprotein of SARS-CoV-2 (protein accession number QIA98583.1) was predicted to interact with the human major histocompatibility complex (MHC) class I human leukocyte antigen (HLA)-A*11-01 allele, showing up to 90% conservancy and a high antigenicity value. After vigorous analysis, this peptide was predicted to be a suitable epitope capable of inducing a strong cell-mediated immune response against SARS-CoV-2. Conclusions These results could facilitate selecting SARS-CoV-2 epitopes for vaccine production pipelines in the immediate future. This novel research will certainly pave the way for a fast, reliable, and effective platform to provide a timely countermeasure against this dangerous virus responsible for the COVID-19 pandemic.


Introduction
COVID-19 began in December 2019 with an outbreak of a novel virus in Wuhan city of China [1].The disease gained a rapid foothold worldwide, resulting in the World Health Organization (WHO) declaring it a global pandemic by March 2020.As of March 10, 2021, there has been a worldwide total of 118,159,602 cases and 2,622,101 deaths due to COVID-19 reported by the WHO.The virus causing COVID-19, SARS-CoV-2, spreads primarily through saliva, droplets, or discharges from the nose of an infected person after coughing or sneezing.Coronaviruses are enveloped RNA viruses with the largest genome among all RNA viruses [2].As continuous transmission of the virus across borders increases, imposing a major health burden on the global scale, more studies are urgently required to understand SARS-CoV-2.Moreover, in the absence of effective cures and drugs, vaccination or immunization therapy is imperative to target the entire population.In particular, immunoinformatics tools have proven to be crucial to move the vaccine development pipeline forward XSL • FO RenderX [3].Since there is relatively little knowledge about the pathogenesis of the virus, an immunoinformatics-based approach to investigate the immunogenic epitopes for further vaccine development is required [4].
Since COVID-19 has affected almost the entire world's population, binding of promiscuous epitopes to a variety of human leukocyte antigen (HLA) alleles is vital for larger dissemination.Toward this end, in silico approaches will be remarkably useful in helping to develop a cure as quickly as possible [5].Antibody generation by the activation of B cells as well as acute viral clearance by T cells along with virus-specific memory generation by CD8+ T cells are analogously important to develop immunity against the virus [6].The SARS-CoV-2 spike (S) protein is considered to be highly antigenic, and thereby can evoke strong immune responses and generate neutralizing antibodies that can block attachment of the virus to host cells [7].
In reverse vaccinology, various in silico biology tools are used to discover novel antigens by studying the genetic makeup of a pathogen and the genes that could lead to identification of good epitopes.The reverse vaccinology approach thus offers a fast and cost-effective vaccine discovery platform [8].With this approach, a novel antigen is identified using omics analysis of the target organism.In silico analysis combined with the reverse vaccinology approach facilitates an easier and time-and labor-saving process of antigen discovery [9].Herein, we explored the proteome of SARS-CoV-2 strains prominent in the Indian geographical region against the human host to identify potential antigenic proteins and epitopes that can effectively elicit a cellular-mediated immune response against the virus.With this approach, we identified a promising antigenic peptide, GVYFASTEK, from a surface glycoprotein (protein accession number QIA98583.1) of SARS-CoV-2, which was predicted to interact with human major histocompatibility complex (MHC) alleles and displayed up to 90% conservancy and significant antigenicity.Molecular docking analysis further confirmed the molecular interaction of the prime antigenic peptide with the residues of the HLA-A*11-01 allele for MHC class I.An overview of the study design is provided in Figure 1.After careful evaluation, this peptide was determined to be an appropriate epitope for eliciting a strong cell-mediated immune response against SARS-CoV-2.The outcomes from this significant analysis could help to select appropriate SARS-CoV-2 epitopes for multiepitope vaccine production pipelines in the near future.This novel research will certainly pave the way for a fast, reliable, and effective platform to provide a timely countermeasure against this dangerous pandemic disease.

Physicochemical Property Prediction
The online tool ProtParam of ExPASy [11] was used to predict various physicochemical properties of the selected protein sequences.

Protein Antigenicity
VaxiJen v2.0 [12] was used to predict the antigenicity of the selected proteins.This software uses the FASTA file format of amino acid sequences as input and then predicts antigenicity based on the physicochemical properties of proteins.The output is denoted according to an antigenic score [13].During analysis, the threshold was maintained at 0.4 [9].

B Cell and T Cell Epitope Prediction
The B cell and T cell epitopes of the selected surface glycoprotein sequence were predicted via the Immune Epitope Database (IEDB), which contains a large amount of experimental data on epitopes and antibodies [14].The IEDB enables performing a robust analysis on several epitopes in the context of various tools, including conservation across antigens, population coverage, and clusters with similar sequences [15].To obtain MHC class I-restricted CD8+ cytotoxic T lymphocyte epitopes of the selected surface glycoprotein sequence, the NetMHCpan EL 4.0 prediction method was applied for the HLA-A*11-01 allele.MHC class II-restricted CD4+ helper T lymphocyte epitopes were obtained for the HLA DRB1*04-01 allele using the Sturniolo prediction method.The top 10 MHC class I and top 10 MHC class II epitopes were randomly selected based on their percentile scores and antigenicity scores.Five random B cell lymphocyte epitopes were selected based on their greater lengths using the Bipipered linear epitope prediction method [8].

Antigenicity and Allergenicity of the Predicted Epitopes
VaxiJen v2.0 was utilized to predict protein antigenicity.During antigenicity analysis, the threshold was maintained at 0.4 [9].The allergenicity of the selected epitopes was predicted via AllerTOP v2 [16].

Transmembrane Helix and Toxicity Prediction of the Predicted Epitopes
The transmembrane helix of the selected epitopes was predicted using the TMHMM v2.0 server [17], which predicts whether the epitope would be in the transmembrane region, or remain inside or outside of the membrane.The toxicity prediction of the selected epitopes was carried out via the ToxinPred server [18].

Prediction of Conservation of the Selected Epitopes
The conservation analysis of the epitopes was performed via the epitope conservancy analysis tool of the IEDB server [15].During analysis, the sequence identity threshold was maintained at ≥50 [8].

Cluster Analysis of MHC Alleles
Cluster analysis was carried out by MHCcluster 2.0 [19,20].During cluster analysis, the number of peptides to be included was kept at 50,000 and the number of bootstrap calculations was set to 100.For cluster analysis, the NetMHCpan-2.8prediction method was used.

Generation of 3D Structures of Selected Epitopes
The PEP-FOLD3 online tool [21] was used to predict the 3D structures of the selected best epitopes [22][23][24].

Molecular Docking and Molecular Dynamics Simulation
Molecular docking was carried out to depict the binding pattern of inhibitors with respective proteins.Predocking was carried out by UCSF Chimera [25].The peptide-protein docking of the selected epitopes was carried out by the online docking tool PatchDock [26].The results of PatchDock were refined and rescored by the FireDock server [27].Docking was then performed by the HPEPDOCK server [28].Docking pose analysis was performed using Ligplot [29].The molecular simulation was executed with the GROMACS 2018.1 package using the Gromos43a1 force field [9].Protein solvation was performed with the SPC water model in a cubic box (10.8 × 10.8 × 10.8 nm 3 ).The solvated protein system was processed for energy minimization using the steepest algorithm up to a maximum of 25,000 steps or until the maximum force was not greater than 1000 kJ/mol/nm, which is the default threshold.The NVT and NPT ensembles for 50,000 steps (100 ps) were run at 300 K and 1 atm.The system was first equilibrated using the NVT ensemble followed by the NPT ensemble.The final molecular dynamic simulation was performed for the dock complex of the GVYFASTEK epitope docked against the HLA-A*11-01 allele (Protein Data Bank [PDB] ID 5WJL).Finally, the simulations were evaluated according to the root mean square deviation (RMSD) and root mean square fluctuation (RMSF) of atomic positions for the complete episode of simulations.All steps were similar across simulations, except that the final molecular dynamics simulation was carried out for 50 ns.

Selection and Retrieval of Viral Protein Sequences
The SARS-CoV-2 strain was identified and 12 viral protein sequences against the human host in India were retrieved from the ViPR database and selected for possible vaccine candidate identification (Table 1).The FASTA sequences of the proteins are given in Multimedia Appendix 1.

Physicochemical Property Analysis and Protein Antigenicity
Analysis of physicochemical properties of the 12 proteins, including amino acids, molecular weight, theoretical isoelectric point (pI), extinction coefficient (M -1 cm -1 ), estimated half-life (in mammalian cells), instability index, aliphatic index, and grand average of hydropathicity (GRAVY), were predicted (Table 2).With a fixed threshold of 0.4, all proteins were predicted to be antigenic (Table 3).The physicochemical analysis revealed that the surface glycoprotein (QIA98583.1)had the highest extinction coefficient of 148,960 M -1 cm -1 and the lowest GRAVY value of -0.077 among the proteins.In addition, the surface glycoprotein was stable and antigenic; therefore, we selected this protein for further analysis.

T Cell and B Cell Epitope Prediction
The T cell epitopes of MHC class I were determined by the NetMHCpan EL 4.0 prediction method of the IEDB server with the sequence length set to 9. The server-generated epitopes were further analyzed based on the antigenicity scores and percentile scores, and the top 10 potential epitopes were selected randomly for antigenicity, allergenicity, toxicity, and conservancy tests.The server ranks the predicted epitopes in ascending order of percentile scores (Table 4).The T cell epitopes of MHC class II (HLA-DRB1*04-01 allele) of the protein were also determined by the IEDB server (Table 5) using Sturniolo prediction methods.The top 10 ranked epitopes of the protein were selected randomly for further analysis.Additionally, the B cell epitopes of the protein were selected using the Bipipered linear epitope prediction method of the IEDB server, with the selection of epitopes based on greater lengths (Figure 2).

Topology Identification of Epitopes
The topology of the selected epitopes was determined by the TMHMM v2.0 server.Table 4 and Table 5 represent the potential T-cell epitopes of selected surface glycoprotein.Table 6 shows the potential B cell epitopes with their respective topologies.

Antigenicity, Allergenicity, Toxicity, and Conservancy Analysis of Epitopes
The selected T cell epitopes were found to be highly antigenic as well as nonallergenic, nontoxic, and had a conservancy greater than 90%.Among the 10 selected MHC class I epitopes and 10 selected MHC class II epitopes, a total of four epitopes were selected based on the above-mentioned criteria: GVYFASTEK, TLADAGFIK, NFRVQPTESI, and LLIVNNATNV.

Cluster Analysis of MHC Alleles
The cluster analysis of the MHC class I alleles that possibly interact with the predicted epitopes was carried out by the online tool MHCcluster 2.0, which generates clusters of alleles phylogenetically.The results are shown in Figure 3, in which the red zone indicates a strong interaction and the yellow zone corresponds to a weaker interaction.

Three-Dimensional Structure Prediction (Modeling) of Epitopes
All T cell epitopes were subjected to 3D structure prediction with the PEP-FOLD3 server, which were used for peptide-protein docking (Figure 4).

Peptide-Protein Docking and Vaccine Candidate Prioritization
Molecular docking was performed to determine whether all of the identified epitopes could bind with MHC class I and MHC class II molecules.The selected epitopes docked with the HLA-A*11-01 allele (PDB ID 5WJL) and HLA-DRB1*04-01 allele (PDB ID 5JLZ).The docking was performed using the PatchDock online docking tool and refined by the FireDock online server.Results were also analyzed by the HPEPDOCK server (see Figure S1 in Multimedia Appendix 1).Among the four epitopes, the selected glycoprotein QIA98583.1,GVYFASTEK (MHC class I epitope), showed the best result with the lowest global energy of -52.82.Further, the docking pose was analyzed via Ligplot (Figure 5a) and the docking site can be visualized in Figure 5b.We also identified highly antigenic and nonallergenic B cell vaccine candidates LTPGDSSSGWTAG and VRQIAPGQTGKIAD from the selected surface glycoprotein (QIA98583.1).

Molecular Dynamics Simulation
Molecular dynamics simulation of the dock complex of the GVYFASTEK epitope docked against the HLA-A*11-01 allele (PDB ID 5WJL) was successfully executed for 50 ns.The complex became stable throughout the simulation with an RMSD fluctuation of 0.3-1.0nm from the original position (Figure 6a).In most cases, residues lying in the core protein regions have low RMSF values while exposed loops have high RMSF values (Figure 6b).The peaks in the graph show a value between 0.1 and 0.6 nm.Both these results indicate that the protein complexes were stable throughout the molecular docking simulations, demonstrating that the proteins possess good ability for stability.

Principal Findings
A vaccine is an enormously imperative and expansively formed therapeutic product.Millions of infants, children, and adults are vaccinated every year.However, the development and research processes of vaccines are expensive and occasionally require countless months to prepare and advance an appropriate vaccine candidate toward eliminating a pathogen.There are currently innumerable tools and approaches of immunoinformatics, computer-aided drug design, bioinformatics, and converse/reverse vaccinology to extensively progress vaccine design and preparations, which in turn help to reduce the duration and cost investment for vaccine expansion [8,30].
In this study, physicochemical analysis revealed that the SARS-CoV-2 surface glycoprotein QIA98583.1 exhibited the highest extinction coefficient of 148,960 M -1 cm -1 and the lowest GRAVY value of -0.077 among the identified viral proteins.In addition, this selected surface glycoprotein was highly stable (instability index <40) and antigenic.The antigenicity of the protein was determined by the VaxiJen V2.0 server.If a compound has a variability index greater than 40, it means that the product is considered to be unbalanced [31].The extinction coefficient refers to the quantity of light that is captured by a complex at a particular wavelength [32,33].Various physicochemical properties, including the number of amino acids, molecular mass/weight, theoretical pI, extinction coefficient, uncertainty index, aliphatic index, and GRAVY, were resolved by the ProtParam server [34].
The two major functioning immune cells are B and T lymphocytic cells, which are responsible for several defensive roles in the body.Once identified by an antigen-presenting cell (APC; eg, dendritic cells and macrophages), the antigen is accessible by the MHC class II molecule existing on the surface of APCs to helper T cells.Subsequently, the helper T cell acquires a CD4+ fragment on its surface, designated as a CD4+ T cell.Once stimulated by an APC, helper T cells subsequently stimulate B cells, yielding antibody-producing plasma B cells alongside memory B cells.Plasma B cells harvest several antibodies and memory B cells function in long-term immunological memory.Moreover, macrophages and CD8+ cytotoxic T cells are also triggered by helper T cells to ultimately abolish the target antigen [35][36][37][38][39].
The possible B and T cell epitopes of the selected SARS-CoV-2 viral protein were identified by the IEDB server [14], which generates and ranks the epitopes based on their antigenicity scores and percentile scores.The top 10 MHC class I and class II epitopes were engaged for this investigation.The topology of the precise epitopes was resolved by the TMHMM v2.0 server [17].In all inflammatory situations such as allergenicity, antigenicity, toxicity, and conservancy examinations, the T cell epitopes were found to be exceedingly antigenic with a higher immune response without allergenicity or toxicity, and showed a conservancy of over 90%.Among the 10 certain MHC class I and 10 selected MHC class II epitopes of the protein, four epitopes were designated based on the revealed properties, GVYFASTEK, TLADAGFIK, NFRVQPTESI, and LLIVNNATNVV, along with antigenic and nonallergenic B cell epitopes that were selected for additional vaccine candidate investigation.Cluster examination of the conceivable MHC class I and MHC class II alleles that might interact with the XSL • FO RenderX predicted epitopes was performed by the online tool MHC cluster 2.0 [20].The antigenicity, demarcated as the capability of an extraneous ingredient to act as an antigen and stimulate B and T cell responses over their epitope, correspondingly identifies the antigenic determinant portion [40].The allergenicity is defined as the capability of that ingredient to act as an allergen and induce latent allergic responses in the host [41].
Moreover, cluster analysis of the MHC class I and II alleles was similarly performed to categorize their association with each other and group them based on their functionality and predicted specificity [19].In the following steps, peptide-protein docking was performed among the selected epitopes and MHC alleles.The MHC class I epitopes remained docked to the MHC class I molecule (PDB ID 5WJL) and the MHC class II epitopes were docked to the MHC class II molecule (PDB ID 5JLZ) correspondingly.The peptide-protein docking was performed to evaluate the capability of the epitopes to interact with the MHC molecules.Predocking was performed by UCSF Chimera and then 3D structure generation of the epitopes was performed.The docking was executed by the PatchDock and FireDock servers and analyzed by the HPEPDOCK server constructed on global energy.The GVYFASTEK epitope demonstrated the best scores in the peptide-protein docking.All of the vaccine candidates proved to be potentially antigenic and nonallergenic, indicating that they should not cause any allergenic reaction within the host.However, more in vitro and in vivo examinations should be performed to confirm the safety, usefulness, and potential of the predicted vaccine candidates.

Conclusion
In the face of the enormous tragedy of suffering, demise, and social adversity caused by the COVID-19 pandemic, it is of extreme importance to develop an effective and safe vaccine against this disease.Bioinformatics, reverse vaccinology, and related technologies are widely used in vaccine design and development, since these technologies reduce costs and time.In this study, we first identified proteins belonging to SARS-CoV-2 against the human host from strains in India.The potential B cell and T cell epitopes that can effectively elicit cellular-mediated immune responses related to these selected proteins were then determined through robust processes.The potential T cell epitope (GVYFASTEK) and B cell epitopes (LTPGDSSSGWTAG, VRQIAPGQTGKIAD, QIAPGQTGKIAD, and ILPDPSKPSKRS) can play major roles in the development of new subunit and multiepitope vaccines.In brief, reverse vaccinology is confirmed as a reliable means to recognize novel vaccine candidates and their consequential application.This study can motivate further research in an innovative and efficient direction to deliver a fast, reliable, and significant platform in search of an effective and timely cure of COVID-19 caused by SARS-CoV-2.

Figure 3 .
Figure 3. Major histocompatibility complex (MHC) class cluster analysis.(A) Heat map.(B) Specificity tree.The red zone indicates a strong interaction and the yellow zone corresponds to a weaker interaction.

Figure 5 .
Figure 5. (A) Docking pose analysis via LigPlot (GVYFASTEK epitope docking against the HLA-A*11-01 allele [PDB ID: 5WJL]).Molecular docking result showing protein-ligand interaction.Oxygen (O), nitrogen (N), and carbon (C) atoms are represented by red, blue, and black circles, respectively.(B) Molecular docking analysis showing that the docking site of the ligand (GVYFASTEK epitope) in our study is similar to the ligand used in the crystal structure of the HLA-A*11-01 allele (PDB ID: 5WJL).