The novel Coronavirus enigma: Phylogeny and mutation analyses of SARS-CoV-2 viruses circulating in India during early 2020

Background This is a comprehensive analysis of 46 Indian SARS-CoV-2 genome sequences available from the NCBI and GISAID repository during early 2020. Evolutionary dynamics, gene-specific phylogeny and emergence of the novel co-evolving mutations in nine structural and non-structural genes among circulating SARS-CoV-2 strains in ten states of India have been assessed. Materials and methods 46 SARS-CoV-2 nucleotide sequences submitted from India were downloaded from the GISAID (39/46) or from NCBI (7/46) database. Phylogenetic study and analyses of mutation were based on the nine structural and non-structural genes of SARS-CoV-2 strains. Secondary structure of RdRP/NSP12 protein was predicted with respect to the novel A97V mutation. Results Phylogenetic analyses revealed the evolution of “genome-type clusters” and adaptive selection of “L” type SARS-CoV-2 strains with genetic closeness to the bat SARS-like coronaviruses than pangolin or MERS-CoVs. With regards to the novel co-evolving mutations, 2 groups are seen to circulate in India at present: the “major group” (52.2%) and the “minor group” (30.4%), harboring four and five co-existing mutations, respectively. The “major group” mutations fall in the A2a clade. All the minor group mutations, except 11083G>T (L37F, NSP6) were unique to the Indian isolates. Conclusion The study highlights rapidly evolving SARS-CoV-2 virus and co-circulation of multiple clades and sub-clades, driving this pandemic worldwide. This comprehensive study is a potential resource for monitoring the novel mutations in the viral genome, changes in viral pathogenesis, for designing vaccines and other therapeutics.


Introduction
through which virus made its way into human from bats during early phase of pandemic, but many questions 50 remain unanswered even though more sequence data is made available. Studying the heterogeneous genomic 51 constellations within specific geographical settings would help to understand its complex epidemiology and 52 formulate region specific strategies. 53 The first three cases from India were reported in Kerala during January 2020 having travel history to Wuhan

187
24933G>T (G1124V) ( Table 1, Mutation 3) and 22444C>T (D294D) ( Table 5) were also observed in the S 188 gene of "major group". 16 out of the 24 isolates revealed three novel mutations 28854C>T (S194L) (5 samples),  (Table 2, Mutation 2, 3 and 4 respectively), whereas two isolates of the "minor group"had 28396G>A (R41R) 191 change in the N gene (Table 5). Intriguingly, 28854C>T (S194L) in N gene was found to co-evolve with 192 22444C>T (D294D) mutation inthe S gene. We also observed 1059T>A (T85I) change within the NSP2 (Table   193 4) and, 6310C>A (S1197R), 7392C>T (P1558L) ( Table 4) and 6466A>G (K1249K) (  Being the crucial enzyme for viral RNA replication and maintaining the genomic fidelity, any significant 197 change in RdRP structure could affect its functions, thereby increasing the rate of mutagenesis in the genome. 198 We have identified two missense mutations in the RdRP protein; P323L associated with the "major group" 199 isolates and A97V associated with the "minor group" isolates. The effect of P323L on the secondary structure 200 of RdRP has already been described [8]. Therefore, we analyzed the effect of novel mutation A97V on the 201 secondary structure of RdRP by using CFSSP server. A97V mutation resulted in substitution of α-helixes at 202 positions 94, 95 and 96 with the β-sheets in the RdRP secondary structure which might alter its tertiary 203 conformation, resulting in significant functional implications (Fig.2). as the origin of this zoonotic virus, which has been eventually transmitted to worldwide [11][12][13].

216
The origin of the SARS-CoV-2 is still a debatable issue but identification of its (2/46, 4.3%), indicating the L-type to predominate over S-type in this geographical region.

239
Convoluted mutational analysis also revealed co-circulation of two groups of mutated SARS-CoV-2 strains in

294
The authors acknowledge the hard work and dedication of scientists performing next generation genome 295 sequencing and submitting them in public data bases for the benefit of scientific community.

Rakesh Sarkar and Mahadeb Lo were supported by fellowships from University Grants Commission and 297
Council of Scientific and Industrial Research, India, respectively.

299
The authors declare that no conflicts of interest exist.