Second wave of COVID-19 in India could be predicted with genomic surveillance of SARS-CoV-2 variants coupled with epidemiological data: A tool for future

India has witnessed a devastating second wave of COVID-19, which peaked during the last week of April and the second week of May, 2021. We aimed to understand whether the arrival of second wave was predictable and whether it was driven by the existing SARS-CoV-2 strains or any of the emerging variants. We analyzed the monthly distribution of the genomic sequence data for SARS-CoV-2 from India and correlated that with the epidemiological data for new cases and deaths, for the corresponding period of the second wave. Our analysis shows that the first indications of arrival of the second wave were observable by January, 2021, and by March, 2021 it was clearly predictable. B.1.617 lineage variants drove the wave, particularly B.1.617.2 (a.k.a. delta variant). We propose that genomic surveillance of the SARS-CoV-2 variants augmented with epidemiological data can be a promising tool for predicting future COVID-19 waves.


Methods (Study design, participants, data sources, and statistical analyses)
A time series (weekly and monthly) distribution of the genomic sequence data for SARS-CoV-2 from India along with the epidemiological data for new cases and deaths from COVID-19 were analyzed for the period from 1 st December 2020 to 26 th July 2021 (a total of 34 weeks).Further, the phylodynamic analysis of the individual variants circulating in the population in the period of study was performed.

Weekly and monthly distribution of SARS-CoV-2 variants and new cases and deaths due to COVID-19
The genomic sequence data for SARS-CoV-2 and official epidemiological data for COVID-19 for the period from 1 st December 2020 to 26 th July 2021 from India were downloaded from EpiCoV TM database of Global Initiative on Sharing All Influenza Data (GISAID) and Worldometer: https://www.worldometers.info/coronavirus/coronavirus/country/india/)respectively.A total of 40359 SARS-CoV-2 genomic sequences were analyzed.The number of sequences for each SARS-CoV-2 variants was retrieved using automatic search function feeding information for the lineage/sub-lineage and collection dates in EpiCoV TM database of GISAID.Total number of sequence per week and month for the studied time period was noted for each variant and their relative proportions were calculated (in percent).Data was tabulated, and weekly and monthly distribution of each variant was charted against the COVID-19 epidemiological data (new case and deaths) and statistically analyzed.The graphs were plotted to visualize the trends.Further, the genomic sequence of SARS-CoV-2 variants were analyzed for the individual states and union territories, to check if there has been any deviation from the collective data trends.

Phylodynamics of SARS-CoV-2 variants
A phylodynamic analysis of the variants circulating in the Indian population in the period of study was performed from GISAID sequences using bioinformatics tool available at EpiCoV TM .

Statistical analysis
All statistical analyses were performed using XLSTAT.Descriptive statistics was calculated for all of the variables.Normal distribution and homogeneity of the data were determined using Andersons Darling and Levene's tests, respectively.A correlation matrix was generated and further a linear regression analysis was conducted between the comparing variables (presented as R values= 0 to 1).For all the comparisons statistical significance threshold was set as p ≤ 0.05.

Ethics approval
An approval from the institute ethics committee was precluded as the data used in this study were retrieved from publically available databases.

Funding
No substantial funding was received for this study.

Results and Discussion
A retrospective analysis of the epidemiological data reflected that the second COVID-19 wave started rising by the end of February, 2021 and peaked by the end of the first week of May 2021.Based on the distinct epidemiological trends observable in the graph (Fig. S1), we divided the period of study (1 st December 2020 to 26 th July 2021, a total of 34 weeks) into pre peak (1-23 week) and post peak (24-34 week) period (Fig. S1).The weekly average of new cases and deaths showed strong correlation in the complete study period (R=0.98,p<.001) signifying high statistical validity of the data for the further comparisons.Further, we analyzed the distribution of SARS-CoV-2 variants circulating in the Indian population in correlation of the new cases and deaths in the period before the peak and thereafter.For the purpose of description, based on the epidemiological trends, the pre peak period was further divided into three time series intervals: 'very early' (1-8 weeks), 'early' (9-16 weeks), and 'near peak' (17-23 weeks).In the 'very early' period new cases and deaths showed a down ward trend, which maintained a plateau in the 'early' period (except towards the end when cases and deaths started rising indicating start of the second wave).In the 'near peak' period steep rise in the new cases and deaths were observable (Fig. 1).The rise of B.1.617lineage variants showed an evolution in the relationship with the epidemiological data, showing weak to strong statistical correlation with the rise of new cases, in 'very early' (R= 0.36, p> .05)and 'early' (R= 0.26, p> .05) to 'near peak' (R= 0.87, p= 0.01) time interval; which indicated intensification of the second wave as the relative proportion of these variants increased (Fig. 1).Of note, this trend was not observed with any other variant, signifying that the B.1.617lineage was the prime driver of the rising second wave.Further, we wanted to know which particular B.1.617lineage variants were dominating in the studied time period.Interestingly, an intra-lineage competition was distinctly visible between the sub-lineages of B.1.617(Fig. 2).The single sample of B.1.617was reported on 25. 02.2021 (date of collection) (EPI-ISL_1544002) and thereafter it has not been detected.First case of B.1.617.2 was detected as early as 21 st November, 2020 (EPI_ISL_2373501) followed by B.1.617.1 on 1 st December 2020 (EPI_ISL_1372093), and B.1.617.3 on 14 th December 2021 (EPI_ISL_2099648).Up to end of February (week 13), B.1.617.1 was detected in higher number of sequences than B.1.617.2, which was followed by B.1.617.3.However by the end of March (week 18) B.1.617.1 took a downturn, but B.1.617.2 continued to rise along with similar rise in the number of new cases.Notably, by the end of April 2021 B.1.617.2 was singly detected in about 72% of the SARS-CoV-2 samples from India uploaded in GISAID database that further reached to about 79% in the week of peak (23 rd week).Notably, B.1.617.2, but no other B.1.617lineage variants, matched the rise in the new cases leading to the peak of wave (Fig. 2) (very early: R= 0.35, p> .05,early: R=0.17, p> .05;and near-peak: R=0.92, p=.0028) (Table S1).(Data source: SARS-CoV-2 genomic sequence-GISAID database: https://www.gisaid.org.Data was analyzed for the period of 1 st December, 2020 to 26 th July 2021.) The phylodynamic analysis of the circulating variant in the period of study strongly corroborated with the trends presented in the graphs, showing an exclusive increase in the cluster density of B.1.617.2 in comparison to the other variants in the near-peak period (Fig. 3).To know whether the rise in B.1.617.2 variant was localized to certain geographical regions which may have influenced the collective data trends, we compared the monthly distribution of genomic sequences of the SARS-CoV-2 variants for the states and union territories of the country individually.A similar increase in the detection of B.1.617.2 variant was observable in most of the states and union territories (Fig. S2), except Kerala, where a different patterns was visible (Fig. S2-O).In Kerala rise of B.1.617.2 was slower in comparison to rest of the country (55.5% vs. 72% of total cases by the end of April 2021), which was further confirmed in the state-wise sero-survey data from the period of second wave (44.4% vs. 67.7% of national average) (11).Notably, an intensive rise in the B.1.617.2 cases were observed in Kerala in a later period.
Thus, the analyses collectively distinctly delineate that the formation of the second COVID-19 wave in India was closely associated with the rise of B. The findings of this study signify that the genomic surveillance of the SARS-CoV-2 variants augmented with epidemiological data can be a potential tool for forecasting imminent COVID-19 waves.Nevertheless, the accuracy of the prediction would largely dependent on the population matched viral genomic sequencing and consistency in uploading of the data from all geographical regions, as well as accurate reporting of the epidemiological data, which currently seems a big hindrance restricting timely predictions.
Plausibly, the exclusive rise in the proportion of an emerging SARS-CoV-2 variant matched with the concomitant rise in new cases should inform arrival of a new COVID-19 wave.However, apart from these, considering the other epidemiological factors, such as previous exposure with the related viral strains and immunization status of the population will be necessary to determine the extent of an imminent wave (12).Notably, first COVID-19 wave in India was limited in the extent as were indicated by the sero-survey data, and also very limited population was vaccinated at the beginning of 2021; with emergence of a new variant both of these factors may have created an optimum condition for the rising of a massive second wave.Further, the preventive measures in place, such as lockdowns or restrictions against gatherings, and use of face masks can also influence the prospects and extent of a new wave.
There have been multiple limitations in our study which could have impacted the interpretation of the findings.Firstly, the samples used in our analyses were not representative for populations as for many geographical regions that has been greatly disproportioned.Hence, the genomic sequence data presented in this study doesn't necessarily reflect the accurate epidemiological scale of spread of the variants in the reported geographical regions, but only shows their relative proportion in the samples for which genomic sequences were uploaded in GISAID database.We assumed that similar proportions between variants exist in the actual population.Secondly, there has been inconsistency in reporting and uploading of the genomic sequences, which constrained examining a daily trend in the spread of the variants.The paucity of the genomic sequences and inconsistency in their uploading on the used databases for some states/union territories made the determination of the variant dominance difficult.

Data sharing
Primary data used for this study are publicly available on: SARS-CoV-2 genomic sequence-GISAID database: https://www.gisaid.org/;Epidemiological data-Worldometer: https://www.worldometers.info/coronavirus/coronavirus/country/india).The categorized data for the study period can be availed from the corresponding author on reasonable request.

Figure 1
Figure 1 Weekly distribution of SARS-CoV-2 variants in genomic sequence data from India and their correlation with daily new COVID-19 cases and deaths for the period of 1 st December 2020 to 26 th July 2021.[The data were analyzed for the period before the peak of the second wave (23 rd week) and thereafter.For the purpose of description, based on the epidemiological trends, the pre peak period was further divided into three time series intervals: 'very early' (1-8 weeks), 'early' (9-16 weeks), and 'near peak' (17-23 weeks).(Data source:

Figure 3
Figure 3 Phylodynamics of SARS-CoV-2 variants in Indian population for the period of 1 st December 2020 to 26 th July 2021.(Data source: SARS-CoV-2 genomic sequence-GISAID database: https://www.gisaid.org.) 1.617 lineage variants, particularly its sub-lineage B.1.617.2.The first indication of an imminent COVID-19 wave was observable by the end of January 2021 when cases by B.1.617.2 surpassed all other variants, and the rise of the wave was clearly observable by the end of March 2021 when cases by B.1.617.2 showed a steep rise matched with the total new cases.Our findings get corroborated by a recent article published by a group of scientists affiliated with INSACOG-Indian SARS-CoV-2 Genomic Consortia, who observed a similar pattern in rise of B.1.617lineage, primarily B.1.617.2 variant in Delhi before second wave (1).