Translate this page into:
Genomic analysis of SARS-CoV-2 strains among Indians returning from Italy, Iran & China, & Italian tourists in India
*For correspondence: hellopragya22@gmail.com
This is an open access journal, and articles are distributed under the terms of the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 License, which allows others to remix, tweak, and build upon the work non-commercially, as long as appropriate credit is given and the new creations are licensed under the identical terms.
This article was originally published by Wolters Kluwer - Medknow and was migrated to Scientific Scholar after the change of Publisher.
Sir,
The single-stranded RNA genome of the 2019 novel severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) about 29.9 kb in length and encoding about 9860 amino acids, was annotated to possess 14 open reading frames (ORFs) and 27 proteins12. The orf1ab and orf1a genes at the 5´-terminus of the genome encode the pp1ab and pp1a proteins, respectively, together form 15 non-structural proteins (nsps), nsp1-nsp10 and nsp12-nsp16. The 3´-terminus of the genome encodes four structural proteins, the spike surface glycoprotein (S), the small envelope protein (E), membrane protein (M) and nucleocapsid protein (N). There are eight accessory proteins denoted as 3a, 3b, p6, 7a, 7b, 8b, 9b and ORF142.
The epidemiology of the SARS-CoV-2 since its emergence in December 2019 has been ever expanding, with increase in the number of cases and its spread globally34. The number of SARS-CoV-2 cases in India as on March 31, 2020 was 1,071, with mortality crossing 294. In this context, it is vital to understand the genetic nature of circulating SARS-CoV-2. In India, as per the guideline of the Ministry of Health and Family Welfare, suspected samples of SARS-CoV-2 were collected and tested at the designated Viral Research and Diagnostic Laboratories (VRDL)5. As a part of this activity, a total of 15 SARS-CoV-2 positive specimens were obtained during the first week of March 2020, from Italian tourists and travellers from Italy and their contact cases in India. Further, in an effort to screen Indian nationals in Iran to enable their evacuation, during March 5 to 17, 2020, throat swabs were collected from 1,920 individuals; of whom 281 were positive. In addition, a team of Indian doctors visited Italy and collected a total of 380 swabs of Indian citizens; of whom four positive specimens were identified. In an earlier study, the authors identified the first three cases of SARS-CoV-2 in Kerala, India, as imported cases from Wuhan, China, and presented the first two full-genome sequences along with the potential B-cell and T-cell epitopes on the spike protein6. Further, in another study, the SARS-CoV-2 viruses were isolated in Vero CCL-81 cells7. The present study was undertaken to understand and compare the genetic makeup of representative samples of the imported cases of SARS-CoV-2 to India from Wuhan, China, those of Italian tourists in India and the Indians evacuated from Iran and Italy.
Throat swab/nasal swab specimens collected from the 1,920 individuals in Iran were tested at the Indian Council of Medical Research-National Institute of Virology (ICMR-NIV) Pune, using real-time reverse transcription-polymerase chain reaction (RT-PCR) protocols to detect RdRp (1), RdRp (2), E and N genes as described elsewhere8. Next-generation sequencing (NGS) was performed on a total of 41 SARS-CoV-2 positive clinical samples from Italy and Iran. Table I presents the details of the full genomes obtained (n=19) as a part of this study as well as the two earlier genomes retrieved from the Kerala samples (n=2) from those who had the travel history from China67.
Sample ID | Travel history/details | Ct value of E gene for clinical samples | Per cent of relevant reads | Genome length (bp) | GISAID ID |
---|---|---|---|---|---|
hCoV-19/India/1-27/2020* | Wuhan, China travel history of Indian citizens (Group A) | 34.5 | 0.36 | 29,854 | EPI_ISL_413522 |
hCoV-19/India/1-31/2020* | 28.98 | 0.80 | 29,851 | EPI_ISL_413523 | |
hCoV-19/India/1073/2020 | Specimens collected at Iran from Indian citizens (Group B) | 25 | 1.53 | 29,855 | EPI_ISL_421662 |
hCoV-19/India/1093/2020 | 23 | 0.10 | 29,847 | EPI_ISL_421663 | |
hCoV-19/India/1100/2020 | 23 | 0.79 | 29,862 | EPI_ISL_421664 | |
hCoV-19/India/1104/2020 | 22 | 34.88 | 29,890 | EPI_ISL_421665 | |
hCoV-19/India/1111/2020 | 22 | 3.36 | 29,861 | EPI_ISL_421666 | |
hCoV-19/India/1115/2020 | 22 | 3.04 | 29,864 | EPI_ISL_421667 | |
hCoV-19/India/1125/2020 | 25 | 0.18 | 29,873 | EPI_ISL_421668 | |
hCoV-19/India/1616/2020 | 23 | 0.60 | 29,857 | EPI_ISL_421669 | |
hCoV-19/India/1621/2020 | 18 | 5.28 | 29,860 | EPI_ISL_421671 | |
hCoV-19/India/1644/2020 | 22 | 1.23 | 29,855 | EPI_ISL_421672 | |
hCoV-19/India/1652/2020 | 24 | 0.17 | 29,847 | EPI_ISL_424363 | |
hCoV-19/India/3118/2020 | Indian contacts of an Indian citizen having travel history to Italy (Group C) | 24 | 3.30 | 29,857 | EPI_ISL_424364 |
hCoV-19/India/3239/2020 | 20 | 22.58 | 29,862 | EPI_ISL_424365 | |
hCoV-19/India/770/2020‡ | Italian tourists who arrived in Delhi, India and an Indian contact of the cohort (Group D) | 18 | 93.08 | 29,862 | EPI_ISL_420545 |
hCoV-19/India/773/2020‡ | 25.1 | 19.98 | 29,858 | EPI_ISL_420549 | |
hCoV-19/India/777/2020‡ | 22.1 | 26.93 | 29,856 | EPI_ISL_420551 | |
hCoV-19/India/781/2020‡ | 22.1 | 35.47 | 29,871 | EPI_ISL_420553 | |
hCoV-19/India/31/2020 | Close contacts in Agra, of an infected Delhi-based person who returned from Italy (Group E) | 25 | 2.13 | 29,860 | EPI_ISL_426179 |
hCoV-19/India/32/2020‡ | 16 | 88.50 | 29,903 | EPI_ISL_420555 |
Multiple sequence alignment of 21 full genomes obtained and 1563 full-genome sequences (
Phylogenetic trees based on full-genome sequences deposited and available at GISAID revealed the diversification and the clustering of sequences into groups, based on the genetic variants. Specific amino acid substitutions in the nsp3 region, spike protein and ORF8, in general, lead to the formation of V, G and S genetic variants/clades, respectively. The S clade corresponds to the C28144T nucleotide polymorphism that results in a non-synonymous substitution Leu84Ser in ORF8. Clades V, G and a group of unclassified strains possess mainly C28144 and are referred to as the L type11. The phylogenetic analyses of the study strains and the other global sequences revealed that the SARS-CoV-2 sequences derived from Italy (n=8) in this study, clustered in clade G, while the SARS-CoV-2 sequences (n=11) of Indians evacuated from Iran belonged to the unclassified group which also included one of the SARS-CoV-2 sequences imported from Wuhan (hCoV-19/India/1-27/2020) (Figure). The other sequences imported from Wuhan (hCoV-19/India/1-31/2020) possessed Leu84Ser in ORF8b, classifying it in clade S.

- Phylogenetic tree of selected representative full-genome sequences of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)-derived from coronavirus disease 2019 positive patients' clinical samples who had travel history of Wuhan, China, Italy and Iran by neighbour-joining method. Strains sequenced at ICMR-NIV are shown in magenta colour. The clades as per Global Initiative on Sharing All Influenza Data (GISAID) nomenclature are indicated in blue (clade G), red (clade V), green (clade S) and black (unclassified).
The sequences of Italy origin were noted to segregate into at least two subgroups. The percentage nucleotide divergence (PND) within these sequences was found to be 0.01 per cent. The SARS-CoV-2 sequences from the Italian tourists (n=6) showed relatedness to other European SARS-CoV-2 sequences from Scotland, Finland, England, Spain, Ireland and the Czech Republic along with a Shanghai, China, strain as the outgroup (Figure). Two other sequences (hCoV-19/India/3118/2020 and hCoV-19/India/3239/2020) clustered more closely with sequences from Belgium and Switzerland. The two sequences (hCoV-19/India/31/2020 and hCoV-19/India/32/2020) from the Agra contacts of the Italy-returned Delhi based individuals were more distinct and showed clustering in a strongly supported subgroup consisting of strains from Brazil and the European countries including Switzerland, Germany, France, Hungary and The Netherlands.
The variable amino acid sites based on the alignment of the 21 sequences of this study with respect to Wuhan Hu-1 strain are shown in Table II. All the Italy-origin sequences possessed the substitution D7711G/D614G in the S protein, characteristic of the G clade, along with another mutation P4715L (nsp12-323) that is also shared with many other countries. Mutation S1515F (nsp3-697) was specific to the Italian cohort strain; D8726G (M-3) was specific to hCoV-19/India/3118/2020 and hCoV-19/India/3239/2020 (Indian contacts of an Indian citizen having travel history to Italy), similar to sequences from Scotland, Belgium, Finland, Switzerland and England. The mutations, R9455K and G9456R (N-203 and 204), were found to be specific to the two strains, hCoV-19/India/31/2020 and hCoV-19/India/32/2020 but shared with a few more countries. A recent study has identified the earliest Italian importation of SARS-CoV-2 to a case from Shanghai, China, and has also identified at least two circulating variants in Italy12. Thus, it is likely that the former strain (Italian cohort) has its origin from China, whereas the latter strain (contacts in Agra, n=2) appears to have been from a European cluster involving an entry into Germany that preceded the first cases in Italy by almost a month1213.
Amino acid position in genome | 207 | 378 | 476 | 671 | 1515 | 2079 | 2144 | 2796 | 3606 | 4715 | 4798 | 5538 | 7505 | 7535 | 7711 | 8027 | 8726 | 9082 | 9214 | 9455 | 9456 |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
NC 045512 Wuhan-Hu-1 | R | V | I | I | S | P | P | M | L | P | A | T | R | S | D | A | D | V | L | R | G |
hCoV-19/India/1-27/2020|EPI ISL 413522 | . | . | . | T | . | . | S | . | . | . | V | . | I | . | . | . | . | . | . | . | . |
hCoV-19/India/1-31/2020|EPI ISL 413523 | . | . | V | . | . | L | . | . | . | . | . | I | . | . | . | V | . | . | S | . | . |
hCoV-19/India/1073/2020 | C | I | . | . | . | . | . | I | F | . | . | . | . | . | . | . | . | F | . | . | . |
hCoV-19/India/1093/2020 | C | I | . | . | . | . | . | I | F | . | . | . | . | . | . | . | . | F | . | . | . |
hCoV-19/India/1100/2020 | C | I | . | . | . | . | . | I | F | . | . | . | . | . | . | . | . | F | . | . | . |
hCoV-19/India/1104/2020 | C | I | . | . | . | . | . | I | F | . | . | . | . | . | . | . | . | . | . | . | . |
hCoV-19/India/1111/2020 | . | I | . | . | . | . | . | . | F | . | . | . | . | . | . | . | . | . | . | . | . |
hCoV-19/India/1115/2020 | C | I | . | . | . | . | . | I | F | . | . | . | . | . | . | . | . | F | . | . | . |
hCoV-19/India/1125/2020 | . | I | . | . | . | . | . | . | F | . | . | . | . | . | . | . | . | . | . | . | . |
hCoV-19/India/1616/2020 | C | I | . | . | . | . | . | I | F | . | . | . | . | . | . | . | . | . | . | . | . |
hCoV-19/India/1621/2020 | C | I | . | . | . | . | . | I | F | . | . | . | . | . | . | . | . | . | . | . | . |
hCoV-19/India/1644/2020 | C | I | . | . | . | . | . | I | F | . | . | . | . | . | . | . | . | . | . | . | . |
hCoV-19/India/1652/2020 | C | I | . | . | . | . | . | I | F | . | . | . | . | . | . | . | . | . | . | . | . |
hCoV-19/India/3118/2020 | . | . | . | . | . | . | . | . | . | L | . | . | . | . | G | . | G | . | . | . | . |
hCoV-19/India/3239/2020 | . | . | . | . | . | . | . | . | . | L | . | . | . | . | G | . | G | . | . | . | . |
hCoV-19/India/770/2020 | . | . | . | . | F | . | . | . | . | L | . | . | . | . | G | . | . | . | . | . | . |
hCoV-19/India/773/2020 | . | . | . | . | F | . | . | . | . | L | . | . | . | . | G | . | . | . | . | . | . |
hCoV-19/India/777/2020 | . | . | . | . | F | . | . | . | . | L | . | . | . | . | G | . | . | . | . | . | . |
hCoV-19/India/781/2020 | . | . | . | . | F | . | . | . | . | L | . | . | . | . | G | . | . | . | . | . | . |
hCoV-19/India/31/2020 | . | . | . | . | . | . | . | . | . | L | . | . | . | . | G | . | . | . | . | K | R |
hCoV-19/India/32/2020 | . | . | . | . | . | . | . | . | . | L | . | . | . | . | G | . | . | . | . | K | R |
Wuhan Hu-1 strain of severe acute respiratory syndrome coronaviruses 2 (SARS-CoV-2) is used as the reference strain. Strains and mutations specific to China, Iran and Italy are shown in orange, violet and brown colour, respectively. In case of Iran and Italy, only those amino acid sites are shown where at least two of the sequences share the same mutation. R, arginine; V, valine; I, isoleucine; S, serine; P, proline; M, methionine; L, leucine; A, alanine; T, threonine; D, aspartic acid; G, glycine; C, cysteine; F, phenylalanine; K, lysine
Analysis of the strains from the SARS-CoV-2 positives in Iran (Figure) showed that these sequences (n=11) clustered with other strains having a global spread inclusive of Canada, USA, several European countries, New Zealand, Australia and Southeast Asian countries noted in this group (moderate support of 64%). The PND among these study sequences was found to be 0.24 per cent. Common mutations shared among SARS-CoV-2 sequences in the group included R207C (nsp2-27), V378I (nsp2-198), M2796I (nsp4-33) and L3606F (nsp6-37). A mutation V9082F (ORF7a-74) was unique to four of the study sequences (hCoV-19/India/1073/2020, hCoV-19/India/1093/2020, hCoV-19/India/1115/2020 and hCoV-19/India/1100/2020) that clustered with a strain from Kuwait, KU12. The KU12 strain was also noted to possess this mutation. To date, there are no other sequences from Iran in the GISAID database. However, a phylogenetic study14 of full-genome sequences has identified distinct SARS-CoV-2 link to travellers returning from Iran to Australia and New Zealand. Some of these representative sequences were included in this study as well.
In terms of the overall divergence of SARS-CoV-2, the strains in this study were 99.97 per cent identical to the earliest strain Wuhan Hu-1. However, it is vital to track the evolutionary dynamics of the strains vis-à-vis the strains circulating globally and monitor any specific changes in the functional sites of the major viral proteins.
Delineation of circulating strains into three major evolving clades has been reflected in GISAID, with clade G apparently being one of the dominant ones. From the start of the pandemic, severity or transmission patterns have not been associated with any clade in particular. A limitation of this study was the non-availability of full genomes from other parts of India. This would enable a pan-India comparison of the circulating strains in the country. Overall, the present study revealed genetic variants in India that were similar to strains circulating in the specific regions of their origin. Continued surveillance of SARS-CoV-2 strains in India is warranted to get the complete picture of all circulating strains and identify changes that could be associated with increased virulence.
Supplementary Table
Supplementary Table Acknowledgement for the list of the sequences downloaded from GISAID database that were used in the studyAcknowledgment
Authors thank Prof. (Dr) Balram Bhargava, Director-General, Indian Council of Medical Research (ICMR) & Secretary, Department of Health Research (DHR), Ministry of Health & Family Welfare (MoHFW), New Delhi for the support. Authors acknowledge the support from Dr P. Ravindran, Director, Emergency Medical Response (EMR), MoHFW, Dr R. Lakshminarayan, ICMR and the team from the DHR, MoHFW, for the logistic support. The National Centre for Disease Control (NCDC) team is acknowledged for sample collection from Italy. Shri Santosh Jadhav, Bioinformatics Group, ICMR-National Institute of Virology, Pune, is thanked for his inputs.
Financial support & sponsorship: Financial support provided by the Indian Council of Medical Research, New Delhi, is acknowledged
Conflicts of Interest: None.
References
- Genomic characterization of the 2019 novel human-pathogenic coronavirus isolated from a patient with atypical pneumonia after visiting Wuhan. Emerg Microbes Infect. 2020;9:221-36.
- [Google Scholar]
- Genome composition and divergence of the novel coronavirus (2019-nCoV) originating in China. Cell Host Microbe. 2020;27:325-8.
- [Google Scholar]
- A pneumonia outbreak associated with a new coronavirus of probable bat origin. Nature. 2020;579:270-3.
- [Google Scholar]
- 2020. Novel coronavirus (2019-nCoV) situation reports. WHO; Available from: https://wwwwhoint/emergencies/diseases/novel-coronavirus-2019/situation-reports
- Laboratory preparedness for SARS-CoV-2 testing in India: Harnessing a network of virus research & diagnostic laboratories. Indian J Med Res 2020:151. doi: 104103/ijmrIJMR_594_20
- [Google Scholar]
- Full-genome sequences of the first two SARS-CoV-2 viruses from India. Indian J Med Res 2020:151. doi:104103/ijmrIJMR_663_20
- [Google Scholar]
- First isolation of SARS-CoV-2 from clinical samples in India. Indian J Med Res 2020:151. doi:104103/ijmrIJMR_1029_20
- [Google Scholar]
- Development of in vitro transcribed RNA as positive control for laboratory diagnosis of SARS-CoV-2 in India. Indian J Med Res 2020:151. doi: 104103/ijmrIJMR_671_20
- [Google Scholar]
- MAFFT: A novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 2002;30:3059-66.
- [Google Scholar]
- MEGA7: Molecular evolutionary genetics analysis version 7.0 for bigger datasets. Mol Biol Evol. 2016;33:1870-4.
- [Google Scholar]
- On the origin and continuing evolution of SARS-CoV-2. Natl Sci Rev 2020 doiorg/101093/nsr/nwaa036
- [Google Scholar]
- Genomic characterisation and phylogenetic analysis of SARS-CoV-2 in Italy. medRxiv 2020 doi:2020031520032870
- [Google Scholar]
- First cases of coronavirus disease 2019 (COVID-19) in the WHO European Region, 24 January to 21 February 2020. Euro Surveill. 2020;25:2000178.
- [Google Scholar]
- An emergent clade of SARS-CoV-2 linked to returned travellers from Iran. bioRxiv doi: 20200315992818
- [Google Scholar]