Scientists have discovered that several viruses belonging to the coronavirus family can infect a variety of hosts, including birds, humans and other mammals. These viruses are single-stranded, positive-sense RNA viruses between 27-32 kb in size. They are divided into four categories, namely alpha, beta, delta and gamma.
Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), the causative agent of the ongoing coronavirus disease (COVID-19) pandemic in 2019, was first detected in Wuhan Province, China in December 2019. Mortality, the World Health Organization declared COVID-19 a pandemic on March 11, 2020.
When a virus undergoes genomic mutation, the most important thing is to determine the mutation site for vaccine development. Several phylogenetic tree-based analyses have been performed to understand the evolutionary relationship between SARS-CoV-2 and other β-coronaviruses. Previous studies have constructed a phylogenetic tree and found that the genome sequence of SARS-CoV-2 has 88% homology with BAT-CoV. In another study, scientists isolated about 70 SARS-CoV-2 genome sequences from COVID-1
Even though the genome sequences of SARS-CoV, MERS-CoV and SARS-CoV-2 can be compared and studied, there are still gaps in the comparison of the four coronaviruses, namely SARS-CoV, MERS-CoV, BAT-CoV and SARS -CoV-2.A new study involving the genome comparison between the four coronavirus sequences mentioned above has been published in Journal of Medical Virology. This research utilized a variety of genetic markers, including single nucleotide polymorphisms (SNPs), whole-genome sequence phylogeny, protein mutations, and microsatellites. These were compared with the SARS-CoV-2 reference genome sequence (called Wuhan-Wu-I). All sequences were obtained from NCBI Genbank.
The SARS-CoV, MERS-CoV and SARS-CoV-2 sequences were obtained from Homo sapiens (Host), and the BAT-CoV sequence was collected from eight different types of bats. The results of this study are described below.
In order to perform phylogenetic analysis on different coronavirus sequences, a maximum likelihood method with 1000 bootstrap values was used. Phylogenetic analysis revealed different lineages of coronaviruses. The entire genome-based phylogenetic analysis shows that MERS-CoV belongs to the outer group species, while the other three are classified as the inner group species. In this population, two pedigrees were found, one consisting of SARS-CoV-2 and the other consisting of SARS-CoV and BAT-CoV. The branches of the phylogenetic tree indicate that the differentiation of SARS-CoV and BAT-CoV is very early. The tree also shows the independent deviation of SARS-CoV-2 from BAT-CoV. Phylogeny also shows that SARS-CoV-2 is more closely related to BAT-CoV and SARS-CoV than MERS-CoV. Simplot software is used to visualize the similarity map between the four selected species. It revealed that BAT-CoV has about 98% homology with the reference sequence (the Wuhan staining of SARS-CoV-2). However, the similarity between SARS-CoV and the reference sequence is 92%, while the similarity between MERS-CoV and the Wuhan strain is 58%.
Genetic variation analysis
Variation-based analysis shows that the difference between the MERS-CoV genome and the Wuhan reference strain is 134.21 bits, the difference between the BAT-CoV genome is 136.72 bits, the difference between the SARS-CoV genome is 26.64 bits, and the SARS-CoV-2 genome is at 0.66 sites. The above is different. In addition, the current research also shows that compared with SARS-CoV and BAT-CoV, MERS-CoV and SARS-CoV-2 missense sites are more likely to have mutations. This is due to the decrease in the number of missense variants in SARS-CoV and BAT-CoV, which is caused by the selection pressure on the missense sites.
The mutation numbers of ear protein (S), envelope protein (E), membrane protein (M), nucleocapsid protein (N) and structural protein were calculated. Filter SNPs from S, M, E, and N gene regions through a python script. The S, M, E and N genes show the presence of various SNPs. The Multialin online tool is used to detect the similarities between the four coronaviruses selected for the current study.
Microsatellite analysis is used to determine repetitive sequences in the genome. These sequences have a major impact on the onset and evolution of the disease. In this study, microsatellite analysis was performed using IMEX (Imperfect Microsatellite Extractor) and FMSD (Fast Microsatellite Discovery) online tools. No obvious presence of microsatellites was found using IMEX. However, FMSD revealed that there are more microsatellites in MERS-CoV. The SARS-CoV-2 genome shows the highest incidence of compound microsatellites.
In summary, phylogenetic tree analysis shows that SARS-CoV-2 is closely related to BAT-CoV, and its second close relative is SARS-CoV. All MERS-CoV strains show a distant relationship with SARS-CoV-2. In genetic variation analysis, more mutations were found in MERS-CoV compared with SARS-CoV and BAT-CoV. Phylogenetic analysis, genetic variation, multiple sequence and microsatellite analysis have shown that bats are the natural host of SARS-CoV-2. In addition, it also concluded that BAT-CoV is closely related to SARS-CoV-2. There may be an intermediate host to initiate the spread of COVID-19 from BAT to humans. However, more research is needed to verify this hypothesis. The FMSD tool shows that the relationship between SARS-CoV and SARS-CoV-2 is closer than that of BAT-CoV.
- AH, Rehman Wait. (2021). Comprehensive comparative genome and microsatellite analysis of SARS, MERS, BAT-SARS and COVID-19 coronavirus. Journal of Medical Virology, Https://doi.org/10.1002/jmv.26974, https://onlinelibrary.wiley.com/doi/10.1002/jmv.26974