In a recent study posted to the Research Square* preprint server, researchers conducted a forensic analysis of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2)-related coronaviruses (SARS2r-CoVs).

Study: Forensic analysis of novel SARS2r-CoV identified in game animal datasets in China shows evolutionary relationship to Pangolin GX CoV clade and apparent genetic experimentation. Image Credit: Andrii Vodolazhskyi/Shutterstock
Study: Forensic analysis of novel SARS2r-CoV identified in game animal datasets in China shows evolutionary relationship to Pangolin GX CoV clade and apparent genetic experimentation. Image Credit: Andrii Vodolazhskyi/Shutterstock

Other than bats, pangolins are the sole animals who have been found to be infected with SARS2r-CoVs before the coronavirus disease 2019 (COVID-19) pandemic. Various theories strived to explain the source of SARS-CoV-2. Some theories implicated the Huanan seafood market as a potential viral source. In contrast, other theories suggest that a SARS-CoV-2 progenitor virus was developed in a Wuhan laboratory that conducted SARS2r-CoVs research and was accidentally released.

About the study

In the present study, researchers examined the novel SARS2r-CoVs, which were detected in metatranscriptomic datasets related to game animals.

The team analyzed the sequence read archive (SRA) data from BioProjects PRJNA795267 and PRJNA793740. Subsequently, the team identified the pangolin Guangxi (GX)_ZC45r-CoV sequences present in the SRA datasets corresponding to two Myocastor coypus named Coypu or Nutria, two Hystrix brachyura including Malayan porcupine and Rhizomys pruinosus belonging to the Hoary bamboo rat, Meles leucurus from the Asian badger, and Paguma larvata from masked palm civet.

The team aligned each SRA dataset to the GX_ZC45r-CoV gap_filled reference genome which consisted of a GX_ZC45r-CoV developed with empty regions replaced with bat-SL-CoVZC45. The datasets were also assessed with systematic mitochondrial mapping to map the reads corresponding to the mitochondrial genomes.

Furthermore, the team employed the reads mapped to the human genomic mitochondria to assess the human mitochondrial haplogroup present in the genomic datasets. These datasets were mapped to the human reference mitochondrial genome.   


The study results showed that the nonstructural protein 4 (NSP4), NSP10, and ribonucleic acid (RNA)-dependent RNA polymerase (RdRp) coding regions accounted for the coverage of the 16 game animal datasets. Furthermore, the number of reads that were mapped to GX_ZC45r-CoV gap_filled for the seven additional game animal datasets was found to be very low for one to eight reads. The team also noted that several single nucleotide variants (SNVs) related to bat-SL-CoVZC45 were found consistently in all the samples which indicated that the same strain was present in all the samples.

The alignment of GX_ZC45r-CoV to Guangdong (GD) pangolin CoVs (PCoVs) GX-P4L revealed 47 SNVs in the NSP4 region while alignment to bat-SL-CoVZC45 showed 104 SNVs in the NSP10/RdRp region. Furthermore, the coverage corresponding to the NSP4 coding region was incomplete but complete coverage was observed at the 3’ end, including the full coverage of the C-terminus and 50% coverage for the transmembrane domain of the NSP4 region.

The team also observed that while the RdRp coding region showed complete coverage, the NSP10 coding region revealed that 47% of the NSP10 5’ end was not covered with GX_ZC45r-CoV matching reads. Moreover, the 590 nucleotide (nt) region present at the 5’ end corresponding to the RdRp coding region had higher read coverage as compared to the remaining RdRp region. Another anomalous distribution of reading coverage was observed at the 14758 nt relative to bat-SL-CoVZC45.      

Mitochondrial mapping analysis showed that the datasets were contaminated with several unexpected eukaryotic species. Mammal species that were frequently detected in the datasets included Homo sapiens having 16% to 98% mitochondrial genomic coverage and Mus musculus having 13% to 71% genomic coverage. Moreover, the genomic coverage of both Paguma larvata and Homo sapiens was higher than that of the Mele leucurus.

Detection of mitochondrial haplogroups showed that a dominant haplogroup named F1c1 (a1) was detected in 12 of the 14 tested datasets. One of the remaining two datasets did not reveal any mapped reads while in the other dataset all the human mitochondrial reads were associated with the H27/H27e haplogroup. The minor haplogroups detected in the datasets included H1aw and H1t2 from MC-HuN-T-1, C from MC-HeB-T-1, and H27/H27e from the MJ-ZJ-MO-4 dataset.

Phylogenetic analysis was performed on the NSP10 region because it played an essential role in the methylation of the messenger RNA (mRNA) cap. The analysis showed that GX_ZC45r-CoV had a basal sister association with the GX CoV clade. On the other hand, the GD PCoVs were found to be more closely associated with the SARS-CoV-2/ BANAL clade and formed a basal sister clade. Furthermore, Bat-SL-CoV-ZC45 was substantially more divergent from the PD, GX, and SARS2r-CoV clades. The team also noted that the RdRp of GX_ZC45r-CoV was 95.85% similar to bat-SL-CoVZC45.

Overall, the study findings showed that specific regions present in the partial SARS2r-CoV genome and the GX PCoV clade have the same ancestor.

*Important notice

Research Square publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.


Leave a Reply

Your email address will not be published.