In a recent study posted to the bioRxiv* preprint server, researchers surveyed over 360 coronavirus disease 2019 (COVID-19) patients to characterize the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) genomic sequence diversity of individual infections.

Study: COVID-19 Infection and Transmission Includes Complex Sequence Diversity. Image Credit: Dana.S/Shutterstock
Study: COVID-19 Infection and Transmission Includes Complex Sequence Diversity. Image Credit: Dana.S/Shutterstock


Throughout the COVID-19 pandemic, whole-genome sequencing (WGS) helped researchers identify polymorphisms in the SARS-CoV-2 genome and its continuing evolution.

Convention required that the researchers report the majority consensus sequence; however, this approach under-reported around 79% of the observed sequence variation. Accordingly, many nucleotide variations remained undetected, e.g., lineage defining single nucleotide polymorphisms (SNPs) in Omicron BA.1 and BA.2 sub-variants.

Thus, raising the possibility that minority alleles (under-reported) in earlier SARS-CoV-2 infections might be playing an important role in the continuing evolution of new SARS-CoV-2 variants of concern (VOCs).

There are over 29,000 loci in the SARS-CoV-2 genome. WGS has identified complex viral sequence data sets, including consensus sequences that report the majority of nucleotides at each location of the viral genome. Unfortunately, focusing on the majority consensus sequence during WGS diverted attention from genetic variations which contribute to SARS-CoV-2 evolution.

Extensive and concerted efforts to perform phylogenetic assessments captured and presented evidence of the global landscape of continuous emergence of the major SARS-CoV-2 lineages sharing sequence variations. As of April 2021, over 1.8 million genomic sequences of SARS-CoV-2 were submitted from 172 countries documenting the evolution of SARS-CoV-2 strains.

With the surge of each new variant, including Alpha (B.1.1.7), Beta(B.1.351), Gamma (P.1), Delta (B.1.617.2), and Omicron (B.1.1.529), there were concerns about strain-specific transmission, virulence, as well as loss of the effectiveness of COVID-19 diagnostics, therapeutics, and vaccines.

Data from the global initiative on sharing all influenza data (GISAID) and NextStrain provided continuously updated reports on SARS-CoV-2 lineage classification. This data illustrated that a low mutation rate does not necessarily translate to a limited capacity for variation in SARS-CoV-2.

Some studies have described sequence variation within infected individuals as intra-host single nucleotide variants (iSNVs). Although these studies have not revealed details of individual infection diversity and transmission outcomes, they have acknowledged that variations exist within a single infection.

About the study

In the present study, researchers evaluated SARS-CoV-2 WGS data from the patients and staff members of the VA Northeast Ohio healthcare system in the United States who contracted COVID-19 from multiple perspectives.

There were 254 samples from 179 patients and 75 healthcare personnel with COVID-19, with a wide demographic range. The team performed high-resolution contact tracing of SARS-CoV-2 infections to gain insight into viral transmission dynamics between donor and recipient individuals.

Further, they assessed the extent of sequence variability within SARS-CoV-2 infections across different surges between June 2020 and October 2021. To assess genetic diversity introduced from outside the local region, the team analyzed an additional 110 samples from the sequence read archive (SRA).

They evaluated each sample by reverse transcription-polymerase chain reaction (RT-PCR) to estimate the infection levels to enable a high-resolution evaluation of individual infection complexities. Finally, they aligned SARS-CoV-2 consensus sequences to observe iSNVs.

Study findings

Around 140 sequences met technical thresholds, i.e., 80% of the genome covered at 100X read-depth, and had an average cycle threshold (CT) value of 27.96. Subsequently, identified sequences had 279,408 uniquely mapped reads per sample, with an average coverage of 1,081X read-depth.

The detection of SNPs was not influenced by CT if it reached a minimum threshold for coverage. Therefore, regardless of the CT value, the study samples showed a comparable number of SNPs defined by alternate allele frequency (AAF) ≥ 5% or AAF ≥ 50%.

There was a significant increase in the number of SNPs detected over time, with the average covering approximately doubling (30 to 65 with AAF ≥5%) or tripling (10 to 35 with AAF ≥50%) from April 2020 to August 2021.

The alignment of the 140 SARS-CoV-2 consensus sequences revealed 1,096 iSNV positions, of which 406 SNPs were in multiple infection samples. The authors observed that variant positions across the SARS-CoV-2 genome with AAF >5% had multiple biallelic sequence polymorphisms, suggesting heterogeneity of SARS-CoV-2 infections.


The authors observed most of the alternate vs. reference allele frequency proportions (AAF: RAF) mixtures in donor and recipient sequences were in nearly identical ratios, suggesting that the majority of SARS-CoV-2 strain diversity was introduced during transmission resulting in subsequent infection complexity. Intriguingly, the authors did not observe more iSNVs in immunocompromised patients.

Furthermore, the authors accurately identified two individuals who harbored B.1.1.7/B.1.617.2 co-infections in mid-May 2021. This finding suggested that minority alleles in earlier COVID-19 surges most likely contributed to the evolution of new SARS-CoV-2 VOCs.


Every SARS-CoV-2-infected person carries 1 to 100 billion virions during peak infection. The efficient replication and transmission of SARS-CoV-2 increasingly favor the dispersal of mutations globally. As levels of immunity stimulated by infection and vaccination fluctuate in humans, SARS-CoV-2 will encounter substantial heterogeneity in natural and acquired host defense mechanisms. Inequity in the access and uptake of vaccines will further accentuate deleterious and advantageous mutations in SARS-CoV-2.

Together, these factors necessitate the study of the genetic complexity of SARS-CoV-2 infections. This approach could help better understand SARS-CoV-2 infection and transmission dynamics involving immunocompetent and immunocompromised patients, assess the efficacy of COVID-19 drugs, and resistance and vaccine escape mechanisms of SARS-CoV-2. Furthermore, it could help address the problems due to the under-reporting of infection complexity represented in data repositories.

*Important notice

bioRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.


Leave a Reply

Your email address will not be published. Required fields are marked *