From viral democratic genomes to viral wild bunch of quasispecies

The tremendous majority of RNA genomes from pathogenic viruses analyzed and deposited in databases are consensus or “democratic” genomes. They represent the genomes most frequently found in the clinical samples of patients but do not account for the huge genetic diversity of coexisting genomes, which is better described as quasispecies. A viral quasispecies is defined as the dynamic distribution of nonidentical but closely related mutants, variants, recombinant, or reassortant viral genomes. Viral quasispecies have collective behavior and dynamics and are the subject of internal interactions that comprise interference, complementation, or cooperation. In the setting of SARS‐CoV‐2 infection, intrahost SARS‐CoV‐2 genetic diversity was recently notably reported for immunocompromised, chronically infected patients, for patients treated with monoclonal antibodies targeting the viral spike protein, and for different body compartments of a single patient. A question that deserves attention is whether such diversity is generated postinfection from a clonal genome in response to selection pressure or is already present at the time of infection as a quasispecies. In the present review, we summarize the data supporting that hosts are infected by a “wild bunch” of viruses rather than by multiple virions sharing the same genome. Each virion in the “wild bunch” may have different virulence and tissue tropisms. As the number of viruses replicated during host infections is huge, a viral quasispecies at any time of infection is wide and is also influenced by host‐specific selection pressure after infection, which accounts for the difficulty in deciphering and predicting the appearance of more fit variants and the evolution of epidemics of novel RNA viruses.


| VIRUS SPECIES DEFINITION
Viruses are one of the most difficult biological elements to define, and the creation of names to define them by analogy with previously defined elements very quickly shows its limits.Thus, the notion of a virus as a living or nonliving element has given rise to debates that only make sense semantically, as the very definition of a virus cannot be exhaustive and ignores either giant viruses or "selfish" genes without any real boundaries being established between each of these replicative life forms. 1Under these conditions, the definition of viral species is an almost impossible challenge.Indeed, the definition of species, as first derived from eukaryote species, theoretically assumes a monophyly that is impossible to truly assert for viruses because of their ease of recombination. 2 For example, we have recently described a recombinant between the vaccine poliovirus and an enterovirus C in Africa 3 and recombinants between different SARS-CoV-2 variants, 4 and it is not clear how they can be classified considering a monophyletic approach.It should be noted that monophyly should be no longer conceivable for other living entities that are likely to harbor genetic sequences from other kingdoms of life. 5The massive genetic work of the 21st century has shown that all living entities are in fact mosaics and that no living entity is monophyletic [6][7][8] ; the idea of a tree of life with monophyletic entities at its extremities and with a unique ancestor (LUCA, last universal common ancestor) has been definitively ruled out of scientific reality.
Furthermore, thinking about viruses has been biased from the beginning of history.Early work on the toxicity of a plant extract (in this case tobacco) led investigators to assume that its sickness was not due to an organism but to a toxin. 9,10Later, the toxic product was identified as tobacco mosaic virus containing RNA and proteins, while infectivity by viral RNA was reported. 11,12However, crystallization of the tobacco mosaic virus by WM Stanley (Nobel Prize in Chemistry in 1932) was nonetheless the basis for the long-standing view that viruses are inanimate, nonliving objects. 13Moreover, other authors, such as FM Burnet (Nobel Prize in Medicine in 1960), who compared viruses to intracellular bacteria, considered that viruses are merely cellular organisms that had lost their reproductive apparatus; in particular, what were subsequently identified as ribosomes (Claude A, de Duve C, Palade GE, Nobel Prize in Physiology and Medicine in 1974). 14Thus, from the onset, there was an assumption of a molecule originating from cellular living entities and of reductive evolution of previous cellular entities.In 1957, A. Lwoff proposed a concept of virus that largely opposed viruses to cellular organisms by defining viruses based on negative criteria. 15Regarding viral species, they were later defined in part on the basis of the existence of isolates representative of those species. 16A viral isolate could be obtained from any culture medium, including animals, plants, and cells, and the perception of these isolates was that they represented a defined object with regular mutations that eventually allowed for adaptation by natural selection.An isolate was considered clonal by definition, with a single nucleic acid sequence.However, the variability of these viruses appeared to be massive through mutations but also through loss of genes identified by the study of herpesviruses 17 or poxviruses 18,19 as well as by that of the giant mimivirus in successive culture passages on its host (Acanthamoeba spp.). 20Mimivirus could indeed lose 16% of its genome in allopatric culture, particularly the genes not initially transcribed. 21This experiment, imitating Pasteur's experiments on fowl cholera with Pasteurella multocida, 22 showed that the adaptation mechanism could be linked to gene loss and not to point mutations or "slow changes in progress," as postulated by Darwin. 23The first sequences of viral genomes (the very first was that of RNA bacteriophage MS2 24 ) were obtained by manual, time-consuming, and tedious techniques that were thereafter replaced by the Sanger sequencing method, 25 which made it possible to obtain entire virus genomes relatively easily.
However, from the outset, the perception that a virus had a defined genome was an essential aspect found in virtually all descriptions of viruses until the recent past.The evolution and sophistication of sequencing methods quickly showed that this was not the case and that the frequency of mutations had been grossly underestimated by creating genome definitions based on the most frequent form of genomes observed after sequencing, what has been called the "consensus" genome 26 and what we call "democratic" genomes.These genomes represent the majority of genomes but by no means their diversity, and thus representing an isolate by a single sequence is a basic scientific error.In practice, viruses remain very difficult to define.Even today, viral species are not precisely definable unless we create an artifact that we know will have difficulty standing the test of time given the amplification of virus analyses and the tremendous increase in sequencing data.8][29] Moreover, our current ability to understand the importance of recombination between viruses, and between viruses and their hosts, leads to a much more complex view of the nature of viruses.Given this complexity, the discovery and description of quasispecies obtained by deep sequencing methods is leading to a revolution in thinking about the evolution of viruses.

| THE CONCEPT OF VIRAL QUASISPECIES AND THEIR GENERATION, EMERGENCE, AND EVOLUTION
1][32][33][34][35][36] During the 1970s, the nonhomogeneity of RNA viral populations was highlighted by studies of site-specific mutations conducted on Qbeta RNA bacteriophage. 37,38This led to the conclusion that this phage was not a "defined unique structure" but a "weighted average of a large number of individual sequences."This was linked to the high mutation rate of the phage estimated at 10 −4 mutations per nucleotide copied and involved a tolerance of each individual genome to an indeterminate proportion of mutations. 37e genetic diversity of RNA viruses was also deduced from the variety of mutant phenotypes observed during the course of persistent infections in vitro with animal viruses such as vesicular stomatitis virus 36 and foot-and-mouth disease virus. 391][42][43] Nevertheless, the essentiality of sequences produced, including those produced by nextgeneration sequencing during the 21st century, have been stored and analyzed in the form of a consensus/"democratic" sequence.This latter sequence is the succession at each of its nucleotide positions of the majority nucleotide among those detected by sequencing in the sample.Two or more coexisting nucleotides can be possibly indicated by an extended International Union of Pure and Applied Chemistry (IUPAC) nomenclature code, but a single nucleotide (the one majoritary) is mentioned in a very large majority of cases, erasing from the consensus/"democratic" sequence any information about other components of the quasispecies.Hence, such a consensus/"democratic" sequence is a simple but inadequate approach to describe and consider the genetic diversity of viruses in a population of hosts, in a single host, or even in a single host sample.
The concept of quasispecies enables far more comprehensive representation of the diversity of sequences than do consensus/ "democratic" sequences.The quasispecies theory was developed during the 1970s by Manfred Eigen, a biophysical chemist, and Peter Schuster, a theoretical chemist, to explain the evolution and self-organization of primitive replicons. 44,45This theory was built independently from laboratory experiments conducted on the genetic diversity of RNA viruses, but it was noticed that it fitted quite well with the results of these latter experiments.The theory derives from the species concept and from the molecular theory of evolution.A quasispecies was defined as the complex mutant spectrum (or swarm or cloud) produced in replication systems through regular generation of error copies, and it was first named "comet tail."It echoes at the genetic level the paradigm that populations are usually not composed of fully identical individuals but of "types" distinct between each other by some criteria. 46us, amplification by copying from a single ("wild-type") clone generates mutants, and the mutations produced delineate a sequence space that expands depending on the mutation rate and the selection of mutants, and that reaches a stable steady-state mutant spectrum where for each quasispecies component, the contributions of both mutation and selection balance each other.
The quasispecies concept is linked to an "error threshold" rate above which genetic information can no longer be propagated, which corresponds to extinction.There are many alternative terms to quasispecies, including intrahost variation, intrahost diversity, mixture of mutants, nucleotide degeneracy, or heterospecies.
A viral quasispecies is a dynamic distribution of nonidentical but closely related mutants or recombinant or reassortant viral genomes, as viral quasispecies are also known to be the consequence of recombinations and reassortments. 47,48A viral quasispecies undergoes continuous genetic variations and is the subject of random drift, competition, and selection and responds to various selective constraints.[51][52][53][54] Spectra of viral mutants, recombinants, and reassortants, rather than individual genomes, are the target of evolutionary events.
They are the source of viral adaptability, as they comprise continuously changing dynamic reservoirs of genotypic and phenotypic viral variants with given fitness landscapes. 55Such intrahost genomic variability leads to antigenic variability and phenotypic specificities.6][57][58] Viral quasispecies may also undergo a drastic reduction in population size due to isolation of one or several individuals from the mutant spectrum through bottlenecks during inter-as well as intra-host transmission. 59Moreover, viral quasispecies have collective behavior and are the subject of internal interactions that comprise interference, complementation, or cooperation.For instance, a low-fitness mutant can suppress a high-fitness mutant, 60 or an attenuated virus can suppress a virulent virus. 61e viral quasispecies concept encompasses an error threshold that delineates the maximal mutation rate of the dominant sequence at which the quasispecies can be metastable, and above which a drift of the sequence population occurs in the sequence space. 26,48Considering the above principles, each viral quasispecies is unique in its composition and differs from those that have preexisted and from those that will exist, whether in the same host or in different hosts and whether in an identical context or in different contexts. 62It continuously evolves even if the consensus sequence remains the same.
It is worth noting that for a given and unique sequence, diversity can be represented in the form of a profile (i.e., Profile hidden Markov models [HMM]). 63Profiles are constructed by converting multiple sequence alignments into position-specific scoring systems (PSSM).
Profile HMMs are probabilistic models including evolutionary changes in a set of homologous sequences.This principle may be applied during genome assembly by integrating the diversity of reads as genomic profiles.Deep sequencing further increases the need to integrate the sequence variability of reads.Ideally, in the future, the complete genomic sequence will be described in the form of a global genomic profile that would take into account the diversity of each position in a genome.

| THE HIV PARADIGM: FROM SANGER TO NEXT-GENERATION SEQUENCING
Sequencing of HIV-1 and HIV-2 RNA and DNA has been a major tool to study and manage HIV infection.To date, more than 1.2 million sequences, including approximately 15 089 genomes, have been deposited in GenBank (https://www.ncbi.nlm.nih.gov/genbank/), and HIV genotypic resistance testing is an unavoidable tool for managing HIV treatment, 64 with several algorithms available worldwide to interpret the mutation patterns of genes encoding targets of antiretroviral therapies (e.g., https://hivdb.stanford.edu/,https:// rega.kuleuven.be/cev/avd/publications/european_guidelines_pagehttps://www.iasusa.org/resources/hiv-drug-resistance-mutations/,https://hivfrenchresistance.org/).Sequencing of HIV genes developed in the 1990s with the advent of molecular biology techniques applied to diagnosis in clinical virology, particularly that of plasma viral load. 65Use of Sanger sequencing for genotypic evaluation of resistance to antiretrovirals appeared in the mid-1990s. 66,67Nonetheless, drug resistance genotyping was not immediately recognized as a tool of interest for the management of patients.But sequencing developed alongside phenotypic tests and then replaced them. 68,69ove all, the concept of quasispecies was not initially considered in reading and interpretation of sequencing results.
In fact, HIV-1 genetic diversity in infected patients is tremendous.1][72] Moreover, HIV-1 quasispecies are compartmentalized, for instance, in peripheral blood mononuclear cells, lymph nodes, the genital sphere, or the central nervous system. 73Interindividual transmission of minority drug-resistant HIV-1 variants was evidenced at the time of primary infection by detection of these minority variants in transmitters or by their decreased prevalence after several weeks in seroconvertors in the absence of antiviral therapy, as explained by their lower fitness than wild-type HIV-1. 74,75By sequencing HIV-1 genes, the emergence of mutations associated with HIV-1 resistance to antiretrovirals can be detected early not only in "democratic" sequences but also in minority components of the quasispecies, the detection of which can predict their expansion and clinical significance. 76,77This notion that minority HIV-1 variants preexist echoes in virology the existence of preadaptive mutations evidenced during the 1950s with the observation of the clonal occurrence of mutants in bacteria based on replica plating. 78Sanger sequencing allows for visualization of some minority quasispecies when reading an electropherogram (Figure 1A). 76Nonetheless, Sanger technology only enables the detection of genomes representing at least 10%-20% of the overall virus population of the sample. 80With the development of nextgeneration sequencing techniques, detection of mutations representing less than 10% of quasispecies is now possible, with 5% as the prevalence threshold conventionally used to consider these mutations as significantly involved in clinical resistance to antiretrovirals. 81map of next-generation sequencing reads generated from the blood of an HIV-1-infected patient (Figure 1B,C) clearly shows that

quasispecies composition by Sanger (A) and next-generation sequencing (B, C). (A)
The electrophoregram is a screenshot taken from the computer of an ABI 3500 instrument (Thermo Fisher Scientific).Arrows mark double peaks that indicate mixed populations of sequences.(B, C) Screenshot of mapping reads generated by Illumina technology using a MiSeq instrument (Illumina) of an HIV-1infected blood sample, as visualized with the Integrative Genomics Viewer tool. 79In (B), sequence reads are shown for the same genome coordinates than the electrophoregram.Mixed viral populations can only be seen at two positions by the electrophoregram, whereas additional mutations are evidenced by Illumina sequencing reads.In (B, C), each gray rectangle is a read.Colored horizontal lines in reads indicate mutations relatively to the "democratic" sequence for the sample.Only a very small fraction of sequence reads can be shown considering the huge sequencing depth.In (C), the region near 2600 bases corresponds to an area of the genome with a low depth of next-generation sequencing reads.As a matter of fact, some reads exist which could have been mapped to this region, but they do not appear on this limited-in-size screenshot of the mapping visualization software.See box 1.

BOX 1. Potential technical issues in deciphering the composition of viral quasispecies
• Exhaustiveness and accuracy in the characterization of the composition of viral quasispecies: The full composition of the viral quasispecies is not comprehensively known, particularly for mutants below the threshold of 1%.With current technologies and tools, detection of every single molecule composing the quasispecies is not possible.Indeed, the sensitivity threshold of nextgeneration sequencing, commonly known as "depth bias," 142 does not allow for accurate exhaustive detection.The estimated sensitivity of Illumina next-generation sequencing to detect minor viral variants in a mixture of standards was reported to be 97.5% for a minor variant present at 1%. 143 In another work, the lowest threshold of detection of minor variants was estimated at 0.5%-1% with Illumina next-generation sequencing.In addition, the rate of sequencing errors was estimated to range between 0.5% and 0.1% (medians) with Illumina technology (on MiSeq and Novaseq instruments, respectively) 144 and reaches approximately 10% or more with the Nanopore Technology. 145Such introduction of artifactual mutations hence limits the detection accuracy of highly minority genuine mutations.A greater coverage of the sequences with reads (so-called sequencing depth) increases both accuracy and sensitivity, but sequencing accuracy decreases as the threshold of detection of variants increases due to biases affecting cDNA synthesis, PCR amplification, library preparation, and sequencing itself.In summary, the diversity of the wild bunch can vary considerably according to the tool used to measure it.
• Obtaining of long sequences by next-generation sequencing: The availability of the longest NGS reads possible, and beyond of whole virus genome sequence is a valuable goal for the characterization of the composition of viral quasispecies.Indeed, a major goal is to produce reads that each are one of the quasispecies component as a single molecule.Alternatively, obtaining only reads shorter than the whole genome length can allow detecting mutations as being part of the viral quasispecies but leaves unresolved whether these mutations are part from a same viral genome or are present in different combinations on different genomes.For instance, currently, Illumina NGS technology only allows generating short reads, in the order of magnitude of a few hundreds bases in length.
In contrast, Oxford Nanopore and PacBio technologies allow generating much longer reads that can cover whole viral genomes.7][148][149][150] Nevertheless, until recently, the error rate of these latter NGS technologies was high, around 10%, 145 which hampered a very accurate characterization of the quasispecies components.Improvements of these error rates will improve significantly quasispecies studies.
F I G U R E 2 Screenshot of mapping of next-generation sequencing reads from a SARS-CoV-2-infected respiratory sample.Reads were generated by Illumina technology using a NovaSeq instrument (Illumina).Mapping was visualized with the Integrative Genomics Viewer tool. 79ach gray rectangle is a read.Colored horizontal lines in reads indicate mutations relatively to the "democratic" sequence for the sample.Only a very small fraction of sequence reads can be shown considering the huge sequencing depth.See box 1.
reads are not devoid of mutation(s) compared to the "democratic" genome and that reads are not identical to their mates.In fact, there are non identical sequences, and the "democratic" genome is thus virtual.

| REDEFINING VIRUS ISOLATES AS A WILD BUNCH
Our perception of viruses was first that of biomolecules that can be reproduced in a host 13 ; eventually, the notion of quasispecies acknowledged the considerable variation of viruses, particularly RNA viruses, which can be thought to play a considerable role in evolutionary strategies. 48Until recently, quasispecies seemed to consist of multiple sequences, the most efficient (fit) of which, as represented by the "democratic" genome, can be transmitted and is the only one subject to selective pressures.Another approach is now more reasonable.In fact, it seems that in at least a few cases, it is the whole "bunch" of molecules that is transmitted (Figures 1C and 2).Indeed, the phenomenon of multiple invasive sequences has been well developed for measles virus. 82,83This virus has two tropisms: one for lymphocytes and lymph nodes and one for epithelial cells.The most efficient viral form in lymphocytes differs from that in epithelial cells.Several hypotheses have been proposed on this subject.The first is that viral mutants appear during infection, allowing the virus to replicate in one of its targets and to produce a second form from the quasispecies that can reach the second target.Currently, the common hypothesis is that hosts are infected by a whole "wild bunch," which is distributed according to its tropism in either lymphocytes or epithelial cells.This phenomenon has apparently been confirmed in many situations.In the case of infection of mice with poliovirus, for example, different viral sequences are found in different organs. 84The same phenomenon has been reported in SARS-CoV-2 infections, including showing in some cases dominant mutations in certain samples that had not yet been found in the "democratic" genomes identified later in epidemic infections. 85This probably means that the conditions for selection of different mutants in different organs differ and allow for different "democratic" genomes to emerge.
In this sense, it is plausible that we can no longer consider virus isolates or strains as being linked to a single sequence but rather as a bunch of sequences.It is possible in these circumstances that the differences in mortality in humans and experimental animals are related to the composition of the bunch, just as it is possible that the role of inoculum size in infection is also which will take some time.Indeed, a number of philosophical debates on viruses and even the definition of viruses themselves are still at a prescientific stage, which will make this conceptual approach difficult.

| THE CASE FOR COVID-19, INCLUDING SARS-COV-2
In December 2019, a novel coronavirus, named severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2), emerged in Wuhan, Hubei Province, China 86 and later caused a pandemic. 87ylogenetic analyses based on its complete genome revealed that it belongs to the genus Betacoronavirus and subgenus Sarbecovirus, which also encompasses SARS-CoV and Middle East respiratory syndrome (MERS)-CoV, two other coronaviruses causing severe respiratory syndromes in humans. 88In addition, the SARS-CoV-2 genome showed high genetic similarity (approximately 96%) with that of the bat coronavirus RaTG13. 86RS-CoV-2 is a linear, nonsegmented, single-stranded, positive-sense RNA virus.Its genome comprises 29 903 nucleotides with 12 open reading frames (ORFs). 88As in the case of other RNA viruses, SARS-CoV-2 RNA replication through viral RNA-dependent RNA polymerase (RdRp) is associated with mutations due to the low fidelity of this enzyme.One particularity of coronaviruses is that their genome encodes a 3′−5′ exonuclease (NSP14 in SARS-CoV-2) with corrective activity.

SARS-
CoV-2 mutations may also have other sources and notably occur through the action of apolipoprotein B mRNA editing enzyme, catalytic polypeptide (APOBEC), or adenosine deaminases acting on RNA (ADAR) cellular RNA-editing enzymes, which are host antiviral defense mechanisms that can restrict viral infection 93,94 but also increase the frequency of mutations.Finally, recombinations between SARS-CoV-2 genomes can generate intrahost diversity, and multiple SARS-CoV-2 recombinants have been reported, mostly during the most recent period of the pandemic since 2022. 4,95,96Therefore, mutated SARS-CoV-2 genomes are generated at a substantial rate, leading to a soup of nonidentical genomes in which each single genome can differ from another by one or several nucleotides and thus deviates from the "democratic" genome sequence.

| Quasispecies for non-SARS-CoV-2 coronaviruses
Viral quasispecies have already been reported for coronaviruses other than SARS-CoV-2.In nonhuman animals, the existence of intraindividual quasispecies was reported in 1995 for the mouse hepatitis virus (MHV). 97Adami et al. detected an increase in genetic diversity in persistently infected mice, with apparent abnormal accumulation of mutations in the spike gene.Such existence of quasispecies in mice chronically infected with MHV in the spike domain was confirmed in two other studies. 98,99Interestingly, in one of these studies, mice with the most severe chronic paralysis were those that harbored the most complex quasispecies. 99In 1999, the presence of quasispecies for the feline coronavirus was reported. 100study on the bovine coronavirus (BCoV) also observed the presence of spike mutations, with 104 nucleotide changes associated with several intrahost clones within a single sample. 101Another study involving nasopharyngeal samples of pigs with documented HKU15 coronavirus infection detected six quasispecies in a single sample. 102garding human coronaviruses other than SARS-CoV-2,

Viehweger et al. reported in 2019 the presence of diverse subgenomic mRNAs for human coronavirus 229E (HcoV-229E)
through direct RNA and long read sequencing. 103Additionally, Gorse et al. detected quasispecies for HCoV-OC43 but with a lower intrasample frequency than for hepatitis C virus. 104Regarding MERS-CoV, Lu et al. reported in 2017 the presence of different spike domain S1 variants (wild type or with a 530 nucleotide-long deletion) in the same human sample. 105

| Quasispecies for SARS-CoV-2
SARS-CoV-2 was reported in several studies to exhibit genetic intrasample and intrahost diversity in clinical samples and in virus culture.The mutational patterns of the minority quasispecies were heterogeneous according to the studies.One issue is the extent to which the wild bunch is clinically significant, that is, in what proportion minority SARS-CoV-2 variants can be infectious and transmitted between two individuals.It has been estimated that 10 9 to 10 11 viruses are carried by an infected person during peak infection. 106Based on a mutation rate of 10 −3 substitutions per nucleotide per year or 3 × 10 −6 mutations per nucleotide per replicative cycle and 3-7 replicative cycles per infection, 10 5 −10 8 mutations occur per host.Thus, people are infected by a wild bunch and absolutely not by a single, clonal type of molecule.
Studies that reported intrahost SARS-CoV-2 genetic diversity have notably provided data for immunocompromised, chronically infected patients, patients treated with monoclonal antibodies targeting viral spike, and different body compartments from the same patients (Supporting Information: Table S1).Choi et al.  112 They reported that intrahost diversity was higher in the persistent than nonpersistent patients, with minority N501Y and P681H mutations in the patients of both groups.Caccuri et al. described the case of a 59year-old male patient who was immunocompromised due to follicular lymphoma and chronically infected with SARS-CoV-2 for 222 days. 113They reported progressive emergence of the majority quasispecies of viruses with critical spike mutations Q493K and N501T, but that the prevalence of quasispecies with deletion of spike amino acid 144 fluctuated over time.In another work, Sonnleitner et al. studied the mutational dynamics of SARS-CoV-2 in serial passages in vitro. 114They used 10 ancestral SARS-CoV-2 strains for consecutive serial incubation in 10 passages in Vero B4 cells, observing that the spike gene was a hot spot for mutations, including substitutions at codons 484 and 501, which are backbone mutations of several variants of concern (https://covariants.org/). 115e therapeutic use of monoclonal antibodies was reported to have a significant impact on the composition of SARS-CoV-2 quasispecies by promoting the emergence of quasispecies that are less susceptible to this therapy.Peiffer-Smadja et al. and Jensen et al.
detected the emergence of spike E484K-harboring SARS-CoV-2 in 10 of 12 patients treated with bamlanivimab using next-generation sequencing. 116,117Lohr et al. reported a case of emergence of spike E484K-harboring SARS-CoV-2 of the Alpha variant after bamlanivimab therapy in a patient with acute myeloid leukemia. 118 5patients). 120These patients included 11 solid organ transplant recipients and 6 patients who were immunocompromised due to other causes.SARS-CoV-2 with the spike protein Q493R mutation was detected in two cases on Day 7 post-administration of bamlanivimab/etesevimab and in one case on Day 14.In a fourth case, the Q493K spike mutation was detected on Day 7, and in a fifth case, the E484K spike mutation was detected on Day 21.These findings were associated in three cases with the absence of viral load decrease or with viral rebound.In contrast, in another work, in 36 patients who were administered casirivimab/imdevimab, no spike mutations were detected in the SARS-CoV-2 quasispecies associated with reduced monoclonal antibody activity. 121 addition, compartmentalization of SARS-CoV-2 quasispecies has been reported in a few studies.Jary et al. performed nextgeneration sequencing on PCR amplification products for genes encoding the spike, envelope, membrane, and nucleocapsid proteins in a SARS-CoV-2-infected patient, and minority viral populations represented up to 1% of the whole population during the course of infection. 122Quasispecies differed in sequential samples and between lower and upper respiratory tract samples collected on the same day, suggesting independent sites of SARS-CoV-2 replication.By studying samples of extrapulmonary organs, including the heart, kidney, liver, and spleen, Van Cleemput et al. described organspecific intrasample diversity in 13 postmortem cases involving immunocompromised patients. 85Strikingly, they identified the spike mutation of concern N501Y, which mostly emerged in the Alpha, Beta, and Gamma variants worldwide during late 2020 and early 2021, though the 13 patients had been sampled in March 2020.In addition, the spike substitution T1027I present in the Gamma variant and Y453F, initially described in mink, 123 were detected as minority quasispecies in several organs.Hartard et al. performed a postmortem study of multiple specimens collected from a patient who revealed heterogeneous distribution of viral quasispecies according to the tissue examined, including V341A, E654A, and H655R substitutions in spike. 124Additionally, Gaiarsa et al. described quasispecies based on the S gene in 109 clinical samples collected from 77 patients, including from the upper respiratory tract and lower respiratory tract. 125They identified a greater incidence among minority quasispecies of nonsynonymous mutations and indels in the lower respiratory tract, which they asserted could be the result of an ability of the virus to invade cells without interacting with ACE2.In addition, they observed that intrasample diversity was more prevalent in the gene region encoding the spike cleavage site, particularly in samples from the upper respiratory tract.Also, an association of five mutations with the upper respiratory tract (U2055G, U2058G, U2060G, U2100G, and A3483S) and four mutations with the lower respiratory tract (C212A, A3005U, C3485U/G, and A3596G/U) was found.Finally, Normandin et al.
performed next-generation sequencing on 120 specimens using formalin-fixed, paraffin-embedded tissues from six patients who experienced fatal COVID-19. 126They reported that a majority of the 180 SARS-CoV-2 mutational patterns identified in these patients were unique to given tissue samples, indicating compartmentalized infections in extrapulmonary sites.The spike protein amino acid substitution Q675H, present as a minority mutation in heart tissue, was not circulating at the time the patient died but was later detected in the consensus genomes from several SARS-CoV-2 lineages.In addition, a prevalence of minority patterns greater than 10% was more frequently detected in patients who experienced a long duration of COVID-19 before death.

| Interactions between SARS-CoV-2 and host cells and the quasispecies paradigm
The plasma membrane of host cells acts as a machine for concentrating viruses.Two mechanisms are at work: (i) a reduction in dimensionality due to the transition from a three-dimensional space containing the viruses to the two-dimensional space of the membrane, which favors the concentration of the viruses on the surface of target cells 127 ; and (ii) the presence on the plasma membrane of lipid rafts (microdomains enriched in gangliosides and cholesterol), which are the preferred sites for the landing of most viruses on host cells. 128In fact, regardless of the target tissue or cell, rafts display a negative electrostatic field that attracts viral particles, the surfaces of which are studded with electropositive zones. 129ruses from the "wild bunch" of an inoculum, which have already undergone selection in the previous host, necessarily fulfill these conditions and will therefore be globally attracted by target cells.It is only after this initial interaction with lipid rafts that more specific interactions involving protein receptors expressed by the target cells can occur.It is nevertheless striking to note that these latter interactions are based on recognition by the virus not of a single receptor but of several.For instance, in the case of SARS-CoV-2, several membrane receptors have been identified: ACE-2 (in the respiratory tract, lung, brain, gastrointestinal tract, blood vessels, heart, kidneys, testis); bêta1 and bêta 3 integrins (on pulmonary epithelial cells); GRP78 (thyroid gland, olfactory cells, lung macrophages); CD26 (CD4+ T cells, myeloid cells, myocardial cells, blood capillaries); AXL (pulmonary epithelium); CD147 (neural cells, lymph nodes, platelets, red blood cells, myeloid cells, skin); NRP-1 (olfactory epithelium, bone marrow-derived macrophages); and CD209L/ CD209 lectins (human endothelium, renal tissue, megakaryocytes, platelets). 129This extreme diversity of protein receptors and coreceptors provides a broad range of possibilities for the "wild bunch," ensuring rapid dissemination of viruses in the host and increasing virus genetic diversity as a consequence of adaptation to the receptor. 130From a molecular point of view, it is also of interest to note that the region of the SARS-CoV-2 spike protein that faces the host cell membrane is organized into two separate domains: the N-terminal domain (NTD), which mediates attachment of the virus to negatively charged heparan sulfates and lipid raft gangliosides; and the receptor binding domain, which is initially masked at the center of the trimeric viral spike and becomes accessible to ACE-2 and alternative receptors after conformational reorganization of the spike trimer. 131These geometrical features clearly optimize dissemination of the virus throughout the host body by allowing attachment of the virus to a wide range of cells and then the likelihood of finding a compatible receptor on different cell types.Overall, the molecular biology of host-virus interactions is therefore essentially compatible with the "wild bunch" concept with the attraction of viruses in an inoculum to lipid rafts according to the double facilitation mechanism (reduction of dimensionality/electrostatic attraction) and the biochemical diversity of viral protein receptors.In the case of SARS-CoV-2, the wide diversity of symptoms in various organs occurring during the acute phase of the infection is also consistent with the "wild bunch" concept.

| QUASISPECIES SELECTION
In RNA viruses in particular, the frequency and importance of quasispecies leads, as described by Domingo and Perales, to adaptive pluripotency, 48 without discussing the probably quite important role in the evolution of recombinant RNA viruses and evidenced by enteroviruses and SARS-CoV-2. 3,4The large number of quasispecies is probably at the origin of the rebounds in virulence or transmissibility that have been postulated not to occur according to the Muller-Ratchet principle. 132In this theory, the lack of recombination can lead to a gradual decrease in fitness and extinction.It is likely that this phenomenon is at the origin of the usual epidemic curves of acutely infecting RNA viruses, which exhibit bell-shaped curves, appearing after a favorable mutation and gradually decreasing due to accumulation of neutral or unfavorable mutations. 133We have evaluated for SARS-CoV-2 the number of acceptable unfavorable mutations, which is on the order of 6-7, leading to extinction of the "democratic" genome clone if no new favorable mutations appear. 133ese favorable mutations exist in quasispecies long before they are found in epidemic "democratic" genomes, indicating the existence of a large phenotypic reservoir in RNA viruses.The benefits of conserving this large diversity can indeed lead to selection according to the circumstances.In the case of SARS-CoV-2, the rapid use of monoclonal antibodies has favored the emergence of minority components of quasispecies that have become "democratic" genomes under the effect of their selection pressure.This phenomenon has been observed in new viruses with various antiviral treatments. 134Vaccines also have the ability to select mutations that change the majority target of the immune response. 43All these elements highlight that for antiviral therapeutic strategies, the choice of a narrow target entails the risk of emergence of an unknown variant existing only as a minority population in quasispecies.
A less commonly discussed fact is the opposite effect, whereby lower virulence allows for longer persistence.This has been referred to as the "flattest advantage." 48,135It appears that less aggressive mutants are less likely to decline rapidly in activity due to counterselection, which may be related to the host response or to the stronger effect of favoring sequences in the more active population.
Finally, as in some viral infections more than 10 12 viruses can be quantified, the quasispecies present at any one time of infection is absolutely paramount and lead to an understanding of why it is so difficult to predict evolution of epidemics of novel viruses, especially RNA viruses.

| FINDING BLACKLISTED MUTATIONS
Overall, the analysis of the mutations produced in circulating SARS-CoV-2 genomes shows a very large variety of mutations.In a recent work, we highlighted 22 225 nucleotide mutations observed in a library of 61 397 genomes produced in our institute and available in databases. 133These observed mutations cannot, by definition, be lethal because lethal mutations are not followed by reproduction.We have separated these types of mutations into "hyperfertile," weakly "fertile," neutral or weakly deleterious, or absent.Other authors have classified mutations as "whitelist" mutations or "blacklist" mutations, which are never found and assumed to be lethal. 136The analysis of lethal mutations is interesting; at the moment, such analysis can probably only be conducted in the real world through the study of quasispecies.Quasispecies analysis will reveal whether the distribution of lethal mutations is random (in the Court Jester model 133,137 ) or whether certain lethal mutations are more frequently represented than would be expected by chance, as has been observed for beneficial mutations.Some beneficial mutations are found as much more represented, especially in spike; conversely, some regions have very few beneficial mutations, especially regions containing genes essential for replication, such as that encoding the RNA polymerase. 133It will be interesting to evaluate the frequency of point mutations never expressed in transmitted genomes to identify particular areas where such mutations originate as a probable consequence of the host.It is clear, for example, that changes related to apolipoprotein-B (ApoB) mRNA editing enzyme catalytic polypeptide-like protein (APOBEC) activity 138 should be more common in this type of mutation, as APOBEC function is effectively to neutralize viral activity by replacing one base with another through deamination.0][141] It will be interesting to assess the nature of the changes observed and their frequency among unexpressed mutations to determine if other host-related mechanisms of virus neutralization exist.This might reflect an incomplete resistance mechanism that reduces the severity of the disease.Thus, specific study of the frequency of lethal unexpressed mutations and their percentage may be an indicator of the efficiency of an immune defense that is little or poorly explored in the context of acute viral infections.

| CONCLUSION
Current knowledge is evolving extremely rapidly due to the explosion of the number of available viral genomes.There was once definitely (apart from the major virological fields such as HIV or viral hepatitis) a delay in the number of sequences available.For example, if we compare the number of sequences available for Escherichia coli, Klebsiella pneumoniae, or Salmonella typhi and the number of sequences for influenza, or chickenpox viruses, we can only be astonished by the lack of data regarding viruses.
Currently, available data, particularly in the field of measles, 82 show that we must definitely move away from the concept of the virus as a single object, as a clone represented by a single sequence.The habit of representing sequences by a single molecule is hardly acceptable, given that with current tools, we know that in most cases there are quasispecies in both natural samples and viral cultures.Interestingly, work on measles viruses has shown that not only these quasispecies develop during infection but that infection itself can occur due to quasispecies with different organ affinities.Thus, partly as a result of the SARS-CoV-2 pandemic for which more than 15 million "democratic" genome sequences are now available and for which studies and attempts to understand the different epidemic waves caused by distinct variants as well as their host and even organ affinity have been paramount, we should turn a page in the understanding of viral infections and viruses.Viruses are not inert biomolecules with a rare mutation but organisms with many variations, with different mutants or variants causing infections.
In fact, we realize that almost all published virus genome sequences are approximate and do not represent a biological reality.
It is not possible to define a virus by a single sequence as the "democratic" sequence, and the actual experience of deep sequencing reveals huge diversity when several thousand sequences are generated from a given genomic region.This suggests that the genomes currently available are merely a representation of reality underlining the most common nucleotide at each site of the genomes but not representing an authentic reality.Figures 1C and 2 indicate that in the patient's blood or respiratory sample, many sequence reads generated by next-generation sequencing, despite being very short (approximately 50 nucleotides long), differ from the "democratic" sequence because they have mutations relative to the virtual sequence.Clearly, this means that even the notion of quasispecies is not precise enough.The existence of a mutation relative to a virtual sequence is compelling.However, only accumulation of mutations at the same genomic sites in several sequence reads can lead to the definition of quasispecies, which does not prevent the appearance of mutations at sites other than those that define the quasispecies.In practice, the amazing amounts of sequence data accumulated for HIV and more recently for SARS-CoV-2 are leading us to radically rethink viruses as a wild bunch of virus particles shuttling related but different genomic sequences; they are constantly innovating and producing epidemics and pathological episodes of a complexity that is difficult to analyze.This also calls into question in a very significant way the possibilities of creating a single genome representing a specific invariant pathotype or creating RNA vaccines with a unique sequence, as was shown in various studies. 151It is therefore becoming essential for databanks (as has started to occur) to not only make available the majority, or "democratic" genomes but also all quasispecies as supplementary data so that they can be analyzed.
related to the extent to which the sequences are dispersed throughout the bunch.This allows for adaptation to different organs, likely resulting in diseases of different natures in terms of virus localization to different organs.This evolution of thought about viruses was made possible by the evolution of tools, from the Chamberland filter to nextgeneration sequencing, passing by electron microscopy, which successively allowed for observation of the toxic effect and observation of the virus, obtaining the first genome by detecting only the majority sequence, discovery of quasispecies and highlighting multiple species as multiple genetic sequences during an infection by "the same virus."To be established, this latter, new view will require a paradigm shift in the sense of Thomas Kuhn (1962), described the case of a 45-year-old male patient immunocompromised due to severe antiphospholipid syndrome, with SARS-CoV-2 shedding for 151 days, during which the patient was notably administered remdesivir and casirivimab/imdevimab monoclonal antibodies.107During this period of prolonged infection, a total of 45 nucleotide substitutions, including 24 nonsynonymous mutations, and 34 deletions occurred.In the spike, 12 amino acid substitutions occurred, including the N501Y and/or E484K substitutions and the deletion Y144−.Kemp et al. reported SARS-CoV-2 genetic evolution and reduced sensitivity to neutralizing antibodies in an immunosuppressed male in his 70s who was diagnosed in 2012 with marginal Bcell lymphoma and who was chronically infected with SARS-CoV-2 and notably received convalescent plasma for this infection.108Whole-genome sequences obtained by next-generation sequencing from samples collected at 23 time points over 101 days showed substantial shifts in viral quasispecies, notably with emergence of a predominant virus harboring amino acid substitution D796H in the spike protein S2 subunit and a deletion of amino acids 69 and 70 of the S1 N-terminal domain.In vitro tests indicated that the spike mutation D796H appeared to be the main contributor to the reduced susceptibility to neutralizing antibodies and was also associated with reduced infectivity, which might be compensated for by spike deletion ΔH69/ΔV70.Interestingly, this predominant virus exhibiting an immune escape genotype decreased in prevalence as passively transferred antibodies waned.Laubscher et al. analyzed 12 clinical specimens collected from the upper or lower respiratory tracts of 12 oncological patients and reported intrapatient evolution of minority quasispecies mainly involving the ORF1ab gene. 109Ip et al. used Illumina next-generation sequencing and nanopore strategies with two respiratory specimens, sputum and saliva collected 2 days apart, from a 75-year-old patient with severe disease. 110They identified the spike gene nucleotide mutation G22017T, the prevalence of which increased from ≤5% in the first sample to ≥60% in the second sample.This mutation corresponds to amino acid substitution W152L within the N-terminal domain of the spike protein, which interacts with neutralizing antibodies.Siquiera et al. studied intrahost genetic diversity in nasopharyngeal and oropharyngeal samples from 57 SARS-CoV-2-positive cancer patients and 14 SARS-CoV-2-positive healthcare workers from the Brazilian National Cancer Institute who were sampled between April and May 2020. 111In addition to the majority of mutations, 85 other mutations were detected at frequencies varying between 1.4% and 19.7%.The cancer patients exhibited significantly higher intrahost viral genetic diversity than the healthcare workers.Dudouet et al. performed next-generation sequencing with Nanopore technology of SARS-CoV-2 genomes from 78 patients with nasopharyngeal virus persistence beyond 17 days and from 47 patients without such virus persistence.
Additionally, Guigon et al. detected the spike mutation Q493R in a 63-yearold patient with cutaneous T-cell lymphoma at Day 40 of bamlanivimab/etesevimab administration that was deemed compatible with reduced efficacy or lack of efficacy of the monoclonal antibodies. 119A study by Vellas et al. involved the use of the PacBio SMRT system to examine the intrahost diversity of the spike gene in 32 patients who received bamlanivimab alone (4 patients), bamlanivimab/etesevimab (23 patients), or cambirivimab/imdevimab ( The technical tools needed to analyze such databases, including quasispecies, should be developed in the coming years to decipher these data in this completely new era of virology.ORCID Philippe Colson http://orcid.org/0000-0001-6285-0308Didier Raoult http://orcid.org/0000-0002-0633-5974