The return of the “Mistigri” (virus adaptative gain by gene loss) through the SARS‐CoV‐2 XBB.1.5 chimera that predominated in 2023

Severe acute respiratory syndrome coronavirus 2 XBB.1.5 is the first recombinant lineage to predominate at the country and global scales. Very interestingly, like the Marseille‐4B subvariant (or B.1.160) and the pandemic variant B.1.1.7 (or Alpha) previously, it has its ORF8 gene inactivated by a stop codon. We aimed here to study the distribution of stop codons in ORF8 of XBB.1.5 and non‐XBB.1.5 genomes. We identified that a stop codon was present at 89 (74%) ORF8 codons in ≥1 of 15 222 404 genomes available in GISAID. The mean proportion of genomes with a stop codon per codon was 0.11% (range, 0%–7.8%). In addition, a stop codon was detected at 15 (12%) codons in at least 1000 genomes. These 15 codons are notably located on seven stem‐loop hairpin regions and in the signal peptide region for the case of the XBB.1.5 lineage (codon 8). Thus, it is very likely that stop codons in ORF8 gene contributed on at least three occasions and independently during the pandemic to the evolutionary success of a lineage that became transiently predominant. Such association of gene loss with evolutionary success, which suits the recently described Mistigri rule, is an important biological phenomenon very unknown in virology while largely described in cellular organisms.


| INTRODUCTION
Coronaviruses are like many other RNA viruses known to have an evolution partly linked to genetic recombinations, intra-or intergenomes. 1The occurrence of recombinations in Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) was suspected from 2020 on the basis of bioinformatics predictions, 2 and demonstrated from 2021 with the culture isolation of the first recombinant viruses, among which those in our center corresponding to mosaics of genomes from Pangolin lineages 3 B. 1.160 (Marseille-4)   and B.1.1.7 (or Alpha), then from B.1.617.2 (or Delta) and Omicron BA.1 lineages and from different Omicron sub-variants. 4The rate of detection of SARS-CoV-2 recombinants has accelerated during the era of Omicron variants, likely due to several factors, including the cocirculation of several subvariants with high incidence rates on periods of significant duration; to infection of immunocompromised and chronically-infected patients; and/or to the growing ability to bioinformatically detect combinations of haplotypes suggestive of recombinations due to the increase in the number of variant signature mutations and their increasingly abundant distribution throughout the genome. 1,4However, until very recently, despite the proliferation of SARS-CoV-2 hybrid genomes, no recombinant had become the predominant virus in any given period.This was achieved with the emergence of the XBB.1.5lineage (Figure 1A-C). 8 with the other predominant SARS-CoV-2 previously, the majority of studies and attention have thus focused on the spike protein of XBB.1.5.However, it is towards other proteins that attention should also be focused, in particular the one encoded by the ORF8 gene.Indeed, very interestingly, the XBB.1.5recombinant, like described formerly for the Marseille-4B subvariant and the pandemic variant B.1.1.7 (Figure 1A,B), [9][10][11] has its ORF8 gene inactivated by a stop codon.We reported in a recent study the Mistigri rule (see Box 1) to describe the emergence of the Marseille-4B lineage, a subvariant of the Pangolin B.1.160lineage (issued from a French mink farm) that we detected during the summer of 2020 and named Marseille-4 variant. 10The Mistigri rule corresponds to the phenomenon of the loss of a viral gene, and therefore of the protein it encodes, associated with an evolutionary gain.This echoes, in the viral world, the notion of non-virulence gene described several years ago in bacteria and linked to the "Use it or lose it" rule. 12,13We investigated here the distribution of stop codons in the ORF8 gene of SARS-CoV-2 genomes classified in the XBB.1.5lineage and in other lineages.

| MATERIALS AND METHODS
The presence of stop codons at each of the SARS-CoV-2 ORF8 gene codon was iteratively searched in SARS-CoV-2 genomes available in the GISAID database (https://gisaid.org/) 7through the Cov-Spectrum web application (https://cov-spectrum.org/). 9In addition, Cov-Spectrum was used to search for SARS-CoV-2 genomes available in the GISAID database that harbored a stop codons at any of the SARS-CoV-2 ORF8 gene codons, using the advanced search option.Phylogenetic analysis and visualization were performed through the Nextstrain tool (https://nextstrain.org/) 5  and recombinants in various countries in North America and Europe over the whole pandemic period was obtained from the CoVariants web application (https://covariants.org/). 15ructural and functional elements of the ORF8 gene and protein were collected from the UCSC genome browser web application (https://genome.ucsc.edu/cgi-bin/hgGateway) 16for genome no.
NC_045512.2 as well as from previous reports by Flower et al. 17 and Arduini et al. 18 The structure of the ORF8 protein was obtained by submitting file #A0A6V7APK3•A0A6V7APK3_SARS2, collected from the Uniprot database, 19 to the Robetta server (robetta.bakerlab.org). 20SARS-CoV-2 culture isolation of a SARS-CoV-2 classified as XBB.1.5,based on the sequencing and analysis in our laboratory 21 of its genome (GenBank 22 Accession no.OR076443), was performed by inoculating 200 µL of respiratory sample on Vero E6 cells in our NSB3 level laboratory and observing the cytopathic effect by inverted microscopy, as previously described. 23SARS-CoV-2 virions were visualized in the culture supernatant by negative staining on a Tecnai G20 electron microscope (FEI).

| RESULTS
Based on the analysis conducted as of 23/05/2023 through the Cov-Spectrum web application (https://cov-spectrum.org/), 6a stop codon has been detected at 89 (74%) of the 122 codons of the ORF8 gene in at least one of the 15 222 404 genomes available (Figure 2; Supporting Information: Table S1).The mean number of genomes subvariant viruses. 10The explanation may be 10 that according to the rule of "use it or lose it", getting rid of a gene that we do not use has been beneficial for viruses; 12,13 or that the ORF8 gene is a major target for the immune response 14 and that loss of this protein would hence lead to a lower sensitivity of the virus to immune responses of patients previously infected by SARS-CoV-2, but also by SARS-CoV or viruses harboring a common epitope.
This observation of the inactivation of the ORF8 gene corresponds to an important biological phenomenon, largely unknown in virology but widely described for cellular organisms, of genomic reduction in contexts of adaptation to a new host and/or to a specialization. 12,13I U R 2 Frequency and distribution of stop codons in the SARS-CoV-2 ORF8 gene.This figure is based on data collected from the Cov-Spectrum web application (https://cov-spectrum.org/)6 that collects data from the GISAID database (https://gisaid.org/).7 It is also partially based on SARS-CoV-2 ORF8 features previously reported by Flower et al. 17 and Arduini et al. 18 The number of genomes harboring stop codons at ORF8 gene codons 8, 27, and 64 are indicated by pink, yellow and light blue colors, respectively.The YIDI motif mediates noncovalent dimer interactions.β = beta sheet.SARS-CoV-2, severe acute respiratory syndrome coronavirus 2.
codon per codon was 0.11% (range, 0%-7.8%).In addition, a stop tryptophan in one case each.For all these 15 codons, a single nucleotide substitution enabled generating the stop codon, whereas it was the case for 31 of 106 codons that exhibited a stop codon in <1000 genomes and for 2 of 71 codons that exhibited a stop codon in less than 10 genomes (Figure 2).For eight of the 15 codons that were turned to stop and were detected in at least 1000 genomes, codon change was due to a C>U substitution (Supporting Information: Table S1).Seven lineages, namely BA.The SARS-CoV-2 ORF8 protein displays a signal sequence cleavage site at the 15th peptide bond, allowing its extracellular secretion as a functional homodimer (Figure 3A). 24A stop codon at codon 8 such as that observed in XBB.1.5genomes in the peptide signal sequence means that the biosynthesis of the protein is terminated very early, even before the chain reaches the cleavage site.A SARS-CoV-2 classified as XBB.1.5 (genome GenBank Accession no.OR076443) was isolated here in culture in our NSB3 level laboratory.It caused a cytopathic effect during the first passage 10 days postinoculation on Vero E6 cells and could be visualized in the culture supernatant by electron microscopy (Figure 1C).

(A) (B)
F I G U 3 Structure of the full-length OR8 protein with the signal sequence (A) and schematic of the generation of the SARS-CoV-2 XBB.1.5 genome (B).(A) Codon 8 in ORF8 encodes a glycine residue.Gly 8 is represented as atomic yellow spheres.The cleavage site is indicated by a double arrow.The structure has been obtained by submission of Uniprot 19 file #A0A6V7APK3•A0A6V7APK3_SARS2 to the Robetta server (robetta.bakerlab.org). 20(B) XBB.1.5 was generated through two successive main events that consisted in a recombination between two BA.2 lineages that generated the XBB (22F) lineage then the loss of the ORF8 protein due to a stop codon in the ORF8 gene.This is why, in our laboratory, we have called this virus "the Hydra of Lerna," in reference to the creature in Greek mythology, offspring of Typhon and Echidna, sometimes described as a chimera with the body of a dog and the heads of serpents, and whose cutting off a head did not kill.SARS-CoV-2, severe acute respiratory syndrome coronavirus 2.
In XBB.1.5, the stop codon is present at the very beginning of the ORF8 gene, at codon 8, in the signal peptide region.This means that the biosynthesis of the protein is terminated very early, even before the chain reaches the cleavage site.Although translational readthrough (suppression of termination at a stop codon) may be used by some viruses to expand their gene expression as for the case of HIV-1 for which the stop codon is found at the border between gag and pol genes, 25 this should obviously not be the case when a stop codon appears at the very beginning of a protein coding sequence.Hence, it can be reasonably anticipated that the stop codon at position 8 of the ORF8 gene sequence results in the total suppression of the biosynthesis of the ORF8 protein.0][11] The genomes of many other recombinant XBB.1 derivatives, such as XBB.1.16and XBB.1.9lineages that are particularly surveyed, 26 also have a stop codon in the ORF8 gene as recently reported 11 (Supporting Information: Table S2).Stop codons in ORF8 have been in fact observed extremely frequently and at multiple codons of this gene for human SARS-CoV-2 (Figure 2; Supporting Information: Table S1), but also for SARS-CoV-2 from other mammals such as mink and pangolin. 9The inactivation by stop codons of other SARS-CoV-2 accessory genes was also reported. 27In addition, a remnant of the ORF8 gene was reported in human endemic coronavirus HCoV-229E, and it was hypothesized that this gene would have been degraded when dispensable on passage to humans. 28us, it is very likely that during the SARS-CoV-2 pandemic the functional loss of ORF8 contributed on at least three occasions and independently to the evolutionary success of a lineage, having led it to temporarily become the one that predominated (Figure 1A,B; Supporting Information: Figure S1).The SARS-CoV-2 ORF8 protein is 121amino acid long and is comprised by an N-terminal signal sequence then a predicted Ig-like fold (Figure 3A). 17,18The RNA genome region that encodes it was predicted to form several stem-loop hairpins.Observation of stop codons in SARS-CoV-2 ORF8, which turns out to be a process of convergent evolution, is critical.Indeed, it highlights that genomic reduction and loss, or at least inactivation of genes in a context of host adaptation is a general biological phenomenon, not only in cellular organisms but also in viruses; and not only in giant viruses such as Mimivirus and those with a large arsenal of genes such as poxviruses 30,31 but also in canonical viruses such as SARS-CoV-2.In the case of Mimivirus, it is known to survive and multiply in the environment in amoebae of the genus Acanthamoeba, potentially in coinfection with other microorganisms resistant to amoebae, 30 which allows the transfer of sequences between their different genomes.Its prolonged culture at the laboratory in an allopatric environment, in the absence of any other infectious agent, was associated with a reduction in the size of its genome by 16%. 32And the genes lost, or degraded were found to be significantly more frequently those less expressed during the initial isolation of the virus in the laboratory, 30 suggesting that these genes were those not used in these experimental conditions.Overall, the successive emergence of SARS-CoV-2 variants whose ORF8 gene harbors one or several stop codons is a process of convergent evolution that highlights that viral gene losses and their association with a gain in viral fitness deserve to be considered more henceforth as an evolutionary process not to be neglected in viruses.

1
using default parameters.The set of SARS-CoV-2 genomes analyzed was comprised by those selected by the Nextstrain web application, plus 200 Phylogeny (A) and time distribution (B) of SARS-CoV-2 genomes harboring a stop codon in the ORF8 gene, and electron microscopy image of XBB.1.5viruses in a culture supernatant (C).(A) Phylogenetic analysis and visualization were performed through the Nextstrain tool.5This Figure1Ais an annotated screenshot of the Nextstrain web application (https://nextstrain.org/). 5SARS-CoV-2 genomes with a stop codon in ORF8 are indicated by a blue circle.The set of SARS-CoV-2 genomes analyzed was comprised by those selected by the Nextstrain web application plus 200 SARS-CoV-2 genomes of the B.1.160lineage (Marseille-4 variant) obtained from respiratory samples evenly distributed during the circulation period of this lineage, which were incorporated into the set of analyzed genomes as they were poorly represented in the Nextstrain data set.(B) Data were collected from the Cov-Spectrum web application (https://cov-spectrum.org/)6that collects data from the GISAID database (https://gisaid.org/),7using the advanced search option.# Marseille-4B spread in France between December 2020 and April 2021; Alpha (B.1.1.7)variant spread between December 2020 and September 2021.SARS-CoV-2, severe acute respiratory syndrome coronavirus 2. SARS-CoV-2 genomes obtained from respiratory samples in our institute and classified into the B.1.160lineage (Marseille-4 variant). 10These latter genomes were incorporated into the set of analyzed genomes as the B.1.160lineage was poorly represented in the Nextstrain data set, and they were selected based on an even distribution during the circulation period of this lineage between July 2020 and April 2021.Time distribution of SARS-CoV-2 variants harboring a stop codon per codon position was 17 117 (range, 0-1 181 627), and the mean proportion of genomes harboring a stop BOX 1.The Mistigri rule The Mistigri is a card game in which the winner is the one who gets rid of the "jack of clubs" card.By analogy, it has been used to describe the renewed incidence within the Pangolin B.1.160lineage associated with the inactivation of the ORF8 gene by a stop codon in the Marseille-4B codon has been detected at 15 (12%) codons in at least 1000 genomes.These 15 latter codons are located on a RNA modification site (codon 18); on seven stem-loop hairpin regions (codons 8, 18, 19, 23, 29, 45, 50, 54, 59, 64, 68, 91, 106, and 110); on three beta sheet regions at the protein level (codons 23, 45, and 59); and in the signal peptide region for the case of the XBB.1.5lineage (codon 8).At these 15 codons, the original codon encoded a glutamic acid or a glutamin in five cases each; a glycine in two cases; and a lysine, a serine or a 2; B.1.1.7;XBB.1.5;BG.2 (an Omicron BA.2-derived lineage); B.1.258.17 (that predominated in Slovenia between August 2020 and May 2021); and AY.117 and AY.48 (two Delta sublineages), were the majority lineages with at least 1000 genomes harboring a stop codon at a given codon.Besides, XBB.1.5 was the majority lineage to harbor a stop codon at four different codons, essentially at codons 8 (n = 353 930) and 90 (n = 388).B.1.1.7 was the majority lineage to harbor a stop codon at 13 different codons, essentially at codons 27 (n = 1 181 627 genomes), 68 (n = 402 673), and 110 (n = 3,621).Also, BA.2 was the majority lineage to harbor a stop codon at 10 different codons, essentially at codons 45 (n = 4063) and 29 (n = 1487).