Genome evolution is driven by gene expression‐generated biophysical constraints through RNA‐directed genetic variation: A hypothesis

The biogenesis of RNAs and proteins is a threat to the cell. Indeed, the act of transcription and nascent RNAs challenge DNA stability. Both RNAs and nascent proteins can also initiate the formation of toxic aggregates because of their physicochemical properties. In reviewing the literature, I show that co‐transcriptional and co‐translational biophysical constraints can trigger DNA instability that in turn increases the likelihood that sequences that alleviate the constraints emerge over evolutionary time. These directed genetic variations rely on the biogenesis of small RNAs that are transcribed directly from challenged DNA regions or processed from the transcripts that directly or indirectly generate constraints or aggregates. These small RNAs can then target the genomic regions from which they initially originate and increase the local mutation rate of the targeted loci. This mechanism is based on molecular pathways involved in anti‐parasite genome defence systems, and implies that gene expression‐related biophysical constraints represent a driving force of genome evolution.


Introduction
The modern synthesis theory of evolution postulates that the evolution of living organisms results from random mutations that are selected or filtered by natural or sexual selection, or that are under neutral evolution when they do not affect gene product functions. "Random mutations" means that mutations occur at random locations in the genomic DNA or that mutations occur randomly with respect to the biological effects they may drive. However, increasing evidence indicates that the DNA location of mutations is not random. For example, the sequencing of an increasing number of human genomes from parent-offspring trios indicates that de novo mutations in the germline are not randomly distributed across the genome [1][2][3][4][5]. It seems instead that the DNA location of mutations is associated with local genomic factors including local sequence context (e.g. CpG dinucleotides), DNA biochemical (e.g. methylation) and structural features (e.g. non B-DNA structures), or chromatin marks and topology [1][2][3][4][5]. This is likely because these genomic factors impact on different mutational processes such as DNA damage or editing, or processes involved in DNA repair [6][7][8][9]. This interplay raises the possibility that some cellular processes could impact on the mutational rate of genomic loci by impacting on local genomic factors.
In this context, increasing evidence indicates that RNA molecules target specific genomic loci through several mechanisms (e.g. base-pairing to complementary nascent RNAs or to complementary DNA strands), and can impact on the local genomic factors described above. RNAs can direct chromatin or DNA biochemical and structural modifications, and can guide to targeted loci, enzymes involved in DNA sequence modifications (e.g. DNA endonucleases and editing enzymes) [10][11][12][13]. RNAs are also involved in DNA repair and recombination, and they direct complex DNA rearrangements [14][15][16]. These observations have led several authors to propose that RNAs are involved in "re-writing" the genome [14][15][16][17][18][19]. Consequently, some biological processes could result in the biogenesis of RNAs that target complementary genomic loci and that modulate the mutation rate of the targeted-loci. If such a process exists, where might these RNAs originate, and could they increase the local mutational rate of targeted genomic locations in a non-random way with respect to the biological effects they potentially drive?
While it has long been established that transcription is coupled to translation in prokaryotic cells, one of the most exciting recent discoveries in eukaryotic cells, is that the different steps of the gene expression process are physically coupled. Indeed, transcription is coupled to RNA processing, and mRNA metabolism is coupled to protein metabolism [20,21]. In this article, I show that the physical proximity or the tight interplay between different steps of gene expression provides the molecular framework for a potential feedback from the gene expression process back to DNA mutation process. More specifically, since the biogenesis of any gene product (i.e. RNA or protein) in the crowded interior of a cell can threaten cellular and genomic integrity, I will propose that the cell might overcome these problems through molecular pathways that facilitate changes to genomic sequences that produce" toxic molecules" (Box 1). Since parasitic nucleic acids are also a threat to the integrity of the cell and its genome, the same molecular pathways, which rely on small RNAs to mutate or remove exogenous toxic nucleic acids, are presumably also at work to modify or remove toxic endogenous nucleic acid sequences.
Co-transcriptional biophysical constraints shape gene architecture The interplay between co-transcriptional biophysical constraints, DNA instability, and RNA processing drives DNA sequence evolution It is now clearly established that transcription generates topological and biophysical constraints on DNA with consequences on genome stability. For example, negative and positive supercoils are generated behind and in front, respectively of the transcribing RNA polymerases. This then results in the formation of non-B DNA structures and transcription or replication roadblocks that can induce DNA instability (e.g. DNA breaks) [22,23]. In addition, the newly synthesized RNA molecules can rehybridize to the template DNA strand, to create co-transcriptional R-loops (composed of an RNA:DNA hybrid and a displaced single-stranded DNA). These R-loops are a major source of DNA instability [23,24]. Transcription can therefore challenge DNA integrity (e.g. by inducing DNA breaks) both in an RNAdependent and -independent manner, through co-transcriptional biophysical constraints.
As well as generating genome instability, co-transcriptional constraints trigger the co-transcriptional processing of nascent RNAs (e.g. splicing, 3 0 -end RNA processing) in eukaryotic cells. For example, chromatin compactness and transcriptional roadblocks impacts on co-transcriptional splicing or trigger transcription termination, RNA cleavage and 3 0 -end RNA processing [20,[25][26][27][28]. The formation of R-loops has also been involved in 3 0 -end RNA processing [29,30]. It is believed that cotranscriptional physical constraints induce pausing of RNA polymerase, which increases the time window required for the co-transcriptional recruitment of RNA processing factors on the nascent RNAs [20,26,31,32].
While co-transcriptional physical constraints impact on co-transcriptional RNA processing, increasing evidence indicates that co-transcriptional RNA processing itself alleviates these constraints, and protects DNA from transcriptional-mediated damages. Indeed, inhibition of different RNA processing steps, including splicing, leads to genome instability [33][34][35][36]. Several mechanisms could explain the protective effect of co-transcriptional RNA processing on DNA integrity. First, RNA-binding proteins involved in RNA processing may coat the nascent RNA and prevent it from hybridizing back to the DNA template. Supporting this model, depletion of several splicing factors lead to R-loop formation and genome instability [33][34][35][36][37][38][39]. Second, co-transcriptional RNA processing could favor the removal of newly synthetized transcripts from chromatin since splicing is coupled to exon junction complex recruitment, which contributes to co-transcriptional RNA packaging and RNA export. Supporting this model, depletion of factors involved in the coupling between RNA processing, RNA packaging and export, results in DNA instability [40,41]. Co-transcriptional translation in

Box 1
The concept of toxicity used throughout this manuscript must be understood in a general sense and covers several notions. For example, transcription is toxic to DNA because it can cause DNA breaks. RNAs also have some degree of toxicity to DNA since they can re-hybridize to DNA and generate R-loops that are genotoxic. RNAs produced from some genomic repeated elements, like retrotransposons, are toxic to genomes because they can invade it and alter their functioning by interfering with the activity of genes. RNAs are also potentially toxic because of their ability to form molecular aggregates that are toxic to the cell as it is now clearly established in a variety of pathologies. Similarly, protein biogenesis can lead to the formation of toxic molecular aggregates that can be linked, for example, to nascent protein misfolding. The concept of toxicity used in this manuscript relies therefore on the notions that (i) a genome is used by a cell to produce molecules, RNAs, some of which being used to produce proteins; (ii) these biogenesis processes are potentially "dangerous" for the cell because the act of production of these molecules or these molecules themselves can create deleterious biophysical constraints within the crowded intracellular environment. prokaryotes may have a similar protective role to eukaryotic co-transcriptional RNA processing, by taking nascent RNA off the DNA template [42].
Because of the interplay between cotranscriptional biophysical constraints, DNA instability and co-transcriptional RNA processing, I propose that co-transcriptional biophysical constraints within a transcriptional unit increase DNA damage locally, that is, in the vicinity of the physical constraints ( Fig. 1). As long as the constraints persist, the resulting DNA instability increases the probability that mutations occur at the challenged locus. The high local mutational rate may lead to the emergence of RNA processing sites over evolutionary time because they alleviate transcription-mediated genotoxicity and thus increase the DNA stability of their host gene.
Are small RNAs involved in directing genetic variations in response to transcription-induced genome instability?
The mutational process in Fig. 1 that ultimately overcomes co-transcriptional DNA instability likely involves small RNAs. Indeed, small noncoding RNAs are produced in the vicinity of doublestranded DNA break (DSB) sites [43][44][45][46]. These small RNAs can induce chromatin modifications to assist DSB repair or drive the local recruitment of proteins involved in DNA repair [43][44][45][46][47]. Recent work also suggested that RNAs can be used as templates for homologous recombination in bacteria, yeast, and human. Indeed, RNAs can anneal to complementary DNA sequences and serve as templates for DNA repair by reverse transcription [48][49][50][51]. RNAs can also direct endonucleases or DNA-editing enzymes, like the activation-induced cytidine deaminase, to genomic loci, which results in increased DNA instability and mutation there [12,13,16].
Collectively, these observations raise the possibility that transcriptionmediated DNA damage induces the biogenesis of small RNAs (either by transcription or nascent RNA cleavage) that subsequently modulates the local mutational rate. It would be interesting to look, in the future, at whether cotranscriptional biophysical constraints induce specific types of small RNAs, DNA damage and mutations that are more likely to generate splicing or polyadenylation sites or some RNA processing regulatory sequences. For example, since co-transcriptional R-loops are made of a displaced single-stranded DNA that is the preferential substrate for DNA editing enzymes, it would be interesting to test whether the mutations mediated by DNAediting enzymes increase the generation of RNA (regulatory) processing sites. Even if this is not the case, the important take home message from Fig. 1, is that local DNA instability that is induced by co-transcriptional constraints persists until sequences (e.g. RNA processing sites) that alleviate the co-transcriptional physical constraints emerge over evolutionary time.
Understanding genomic features in light of co-transcriptional physical constraints driving DNA sequence evolution The mutational process hypothesis outlined in Fig. 1 implies that biophysical parameters that impact cotranscriptional events (and that therefore more-or-less directly impact DNA stability) can be modified during evolution not because of randomly located mutations. Rather, mutations preferentially occur where co-transcriptional parameters induce genomic instability. In prokaryotes, this model implies that transcriptional elongation rate directly contributes to codon-use evolution. Indeed, a fast transcriptional elongation rate would create genetic instability when the nascent RNA is not efficiently translated. Therefore, RNAmediated genotoxicity would favor the emergence of codons that synchronize the kinetic parameters of translation and transcription [42,52]. A consequence in eukaryotes of the mutational process in Fig. 1 is the emergence, over evolutionary time, of a large number of cryptic and alternative RNA processing sites. Indeed, if cotranscriptional constraints trigger both DNA instability and RNA processing, which in turn alleviates these constraints, transcription-mediated genome instability would constantly favor the emergence of new RNA processing sites. This conclusion leads on to a novel interpretation of the huge number of available RNA processing sites (splicing and polyadenylation sites) present in most eukaryotic genes. It is often assumed that alternative RNA processing sites are generated by random mutations and are then selected during evolution depending on the cellular functions of the gene products they allow the synthesis [53]. However, it is conceivable that RNA processing sites are generated over evolutionary time because of the DNA instability triggered by co-transcriptional biophysical constraints. Neo-formed RNA processing sites would be passed down the generations because they contribute to the genetic stability of the transcribed loci they are embedded in. Neo-formed RNA processing sites within a gene would next be filtered depending on their potential impact on the cellular function of the gene products (see below).
The interplay between cotranscriptional constraints, DNA instability and co-transcriptional RNA processing could also contribute to explain the evolutionary success of some mobile elements, like the Alu elements in primates. Alu elements, belonging to a class of retroelements termed SINEs (short interspersed elements), contribute to 11% of the human genome. Despite the fact that the expansion of these elements increases the size of transcribed genomic regions and is a threat to the hosting genome [54][55][56], Alus may advantage the hosting genome by reducing transcription-mediated genomic instability. Indeed, it is now well established that Alu elements provide both polyadenylation and splicing sites [57][58][59][60]. This means that the genotoxicity of Alu elements might be counterbalanced by the ability of these elements to spread RNA processing sites within their host genomes.
It is interesting also to note that Alu sequences are co-transcriptionally wrapped up by RNA processing factors and favor nascent RNA folding, which may collectively help to take Alucontaining nascent RNAs off chromatin [61,62]. Although it is known that T-rich sequences are important for Alu element insertion [59], we do not know yet the rules that could direct the insertion of Alu elements into specific DNA locations. Could Alu elements be preferentially inserted where cotranscriptional biophysical constraints are creating genomic instability? Alu elements could take advantage of the single-stranded DNA in cotranscriptional R-loops. T-rich sequences that induce RNA polymerase pausing [63] could increase the likelihood of Alu insertion, especially if they are downstream of GC-rich sequences that increase the likelihood of co-transcriptional R-loops [64]. Insertion of Alu elements within these unstable DNA regions would bring pseudo-RNAprocessing sites which would bring RNA processing-mediated genome stability. The evolutionary success of Alu elements might therefore be due to their ability to disseminate alternative (or pseudo-) RNA processing sites within their host genomes, as these elements simultaneously stabilize transcribed-genomic loci and increase the molecular diversity generated from these loci [53].
Mutations generated by the transcription-mediated mutational process could next be filtered during evolution based on the gene products' functions. A mutation may alleviate a cotranscriptional constraint but disturbs the function of the gene product and give rise to deleterious biological consequences. However, the flexibility of RNA processing pathways [65] could "buffer" the potential deleterious effects of newly generated RNA processing sites on gene product functions. Indeed, newly generated and weak alternative RNA processing sites could be used in cases of "emergency," when an RNA polymerase gets stuck in a locus. The strength of RNA processing sites in a given locus therefore likely relies on the evolutionary-controlled equilibrium between the sites' effects on the stability of the locus and on the functions of its gene products. Having focus above on co-transcriptional biophysical constraints, I will next address the potential interplay between biophysical constraints occurring during translation and the evolution of coding gene sequences.
Co-translational biophysical constraints shape coding sequences Formation of RNA and protein aggregates is a threat to the cell homeostasis Before describing how co-translational biophysical constraints might shape coding sequences over evolutionary time, it is important to first highlight that one of the main pitfalls of the gene expression process is the formation of toxic RNA and protein aggregates. Aggregation can result from the increase of the local concentration of proteins and RNAs since their physicochemical properties make them prone to form aggregates. For example, protein aggregates can be seeded by increased local protein concentration, which might be critical at the protein production site and because many proteins contain aggregation-prone intrinsically disordered regions [66,67]. Protein aggregates can also be initiated by protein unfolding during translation, as peptides emerging from ribosomes can form "spurious" contacts with peptides from the same nascent polypeptide [66,67]. Likewise, the physicochemical properties of RNAs, their ability to interact with each other through base pairing, and their ability to interact with RNA-binding proteins that contain aggregation-prone intrinsically disordered regions, make the RNAs prone to form aggregates [68,69]. Therefore, the main question is not why proteins and RNAs form aggregates, but is rather what the mechanisms preventing the formation of aggregates are in the crowded interior of the cells?
One straightforward mechanism is RNA cleavage. For example, it has been shown that co-translational unfolding of nascent proteins can induce co-translational mRNA cleavage. This has been observed during the endoplasmic reticulum stress response where an mRNA can be co-translationally cleaved if it is in the process of producing an unfolded nascent protein that is translocating into the endoplasmic reticulum [70]. A similar process occurs widely, free in the cytoplasm, where nascent protein unfolding can induce co-translational mRNA cleavage and thus translation arrest [71,72]. In addition, several translation-associated processes, including nonsense-mRNA mediated decay, "no-go" decay and the nonstop decay pathways, induce mRNA cleavage if a specific step of the protein synthesis process (e.g. translation termination) is inefficient [73,74]. It has also been recently demonstrated that synonymous codons affect the kinetics of translation elongation, which impacts co-translational mRNA cleavage [74][75][76]. Therefore, mRNAs can be co-translationally cleaved when their translation is inefficient or results in the synthesis of nascent proteins that initiate aggregate formation.
While RNA cleavage is often associated with RNA degradation, there is increasing evidence that cleaved RNAs can give rise to small functional RNAs. There are indeed numerous examples of mature coding and noncoding RNA molecules being cleaved by endoribonucleases and giving rise to small RNAs that regulate diverse biological processes [77][78][79][80][81][82]. An emerging concept is that cleavage-derived small RNAs are involved in feedback loops, and allow cells to "fight" potentially toxic RNAs. This is illustrated by the piRNA pathway [83][84][85][86][87]. piRNAs were originally identified as small RNAs that are cut out of transcribed retrotransposons. The cleavage of retrotransposon RNAs decreases their ability to invade their host's genome [88,89]. In addition, the retrotransposon-derived piRNAs are loaded onto proteins of the Argonaute family, which directs the cleavage of any transcripts that contain retrotransposoncomplementary sequences [88][89][90][91]. But these small RNAs can also target the genome regions that produce their precursors and induce targeted transcriptional gene silencing [83,85,87,88]. Interestingly, the piRNA pathway is not restricted to retrotransposon-derived RNAs as (i) piRNA-like molecules can also be derived from pseudogenes, 5 0 -or 3 0 -UTRs and even coding mRNA sequences [91][92][93][94][95]; (ii) piRNA-like molecules have also been shown to post-transcriptionally regulate mRNAs [91,94]; (iii) piRNA-like molecules have been shown to target genomic regions that do not contain retrotransposons [96][97][98][99][100]. The piRNA pathway illustrates how toxic RNAs (e.g. retrotransposon RNAs) can be cleaved and trigger the biogenesis of small RNAs that next contribute to targeted-RNA cleavage (i.e. post-transcriptional gene silencing, PTGS) or transcriptional gene silencing (TGS). Both pathways can be described as feedback loops since they inhibit the production or accumulation of the toxic RNAs (Fig. 2).
The question now to be addressed is whether small RNAs deriving from potentially toxic precursor RNAs impact on the precursors' DNA sequences as part of a cellular process that "fights" against genome-generated toxic RNAs? This possibility is supported by the direct and indirect roles of small RNAs in chromatin and DNA biochemical modifications, DNA repair, and DNA sequence modifications and recombination at targeted loci, as described in the previous part.
In summary, (i) co-translational events can trigger RNA cleavage; (ii) cleaved RNAs can be further processed into small RNAs; and (iii) small RNAs can impact on DNA stability. The next section describes how these molecular pathways could work together to provide a molecular framework to link co-translational biophysical constraints to directed-mutations within coding sequences.
One sequence can impact on DNA, RNA, and protein features Protein coding sequences are clearly shaped by functional constraints depending on the amino acid chain sequence. However, there is now clear evidence that the biophysical processes of protein synthesis and folding also contribute to shape coding sequences, as even synonymous sites appear to be under evolutionary constraints [101][102][103][104][105]. It is believed that synonymous sites that are neutral at the amino acid chain level, are not neutral in terms of quantitative and qualitative parameters of biophysical processes like protein synthesis and folding. The preference for specific synonymous codons depends on their effects on RNA secondary structures, and on the fact that they determine the nature of the anti-codons (tRNAs) to be used during translation. Both RNA secondary structure and codon usage impact on translation kinetics and protein folding [66,[103][104][105]. Therefore, coding sequences could be selected over evolutionary time, based not only on the encoded amino acids, but also based on their impact on co-translational biophysical processes. Could these cotranslational biophysical processes drive genetic variations? I propose that co-translational biophysical constraints that cause nascent protein mis-folding and aggregate formation, trigger co-translational mRNA cleavage. Cleaved mRNAs could then initiate the biogenesis of small RNAs that next target the loci they originate from and increase the local mutational rate of the targeted loci. A special class of nucleic acid sequences, namely G-rich tracts, illustrates how such a molecular framework could be straightforward. G-rich DNA or RNA strands can form G-quadruplexes that are topologically polymorphic secondary structures. These structures that are mutation hotspots impact several processes at the DNA, RNA and protein level (Fig. 3, left panel) [106][107][108][109][110][111][112][113][114][115]. Because of the Gquadruplex features, RNA molecules containing G-quadruplexes may allow a feedback from translation to DNA mutational processes. Indeed, if a co-translational event is inefficient or altered (e.g. if a nascent peptide is misfolded and initiates aggregation), there is a probability that translationally repressed mRNAs will be co-translationally cleaved in the vicinity of structures, like G-quadruplexes, that reduce the motion of ribosomes and that can act as translational roadblocks (Fig. 3, right panel) [106,[111][112][113][114][115]. These RNA structures may help the recruitment of endoribonucleases and, alternatively, the translationally repressed mRNAs might be cleaved anywhere and trimmed by exoribonucleases that can be blocked at stable RNA secondary structures, like G-quadruplexes [116][117][118][119]. Therefore, co-translational biophysical constraints may result in the production of G-quadruplex-containing RNA fragments and mRNA-derived small RNAs. Accordingly, Gquadruplexes are involved in regulating the biogenesis of small RNAs, such as piRNAs and they are present in several kinds of cleavage-derived small RNAs [116,119,120]. Therefore, RNA secondary structures like G-quadruplexes might play a role in the biogenesis of small RNAs from precursor RNAs by impacting on RNA cleavage, by protecting RNA fragments from exoribonucleases, or by initiating the biogenesis of small RNAs.
G-quadruplex-containing small RNAs may next target the genomic loci they originate from, by base pairing with nascent RNAs or with strands of opened DNA. Indeed, it has been shown that Gquadruplex-containing RNAs can form stable RNA:DNA hybrids, that is, where Gquadruplexes are made of half RNA and half DNA [109,121,122]. These RNA:DNA hybrids form R-loops, which can lead to DNA instability. G-quadruplex containing RNAs may also direct enzymes like DNAediting enzymes (e.g. Activation-induced cytidine deaminase) to targeted loci. Indeed, it has been shown that after transcription and splicing, the lariats produced from immunoglobulin gene introns that contain repeated sequences (i.e. the switch regions) are de-branched and used for the biogenesis of G-quadruplex-containing small RNAs. These small RNAs can be bound by Activationinduced cytidine deaminase and guide the enzyme to the genomic intronic switchregionsina sequence-specific manner [13]. The interaction between the G-quadruplex-containing RNAs and one of the DNA switch-region strands may lead to the formation of R-loop structures, within which the single-stranded DNA is the preferential substrate of the Activationinduced cytidine deaminase's enzymatic activity [13]. Deaminated DNA next engages the base excision and mismatch repair machineries to generate doublestranded DNA breaks, which creates genetic variability within the immunoglobulin loci [13,123]. This mechanism also occurs outside the immunoglobulin loci ("off-targets") in DNA regions that can form G-quadruplex structures [13,123]. In conclusion, RNAs containing G-quadruplexes can direct genetic variability.
Other specific DNA and RNA sequences and structures likely play a similar role to G-quadruplexes. Of particular interest are short tandem repeats like trinucleotide repeats that are involved in many genetic diseases. These sequences generate biophysical constraints during DNA replication and transcription and are highly mutagenic [124][125][126]. They also contribute to the formation of structured RNAs and can induce ribosome stalling during the elongation phase of translation [127][128][129][130]. Remarkably, mRNAs containing trinucleotide repeats can be cleaved and initiate the biogenesis of small functional RNAs [131][132][133]. Finally, RNAs containing trinucleotide repeats can form DNA: RNA hybrids or triplexes in trans [134]. In conclusion, some sequences (e.g. Gquadruplexes, trinucleotide repeats) have features that could allow a straightforward feedback from translation to directed-genetic variation.
Do co-translational physical constraints drive DNA sequence co-evolution?
In the molecular pathway in Figs. 2 and 3, biophysical parameters impacting on cotranslational eventscan be modified during evolution not because of randomly located mutations but because co-translational events trigger co-translational cleavage of mRNA and the biogenesis of small RNAs that next increase the local mutational rate of targeted loci. A consequence of this model is that protein chaperones that help protein folding during translation should "buffer" this mutational process (as the proteins involved in RNA processing do, see part 1). Recent evidence has shown that protein chaperones, like heat shock proteins (HSPs) that help protein folding during translation, couple protein and RNA homeostasis, and are involved in piRNA biogenesis pathways, impact on genome evolution [135][136][137][138][139][140]. It has been suggested that on the one hand HSP chaperones buffer mutations, as they allow some protein sequence variations by helping protein folding, and on the other hand, the HSP knockdown induces the apparition of de novo mutations [139][140][141][142][143][144][145][146]. An interesting possibility is that the absence of HSP chaperones increases cotranslational aggregate formation and results in the production of mutagenic small RNAs (Fig. 4A).
Consequently, proteins interacting with a nascent polypeptide could also contribute to directed-mutations by impacting co-translational protein folding. Indeed, in contrast to what is often believed, many events affecting proteins occur during translation, which includes protein-protein interactions [147][148][149]. If a mutation affects a protein A that interacts with a nascent protein B and alters its folding during translation, this could trigger the mutational process described above and lead to mutation of the protein B encoding gene (Fig. 4B). Therefore, if biophysical constraints trigger mutations that alleviate the initiating constraints, these mutations can in turn create other constraints anywhere within interacting networks, which will trigger mutations in other genes. Consequently, the interplay between biophysical constraints and mutational processes could explain the evolution of protein interaction networks.
Evolution of coding sequences may not just be fuelled by random mutations. Mutational processes may also be triggered by co-translational biophysical parameters that can feedback on DNA sequences through the biogenesis of structured and mutagenic small RNAs.
A widespread driving force shaped in a species-specific manner Genome defence systems and RNA-mediated genome evolution are the two faces of the same coin If gene expression-generated biophysical constraints drive genetic variations, this process is likely to be ubiquit-ous. First, cotranscriptional and co-translational biophysical constraints rely on the physicochemical properties of nucleic acids and proteins. Second, transcription and translation are universal. Finally, this concept relies on the basic notion (that could apply to any cell) that some expressed-genomic sequences are toxic (Box 1) and challenge the integrity of the cell and its genome. Since every living organism has evolved specific molecular pathways to "fight" parasitic nucleic acids, which relies on RNA-guided immunity, nucleic acid cleavage or editing, it would be expected that each organism uses the same (or similar) molecular pathways to fight both parasitic nucleic acids and endogenous toxic sequences [83][84][85][86][87][150][151][152].
The porous nature of the frontier between "self" cellular nucleic acids and endogenous or exogenous parasitic "nonself" nucleic acids supports the notion of an interplay between genome defence systems and RNA-mediated genome evolution. Indeed, it was believed that cellular RNA decay and small RNA biogenesis pathways were distinct pathways, the first one being involved in "self" RNA degradation and the second one being used to fight "non-self" parasite RNAs. However, there is now considerable evidence that both pathways are tightly connected [83-87, 153, 154]. For example, and as already mentioned, piRNAs that allow the cell to fight against retrotransposon invasion are produced from retrotransposons, coding genes and pseudogenes. In addition, cellular RNAs are massively edited, as parasite RNAs are, and can be used to generate RNAs that activate the immune response [155]. The L1-ribonucleoprotein particle, which is responsible for the genomic insertion of retrotransposonderived RNAs, can also create processed pseudogenes when it allows genomic integration of mRNA sequences (self RNAs) after reverse transcription [156,157]. Collectively, these observations suggest that cellular "self" RNAs can at some points get entangled in the biological pathways normally used to fight parasite "nonself" nucleic acids (Fig. 5). Extrapolating from the crosstalk between antiparasitic nucleic acid pathways and cellular RNA metabolism pathways would explain how and why some cellular RNAs could become "mutagenic." The genome defence systems against parasitic nucleic acids and the RNA-mediated genetic variations of endogenous toxic genomic sequences could actually be two faces of the same coin as these processes both just remove or modify toxic sequences. As a consequence, the evolutionary trajectory of a genome would directly depend on the parasitic nucleic acids it met. This means that although biophysical constraint-mediated genome evolution could be a widespread driving force, the precise molecular pathways that drive genetic variations could be specific to each organism depending on the precise genome defence systems it has. While cells may take advantage of molecular pathways involved in fighting parasitic nucleic acids to modify their own genome, it is interesting to underline that the eukaryotic gene expression process has recently been proposed to be shaped by the "combat" against parasitic nucleic acids [158].
Are RNA-directed genetic variations triggered by co-transcriptional and co-translational biophysical constraints interconnected?
In the first section, I proposed that cotranscriptional biophysical constraints shape gene architecture. In this context, it is interesting to underline that genomewide waves of transcription occur during the development and differentiation of male germ cells. These genome-wide waves of transcription are the consequences of the genome-wide epigenetic "reprogramming" occurring during male germ cell development and differentiation [159][160][161][162]. As a consequence, male germ cells produce the most complex set of coding and noncoding transcripts and alternative splicing variants [159][160][161][162]. Therefore, male germ cells may experience extensive transcription-mediated genomic instability that could explain the large-scale apoptosis of immature sperm cells [4,163,164]. Another consequence of transcription of genome-wide waves in male germ cells is the expression of a wide variety of retrotransposon-derived RNAs, which results in the activation of the piRNA pathway [159][160][161][162]. As described in section 1, the expression of retrotransposon-derived RNAs (e.g. Alu) and the genomic insertion of these elements could, at some point, help male germ cells to alleviate local co-transcriptional constraints and therefore increase the survival of the germ cell with this particular kind of de novo mutations.
However, de novo mutations could generate deleterious gene products. Mattick et al. recently described a molecular pathway by which de novo mutation filtering could be performed during male germ cell development. In this scenario spermatogonia die if they have mutations that do not pass molecular "quality control" [159]. This filtering of de novo mutations during gametogenesis likely reduces transmission of deleterious genetic variants to the next generation.
If de novo mutations, resulting from insertion of Alu elements within an intronic locus pass the spermatogenic "quality control," then this could result in the Alu element exonization being weakly recognized as an alternative exon [53]. The buffering activity of the splicing process would first decrease the likelihood of generating deleterious gene products (i.e. weak splice sites are more often missed). However, if cotranscriptional physical constraints "push" toward the acquisition of stronger RNA processing sites, this would lead to the increase in the inclusion rate of new exons (e.g. Alu exons) during splicing. Meanwhile, the newly included exons would create constraints during translation (e.g. nascent protein mis-folding). These co-translational constraints would, in turn, trigger mutations in the corresponding coding sequences through G-quadruplex-containing small RNAs deriving from Alu sequences. Indeed, retrotransposons are prone to form G-quadruplex structures and they may have contributed to the spread of G-quadruplex structures within genomes [165,166]. Therefore, some retrotransposons (e.g. Alu) may not only spread RNA processing sites (see first part) but also G-quadruplexes that could help retrotransposon-derived exons to rapidly evolve as coding exons through the mutational process relying on co-translational biophysical constraints. Therefore, Alu-derived exons could evolve to encode peptides that do not create deleterious constraints during translation. One general prediction of this model is that biophysical parameters impacting on co-translational events evolve together with biophysical parameters impacting on co-transcriptional events. This would imply the existence of a relationship between codon optimization, translation, and transcription, as was recently suggested [167][168][169].
In addition to favoring germ cell survival, by decreasing transcriptionmediated genomic instability, some de novo mutations could provide germ cell growth advantage. Massive production of undifferentiated spermatogonia and their large-scale apoptosis is thought to reflect a spermatogonial selection process ("selfish spermatogonial selection") [4,163,164]. Recent evidence indicates that many evolutionarily new genes are specifically expressed first in the testis. This has led to the view of the testis as a "nursery" for new gene products and the view that genes can emerge dependent on testis-specific function (the "out of testis" hypothesis) [160,170,171]. Therefore, not only do de novo mutations occur during gametogenesis but some of them might be filtered and eventually selected during this process.

Conclusion and outlook
If gene expression-generated biophysical constraints direct genome evolution, then organismal evolution may not just be fuelled by random mutations. First, some de novo mutations would preferentially occur in an RNA-directed manner in genomic regions that generate constraints or some kind of toxicity when they are expressed (Box 1). Second, de novo mutations would not randomly occur with respect to the biological effects they may drive. Indeed, small RNA-directed genetic variations would start because genomic sequences are toxic and it would end when the sequences are no longer toxic. Different experimental designs could be developed to test the proposed hypothesis (Box 2). Since the same gene expressiongenerated constraints exist in genomes of different individuals from the same species, mutations in certain genomic locations could recur frequently amongst individuals. This is in contrast with random mutations that have a low probability of occurring several times at the same location. The high frequency of these constraint-derived mutations in a population would increase their penetrance. RNA-directed mutation process may also explain why some mutations are more frequent than others and have been recurrently generated during evolution [172][173][174][175]. It is also interesting to note that many disease-associated mutations often affect RNA processing sites [176].
What could be the link between gene expression-generated biophysical constraints directing genome evolution and phenotype? First, it cannot be excluded that the cellular micro-environment can impact on biophysical parameters (e.g. co-translational protein folding) which could trigger a mutational pathway that would end up alleviating the environment-mediated constraints. Therefore, modifications of the cellular environment could trigger a mutational process that increases the likelihood of generating genetic variants impacting gene products involved in the environmentmediated constraints. Related to this phenomenon, I recently proposed that mutations in cancer cells might be directed and adapted to the tumoral micro-environment. This would help explain why most (if not all) anticancer

Box 2
This box aims at describing some experimental settings that could help test the proposed hypothesis, focusing first on prokaryotic cells and next on eukaryotic cells. Since translation is coupled to transcription in prokaryotic cells, modifying optimal toward non-optimal codons in highly transcribed genes, is expected to increase local genomic instability. This would favor the re-emergence of optimal codons that would "synchronize" the dynamic of transcription and translation, therefore decrease genomic instability. Likewise, it may be possible to engineer coding sequences leading to the biogenesis of proteins having a high probability to misfold when emerging from ribosomes. Co-translational (therefore co-transcriptional) misfolding of nascent proteins is expected to increase local genomic instability, which would end up when new sequences alleviating protein misfolding emerge. Since RNA processing is coupled to transcription in eukaryotic cells, it might be possible to insert within gene bodies (e.g. in introns) DNA sequences that create constraints during transcription (e.g. R-loop prone sequences). These sequences would increase the local DNA instability, which would favor the emergence of RNA processing sites. It might also be possible to engineer an eukaryotic cellular model in which the folding of a protein can be challenged during translation in a controlled manner. A prediction resulting from the proposed hypothesis is that small RNAs corresponding to pieces of the parent mRNAs should be detectable, as should an increase of the mutational rate of the corresponding locus. While the experiment settings described above are expected to be associated with genetic variations at targeted loci, the characterization of the underlying mutational processes involved in prokaryotic and eukaryotic cells would allow to decipher whether genetic variations are random or can be driven by dedicated processes. The interplay between antiparasite genome defence systems and RNA-directed genetic variations could be addressed in unicellular organisms by exposing them to different stressful environments, when their genome defence systems are either active or inactive. One prediction is that mutation-mediated organismal adaptation would be strongly impaired in cell without an efficient antiparasite genome defence system. therapies failed because of tumor cell resistance [177].
Second, the concept of gene expression-generated biophysical constraints directing genome evolution implies that (i) the mutational process-driving force is still operating until the constraints are alleviated; and (ii) any new mutation generated in response to a constraint can create new constraints. This network could help explain some kinds of protein co-evolution [178] and therefore the "coordinated" evolution of genes involved in the same genetic circuit driving complex phenotypes.