Newly discovered mechanisms that mediate tumorigenesis and tumour progression: circRNA‐encoded proteins

Abstract Proteins produced by cap‐independent translation mediated by an internal ribosome entry site (IRES) in circular RNAs (circRNAs) play important roles in tumour progression. To date, numerous studies have been performed on circRNAs and the proteins they encode. In this review, we summarize the biogenesis of circRNAs and the mechanisms regulating circRNA‐encoded proteins expression. We also describe relevant research methods and their applications to biological processes such as tumour cell proliferation, metastasis, epithelial‐mesenchymal transition (EMT), apoptosis, autophagy and chemoresistance. This paper offers deeper insights into the roles that circRNA‐encoded proteins play in tumours. It also provides a theoretical basis for the use of circRNA‐encoded proteins as biomarkers of tumorigenesis and for the development of new targets for tumour therapy.


| INTRODUC TI ON
In recent years, due to the rapid development and extensive application of high-throughput RNA sequencing and new bioinformatics algorithms, a large number of circular RNAs (circRNAs) have been identified in the cells of many eukaryotes. circRNAs are novel RNA molecules that are produced from precursor mRNAs by reverse splicing, which differs from the production of conventional linear RNAs. This unconventional process results in a closed-loop RNA structure with neither a 3′ poly(A) tail nor a 5′ cap structure, rendering circRNAs more stable and resistant to ribonuclease degradation than linear RNAs. Over the years, considerable research has been reported on the general functions of circRNAs: they can act as endogenous competitive RNAs such as microRNA (miRNA) sponges, they can interact with RNA-binding proteins (RBPs), they can regulate the transcription of parental genes, and they can form short double-stranded RNAs to suppress protein kinase R (PKR). [1][2][3] It is generally accepted that circRNAs constitute a regulatory noncoding RNA subtype and usually do not encode proteins, but perform important functions such as regulating transcription, RNA splicing modification, mRNA translation, and protein stabilization and translocation. They also play roles in chromosome formation and structural stability. 4 Recently, bioinformatics analysis has suggested that circRNAs show the potential to encode proteins and can perform biological functions by encoding certain proteins. Most circRNAs consist of exonic sequences, localize to the cytoplasm and carry open reading frames (ORFs) with start codons, which indicate their potential to encode proteins. 5 The current view suggests that cap-independent translation of circRNAs that encode proteins is typically initiated through internal ribosome entry sites

| B I OG ENE S IS OF circRNA s
Most circRNAs are formed from exons or introns by back-splicing of pre-mRNA, and they compete with linear mRNAs during splicing. CircRNAs are largely classified into three types on the basis of their source RNA sequence and method of circularization: exonic circRNAs (ecircRNAs), intronic circRNAs (ciRNAs) and exon-intron circRNAs (eicircRNAs), ecircRNAs are the most abundant. It is generally accepted that circRNAs are circularized by the following four mechanisms. In the exon-skipping mechanism, exons skip during partial RNA folding during pre-mRNA transcription, and these structural changes lead to the formation of specific regions called lariat structures. A 'hetero-lariat' is first formed and contains exons and introns, so this type of circRNA is called eicircRNA. Removing the introns in eicircRNA can generate ecircRNA containing only exons. 2,[6][7][8] In the intron lariat circularization mechanism, some introns are thought to form lariat structures during splicing, but most introns are rapidly degraded by debranching, with only some introns containing essential nucleic acid sequences that remain branched after splicing. 9 Pre-mRNA is cleaved at the 5′ end, a process facilitated by small nuclear RNA (snRNA) U1, near the intron lariat. Then, the cleaved RNA is further processed to form a ciRNA. 10 Base pairing of the flanking intron sequence drives circularization mainly by pairing introns between exons (Alu repeat sequence) to induce reverse splicing, and the introns covalently bind together to form cir-cRNA. RBP-mediated intron pairing drives circularization by introns flanking of the exons in which the RBP binds to the pre-mRNA. RBP dimerization promotes the cyclization of adjacent exons to form circRNA. These two mechanisms can generate all three types of circRNAs [11][12][13][14] (Figure 1A).

| MECHANIS M OF circRNA-EN CODED PROTEIN S
Translation of RNA in eukaryotic cells requires the eukaryotic translation initiation factor eIF4F, a complex consisting of the decapping enzyme eIF4A, cap-binding subunit eIF4E and scaffolding protein eIF4G. eIF4G recruits the 43S pre-initiation complex (PIC), which includes the 40S small ribosomal subunit, eIF1, eIF1A, eIF3, eIF5 and the eIF2/Met-tRNAi/GTP ternary complex by interacting with eIF3. 15,16 Upon recognizing the 5′-end cap structure of mRNA, eIF4F recruits the 43S PIC complex, enabling the translation process. 17 This canonical translation initiation mechanism is called cap-dependent translation initiation and is the main mechanism of translation initiation in eukaryotic cells. 17,18 CircRNAs follow a noncanonical cap-independent translation mechanism due to the lack of a 5′ cap structure and depend on IRESs or MIRESs to bind to the initiation factor eIF4G2 complex and anchor the 43S complex for protein translation ( Figure 1B). Interestingly, under cell stress conditions, such as hypoxia, amino-acid starvation, endoplasmic reticulum (ER) stress and apoptosis, as well as after viral infection and exposure to physiological cell differentiation or synaptic network formation stimuli, this translation mechanism is initiated as an alternative to that of naturally endogenous mRNA translation mechanisms. 19,20

| Internal ribosome entry sites
IRESs are RNA elements that promote the recruitment of the 40S ribosomal subunit to the internal regions of mRNAs to initiate translation, and participate in translation initiation through a 5′-cap-independent mechanism. 21 IRESs were first identified in the 5′-untranslated region (5'-UTR) of small RNA viruses detected in animal viral RNA. 22 Specifically, an IRES is a 200-to 500-nucleotide-long sequence located in the 5'-UTR with a specific stem-loop structure that facilitates the translation of many pathogenic viruses. Later, studies revealed IRESs in eukaryotic mRNAs, and under stress conditions, IRES-mediated cap-independent translation can serve as an alternative mechanism for protein production from linear mRNAs in eukaryotic cells. 23 Follow-up studies have shown that circRNAs with IRESs can translate polypeptide chains from their ORFs. Interestingly, due to the special structure of circR-NAs, an IRES sequence in circRNA may be read multiple times; that is in the first read, the IRES sequence is recognized, and in the following reads, the IRES encoding information is translated. Multiple reads result in multiple rounds of translation, which is very common for circRNA-encoded protein production ( Figure 1C).

| N6-methyladenosine (m6A) modification
m6A is a methylation on the sixth nitrogen element of the adenosine base in eukaryotic RNA and is the most prevalent, abundant and dynamically reversible episodic transcriptome modification in mammals. 24 The m6A modification process relies on a recognized motif sequence 'RRm6ACH' (R = G or A; H = A, C or U). 25 The m6A modification is regulated by m6A methyltransferases, m6A demethyltransferases and m6A-binding proteins. Because abnormalities in m6A regulatory mechanisms can lead to the dysregulation of gene expression, including the activation and repression of oncogenes, they are often associated with tumour progression and play important roles in malignant progression and the acquisition of drug resistance in various types of tumours. 26,27 In addition, m6A plays a regulatory role in the translation of RNA. As an IRES, a MIRES stimulates selective mRNA translation under stress conditions by directly binding to the initiation factor eIF3. For circRNAs, m6A modification regulates not only the expression, distribution and function of circRNAs but also the translation of circRNAs. Many translatable endogenous circRNAs carry m6A modification sites, and MIRESs have been reported to act as IRES-like elements to drive the translation of circRNAs. 28,29 CircRNAs are efficiently translated via 19-nucleotide short consensus sequences (RRm6ACH) carrying m6A sites. 30 The m6A-binding protein YTHDF3 recognizes m6A and recruits eIF4G2 to the m6A site. eIF4G2 recognizes an IRES and initiates the assembly of the eIF4 complex, recruiting ribosomes and initiating translation. Moreover, a single m6A site is sufficient to initiate cap-independent translation. 31,32

| Rolling circle translation (RCT)
When a ribosome translates a protein encoded by a covalently closed circRNA and never encounters a stop codon within an ORF, multiple rounds of translation may occur until the ribosome encounters a stop codon within an ORF. Under certain circumstances, iRCT occurs. Specifically, when the ribosome reads the start codon to initiate translation without encountering the termination codon within an ORF in a circRNA, the result is an infinite open reading frame (iORF), which eventually leads to iRCT. 33,34 Theoretically, after this mode of translation has been initiated, extremely high-molecularweight proteins are eventually produced. Thus, it seems that RCT may be a novel mechanism of circRNA-encoded protein translation 35 ( Figure 1C).

| Prediction of necessary coding factors
ORF prediction, IRES prediction and m6A prediction are methods of determining whether circRNAs have protein-encoding capacity. An ORF is a nucleic acid sequence in RNA that begins at the AUG codon and continues through a series of three base sets to a stop codon.
In addition, regulatory elements upstream of the ORF, that is, an IRES or a MIRES, mediate the initiation of translation. For example, ORF Finder and sORFs.org can be used to find possible ORFs in the complete circRNA sequence provided, IRESite and IRES Finder can predict potential IRES on the provided circRNA sequences, and integrated bioinformatics tools such as circPRO and circRNADB. [36][37][38][39][40]

| Dual-luciferase reporter assay and m6A analysis
To determine whether circRNAs are translated, the presence of ORFs and functional IRES-like elements is can be confirmed with luciferase assays, while m6A-like elements is can be confirmed by m6A analysis. 41

| Insertion of protein tags (Flags) in combination with western blotting (WB)
Overexpression vectors can be constructed by inserting a protein tag upstream of an ORF putative stop codon; the following experi-

| How circRNA-encoded proteins regulate tumour cell proliferation and metastasis
The upregulated expression of circRNA-encoded proteins in tumour cells is important for tumour cell proliferation. Peng's team found that circAXIN1 expression was increased in gastric cancer tissues and promoted the proliferation and migration of gastric cancer cells by encoding the novel protein AXIN1-295aa. Mechanistically, AXIN1-295aa saturated available APC by competitively binding APC, resulting in the inability of AXIN1, CK1 or GSK3β to form a normal β-catenin destruction complex with APC, leading to the translocation of β-catenin to the nucleus where it activated downstream genes and ultimately promoted cell proliferation and migration. 45 Li et al. found that circ-EIF6 expression was increased in breast cancer tissues and that its encoded protein EIF6-224aa inhibited its ubiquitin-dependent degradation by interacting with the oncogene MYH9, thereby activating the MYH9/ Wnt/β-catenin pathway and ultimately promoting the proliferation and metastasis of triple-negative breast cancer cells. 46 Similarly, another study identified increased expression of circHER2 and its encoded protein HER2-103aa specifically in HER2(-) triple-negative breast cancers. Mechanistically, HER2-103aa induced the formation of epidermal growth factor receptor (EGFR)/EGFR homodimer and/or EGFR/HER3 heterodimer to maintain AKT phosphorylation, activate the downstream PI3K-AKT pathway and promote triple-negative breast cancer cell proliferation and progression. 47 A recent study showed that circG-  61 Liang's team found that circPLCE1 was also expressed at low levels in colorectal cancer cells and inhibited the proliferation and metastasis of colorectal cancer cells by encoding the PLCE1-411aa protein.
Mechanistically, circPLCE1-411aa binds to the HSP90α/RPS3 complex and promotes the dissociation of RPS3, an important regulator of NF-κB, from the complex, leading to ubiquitin-dependent degradation of RPS3 and ultimately inhibiting NF-κB signalling, thereby suppressing the proliferation and metastasis of colorectal cancer cells. Another recent paper reported that circSEMA4B, which is expressed at low levels in breast cancer tissues, inhibits the proliferation and metastasis of breast cancer and revealed through experimental studies that circSEMA4B exerts its biological effect not only by sponging miR-330-3p but also by encoding a novel protein, SEMA4B-211aa. SEMA4B-211aa inhibits the production of PIP3 by competing with p110 for p85 binding, thereby inhibiting the phosphorylation of the Thr308 site of AKT and ultimately negatively regulating the PI3K/AKT signalling pathway, thereby inhibiting breast cancer progression. 64 The circAKT-encoded protein AKT3-174aa, circHEATR5B-encoded protein HEATR5B-881aa, circSHPRHencoded protein SHPRH-146aa, circPINTexon2-encoded protein PINT-87aa and circFBXW7-encoded protein FBXW7-185aa were also found to function as inhibitors of tumour progression, and their expression was decreased in tumours (Table 1) ability through this abnormal glycometabolism process. This particular glycolytic behaviour not only provides nutrition for cancer cells but also makes the tumour environment more acidic, leading to extracellular matrix destruction and thus inducing tumour metastasis. 66 In contrast, Liu's team found that high expression of circPGD in gastric cancer tissues promoted EMT and inhibited the apoptosis of gastric cancer cells. Moreover, circPGD has been shown to exert biological effects not only by sponging miR-16-5p but also by encoding a novel protein, PGD-219aa, which promotes gastric cancer progression by acting on the SMAD2/3 and YAP signalling pathways 67 ( Figure 3C).

| CircRNA-encoded proteins regulate tumour cell proliferation by regulating the tumour microenvironment
A study revealed that hsa-circ-0000437 expression was decreased in endometrial cancer tissues versus normal tissues.
hsa-circ-0000437 has been found to function as an inhibitor of    Figure 3D).

| CircRNA-encoded proteins regulate apoptosis to interfere with tumour cell death
Zhang's team found that circDIDO1 expression was decreased in gastric cancer tissues and that it inhibited apoptosis. Their experimental study revealed that circDIDO1 exerted its biological function by encoding a novel protein, DIDO1-529aa. Poly ADPribose polymerase 1 (PARP-1) recognizes single-strand breaks in DNA and repairs them, and inhibition of PARP-1 leads to decreased DNA damage repair and causes apoptosis in cancer cells.
circDIDO1 encodes the DIDO1-529aa protein, which interacts with and inhibits the activity of PARP1, which is closely related to apoptosis, thereby suppressing the DNA repair ability of PARP1, and ultimately promoting the apoptosis of gastric cancer cells 69 ( Figure 3A).

| CircRNA-encoded proteins regulate autophagy to intervene in tumour cell survival
Autophagy is an orderly process of intracellular self-digestion, a biological function that enables cells to overcome nutrient defi- in turn leads to reduced gefitinib resistance in lung adenocarcinoma cells. 68 Duan's team found that circMP3K4 expression was increased in hepatocellular carcinoma and that IGF2BP1 recognized the m6A site on circMAP3K4 and promoted its translation to generate circMAP3K4-455aa. Mechanistically, circMAP3K4-455aa protects the N-terminus of AIF by binding to endogenous AIF in mitochondria, thereby preventing AIF cleavage and inhibiting its nuclear distribution to prevent cisplatin-induced apoptosis of hepatocellular carcinoma cells. 70 Another study found that circMRPS35 was highly expressed after sorafenib treatment in hepatocellular carcinoma. circATG4B encodes the protein ATG4B-222aa, which is a decoy to prevent ATG4B from competitively binding TMED10, thereby allowing an increased amount of ATG4B to be released and accelerating autophagy, ultimately leading to increased autophagy and inducing chemoresistance in colorectal cancer cells 72 ( Figure 4D).

| PROS PEC TS FOR CLINI C AL TRE ATMENT APPLI C ATI ON S
Based on the review of the above studies, considering that circRNA-

ACK N O WLE D G E M ENTS
We would like to acknowledge the reviewers for their helpful comments on this paper.

This research was funded by the National Natural Science
Foundation of China (81902515) and Natural Science Research Project of Higher Education in Anhui Province (KJ2021A0857).

CO N FLI C T O F I NTE R E S T S TATE M E NT
The authors confirm that there are no conflicts of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
Data sharing not applicable to this article as no datasets were generated or analysed during the current study.