Deciphering the Functional Long Non‐Coding RNAs Derived from MicroRNA Loci

Abstract Albeit the majority of eukaryotic genomes can be pervasively transcribed to a diverse population of lncRNAs and various subtypes of lncRNA are discovered. However, the genome‐wide study of miRNA‐derived lncRNAs is still lacking. Here, it is reported that over 800 miRNA gene‐originated lncRNAs (molncRNAs) are generated from miRNA loci. One of them, molnc‐301b from miR‐301b and miR‐130b, functions as an “RNA decoy” to facilitate dissociation of the chromatin remodeling protein SMARCA5 from chromatin and thereby sequester transcription and mRNA translation. Specifically, molnc‐301b attenuates erythropoiesis by mitigating the transcription of erythropoietic and translation‐associated genes, such as GATA1 and FOS. In addition, a useful and powerful CRISPR screen platform to characterize the biological functions of molncRNAs at large‐scale and single‐cell levels is established and 29 functional molncRNAs in hematopoietic cells are identified. Collectively, the focus is on miRNA‐derived lncRNAs, deciphering their landscape during normal hematopoiesis, and comprehensively evaluating their potential roles.


Introduction
In mammals, the transcriptional landscape is far more complicated than what we have known and what we originally imagined.Only a small proportion of the mammalian genome can be transcribed into proteincoding mRNA, while the vast majority of them pervasively produce diverse and numerous long non-coding RNAs (lncRNAs).Typically, lncRNAs are defined as transcripts that are longer than 200 nucleotides (nt) and without protein-coding potential.LncRNAs exhibit a surprisingly wide range of sizes, shapes, and functions and are derived from introns or antisense transcripts of protein-coding genes, enhancers, and other open genomic regions.Over the past decade, a wide variety of lncRNAs have been identified due to advances in transcriptome sequencing technology.For example, sno-lncRNAs are intron-derived lncRNAs with snoRNA caps at both ends. [1]When introns with snoRNA ends are processed by snoRNA machinery and the internal sequences are not degraded, sno-lncRNAs lacking 5′-caps or poly(A)-tails accumulate. [1]In contrast, introns without snoRNA ends can be trimmed into circular intronic lncRNAs, [2] which are generated by sequences that escape during the debranching of intron lariats. [2]Natural antisense transcripts are transcribed from the opposite strands of protein-coding or sense strands and are presumably mRNA-like lncRNAs, which can interfere with transcription or mRNA stability. [3]Additionally, the 5′-snoRNA-ended and 3′polyadenylated lncRNAs derived from read-through transcripts sequester multiple RNA-binding proteins in human embryonic stem cells (hESCs) and regulate alternative splicing. [4]Moreover, transcripts derived from enhancers are defined as enhancer RNAs. [5,6]irtually all RNA species in eukaryotes are subject to complicated post-transcriptional processing.In particular, microRNA (miRNA) biogenesis occurs via multiple steps: nuclear processing of primary miRNA (pri-miRNA) by the Drosha/DGCR8 complex, nuclear export of precursor miRNA (pre-miRNA) by Exportin 5, cytoplasmic processing of pre-miRNA by Dicer, and strand selection in the Argonaute complex. [7]The studies reported by us and other colleagues have shown that miRNA processing is tightly regulated, [8][9][10] leading to the accumulation of certain intermediates, such as pri-miRNAs or pre-miRNAs.Very similar to lncRNAs, the pri-miRNAs transcribed from miRNA loci are usually >200 nt.Whereas, only the stem-looped precursors are required to generate mature miRNAs. [7]Therefore, it might be possible that these long transcripts provide the raw material for the evolution of diverse lncRNAs.Indeed, several studies have revealed the existence of such functional pri-miRNAs, including H19 (processed from the host gene of miR-675), which play important roles in different biological processes. [11,12]In addition, linc-MD1 (processed from the host gene of miR-133b) has been shown to regulate muscle differentiation by acting as a ceRNA in mouse and human myoblasts [13] and LEADeR (processed from the host gene of miR-205) orchestrates human prostate basal luminal differentiation. [14]Moreover, Chang et al. developed an approach to annotate genomic locations of pri-miRNAs from RNA sequencing (RNA-seq) data, [15] while Dhir et al. depicted a novel transcription termination mechanism of lncRNA genes that host miRNAs (lnc-pri-miRNAs) [16] and Daniel found that one of the functional lnc-pri-miRNA loci, LOC646329/MIR29HG, could regulate glioblastoma multiforme cell growth independent of its cognate miRNA. [17]However, it's still unknown whether there are any lncRNAs derived from the vast majority of other miRNA loci, and how these miRNA geneoriginated lncRNAs (molncRNAs) exert their biological function in normal bioprocess.
Here, we identified >860 molncRNAs from 303 annotated human miRNA loci and clarified a global molncRNA landscape during hematopoiesis.Among the molncRNAs identified, molnc-301b coordinated the expression of key erythroid regulators at both transcriptional and translational levels to attenuate erythropoiesis either in human hematopoietic stem cells (HSCs) or ESCs, and such effect is independent of its two cognate miRNAs, miR-130b and miR-301b.Finally, we combined pooled CRISPRperturbation with single-cell transcriptome sequencing to screen the potential functional molncRNAs on a genome-wide scale and identified 29 functional molncRNAs.

Identification of molncRNAs by Single-Molecule Long-Read Sequencing in Human Hematopoietic Cells
To determine the dynamic expression of molncRNAs during erythropoiesis, we employed a multi-step strategy combining PacBio Iso-seq with Illumina short-read RNAseq and RNA stability assays of differentiating hematopoietic cells (Figure 1A).
The PacBio Iso-seq of K562 cells, which represent a robust model for studying erythroid gene expression in vitro, yielded 41909348 high-quality subreads, from which 1221669 full-length non-chimeric (FLNC) reads were identified after poly(A) tail and concatemer removal.We then identified a high-quality set of 434485 non-redundant PacBio transcripts (see Experimental Section), including 12005 annotated mRNAs (286092 transcripts) and 2194 lncRNAs (10789 transcripts) (Figure S1A; Table S1, Supporting Information).
Subsequently, we generated the molncRNA set (860 transcripts), defined as any long PacBio isoform (>200 nt) overlapping with a pre-miRNA on the same strand, with no detectable protein-coding potential and no splicing sites shared with annotated protein-coding genes (see Experimental Section).Moreover, based on the location of miRNAs in the exonic or intronic region of their host genes, molncRNAs were divided into 2 subtypes, exonic miRNA gene-originated lncRNA (ex-molncRNA; n = 277) and intronic miRNA gene-originated lncRNA (in-molncRNA; n = 587) (Figure 1B).Notably, because some molncRNAs (n = 4) cover different miRNAs, they might be classified as ex-or in-molncRNA.We subsequently compared the identified molncR-NAs with annotated genes (Figure 1C, Figure S1B, Supporting Information) and classified them into three categories: i) 55 mol-ncRNAs that derived from the antisense strands of an annotated gene (e.g., molnc-3124; Figure S1C, Supporting Information, upper panel); ii) 19 molncRNAs that derived from the intergenic region (e.g., molnc-3685; Figure S1C, Supporting Information, lower panel), and iii) 786 molncRNAs that are located in the intragenic region, which are further classified into three subcategories based on their shared spliced sites with annotated gene (Figure 1C; Figure S1B, Supporting Information): full splice match (FSM), complete splice mismatch (CSM) and partial splice match (PSM).Among them, 116 FSM molncRNAs are identical to the annotated lncRNAs; 145 CSM molncRNAs have completely different splicing junctions to known genes, which are unannotated lncRNAs derived from the same genic loci; the remaining 525 PSM molncRNAs partially share splice sites with known genes, which might be different isoforms of known lncR-NAs (Figure 1C; Figure S1B, Supporting Information).
Similar to actively transcribed mRNAs and lncRNAs, the promoter regions of molncRNAs are also enriched with active markers such as H3K4 tri-methylation (H3K4me3), H3K27 acetylation (H3K27ac) modifications and Pol II signals (Figure 1D).The majority of molncRNAs ranged in length from 1800 to 3500 nt (25%-75%), which was similar to both the mRNAs and classic lncRNAs captured by PacBio Iso-seq (Figure 1E).Nevertheless, like classic lncRNAs, the molncRNAs were expressed at much lower levels than mRNAs (Figure 1F), especially the in-molncRNAs.Unlike protein-coding genes and classic lncRNAs, molncRNAs were generally generated as a unique transcript (Figure S1D, Supporting Information), suggesting that they are derived from relatively fixed transcription units and are less commonly processed by alternative splicing.Besides, most of the ex-molncRNAs had only one exon, whereas in-molncRNAs had a comparable number of exons to mRNAs and even more than classic lncRNAs (Figure S1E, Supporting Information).Thus, ex-molncRNAs exist mainly as intronless genes, which have been reported to show stronger tissue specificity and are often associated with signal transduction (e.g., G protein-coupled receptors), transcriptional regulation (e.g., histones), and other biological functions. [18,19]

Characterization of molncRNA
To systematically characterize molncRNAs, we also performed small RNA-seq.We found that half of the annotated miRNAs (n = 980) were detected in hematopoietic cells, of which 567 were located in an annotated coding gene region and 169 were located in ncRNA region (Figure S1F, Supporting Information).The aforementioned 277 ex-molncRNAs and 587 in-molncRNAs transcripts were derived from 303 annotated human miRNA genes (138 exonic and 204 intronic), accounting for approximately 16% (303 of 1881) of the total miRNA genes (miRBase v21).Among these, 78% (215 of 277) ex-molncRNA (105 miRNAs) and 78% (455 of 587) in-molncRNA (147 miRNAs) transcripts were co-detected with their corresponding mature miRNAs.Of note, corresponding miRNAs were not detected for 62 ex-molncRNAs and 132 in-molncRNAs, further indicating the possibility that molncRNAs fulfill independent roles.In addition, we can't detect any molncRNAs from other 1578 miRNAs, which might be due to the rapid processing of their primary transcripts to generate mature miRNAs (Figure S1F; Table S1, Supporting Information).Moreover, miRNAs co-expressed with molncR-NAs showed higher expression levels than those derived from non-molncRNA-detected loci (Figure S1G; Table S1, Supporting Information), which might be due to a higher transcription activity of both ex-or in-molncRNA loci favoring the output of both molncRNAs and miRNAs (Figure S1H, Supporting Information).In addition, we found that miRNAs tended to be enriched at the 3′ end of in-molncRNA loci and were evenly distributed in ex-molncRNA loci (Figure S1I, Supporting Information).
Interestingly, the comparison between molncRNA sequences and their cognate miRNAs revealed some new information regarding either miRNA clusters or their neighbor lncRNA genes (Figure 1G; Table S1, Supporting Information).a) The revision of the defined miRNA cluster, which means that the actual miRNA cluster is either beyond or less than the current definition in miRbase v21.Distant miRNAs may be located on the same transcription unit.For example, miR-6724-2, a miRNA 40 kb far from the miR-3648-1-3687-1 cluster locus, was also restricted to molnc-3648-1.In addition, clustered miRNAs might be derived from different pri-miRNAs, such as miR-17-92 cluster.Four miR-17-92 cluster-derived molncRNAs were identified.Among them, molnc-17 was presumed to contain all six miRNAs and molnc-19b-1 covered only miR-19b-1 and −92a-1, while the two molnc-92a-1 isoforms were specific to miR-92a-1.It is worth noting that molnc-19b-1 and molnc-92a-1 were also supported by RNA-seq short reads (Figure 1G).b) The redefinition of existing lncRNA, such as molnc-301b, a novel transcript which was distinct from the annotated lncRNA AP000553.1 (Figure 1C,G).c), The validation of the reported miRNA host gene.For example, the molnc-210 splicing site matched with MIR210HG transcript (Figure 1G).
To further develop an overview of the tissue-or lineagespecificity of molncRNAs, we compared their expression profiles with published RNA-seq datasets.Here, to avoid confusion with known protein-coding genes, molncRNAs overlapping by >500 nt were excluded, thereby yielding a dataset consisting of 239 ex-molncRNAs and 557 in-molncRNAs (Figure S1J, Supporting Information).PacBio Iso-seq was first used to construct the molncRNA reference sequences for RNA-seq read mapping (Figure S1K, Supporting Information).We then constructed a comprehensive atlas of molncRNA expression across 16 human tissues by comparison with the Human BodyMap 2.0 project. [20]tingingly, we found that 45% of ex-molncRNAs and 30% of in-molncRNAs displayed tissue-specific expression.These included the white blood cell-specific ex-molnc-223, −128-1, and −558, and the testes-specific in-molnc-379, −432, and −15a (Figure S1L; Table S2, Supporting Information).Subsequently, the hematopoietic expression atlas was established in 13 distinct blood cell populations. [21]We found that 64% of ex-molncRNAs and 67% in-molncRNAs showed lineage-specific accumulation, such as the top-ranked erythrocyte-specific molncRNAs exmolnc-22, −378a and in-molnc-15a, and −210 (Figure 1H; Table S2, Supporting Information).
We also assessed the conservation of molncRNA based on phastCons scores across 100 species by comparison with different genomic elements.The molncRNAs were less conserved than dominant mature miRNAs, pre-miRNAs, and CDSs, while similar to the lower conservation of classic lncRNAs (Figure 1I).Thus, we speculated that the primary route of functional lncRNA evolution is from junk transcripts corresponding to the "transcription first" model. [22]

Dynamic molncRNA Expression During Human Hematopoiesis
To further investigate the potential function of molncRNAs, we induced erythroid differentiation of CD34 + hematopoietic stem/progenitor cells (HSPCs) in vitro.Illumina poly(A) + RNAseq and small RNA-seq were performed to profile the expression of molncRNAs and their corresponding miRNAs, respectively (Figure 2A; Figure S2A,B, Supporting Information).We identified 82 ex-molncRNAs and 75 in-molncRNAs showing dynamic expression during erythropoiesis (Figure 2B; Figure S2C, Supporting Information).In addition, analysis of our previous RNAseq datasets of monocytic and granulocytic differentiation [9] revealed that 22 molncRNAs showed a consistent decrease (e.g., ex-molnc-3121, −4449, −4461 and in-molnc-4453-2) or increase (e.g., ex-molnc-616) in expression during erythroid, monocytic and granulocytic differentiation.Furthermore, the expression of 39 molncRNAs changed only during erythroid differentiation (e.g., decreased expression of ex-molnc-1254-1, in-molnc-1207, −3159 or increased expression of ex-molnc-421), and 30 molncR-NAs displayed opposite changes between erythroid and the other two myeloid lineages (e.g., decreased expression of ex-molnc-let-7d, -let-7i, and −223 or increased expression of ex-molnc-548c and in-molnc-6739 in erythropoiesis) (Figure 2B,D; Figure S2C,D; Table S3, Supporting Information).We also compared the changes in the expression of molncRNAs with their cognate miRNAs during erythropoiesis.Pearson correlation analysis revealed that 42 pairs showed a significantly high correlation.Of these, 22 exhibited changes consistent with those of miRNAs Iso-seq (n = 2 replicates).E) Length distribution of molncRNA, mRNA, and lncRNA based on PacBio Iso-seq.The median length is shown in brackets.F) Expression levels of molncRNA, mRNA, and lncRNA, determined by PacBio Iso-seq.p-Values were calculated by a two-sided Mann-Whitney test.****p <0.0001.G) Schematic illustration and integrative Genomics Viewer (IGV) plot of three kinds of relations among annotated lncRNAs, pre-miRNAs, and molncRNAs.The reference sequence in Ensembl and miRbase, full-length molncRNAs generated by Iso-seq and corresponding mapped short reads in poly(A) + RNA-seq are displayed in order from top to bottom in IGV plot.The thick lines represent exons and the thin lines represent introns.The arrows represent the transcription orientation.H) Heatmap of specific molncRNA expression across 13 distinct hematopoietic lineages (n = 2 replicates).I) Evolutionary conservation analysis of molncRNA and other annotated genomic elements based on phastCons scores.S3, Supporting Information) collected from the GO database using the following keywords: hematopoiesis, hematopoietic, and erythrocyte.The expression tendency of molncRNAs is shown on the left.D) Number of ex-and in-molncRNAs differentially expressed during (e.g., ex-molnc-223, −616, −22, and in-molnc-374b), and 20 showed inverse changes in expression compared with miR-NAs (e.g., ex-molnc-let-7i, −4449, −378a, and in-molnc-3157) (Figure 2B; Figure S2C,E, Supporting Information).
Since ex-molncRNAs contain miRNA sequences, they are more likely to serve as pri-miRNAs.Therefore, to specifically evaluate pri-miRNAs as a new type of lncRNA, we then focused mainly on ex-molncRNAs.To predict the involvement of ex-molncRNAs in erythropoiesis, we analyzed the correlation of their expression with critical erythroid genes defined by hematopoiesis-related terms from the Gene Ontology (GO) database [23] (Table S3, Supporting Information).We found that upregulated ex-molncRNAs correlated positively with essential erythroid activators (e.g., GATA1, KLF1, TAL1, and FOXO3) and negatively with erythroid repressors (e.g., GATA2 and SPI1), while downregulated ex-molncRNAs exhibited the opposite correlations (Figure 2C), indicating the potential roles of these ex-molncRNAs.

Analysis of molncRNA Stability During Human Erythroid Differentiation
To further screen functional molncRNAs, we performed RNA stability profiling to evaluate their half-life (Figure S2F, Supporting Information).Interestingly, molncRNAs had longer half-lives than annotated mRNAs and classic lncRNAs (Figure 2E; Figure S2G; Table S3, Supporting Information), providing compelling evidence for the functional significance of molncRNAs.RNAs were divided into five intervals (Q1-Q5) based on mRNA half-life quintile ranked from the longest to shortest (Figure S2H; Table S3, Supporting Information).Compared with annotated mRNAs and lncRNAs, molncRNAs were more commonly distributed in the relatively long half-life intervals (Q1 and Q2).

Molnc-301b Regulates Erythroid Differentiation Independent of its Cognate miRNAs
To investigate the function of molnc-301b, we first used 5′ and 3′ rapid-amplification of cDNA ends (RACE) to obtain the accurate transcript of molnc-301b in K562 cells (Figure S3A, Supporting Information).We obtained a 2,460 nt transcript with a poly(A) tail, which was nearly (identity: 99.53%) matched with the sequences captured by PacBio Iso-seq.Thus, we concluded that the human molnc-301b locus was located on chromosome 22 and comprised two annotated miRNA genes (miRBase v21): miR-301b and miR-130b (Figure 3A).To further determine the involvement of molnc-301b in erythropoiesis and distinguish it from the mature miRNAs, we constructed a series of molnc-301b mutants comprising miR-301b-3p or miR-130b-3p seed sequence mutants (301b_m1 and 301b_m2), a miR-301b-3p deletion mutant (301b_del) and two vectors expressing only the mature miRNAs (miR-301b and miR-130b).As expected, the wild-type molnc-301b (301b_wt) generated both molnc-301b and the two mature miRNAs, while 301b_m1 and 301b_m2 expressed molnc-301b but no mutant miRNAs and miR-301b and miR-130b generated only the miRNAs (Figure 3B,C; Figure S3B, Supporting Information).Of note, the 301b_del expressed only molnc-301b but none of the two mature miRNAs (Figure 3C and S3B, Supporting Information), which might be due to the disruption of pri-miRNA structure caused by sequence deletion.
We transduced these constructs into HSPCs (Figure 3B) to evaluate their effects on erythroid differentiation.Flow cytometric analysis revealed that with the exception of miR-130b, overexpression of either wild-type or mutant molnc-301bs uniformly decreased the percentage of differentiated erythroid cells (CD71 high /CD235a + and CD71 low /CD235a + ) and globin-positive (HbF + ) cells (Figure 3D,E; Figure S3C, Supporting Information).CFU assays also showed consistent repression of CFU-Es following the introduction of all molnc-301bs except miR-130b (Figure 3F,G).Among these molnc-301b constructs, 301_del, which generates only molnc-301b, still inhibited erythropoiesis, thus indicating its independent role in this process.In contrast, loss-of-function analysis showed that molnc-301b reduction (Figure S3D, Supporting Information) significantly promoted the percentage of differentiated erythroid cells (CD71 high /CD235a + and CD71 low /CD235a + ) and HbF + cells (Figure S3E,F, Supporting Information), significantly increased the number of CFU-Es (Figure S3G,H, Supporting Information).Collectively, our findings show that molnc-301b regulates erythroid differentiation independent of its cognate mature miRNAs, suggesting that different transcripts (molnc-301b, miR-310b, and miR-130b) derived from the same genomic locus play divergent functions.
Besides CD34 + HSPCs, we employed a hESC differentiation system to genetically confirm the impact of molnc-301b on erythropoiesis.Using a CRISPR/Cas9 genome editing strategy, we generated the miR-301b-4×poly(A) knockin H1 ESC (H1 KI), in which a 4×poly(A) sequence was knocked into the sequence human hematopoiesis.Arrows indicate continuously up-or down-regulated molncRNAs.E) Half-life distribution of molncRNA, mRNA, and lncRNA in erythrocytes.The median half-lives are shown in brackets (n = 2 replicates).F) qPCR analysis of the relative expression levels of selected ex-molncRNAs and their cognate miRNAs during HSPC erythroid differentiation.Two primer pairs were used for each molncRNA (primers 1 and 2).Data represent the mean ± SD (n = 3 replicates).G) Decay curves of seven candidate ex-molncRNAs in K562 cells.LncRNA XIST and GATA1 mRNA were used as controls.Data represent the mean ± SD (n = 3 replicates).
Subsequently, we subjected the H1 WT and H1 KI ESCs to a multi-stage erythroid induction process. [24]Molnc-301b deficiency promoted the generation of cobblestone-like cells during hematopoietic differentiation (Figure 3H).Compared to H1 WT cells, an increase in the proportion of CD34 + /CD45 + hematopoietic cells was observed in H1 KI ESCs.Moreover, CD71 + /CD235 + erythroid cells were increased when the expression of molnc-301b was terminated (Figure 3I).In accordance with the flow cytometry results, we detected more CFU-Es derived from H1 KI ESCs compared with the number derived from H1 WT cells (Figure 3J; Figure S3L, Supporting Information).Collectively, this hESC-based genetic system further characterizes and validates the essential biological function of molnc-301b during erythropoiesis.

Molnc-301b Acts as a "Decoy" to Promote the Dissociation of the Chromatin Remodeling Protein SMARCA5 from Chromatin
In order to determine the molecular mechanism of molnc-301b, we first analyzed its subcellular localization and observed that molnc-301b is predominantly associated with chromatin in HSPCs and K562 cells (Figure 4A; Figure S4A, Supporting Information), implying its regulation at the chromatin interface.Subsequently, we employed a multi-omics strategy to identify its target genes.We used RNA-seq following molnc-301b overexpression to identify the RNAs regulated by molnc-301b, defined its binding proteins by RNA pull-down combined with mass spectrometry (MS) experiment, and determined its associated genomic loci via chromatin isolation by RNA purification and sequencing (ChIRP-seq) (Figure 4B).
Given the fact that the vast majority of molnc-301b is located in the nuclei, but not the cytosol, we mainly focused on the 96 nucleus-specific proteins identified by our RNA pull-down assays coupled MS for further study (Figure S4B; Table S4, Supporting Information).Further, molnc-301b binds mainly to chromatin, we focused on six biological processes related to chromatin function: DNA conformation change, chromatin remodeling, regulation of DNA metabolic process, histone deacetylation, positive regulation of chromosome organization and peptidyl-lysine modification (Figure 4E; Table S4, Supporting Information).Among the molnc-301b-associated proteins, SMARCA5 is a member of the SWI/SNF chromatin remodeling complex and plays indispensable roles during early hematopoiesis and erythropoiesis.Loss of SMARCA5 abrogates definitive hematopoiesis within the fetal liver and leads to anemia and embryonic lethality at day 18.5. [25]The interaction between molnc-301b and SMARCA5 were validated through RNA pull-down assays (Figure 4F).Molnc-301b was also specifically enriched from SMARCA5 immunoprecipitates by reciprocal immuno-precipitation (Figure 4G).
Next, to elucidate the downstream effects after the association of molnc-301b with SMARCA5, we compared the DEGs induced by SMARCA5 knockdown with those regulated by molnc-301b overexpression.Two SMARCA5-specific shRNAs (shA and shB) were used to suppress its endogenous expression (Figure S5A,B; Table S4, Supporting Information).Comparative analysis indicated that 366 genes were cooperatively regulated by molncRNA-301b and SMARCA5 (Figure 5A).Among them, nearly 80% (289 genes) showed consistent changes (Figure 5A,B; Table S4, Supporting Information), suggesting an antagonistic relationship between molnc-301b and SMARCA5.These findings indicate that molnc-301b acts as a "decoy" by binding to SMARCA5 and disrupting its association with chromatin in hematopoietic cells (Figure 5C).
To further reveal the DNA targets of molnc-301b, we performed chromatin isolation by RNA purification (ChIRP) of molnc-301b with antisense oligos tiled along the entire  Molnc-301b interacts with SMARCA5.A) Fractions of RNAs located in the chromatin, nucleoplasm, and cytoplasm of HSPCs.GAPDH, U1, and XIST RNAs were used as positive controls for cytoplasm, nucleoplasm, and chromatin location, respectively (n = 3 replicates).B) Experimental design of the molnc-301b mechanism study.C) Scatterplots showing DEGs of 301b_wt-, 301b_m1-, 301b_m2-, 301b_del-overexpressing HSPCs compared with the control.Significantly up-or downregulated genes were determined by p <0.05 (n = 2 replicates).D) GO functional enrichment analysis of coordinately activated and coordinately repressed genes.E) GO functional network showing chromatin-related terms of molnc-301b pull-down nuclear proteins in MS analysis.F) Western blot validation of proteins associated with molnc-301b in K562 cells.NC refers to proteins pulled down by magnetic beads.G) SMARCA5 RIP assay of K562 cells.Western blot showing SMARCA5 immunoprecipitation (upper panel).The relative fold enrichment of molnc-301b using SMARCA5 compared with IgG was determined by qPCR analysis (lower panel).SLC25A21-AS1 and pri-124 transcripts were used as negative controls.Data represent the mean ± SD (n = 3 replicates).p-Valuess were calculated by unpaired t-test.****p <0.0001, ns, not significant.molnc-301b transcript sequence.Both "even" and "odd" probe sets yielded comparable enrichment of expected molnc-301b sites over GAPDH (Figure 5D).Deep sequencing analysis of the retrieved chromatin fragments revealed that molnc-301b binding sites were localized mainly in genic regions.Furthermore, molnc-301b was significantly enriched in the promoter, exon, and TTS regions relative to the whole genome, suggesting its potential roles in regulating gene transcription (Figure S5C, Supporting Information).
To determine the target genes co-regulated by molnc-301b and SMARCA5, we compared their antagonistic targets with molnc-301b-occupied genes revealed by ChIRP-seq and found that 149 of 289 genes were bound by molnc-301b (Figure 5E; Table S4, Supporting Information).As SMARCA5 has been reported to promote chromatin remodeling and activate gene transcription, [26] we focused mainly on genes that were positively regulated by SMARCA5; that is, the genes that were negatively controlled by molnc-301b due to their antagonistic relationship (defined as "co-targeting genes").GO enrichment analysis showed that the molnc-301b and SMARCA5 co-targeting genes were enriched in ribosome biogenesis, ribonucleoprotein complex assembly, and HSC differentiation (Figure 5F), of which the enrichment of RNA translation-related pathways attracted our attention because previous studies have shown that reduced translation of certain transcripts in HSPCs specifically impairs erythroid lineage commitment. [27,28]Supporting this view is the observation that the majority of Diamond-Blackfan anemia cases are caused by heterozygous loss-of-function mutations of ribosomal proteins, with selective perturbation of their erythroid lineages. [27]Therefore, among the co-targeting genes, we selected the key erythroid regulators, including GATA1, KLF1, PUS7, and PCBP2, and translation-associated genes, including EIF3I, EIF3B, MRPL14, and NAT10, for validation.
Peak calling showed enrichment of the high-confidence molnc-301b ChIRP-seq peaks on these target genes (Figure S5D, Supporting Information).ChIRP-qPCR confirmed that molnc-301b is bound strongly to most of the target genes, such as GATA1 and NAT10 (Figure 5G).The ChIP-qPCR analysis showed that SMARCA5 enrichment on these genes (except EIF3I) decreased following overexpression of either wild-type or mutant molnc-301bs (Figure 5H).As expected, the transcription of all selected target genes was repressed (Figure 5I), verifying the transcriptional repression effects of molnc-301b on these genes.Furthermore, the protein levels of most of the target genes were also reduced (Figure 5J; Figure S5E, Supporting Information).Besides, we have performed an EMSA to detect whether the association of molnc-301b plays a role in segregating SMARCA5 from chromatin.Results showed that molnc-301b RNA could reduce the affinity of SMARCA5 on target DNA (Figure S5F, Supporting Information).Overall, these data suggest that molnc-301b, acting as a "decoy", antagonizes the function of SMARCA5 by attenuating its chromatin binding activity and thereby repressing gene transcription in hematopoietic cells.

Molnc-301b Orchestrates Protein Synthesis by Controlling Translation-Associated Genes
Intriguingly, the expression of translation-associated genes was significantly suppressed by molnc-301b in the antagonistic molnc-301b-SMARCA5 axis.Thus, we speculated that molnc-301b affects mRNA translation indirectly by regulating the expression of translation-associated genes.To test the hypothesis, we performed polysome profiling in cells overexpressing molnc-301b, which revealed a slight reduction in the polysome pools (Figure S6A, Supporting Information).In parallel, we tested the effects of molnc-301b on global RNA translation by employing lhomopropargylglycine (HPG), a methionine analog that is specifically incorporated in de novo protein synthesis, allowing the detection of nascent peptide synthesis by fluorescence (Figure 6A).Compared with the control cells, HPG incorporation into nascent proteins was moderately reduced following molnc-301b overexpression (Figure 6B,C).Op-puro incorporation assay showed similar results (Figure S6B, Supporting Information).Additionally, the surface sensing of translation (SUnSET) assay [29] also indicated a reduced protein production in molnc-301b overexpressing cells (Figure S6C, Supporting Information).Therefore, these findings indicate that the overall protein translation is influenced to some extent by molnc-301b.
To further determine the potential effects of molnc-301b on the translation of individual genes, we performed Ribo-seq [30] analysis of K562 cells overexpressing wild-type or mutant molnc-301bs.The quality control results of Ribo-seq (Figure S6E-G, Supporting Information, see Experimental Section) were consistent with previous studies. [31]Calculation of the translation efficiency (TE) as the ratio of RPFs relative to mRNA abundance showed that TE was decreased for most mRNAs in molnc-301b overexpressing cells (Figure 6D; Table S5, Supporting Information), thus indicating the translational suppression activity of molnc-301b.Among these genes exhibiting downregulated TE, GATA1, FOS, GYPC, and PRDX2 are essential hematopoietic regulators (Figure 6E).We verified the molnc-301b-mediated translation inhibition of GATA1 and FOS mRNAs via polysome profiling coupled with qPCR analysis (Figure 6F; Figure S6H,I, Supporting Information).Moreover, the translational repression activity of molnc-301b ultimately resulted in decreased levels of GATA1 and FOS proteins (Figure 6G; Figure S6J-L, Supporting Information).Indeed, it has been observed that GATA1 exhibits a shorter and more unstructured 5′ UTR than other transcripts, which might explain its translational sensitivity to reduced ribosome levels. [27]urthermore, we performed "rescue" assays by overexpressing GATA1 or FOS in molnc-301b overexpressed cells (Figure S7A,B, Supporting Information).Flow cytometric analysis revealed that overexpression of either wild-type or mutant molnc-301bs decreased the percentage of differentiated erythroid cells (CD71 high /CD235a + and CD71 low /CD235a + ) and HbF + cells.Subsequently, the re-introduction of GATA1 or FOS restored the validation of four erythroid-associated and four translation-associated gene loci.I) qPCR validation of four erythroid-associated and four translationassociated genes in HSPCs.J) Western blot validation of four erythroid-associated and four translation-associated genes.For Figure 5G-I   To systematically identify functional molncRNAs in hematopoietic cells, we combined pooled CRISPR-Cas9-based screening with scRNA-seq.In addition, we performed targeted amplification to more efficiently recover the guide RNAs (gRNAs) presented in each cell for better characterization of the genotypeto-phenotype relationships in single cells (Figure 7A). [32,33]ere, we performed loss-of-function screening of 361 miRNA loci (Table S6, Supporting Information).First, to assess the function of molncRNAs, we introduced 12 internal control (IC) genes, consisting of six negative regulators (IC−: GATA2, SPI1, ID2, SATB1, MAFB, and miR-221/222) and six positive regulators (IC+: HMGB2, GATA1, NFE2, KLF1, STAT5A, MYB) of erythroid differentiation.To distinguish the function of molncRNA and miRNA, we designed two pooled gRNA libraries.Library 1 (molncRNA (-) and IC) contained more than two pairs of gRNAs for each of the 12 ICs and the 361 miRNA loci, which were targeted by D1 and D2 gRNAs recognizing the 1-3 kb or 0.5-1 kb downstream region of the miRNA precursors, to specifically perturb molncRNA expression (Figure S8A; Table S6, Supporting Information).Library 2 (molncRNA (-/-) and IC) contained gRNAs for the same target genes as Library 1 but recognized the 1-2 kb upstream or 1-3 kb downstream region of the miRNA precursors (U1 and D1 gRNAs) to knock out both the molncRNAs and their cognate miRNAs (Figure S8A; Table S6, Supporting Information).Cells without gRNA were used as the non-targeting control (NTC).
The subsequent analysis was based on 56509 high-quality single-cell transcriptome profiles with unique gRNA assignments and an average of ≈100 cells per gene targeted (Figure S8C-E; Table S6, Supporting Information).Of note, most gRNAtargeted cells (79%-84%) were assigned to one gRNA, except for a small fraction of cell doublets that matched more than one gRNA (Figure S8D, Supporting Information).After scRNA-seq quality control (Figure S8F, Supporting Information), 3848 NTC cells and 8729 cells containing IC gRNAs were used as references.
We also compared the transcriptomes of different gRNAs targeting the same internal control genes.The multiple gRNA groups targeting the same gene tended to cluster together, confirming that independent gRNAs targeting the same gene had similar phenotypic consequences (Figure S8G, Supporting Information).In the molncRNA (-) and molncRNA (-/-) libraries, ≈44-46% of molncRNA gRNAs clustered closely with each other targeting the same molncRNA.To detect the reliable transcriptome perturbation after IC gene knockdown, we selected those cells in which the targeted IC was successfully repressed for further analysis (Figure S8H, Supporting Information).

Identification of Functional molncRNAs
Next, we performed uniform manifold approximation and projection (UMAP) of gRNAs.In the molncRNA (-/-) and molncRNA (-) libraries, ten distinct clusters were identified (Figure S8I, Supporting Information, left panel).Nevertheless, cell clusters did not match gRNA groups (Figure S8I, Supporting Information, right panel).GO functional enrichment analysis showed that the top 50 genes in the first and second principal components (PC1 and PC2) were involved in cell cycle-associated pathways, which is a reflection of differences in cell states (Figure S8J, K, Supporting Information).Therefore, the UMAP analysis primarily reflects the difference between cell states rather than molecular phenotypes caused by gRNA.Then to examine the transcriptome changes of gRNA, we performed differential expression analysis between molncRNAs-or IC -targeted cells and the NTC cells.We found that the knockdown of IC genes did cause expression changes of hematopoietic-related genes (Figure S8L, Supporting Information).Subsequently, we merged the DEGs in IC+ and IC-groups separately and removed DEGs overlapped in both groups, which might be a reflection of the phenotype triggered by gRNA infection.The filtered DEGs could separate cells with IC+ or IC-gRNAs (Figure S8M, Supporting Information).
Next, we determined the functional molncRNAs by comparison of similarities in transcriptome changes between molncRNAs-and IC-targeted cells (Figure 7C).The positive mol-ncRNAs (Figure 7C) were defined if their transcriptome changes  had more similarities to positive erythroid regulators (IC+), while the negative molncRNAs (Figure 7C) were that had more similarities to negative regulators (IC-, see Experimental Section).Based on this, we identified 124 functional molncRNA gRNAs (112 negative and 12 positive gRNAs) in erythropoiesis.Among them, 46 molncRNA gRNAs target high-confidence miRNA loci (see Experimental Section).To distinguish the function of lncRNA or miRNA, molncRNAs were divided into three categories by comparison of phenotypes of libraries 1 and 2: i).same, lncRNA and miRNA might have similar phenotypes; ii) different, lncRNA and miRNA have opposite phenotypes; iii) molncRNA (-) only, lncRNA has functions independent of its cognate miRNA (Figure 7D).Finally, we screened out 29 functional molncR-NAs (in "same", "different", "molncRNA (-) only" group).Accordingly, cells expressing negative molncRNAs gRNAs showed increased expression of erythroid positive regulators (e.g., MED1 and PRMT1), but decreased expression of negative regulators (e.g., GATA2 and ID2) (Figure 7E, upper panel).In contrast, in most cases, the opposite expression pattern of erythroid regulators was observed in cells expressing positive molncRNAs gRNAs (Figure 7E, lower panel).
Among the 29 functional molncRNAs, 21 were only detected in the molncRNA (-) library ("molncRNA only" group), such as molnc-4454, affecting the myeloid cell differentiation and lymphocyte proliferation gene expression (Figure 7F); 5 ("same" group) have similar function with their cognate miRNAs, such as molnc-301b, supportive of the accuracy of screening results.Moreover, CROP-seq also showed that molnc-301b knockdown could down-regulate the expression of hematopoietic differentiation or translational-related genes (Figure 7F).Additionally, 3 molncRNAs ("different" group) were shown to have opposite functions to their cognate miRNAs, such as molnc-15a/DLEU2.Molnc-15a/DLEU2 is predicted to promote erythroid differentiation and affect myeloid cell differentiation-and cell cycle-related gene expression (Figure 7F).Meanwhile, miR-15a knockdown upregulated the expression of cell cycle-related genes and inhibited erythroid differentiation, which is consistent with previous studies [35] (Figure 7F).Notably, 17 functional molncRNAs were also identified by PacBio Iso-seq in Figure 1.The negative regulators molnc-1206 and −3648-1 were downregulated at days 14 and 18 during erythroid differentiation, providing further evidence of their roles as functional lncRNAs.

Conclusion
More than 70% of miRNA genes in mammals are located in introns. [36]When miRNAs are derived from the introns of noncoding transcripts, host genes may be considered as only pri-miRNAs, non-functional by-products of miRNA processing, or lncRNAs with independent functions. [14]This makes us speculate that, apart from participating in miRNA processing, other mechanisms of action of pri-miRNA remain to be discovered.Here, we conducted systematic screening and identification of miRNA gene-originated lncRNAs expressed in hematopoietic cells based on single-molecule long-read sequencing technology.
Using PacBio Iso-seq combined with RNA-seq, we found that molncRNAs had long half-lives and showed dynamic expression during hematopoietic differentiation.Among them, molnc-301b functions as a "decoy" of SMARCA5 to suppress the expression of erythropoiesis-and translation-associated genes at the transcriptional level.In addition, we showed that molnc-301b also impeded the translational process of several erythroid gene-derived RNAs, and thereby attenuated erythropoiesis at post-transcriptional level.Finally, we developed a CRISPR-based platform for molncRNAs screening at single-cell resolution and eventually identified 29 functional molncRNAs in hematopoietic cells.Such a large-scale screening system is also applicable to identify content-specific and function-essential molncRNAs in other tissues and disease models.
Based on whether miRNAs were located in the exonic or intronic region of their host genes, molncRNAs were divided into exonic miRNA gene-originated lncRNA (ex-molncRNA) and intronic miRNA gene-originated lncRNA (in-molncRNA).Accordingly, examples of both exonic and intronic miRNA geneoriginated lncRNAs have been reported in previous studies.Among them, several ex-molncRNAs had been functionally investigated before they were identified as pri-miRNAs (Figure 7G), such as lncRNA-H19 from miR-675 locus.H19 maintains HSC quiescence in the adult bone marrow by restricting IGF2-IGF1R signaling in a miRNA-dependent manner. [11]Nevertheless, during embryonic hematopoiesis, H19 promotes pre-HSC and HSC specifications by regulating the demethylation of hematopoietic transcription factors (e.g., Runx1 and Spi1) in a miR-675independent manner. [12]These findings suggest the functional diversity of H19 in regulating embryonic emergence versus adult MolncRNA (-) containing more than two pairs of gRNAs for each of the 361 miRNA loci targeted by D1 and D2 gRNAs recognizing the 1-3 kb or 0.5-1 kb downstream region of the miRNA precursor to specifically perturb molncRNA expression.MolncRNA (-/-) containing gRNAs for the same 361 target genes, but recognizing the 1-2 kb upstream or 1-3 kb downstream region of the miRNA precursors (U1 and D1 gRNAs) to knock out both molncRNAs and the cognate miRNAs.IC, internal control gRNA.C) Pairwise similarity of transcriptome changes between molncRNA and IC+/IC-genes.The similarity of transcriptome changes was calculated as the formula on the lower panel (the method referred to Tian's work [77] ) D) Classification of functional molncRNA gRNAs by comparing with library 1 and library 2. E) Plot displaying a differential expression of erythroid-associated genes in cells with the indicated functional molncRNA (upper panel, negative molncRNA regulators; lower panel, positive molncRNA regulators) compared to cells with NTC cells.Circle size represents the proportion of cells expressing indicated erythroid differentiation genes.F) Volcano plot showing the DEGs caused by molncRNA or miRNA knockdown.DEGs in red (p-Value < 0.05), or other colors for genes belong to specific GO functional terms labeled in the diagram.G) i-iii.Models of ex-molncRNA function.i. Linc-MD1 binds miR-133 and miR-135 to act as a competing endogenous RNA (ceRNA) that abolishes miRNA repressing activity on MAML1 and MEF2C and controls muscle differentiation.ii.H19 binds to SAHH and inhibits its activity, thus mediating the demethylation of hematopoietic transcription factors (Runx1 and Spi1) in mouse embryonic hematopoiesis.iii.Molnc-301b acts as an "RNA decoy" molecule by binding to SMARCA5 to perturb its interaction with chromatin.iv-vi) Models of in-molncRNA function.iv) MIR100HG facilitates the interaction between HuR and its target mRNAs to regulate the cell cycle.v) LEADeR binds to promoters with an Alu element in the proximity of the interferon regulatory factor (IRF) binding site and negatively regulates prostate basal luminal differentiation.vi) LncHIFCAR/MIR31HG enhances the recruitment of HIF-1 and p300 to HIF-1 target promoters, and activates the HIF-1 transcriptional network, with the ultimate effect of regulating cancer development.
homeostasis of HSCs.Furthermore, whether the dominant function of different phenotypes is mediated by lncRNAs or miRNAs, requires further investigation.Another muscle-specific lncRNA, linc-MD1, is transcribed by a genomic locus containing miR-206 and miR-133b.Linc-MD1 "sponges" miR-133 and miR-135 and abolishes miRNA repression of MAML1 and MEF2C. [13]dditionally, in-molncRNAs are also biologically relevant in different contexts (Figure 7G).For instance, miR-100, miR-let7a-2, and miR-125b1 are embedded in the intron of the MIR100 host gene (MIR100HG), and MIR100HG-encoded lncRNA regulates the cell cycle by interacting with HuR to influence its association with target mRNAs. [37]Similarly, pre-miR-205 is located in the last intron/exon junction of MIR205HG.The processed transcript serves as a lncRNA that operates autonomously from miR-205 in prostate basal cell differentiation and has been reannotated as long epithelial Alu-interacting differentiationrelated RNA (LEADeR). [14]MiR-31 is harbored in the first intron of MIR31HG, which was identified as a HIF-1 co-activating lncRNA and renamed LncHIFCAR.LncHIFCAR enhances HIF-1 complex binding to the target loci and facilitates the recruitment of p300, driving oral cancer progression. [38]t became evident that the repertoire of genome-encoded RNAs is far more extensive and complex than previously thought.Daniel He et al. also found four miRNA gene-originated lncR-NAs that regulate cell proliferation even when miRNA production machinery is knocked down. [17]The finding of molncRNAs gives insight into bifunctional loci, one locus with two roles.Likewise, different functions might be assigned to molncRNAs and the derived miRNAs in different biological processes.For example, our CRISPR-based screen revealed that molnc-301b and miR-301b act synergistically to inhibit erythroid differentiation, whereas molnc-15a and miR-15a exhibit antagonistic functions in erythropoiesis.Thus, the pervasive transcription of complex genomes provides ample material for the functional innovation of diverse lncRNAs.
In summary, taking advantage of state-of-art technologies, including PacBio Iso-seq, CRISPR screen, and scRNA-seq, we have characterized over 860 miRNA gene-originated lncRNAs, which expanded our understanding of the transcripts overlapped with miRNA loci, as previous studies reported lnc-pri-miRNAs or miRNA host gene-encoded transcripts.Besides, we clarified their essential biological functions during hematopoiesis and uncovered the underlying mechanisms by which molncRNAs steer target gene expression at both transcriptional and posttranscriptional levels.Our study suggests a previously unappreciated and new potential regulatory role of miRNA host genes, the functions of which can be mediated by either miRNAs or the derived molncRNAs.
hESC Cells: The H1 hESC line (NIH code WA01), obtained from the WiCell Research Institute, was maintained on matrigel (Corning, New York, USA)-coated dishes (6-well plate) in mTeSR1 medium (Stem Cell Technologies).The fresh medium was changed every day.
CRISPR/Cas9 Mediated Knockin in hESC: For knockin, the targeting vector contains two homology arms (≈1500 bp in length) and an expression cassette of puromycin-resistant gene flanked by two loxP sites, followed by the 3×SV40 poly(A) signal sequence and a BGH poly(A) signal (a total of 4×poly(A) stop signal).It was worth noting that, to eliminate the confusion of mature miR-301b, a miR-301b precursor sequence upstream of the 4×poly(A) was also incorporated to maintain miR-301b expression upon molnc-301b termination.The targeting vectors were coelectroporated into H1 hESCs with plasmids expressing Cas9 and two gR-NAs.After 48 hours of puromycin-selection, hESC colonies were picked, expanded, and analyzed to identify insertions.The gRNAs and primers are listed in Table S7 (Supporting Information).
Cell Lines and Culture Conditions: Human erythroleukemia cell line K562 was maintained in RPMI 1640 medium supplemented with 10% FBS (Gibco, Carlsbad, CA, USA).293T cells were obtained from the cell resource center of the Institutes of Basic Medical Sciences, Chinese Academy of Medical Sciences, and grown in DMEM with 10% FBS.
RNA Extraction and Quantitative Real-Time PCR (qRT-PCR): Total RNA was extracted using Trizol (Invitrogen, Carlsbad, CA, USA), and cDNA was synthesized using M-MLV reverse transcriptase (Promega, Madison, WI, USA) from 1-4 μg of total RNA.qRT-PCR was carried out in the Bio-Rad CFX-96 (Bio-Rad, Foster City, CA, USA) using SYBR Premix Ex Taq kit (Takara, Dalian, China).Each assay was performed in triplicate.Primer sequences used for qRT-PCR are shown in Table S7 (Supporting Information).
Flow Cytometry: CD34 + HSPCs were harvested at indicated times and washed twice at 4 °C in PBS.Approximately 1×10 5 cells were washed and resuspended in PBS and stained with PE-conjugated anti-CD235a and APC-conjugated anti-CD71 (eBioscience, CA, USA) at 4 °C for 30 min.After the incubation, cells were washed with PBS, resuspended in 4% PFA (Solarbio, Beijing, China), and subjected to flow cytometric analysis.For www.advancedscience.comHbF analysis, HSPCs were washed with PBS and resuspended in 4% PFA at room temperature (RT) for 10 min.Cells were then washed with PBS, and resuspended in PBS/0.1% Triton X-100 at RT for 10 min.Next, cells were washed and resuspended in PBS and stained with PE-conjugated anti-HbF at 4 °C for 1 h.Flow cytometry was carried out on a C6 Flow Cytometer Instrument (BD Biosciences, San Jose, CA, USA).hESCs were collected on the fourth day of suspension culture and stained with PEconjugated anti-CD34 and PE-conjugated anti-CD45, respectively.On the fourth day of erythroid differentiation, hESCs were collected and stained with PE-conjugated anti-CD235a and APC-conjugated anti-CD71 at 4°C for 30 min.
RNA Stability Assay and Sequencing for molncRNA Lifetime: K562 cells were treated with 1 μg mL −1 actinomycin D and collected at indicated time points.Total RNA was extracted using Trizol (Invitrogen).For RNA sequencing, an equal amount of ERCC RNA spike-in control (Thermo Scientific, Waltham, MA) was added to the total RNA samples as internal control before library construction.Sequencing libraries were prepared using NEBNext Ultra Directional RNA Library Prep Kit.RNA stability assay was generated from two biological replicates.
RACE Analysis: To isolate the full-length molnc-301b, 5′ and 3′ RACE reactions were performed on poly(A) + RNA of K562 cells using the SMARTer RACE 5′/3′ Kit (TaKaRa, Dalian, China) according to the manufacturer's protocol.Primers used for the RACE experiment were listed in Supporting Information, Table S7.
Lentivirus Production and Cell Infection: Recombination lentiviruses for molnc-301b and miR-301b/130b overexpression were produced using a pWPXL vector.For lentivirus production, lentiviral vectors were cotransfected into 293T cells with packaging vectors psPAX2 (#12260, Addgene) and pMD2.G (#12259, Addgene) using lipofectamine LTX (Invitrogen, USA).Infectious lentivirus particles were harvested at 48 h after transfection, and filtered through 0.45 μm PVDF filters.The harvested viral particles were added into the HSPCs plus with 8 μg mL −1 polybrene for 12 h.Then the cells were replaced with fresh complete medium and subjected to the following experiments.2 unique shRNA constructs in lentiviral GFP vector of SMARCA5 (TL309248, ORIGENE) were used to package lentivirus and infect K562 cells.After 48 h of infection, the cells were selected with 1 μg mL −1 puromycin until the end of the culture.For RNA-seq, differentiated HSPCs at days 11, 14, and 18 and infected K562 cells were collected, and poly(A) + RNA was enriched and sequenced at Novogene (Tianjin, China).
Subcellular Fractionation: Cells (6 × 10 6) were washed in PBS and suspended in 400 μL Solution A (10 mM HEPES, 10 mM KCl, 1.5 mM MgCl 2 , 0.34 m sucrose, 10% glycerol, 1 mM DTT, 1×protease inhibitor cocktail (Roche Life Science, Indianapolis, USA), 0.1% Triton X-100).Mixed gently and incubated on ice for 5 min.The cytoplasmic and nuclear fractions were harvested by centrifugation at 1300 g for 4 min.The isolated nuclei were washed in 1 mL Solution A and incubated with 400 μL Solution B (3 mM EDTA, 0.2 mM EGTA, 1 mM DTT, protease inhibitor cocktail) on ice for 30 min.The nucleoplasm and chromatin fractions were separated by centrifugation at 1700 g for 4 min.
RNA Pull-Down: molnc-301b RNA was transcribed in vitro using Ribo-MAX large-scale RNA production systems-T7 (Promega).A single biotinylated nucleotide was added to the 3′ terminus of an RNA strand with the Pierce RNA 3′ End Desthiobiotinylation Kit (Thermo Scientific).RNA Pull-down assay was performed with the Pierce Magnetic RNA-protein Pull-Down Kit (Thermo Scientific) according to the manufacturer's instruction.In brief, 50 pmol of biotinylated RNA was mixed with 50 μL washed Streptavidin agarose beads (Invitrogen) and incubated at room temperature with agitation for 30 min. 2 × 10 7 K562 cells were lysed in standard lysis buffers (25 mM Tris-HCl pH 7.4, 150 mM NaCl, 1% Np-40, 1 mM EDTA, 5% glycerol) supplemented with protease inhibitor cocktail and 1 U μL −1 RNase Inhibitor on ice for 30 min followed by centrifugation.Then the RNA-binding beads were added to the cell lysate and rotated at 4 °C for 1 h.Beads were washed with 20 mM Tris-Cl three times and boiled in 1×SDS loading buffer.The proteins were detected by Western blot or separated in gradient gel electrophoresis followed by MS identification.
Western Blotting and Silver Staining: Protein samples were separated on 10% SDS-PAGE gels and then transferred to PVDF membranes.Membranes were blocked with 5% skimmed milk in TBS-T and incubated with primary antibodies at 4 °C overnight.Membranes were washed and incubated with HRP-conjugated secondary antibodies for 1 h at room temperature.Finally, membranes were washed and visualized with ECL substrate (Millipore).
Chromatin Isolation by RNA Purification (ChIRP): The ChIRP procedure was performed as described previously with the following modifications. [39]DNA probes were 22 nt and biotinylated (Tianyi Biotech).molnc-301b ChIRP probes were listed in Table S7 (Supporting Information).
About 2 × 10 7 K562 were harvested and washed twice by ice-cold PBS.Cells were then cross-linked by 1% glutaraldehyde at room temperature for 10 min, followed by quenching the cross-linking reaction with 1/10th volume of 1.25 m glycine at room temperature for 5 min.The cells were then washed twice by ice-cold PBS.Snap frozen by liquid nitrogen and stored at −80 °C.
Crosslinked cell pellets were resuspended in 1 mL nuclei lysis buffer (50 mM Tris-Cl pH 7.0, 10 mM EDTA, 1% SDS) with the addition of a protease inhibitor cocktail.,PMSF and RNase Inhibitor.Chromatin fractions were sonicated to have average DNA size ranging from 100-500 bp and spun at 4 °C at 16 100 g for 10 min.Cleared chromatin supernatant was saved for probe hybridization.For a typical ChIRP sample using 1 mL of lysate, remove 10 μL for RNA INPUT and 10 μL for DNA INPUT and place in Eppendorf tubes.Keep on ice till further use.Add 2 mL Hybridization Buffer (750 mM NaCl, 1% SDS, 50 mM Tris-Cl pH 7.0, 1 mM EDTA, 15% formamide, protease inhibitor cocktail, PMSF, RNase Inhibitor) and 100 μM probes to 1 mL of lysate.Mix well.Incubate at 37 °C with shaking overnight.Then, 100 μL of T-1 magnetic beads were added and incubated at 37 °C for 30 min with shaking.The beads were then washed 5 times at 37 °C with 1 mL wash buffer (2× SSC, 0.5% SDS, PMSF).5 min of end-toend rotation per wash was allowed.At the last wash, resuspend the beads well.Remove 100 μL and set aside for RNA isolation.Reserve 900 μL for DNA fraction.
For DNA isolation, the beads were further washed twice with DNA elution buffer (50 mM NaHCO 3 , 1% SDS, 100 μg mL −1 RNase A and 100 U mL −1 RNase H) at 37 °C for 30 min, including DNA INPUT.Then the crosslinking was reversed in the presence of 1 μg μL −1 protease K at 50 °C for 45 min.DNA was purified by PhOH:Chloroform: Isoamyl.Libraries were produced by employing the NEBNext Ultra RNA Library Prep Kit for Illumina (New England BioLabs).
Chromatin Immunoprecipitation and qPCR (ChIP-qPCR): K562 cells (1 × 10 7) per sample were cross-linked with 1% formaldehyde for 15 min.Cross-linking was neutralized with 0.125 m glycine, and cells were rinsed in PBS twice.Then chromatin was sonicated using a Diagenode Bioruptor (Diagenode, Seraing, Belgium) for 30 min with 30 s pulse/pause cycles in polycarbonate tubes on ice to break chromatin into 200-500 bp fragments.Unbroken debris was spun down, and then the chromatin was split into two equal portions.One was used for the control IgG antibody (Millipore, Darmstadt, Germany), and the other portion was incubated with SMARCA5 antibody (Abcam, ab3749).Salmon sperm-coated protein A/G beads (Millipore) were added to the two portions of the chromatin with equal volume.Then, the mixture of chromatin-antibody-protein-A/G beads was incubated overnight at 4 °C.After washing four times, immunoprecipitated DNA was eluted from beads and purified for subsequent qPCR test.All the ChIP-qPCR primers used in this study are listed in Supporting Information, Table S7 (Supporting Information).
DNA EMSA: The biotin-labeled target DNA probes, as well as corresponding cold probes, were synthesized by Tianyibiotech company (Tianyibiotech, Beijing, China).A total of 20 fmol of biotin-labeled DNA probes were incubated with 1 ug SMARCA5 recombinant protein purchased from Origene (Origene Technologies, Rockville, MD, USA) using the LightShift Chemiluminescent EMSA Kit Pierce, IL, USA) according to the manufacturer's protocol.
Competition experiments were performed with a 200-fold molar excess of the unlabeled probes (cold probe) or 7/14 ug molnc-301b RNA preincubation.The reactions were incubated at room temperature for 20 min before adding DNA loading dye and separated by native 8% PAGE.The probes used for the DNA EMSA experiment are listed in Table S7 (Supporting Information).
SunSET: For SUnSET assays, [29] K562 cells treated with ctrl or molnc-301bs were seeded at 4 × 10 5 cells mL −1 in 6-well plates.The control group was treated with 100 μg mL −1 of cycloheximide (CHX, Sigma) and puromycin.Puromycin pulses were performed by incubating the cells with 15 uL of 10 μg mL −1 puromycin for 15 min at 37 °C.Cells were then washed with cold PBS and lysed in RIPA buffer supplemented with 1 mM PMSF and protease inhibitor mixture.5-10 μg of the whole cell lysate were assayed by Western blot analysis using the anti-puromycin antibody.
Measurement of Protein Synthesis: HPG/OPP incorporation assays were performed to detect nascent protein synthesis using Click-iT HPG Alexa Fluor Protein Synthesis Assay Kits and Click-iT Plus OPP Protein Synthesis Assay Kits (Life Technologies).HPG or OP-Puro were added to the culture medium for 1 h, then cells were removed from wells and washed twice with PBS.Cells were fixed in 0.5 mL of 1% paraformaldehyde in PBS for 15 min on ice.Cells were washed in PBS, then permeabilized in 200 μL PBS supplemented with 0.5% Tx-100 (Sigma) for 20 min at room temperature.The azide-alkyne cycloaddition was performed using the Click-iT Cell Reaction Buffer Kit and azide conjugated to Alexa Fluor 488 (Life Technologies) at 5 μM final concentration.After the 30-min reaction, the cells were washed twice in PBS supplemented with 3% fetal bovine serum and 0.1% saponin, then resuspended in PBS supplemented with 4′,6-diamidino-2phenylindole (DAPI; 4 μg mL −1 final concentration) and analyzed by flow cytometry.
Ribosome Profiling: The ribosome profiling procedure was performed as described previously with the following modifications (n = 2 biological replicates). [40]Briefly, 500 μL lysate was prepared as described under polysome profiling.100 μL lysate was used to prepare the total RNA library, meanwhile, ribosome footprinting and subsequent library preparation of ribosome-protected RNA fragments (RPFs) was performed with 400 μL lysate.Add 10 U RNase I to 400 μL lysate and incubate for 45 min at room temperature with gentle mixing.Then the nuclease digestion was stopped by adding 1 μL RNase inhibitor.RPFs were purified with MicroSpinS-400 columns (GE Healthcare Life Sciences) followed by size selection, which was conducted using 15% TBE-urea gel.rRNAs were depleted with the NEBNext rRNA depletion kit (New England BioLabs).Following end repair and 3′adaptor ligation, RNAs were reverse transcribed using SuperScript III (Thermo Fisher, USA).
CRISPR Screen: The single gRNAs were designed according to the pre-miRNA sequences based on the publicly available online tool (http: //crispr.mit.edu/).Upstream 1 (U1) gRNAs were designed within the range of 1-2 kb upstream of the target miRNA precursors; Downstream 1 (D1) gRNAs were designed within the range of 1-3 kb downstream of the target miRNA precursors; Downstream 2 (D2) gRNAs were designed within the range of 0.5-1 kb downstream of the target miRNA precursors.In addition, we selected 12 positive/negative regulatory genes in the erythroid differentiation as IC+ or IC−.Each gene was designed with ≥2 pairs of gRNAs.The gRNAs are listed in Table S6 (Supporting Information).
The library construction strategy was introduced as follows: Library 1 was designed to knock out the 3′end of molncRNAs and retain the expression of the corresponding miRNA.Library 1 (molncRNA (-) and IC) included D1 and D2 gRNAs and also contained 102 pairs of IC gRNAs.Library 2 was designed to knock out the molncRNA and its cognate miRNA at the same time.Library 2 (molncRNA (-/-) and IC) included U1 and D1 gRNAs and IC gRNAs.Finally, these gRNAs were connected to the tandem double gRNA expression plasmid (pLV-hEF1a-EGFP-2A-Puro-U6-gRNA1-7sK-gRNA2) to obtain the gRNA libraries expressing EGFP green fluorescence.
To sort red+ and green+ populations, 3 × 10 6 of K562 cells in 7.5 mL media plus 8 μg mL −1 polybrene were transduced, with either of the EGFP (MOI ≈4) or mCherry (MOI ≈50) expressing viruses that had been packaged singly.Infected cells were collected and subjected to scRNA-seq.The libraries were prepared using a Single Cell 3′ Library Gel Bead kit V2 in replicate 1 and Single Cell 3′ Library Gel Bead kit V3 in replicates 2, 3 following the manufacturer's instructions.
Specific Amplification of Guide Barcodes: The enriched PCR procedure was performed as described previously [33] with the following modifications.A heminested PCR starting from 40 ng of full-length cDNA was used to enrich for gRNA barcodes.Q5 High-Fidelity DNA Polymerase (New England BioLabs) was used for PCR amplification according to the following PCR protocol: 1) 98 °C for 45 s, 2) 98 °C for 15 s, 65 °C for 20 s, then 72 °C for 60 s (14-16 cycles).PCRs were cleaned with Zymo DNA Clean & Concentrator-5 columns (Zymo, Orange County, CA, USA) and 15 μL of a 1:2 dilution of the first PCR and a 1:2 dilution of the second PCR were carried in each reaction.Then fragments of length 400-500 bp were selected using 8% TBE gel.Guide barcode libraries were sequenced with the Illumina HiSeq X Ten platform.The sequence of the primers used for specific amplification of guide barcodes is listed in Table S7 (Supporting Information).
The similarity of transcriptome changes was calculated based on the ratio of overlapping DEGs number and total DEGs number between molncRNAs-targeted cells and IC+ or IC-genes-targeted cells.Every mol-ncRNA gRNA had two similarity values, calculated between molncRNAstargeted cells and IC+ genes-targeted cells or molncRNAs-targeted cells and IC-genes-targeted cells.The cutoff defined by the plus and minus one standard deviation around the means of similarity ratio (similarity NTC, IC- / similarity NTC, IC+ ), the threshold is 0.57 to 1.43.Based on the new cutoff, molncRNA gRNAs were considered as having the potential to promote erythroid differentiation if its similarity molnc, IC+ was 1.75-fold greater than with similarity molnc, IC-.The molncRNA gRNAs were considered as having the potential to inhibit erythroid differentiation if its similarity molnc, IC-was 1.43-fold greater than with similarity molnc, IC+ .Finally, the functional gR-NAs in molncRNA (-) and molncRNA (-/-) libraries were compared, and divided molncRNAs and miRNAs into four categories: i) same, lncRNA and miRNA have similar phenotypes; ii).different, lncRNA and miRNA have opposite phenotypes; iii) molncRNA (-) only, lncRNA has functions independent of its cognate miRNA; iv.molncRNA (-/-) only, miRNA has functions independent of molncRNA.And the molncRNAs hosting high confidence miRNAs were retinted, which conserved in mammals, [75] disturbed specific biological processes upon CRISPR knockdown assays, or maximum(expression value) > 10 in human tissues (miTED database, [76] the type of human tissues are same as in Figure S1L, Supporting Information).

Figure 1 .
Figure 1.Identification and characterization of molncRNAs.A) Schematic illustration of molncRNA identification and characterization.B) MolncRNAs were divided into two subtypes, ex-molncRNAs (277) and in-molncRNAs (587), based on whether mature miRNAs are embedded in the exonic or intronic sequences of their host molncRNAs.C) Proportion of molncRNAs located in different gene regions.D) Metaplot showing the distribution of H3K4me3, H3K27ac, and Pol II ChIP-seq fragment depth within −3000 to 3000 bp centered around the 5′ end of molncRNA, mRNA, and lncRNA based on PacBio

Figure 4 .
Figure 4. Molnc-301b interacts with SMARCA5.A) Fractions of RNAs located in the chromatin, nucleoplasm, and cytoplasm of HSPCs.GAPDH, U1, and XIST RNAs were used as positive controls for cytoplasm, nucleoplasm, and chromatin location, respectively (n = 3 replicates).B) Experimental design of the molnc-301b mechanism study.C) Scatterplots showing DEGs of 301b_wt-, 301b_m1-, 301b_m2-, 301b_del-overexpressing HSPCs compared with the control.Significantly up-or downregulated genes were determined by p <0.05 (n = 2 replicates).D) GO functional enrichment analysis of coordinately activated and coordinately repressed genes.E) GO functional network showing chromatin-related terms of molnc-301b pull-down nuclear proteins in MS analysis.F) Western blot validation of proteins associated with molnc-301b in K562 cells.NC refers to proteins pulled down by magnetic beads.G) SMARCA5 RIP assay of K562 cells.Western blot showing SMARCA5 immunoprecipitation (upper panel).The relative fold enrichment of molnc-301b using SMARCA5 compared with IgG was determined by qPCR analysis (lower panel).SLC25A21-AS1 and pri-124 transcripts were used as negative controls.Data represent the mean ± SD (n = 3 replicates).p-Valuess were calculated by unpaired t-test.****p <0.0001, ns, not significant.
Quantification and Statistical Analysis: Statistical parameters are reported either in individual figures or corresponding figure legends.Quantification data are in general presented as bar/line plots, with the error bar representing mean ± SD, or boxplot, showing the median (middle line), first and third quartiles (box boundaries), and furthest observation or 1.5 times of the interquartile (end of whisker).Whenever asterisks are used to indicate statistical significance, *stands for p < 0.05, **p < 0.01, ***p < 0.001 and ****p < 0.0001.The n.a.represents "not available", and ns represents "not significant".All statistical analyses were done in R and GraphPad.