Conservation of a core neurite transcriptome across neuronal types and species

The intracellular localization of mRNAs allows neurons to control gene expression in neurite extensions (axons and dendrites) and respond rapidly to local stimuli. This plays an important role in diverse processes including neuronal growth and synaptic plasticity, which in turn serves as a foundation for learning and memory. Recent high‐throughput analyses have revealed that neurites contain hundreds to thousands of mRNAs, but an analysis comparing the transcriptomes derived from these studies has been lacking. Here we analyze 20 datasets pertaining to neuronal mRNA localization across species and neuronal types and identify a conserved set of mRNAs that had robustly localized to neurites in a high number of the studies. The set includes mRNAs encoding for ribosomal proteins and other components of the translation machinery, mitochondrial proteins, cytoskeletal components, and proteins associated with neurite formation. Our combinatorial analysis provides a unique resource for future hypothesis‐driven research.

Specific patterns of localization for these mRNAs are mediated by cis-regulatory elements (zipcodes) usually found in their 3 0 UTRs. These elements are bound by specific RNA-binding proteins (RBPs) that link their targets to the transport machinery and direct their trafficking to the sites where they function. Many of these "zipcodes" and RBPs are conserved across diverse cell types and species, indicating that RNA localization likely has fundamental biological functions. RBPs can both control mRNA localization, for example, by interacting with motor proteins, and regulate translation of localized mRNAs.
Recent high-throughput studies have identified hundreds to thousands of mRNAs that specifically localize to neurites (Briese et al., 2016;Cajigas et al., 2012;Ciolli Mattioli et al., 2019;Farris et al., 2019;Feltrin et al., 2012;Lein et al., 2007;Maciel et al., 2018;Middleton, Eberwine, & Kim, 2019;Minis et al., 2014;Poon, Choi, Jamieson, Geschwind, & Martin, 2006;Poulopoulos et al., 2019;Rotem et al., 2017;Saal, Briese, Kneitz, Glinka, & Sendtner, 2014;Taliaferro et al., 2016;Taylor et al., 2009;Toth et al., 2018;Tushev et al., 2018;Zappulo et al., 2017). There has yet to be a systematic analysis comparing these datasets to determine the extent to which they overlap, and might thus provide important functional insights, as well as factors that might contribute to differences (e.g., neuronal type, neurite isolation method, library preparation techniques, etc.). Here we compare 20 datasets on neuronal mRNA localization from the literature, all of which carried out analysis of separated neuronal compartments, to identify a core neurite transcriptome shared across different experimental setups. We found that this common set encodes components of the translation machinery and cytoskeleton, and proteins associated with mitochondria and neurite formation. The analysis clearly distinguished the transcriptomes of cell lines and neurons generated in vitro from those of primary neurons, but surprisingly, did not reveal clear differences in those obtained from different types of primary neurons. Our comprehensive analysis provides a valuable resource for future studies on mRNA localization.

| STUDYING mRNA LOCALIZATION: FROM IMAGING TO IN SITU SEQUENCING
Studies of neurite-specific transcriptomes have been based on different approaches, each of which entails particular methodological challenges. In situ hybridization is based on the use of fluorescent probes complementary to mRNAs of interest and has been long used to study mRNA localization. Single-molecule fluorescence in situ hybridization (smFISH) represents a further improvement of this technique; it applies multiple fluorescent probes for each transcript to be analyzed and permits the detection of individual RNA molecules. Many of the first mRNAs known to be localized to neurites were identified using these techniques, and they are still frequently used as markers of mRNA localization. However, determining the full complement of neurite-localized transcripts required transcriptome-wide sequencing methods.
Only now have advances in imaging methods permitted high-throughput smFISH, which has been a challenge due to the limited number of available fluorophores that can be distinguished. To overcome this and increase the number of potential parallel targets, labs have developed sequential hybridization and combinatorial labeling techniques including seqFISH (La Manno et al., 2016;Lubeck, Coskun, Zhiyentayev, Ahmad, & Cai, 2014;Shah, Lubeck, Zhou, & Cai, 2016 and multiplexed error-robust FISH (MERFISH; K. H. Chen, Boettiger, Moffitt, Wang, & Zhuang, 2015). Multiplexing is achieved via multiple cycles of hybridization and stripping or photobleaching, with each hybridization step applying distinct fluorescent probes. MERFISH (K. H.  further introduced a two-step labeling procedure in which transcripts of interest are first hybridized with non-fluorescent probes that contain a sequence complementary to the selected transcript and two arms that are complementary to fluorescent readout probes. These readout probes are then used in multiple hybridization rounds, alternating with photobleaching, to detect the transcripts. Multiplexed smFISH permits evaluating both the copy numbers and localizations of thousands of mRNAs and holds great potential for studies of their localization in neurons. Another challenge in detecting mRNA localization through light microscopy is its limited resolution. To address this issue, F. Chen, Tillberg, and Boyden (2015) developed a method called expansion microscopy (ExM), which permits enlarging biological specimens. The authors introduced a polymer gel into fixed samples, triggering a chemically induced swelling that expands them by almost two orders of magnitude. ExM permits conventional microscopes to achieve nanoscale imaging.
Another approach to combine the spatial context of imaging with the high throughput aspect of RNA sequencing (RNA-seq) is spatial transcriptomics (Ke et al., 2013;Lee et al., 2014;Rodriguez-Munoz et al., 2015;Stahl et al., 2016;Vickovic et al., 2019). This approach relies on either spatial barcoding or in situ sequencing (ISS). In spatial barcoding (Rodriguez-Munoz et al., 2015;Stahl et al., 2016;Vickovic et al., 2019), tissue sections are placed on a glass slide coated with oligonucleotides that bear localization-specific barcodes and oligo(dT) to capture mRNAs. After their capture and reverse transcription, cDNA is released from the slide, sequenced and mapped to the tissue using barcodes. ISS involves the generation of cDNA within fixed tissue, followed by rolling-circle amplification and sequencing (ISS; Ke et al., 2013; fluorescence in situ sequencing, FISSEQ; Lee et al., 2014). Further advances in these methodologies and their applications to subcellular compartments promise to yield important new insights into questions of mRNA localization.

| ISOLATION OF SUBCELLULAR NEURONAL COMPARTMENTS FOR RNA SEQUENCING
While microscopy-based methods excel at analyzing subcellular patterns of mRNA localization, RNA-seq provides the best unbiased transcriptome-wide analysis. A number of approaches have been developed to isolate subcellular neuronal compartments for further RNA-seq to carry out transcriptome-wide analyses of mRNA localization. (a) Manual (Cajigas et al., 2012;Tushev et al., 2018) or laser capture microdissection (Farris et al., 2019;Simone, Bonner, Gillespie, Emmert-Buck, & Liotta, 1998;Zivraj et al., 2010) permits the isolation of subcellular compartments including dendrites, axons or axonal growth cones directly from the brain. However, material obtained from tissue samples is heterogeneous, containing not only different types of neurons but also non-neuronal cells. This limitation can be partially overcome by using immunopanning or fluorescence-activated cell sorting (FACS) to purify neuronal populations genetically labeled with a fluorescent marker (Lobo, Karsten, Gray, Geschwind, & Yang, 2006;Zhang et al., 2014). (b) Density gradient centrifugation has been used to isolate axonal growth cones (Poulopoulos et al., 2019) and synaptosomes, structures composed of pre-and postsynaptic compartments (Dunkley, Jarvie, & Robinson, 2008). This method relies on the homogenization of brain tissue to break off nerve terminals, which are then isolated by centrifugation. Similar to microdissection, this approach produces a heterogeneous population of neuronal types and can be combined with FACS (Poulopoulos et al., 2019). Subcellular compartments of neurons can also be separated by culturing the cells (c) in compartmentalized microfluidic chambers (Briese et al., 2016;Taylor, Dieterich, Ito, Kim, & Schuman, 2010) or (d) on microporous membranes (Ciolli Mattioli et al., 2019;Ludwik, von Kuegelgen, & Chekulaeva, 2019;Pertz, Hodgson, Klemke, & Hahn, 2006;Taliaferro et al., 2016;Zappulo et al., 2017) The latter separate cell bodies, which grow on top of the membrane, from neurites, which stretch through the pores and emerge on the lower side of the membrane. This method can be easily scaled up to produce enough material for multiple omics analyses, but does not always permit a separation of axons from dendrites.
Neurons can be isolated from the brain (yielding primary neurons) or generated through in vitro procedures from embryonic or induced pluripotent stem cells (iPSCs), using small molecules that direct neuronal differentiation (Wichterle & Peljto, 2008) or through the expression of neurogenic transcription factors (Chanda et al., 2014;Heinrich et al., 2011;Liu et al., 2015;Zappulo et al., 2017). Each system has distinct advantages. While primary cultures are more likely to recapitulate the properties of neuronal cells in vivo, stem cell-derived neurons are more homogeneous, easy to genetically modify, and reduce animal use. Moreover, generation of neurons from patient-derived iPSCs is the only way to obtain neurons with the genetic background of patients.
An alternative to separating subcellular compartments is proximity-dependent RNA labeling (Fazal et al., 2019;Wang et al., 2019). This method labels RNA pools using the peroxidase enzyme APEX2 (Fazal et al., 2019) which can be targeted to specific subcellular compartments, or the light-activated, proximity-dependent photo-oxidation of RNA nucleobases . The RNA pools localized in these ways can then be isolated and analyzed through RNA-seq. This approach has the advantage of providing subcellular high-throughput data from intact samples of tissues.
Further improvement of precise subcellular targeting approaches will be important to future studies of RNA localization. 4 | COMPARATIVE ANALYSIS OF TRANSCRIPTOME-WIDE NEURONAL mRNA LOCALIZATION DATASETS 4.1 | Local transcriptome shared across multiple datasets Advances in RNA-seq and imaging technologies over the last decade have yielded a number of datasets pertaining to neurite-localized transcriptomes. But these datasets have not been subjected to a comparative analysis, which hampers their interpretation; such an analysis would represent an essential resource for researchers in the fields of RNA localization and neurobiology. We have collected 20 datasets pertaining to levels of RNA expression in the subcellular compartments of neurons from mice, rats, and humans ( Figure 1). These include two datasets from cortical tissues (Lein et al., 2007;Poulopoulos et al., 2019), two datasets from cultured cortical neurons (Taliaferro et al., 2016;Taylor et al., 2009), three datasets from hippocampal tissue slices (Cajigas et al., 2012;Farris et al., 2019;Tushev et al., 2018), two datasets from cultured hippocampal neurons (Middleton et al., 2019;Poon et al., 2006), one dataset from dorsal root ganglia (DRG; Minis et al., 2014), three datasets from primary motor neurons (Briese et al., 2016;Rotem et al., 2017;Saal et al., 2014), two datasets from mouse embryonic stem cell (mESC)-derived neurons (Ciolli Mattioli et al., 2019;Zappulo et al., 2017), two datasets from human induced pluripotent stem cells (hiPSC) derived neurons (motor neurons, Maciel et al., 2018, as well as a mixture of GABAergic and glutamatergic neurons, Toth et al., 2018), and three neuroblastoma cell lines (Feltrin et al., 2012;Taliaferro et al., 2016). These datasets are derived from three different species, multiple model systems and different methods for separating subcellular compartments. This suggests that widely shared patterns likely represent a core set of components that have been conserved in a transcriptome localized to neurites.
Since these RNA-seq datasets were produced with custom settings that differed from study to study, we re-analyzed them all using the same pipeline (Wurmus et al., 2018). An exception was made for three RNA-seq datasets for which raw sequencing data were unavailable (Maciel et al., 2018;Poulopoulos et al., 2019;Rotem et al., 2017). Depending on the study, from 54 to~10,000 transcripts were detected with high confidence in neurites (transcripts per million [TPM] of library reads >10, Figure 1); this reflects the sensitivity of methods used over the last decade. The vast majority of transcripts were detected in multiple datasets (≥3, Figure 1, dark bars).
This analysis yielded a set we call the core neurite transcriptome, comprising 70 transcripts detected in >15 datasets (Table 1 and    and other translation-associated proteins (translation initiation and elongation actors), components of the cytoskeleton (β-actin, Map1b, Tubulinβ2A, Tau/Mapt, Arpc5, Dynlrb1), calcium-binding proteins (Calm1, Calm2), and proteins with roles in axon and dendrite formation (Ywhaz, Ywhae, Gap43, Tpt1, Stmn1). While well studied transcripts such as β-actin have long been known to localize to neurites (Bassell et al., 1998;Micheva, Vallee, Beaulieu, Herman, & Leclerc, 1998), the detection of mRNAs encoding ribosomal and mitochondrial proteins has mostly come through RNA-seq studies ( 2010). Additionally, the localization of transcripts encoding ribosomal proteins has also been confirmed by FISH (Poulopoulos et al., 2019;Zivraj et al., 2010). Recent work by our lab has shown that up to a half of the local proteome in neurites is likely to be established through translation of localized mRNAs (Zappulo et al., 2017). To determine whether the mRNAs we identified as components of the core neurite transcriptome are actually translated there, we checked for their presence in four datasets reporting locally translated mRNAs (Ainsley, Drane, Jacobs, Kittelberger, & Reijmers, 2014;Ouwenga et al., 2017;Shigeoka et al., 2016;Zappulo et al., 2017). Notably,~70% of transcripts consistently detected in neurites were also determined to undergo local translation in at least one of these studies (Table 1 and extended online Table 1).
Surprisingly, some of the transcripts commonly detected in neurites encode proteins with nuclear functions: these include histone H3F3b, chromosomal protein Hmgn1 and the transcription factors Tcf4 and Rnf10. For H3F3b, imaging-based analysis showed that the transcript is indeed present in dendrites but only at the proximal end (Cajigas et al., 2012). Such unexpected localization patterns might reflect regulatory mechanisms, such as an activity-dependent transport of proteins from neurites to the nucleus. In fact, this mechanism has been reported for the protein RNF10 (Dinamarca et al., 2016). The consistent detection of Rnf10 mRNA both in neurites and among ribosome-associated transcripts (extended online Table 1) suggests that this postsynaptic protein is locally translated and then transported to the nucleus upon synaptic stimulation. It remains to be determined whether a similar mechanism also regulates other transcripts that have known nuclear functions and are localized to neurites.

| Differences between the subcellular transcriptomes of neurons
The datasets included in our comparison are derived from neurites of different types of neurons. This means that some components of the transcriptomes are likely to be cell-type specific for major classes such as motor neurons or hippocampal neurons. To analyze the differences between the datasets in more detail and in a statistically unbiased manner, we performed principal component analysis (PCA, see Box 1; Figure 2a,b). This type of analysis requires a large overlap between the datasets it draws on, so our analysis included only datasets with high coverage and complexity (Figure 1, datasets on the left of the dashed line). PCA produced the following clusters: (a) rat hippocampal neuropil (principal component 2 or PC2); (b) two neuroblastoma cell lines, CAD and N2A (PC1 and PC3); (c) mESC-derived neurons (PC1 and PC3); (d) all other datasets, including different types of mouse primary neurons and hiPSC-derived motor neurons (PC1 and PC3). To understand which differences of expression underlie this clustering, we generated heatmaps for the transcripts which made the greatest contributions to each PC (Figure 2c). This analysis showed that the dataset from rat hippocampal neuropils lacks a number of transcripts detected in others, separating it from other datasets. This is most likely due to the fact that the annotation of the rat genome is still incomplete, and the study in question (Tushev et al., 2018) applied 3 0 -mRNA-seq, which sequences only the very ends of 3 0 -UTRs (Figure 2d, Figure S1). Datasets derived from the neuroblastoma lines and, to a lesser degree, mESC-derived neurons, differ most significantly from BOX 1 RNA-seq DATA COMPARISON USING PCA Comparison of properly normalized RNA-seq data can be performed with different methods: clustered heatmaps visualize groups of transcripts with similar expression values; PCA as a dimensionality reduction method can show similarity of samples or datasets based on underlying numerical values, most often expression data. PCA (Mardia, Kent, & Bibby, 1979;Venables, Ripley, & Venables, 2002) produces multiple independent orthogonal principle components (PCs), represented as axes along which samples can be grouped. This step requires values to be measured for each sample and transcript or gene; therefore, only transcripts that are detected in all samples can be analyzed. Because PCA performs a linear dimensionality reduction, it is possible to calculate the contribution of each transcript to each PC. Nonlinear dimensionality reduction methods, like t-SNE (van der Maaten & Hinton, 2008) or uniform manifold approximation and projection (UMAP) (Becht et al., 2018), allow the compression of the multidimensional relationships, calculated in PCA, into two or three dimensions. This enables visualization of all results on the same plot. However, due to the nonlinearity of transformation, the contribution of individual transcripts to specific PCs or relationships between the clusters cannot be extracted anymore.  F I G U R E 2 Legend on next page. primary neurons in terms of transcripts associated with neurogenesis and synaptic activity (Figure 2d; full transcript list in subsets of extended online Table 1). Contrary to our expectations, different types of primary neurons and tissue-derived samples (motor neurons, cortical neurons, hippocampal neurons and tissues, DRG) cluster together and cannot be clearly separated by PCA (Figure 2a,  b). To determine whether neuronal cell type-specific differences are detectable in cell bodies, we applied the PCA to somatic expression levels ( Figure S2a,b). Here, too, we were unable to distinguish different types of primary neurons and differences in expression are related to similar functional terms as in neurites ( Figure S2c). We next applied tdistributed Stochastic Neighbor Embedding (t-SNE), which collects all PCs into a two-dimensional space. Here, too, we were unable to distinguish neuronal cell types ( Figure S2d,e).
To determine whether these effects were due to contaminations by non-neuronal cell types, we assessed levels of the expression of genes specific to particular neuronal and non-neuronal cell types (Figure 2e). We observed higher levels of specific non-neuronal markers (for astrocytes and microglia) in primary neurons. This suggested that glia contamination might be a confounding factor that prevented us from discriminating neuronal cell types. Others might include an intrinsic heterogeneity among primary neurons, differences in the approaches used to separate the transcriptomes of neurites and soma, or in library preparation methods or general experimental handling. Notably, welldefined motor neuron (Hb9, Chat) and sensory neuron (Prph, Trpv1) markers were detected in the corresponding datasets. However, we did not observe clear signatures for pre-or postsynaptic markers (Figure 2e). Possible reasons might include the technical challenges involved in separating axons from dendrites and the fact that our analysis focused on levels of expression of mRNAs, while the best-established markers rely on immunostaining assays for synaptically localized proteins.
One possible strategy to deal with glial contamination, employed by Tushev et al. (2018), is to computationally filter out all transcripts that do not show enrichment in cultured primary neurons when compared with astrocyte cultures. To test whether the removal of non-neuronal contaminants improves neuronal cell type clustering, we applied the same filtering procedure to all of the datasets before performing the PCA ( Figure S3). This filtering did not permit PCA to produce a clearer division of datasets according to cell type. It is important to note that this filtering removes not only astroglial transcripts, but also transcripts shared between neurons and astroglia that might have functions in neurites. So it drastically reduced the overall number of transcripts shared between the datasets (from 8,809 to only 2,352) ( Figure S3).

| Robustly neurite-enriched transcriptomes
The term "mRNA localization" has been used in two different ways: (a) to signify the mere presence of a given mRNA in neurites; (b) as an enrichment of an mRNA in neurites compared to soma. Given that enrichment points to an active localization process, we decided to identify transcripts, which are consistently enriched in neurites versus soma across multiple datasets. We detected 61 transcripts significantly enriched in neurites in at least nine out of the analyzed 11 high coverage datasets (Table 2; given this small number of shared transcripts we were not able to perform further analyses, such as PCA). Strikingly, the majority of these transcripts (41 out of 61) encode ribosomal proteins. Interestingly, these transcripts are also ribosome-associated (Ainsley et al., 2014;Ouwenga et al., 2017;Shigeoka et al., 2016;Zappulo et al., 2017; Table 2 and extended online Table 2). These factors strongly hint at their local translation in neurites. Although ribosome assembly is generally assumed to happen in the nucleolus (reviewed in Klinge & Woolford, 2019), recent studies report that the cytosolic replacement of ribosomal proteins can serve as a mechanism of ribosome  maintenance (Mathis et al., 2017;Shigeoka et al., 2019). Moreover, it has recently became clear that ribosomes differ in their protein composition, resulting in heterogenous ribosome pools that translate specific subsets of mRNAs (Genuth & Barna, 2018;Shi et al., 2017). Given consistent enrichment of mRNAs encoding for ribosomal proteins in neurites, it is tempting to speculate that local transcriptome is translated by specialized ribosomes. Neurite-enriched transcripts include mRNAs that encode mitochondrial proteins (Cox6a1, Cox8a, Cyb5r3). These, alongside ribosomal proteins, were also the most abundant transcripts in neurites (Table 1). An analysis that emphasizes enrichment over mere presence points out a number of transcripts which are less abundant but consistently enriched in neurites (Table 2). These include the Ca 2+ -binding protein S100-a13, which facilitates signal peptideindependent protein secretion (Kathir et al., 2007); Nestin, involved in axon growth cone formation and early axon guidance (Bott et al., 2019); Upstream stimulatory factor 2 (Usf2) implicated in response to Ca 2+ -activated signaling pathways in neurons (W. G. Chen et al., 2003); and Rab-13, a key regulator of membrane trafficking and neurite outgrowth (Sakane et al., 2010) (Table 2).
Noteworthy, most transcripts commonly used as markers of axons, dendrites, and growth cones (such as Actb, Map1b,Map2,Dlg4,Camk2a,Fmr1,Gria1,Nrgn,Grin1,Bdnf,Arc) are not robustly enriched in neurites across multiple datasets, but rather equally distributed or even somatically enriched (Table 3). This is not surprising, taking into account that most of these markers have been established based on their detection in axons and dendrites using microscopy rather than through comparisons of their enrichment in neurites versus soma. Thus, depending on the Note: Table of all transcripts that are significantly enriched in at least nine out of the 11 analyzed datasets (Figure 1, left of dashed line). The data are presented largely as in Table 1, with the following extra columns: number of datasets reporting significant neurite enrichment, average of significant log2 neurite enrichment, rank of enriched transcript in extended online Table 1. methodology, the term "localized mRNA" could mean either presence or enrichment of the given mRNA in the specified subcellular compartment.

| CONCLUDING REMARKS
Advances in sequencing and imaging techniques have made it possible to carry out transcriptomic profiling at subcellular resolutions. Our comprehensive analysis of relevant datasets has identified a core set of transcripts localized and enriched in neurites that is common to multiple types of neurons from three species. This set includes mRNAs encoding ribosomal and mitochondrial proteins, cytoskeletal components, factors involved in the formation of neural processes, and nuclear proteins that may regulate neuronal gene expression in response to localized activity in neurites (Tables 1 and 2). The commonly accepted synaptic mRNA markers are mostly detected, but not enriched in neurites. Remarkably, differences in neurite transcriptomes do not suffice to distinguish different types of primary neurons ( Figure 2). This might be explained in two ways: (a) the core transcriptome of neurites we have identified is highly conserved between different neuronal types, hinting at potentially important general functions of localized transcripts; (b) differences between cell types might be masked by sample heterogeneity, through factors such as the presence of multiple neuronal types, contamination with non-neuronal cells, the maturity of the neurons analyzed and other confounders which are more highly represented in the dataset than type-specific factors. Our analysis of soma transcriptomes ( Figure S2) suggests that the second explanation is more likely. Clarifying this issue will require studying diverse types of neurons using the same experimental workflow. Advances in new technologies that can generate highthroughput data from intact tissue at a subcellular resolution will surely provide more insights into processes that govern RNA localization in cell-type-specific ways.
In addition to the core neurite transcriptome, our comparative analysis allows informed choice of the test system for future localization studies. Due to their homogeneity, ease of genetic manipulation and compliance with the replacement principle aimed at reducing animal use in research, in vitro differentiated neurons and neuronal cell lines represent a system of choice for general mechanistic studies. They, however, do not express as many synaptic markers as primary neurons (Figure 2e) and should be carefully evaluated before using them for studies on biological functions of specific neuronal genes. Our extended online Table 1 allows to test whether the expression pattern of a transcript of interest is recapitulated in one of the neuronal cell lines or stem cell-derived neurons and whether they would therefore represent a proper test system, or studies should be preferably performed in primary cells. Our analysis thus provides an indispensable resource for future localization studies.  Table 2, with the extra column: rank in extended online Table 2. For information on enrichment in specific datasets see extended online Table 2.

| METHODS
Raw RNA-seq data were downloaded from the NCBI-GEO or EBI ArrayExpress databases (Table S1) and processed using the PiGx RNA-seq pipeline version 0.0.10 (Wurmus et al., 2018) using default settings and ENSEMBLE genome assemblies for mouse (GRCm38.p6, release 96), human (GRCh38.p13, release 97), and rat (Rnor_6.0, release 98). Based on the initial analysis reports generated by the PiGx pipeline individual replicates that did not separate well between neurite and soma samples were excluded from further analysis (Table S1). Transcript count data based on genome mapped reads (STAR counts) generated by the pipeline were gathered, normalized to TPM and averaged for neurite and soma samples. Transcript count data or detection status from studies that did not deposit raw datasets (Maciel et al., 2018;Poulopoulos et al., 2019;Rotem et al., 2017) or did not use RNA-seq methods (Cajigas et al., 2012;Feltrin et al., 2012;Lein et al., 2007;Poon et al., 2006;Saal et al., 2014;Taylor et al., 2009) were obtained from the corresponding supplementary materials of those studies and normalized accordingly or treated as binary detected or not detected. All Ensembl gene IDs from human or rat were mapped to orthologous mouse IDs using Ensembl Biomart database. To prevent species-specific genes from affecting analysis, genes without orthologs between mouse, human and rat were excluded from comparative analysis.
All further analyses were performed using custom R scripts: PC analysis was performed with log10 normalized average neurite expression values. Hierarchical clustering of genes and datasets for heatmaps was performed using an Euclidian distance metric and the "complete" method. Gene ontology analysis was performed using the gProfileR package (Reimand, Kull, Peterson, Hansen, & Vilo, 2007), functional terms with more than 2000 proteins were excluded. We considered those transcripts detected in neurites that showed an average neurite expression level of TPM greater than 10 in at least three independent studies (extended online Table 1).
For the comparative analysis of different datasets and calculation of transcript enrichment in neurites only those datasets with high coverage (>5,000 transcripts with TPM > 10 detected) were chosen. We considered those transcripts for PCA analysis that had average expression levels of TPM > 10 both in neurite and soma samples of at least three independent datasets (8,809 transcripts). We decided to exclude transcripts detected in only one or two studies to ensure that differences between all included transcripts represent biological variability and not technical differences from different experimental setups. Calculation of transcript enrichment in neurites was performed using the DEseq2 R package (Love, Huber, & Anders, 2014), using raw counts for all transcripts with TPM > 1 and accounting for pairing of neurite and soma samples as a covariate. We considered those transcripts enriched in neurites that passed criteria for inclusion in PCA analysis and showed significant (p < 0.1) enrichment (log2-fold change >0) in at least one dataset (extended online Table 2).

ACKNOWLEDGMENTS
We thank Vedran Franke for discussions on data analysis strategies, Russ Hodge and members of Chekulaeva lab for helpful comments on the manuscript and data representation. Data analysis was performed by N.K. and the manuscript written by N.K. and M.C.

CONFLICT OF INTEREST
The authors have declared no conflicts of interest for this article.