Osmoregulation of the transcriptome of the hypothalamic supraoptic nucleus: A resource for the community

The hypothalamic supraoptic nucleus (SON) is a core osmoregulatory control centre that deciphers information about the metabolic state of the organism and orchestrates appropriate homeostatic (endocrine) and allostatic (behavioural) responses. We have used RNA sequencing to describe the polyadenylated transcriptome of the SON of the male Wistar Han rat. These data have been mined to generate comprehensive catalogues of functional classes of genes (enzymes, transcription factors, endogenous peptides, G protein coupled receptors, transporters, catalytic receptors, channels and other pharmacological targets) expressed in this nucleus in the euhydrated state, and that together form the basal substrate for its physiological interactions. We have gone on to show that fluid deprivation for 3 days (dehydration) results in changes in the expression levels of 2247 RNA transcripts, which have similarly been functionally catalogued, and further mined to describe enriched gene categories and putative regulatory networks (Regulons) that may have physiological importance in SON function related plasticity. We hope that the revelation of these genes, pathways and networks, most of which have no characterised roles in the SON, will encourage the neuroendocrine community to pursue new investigations into the new ‘known‐unknowns’ reported in the present study.

large peptidergic neurones project axons via the internal zone of the median eminence to axon terminals located adjacent to blood capillaries in the pars nervosa (PN) neurovascular interface of the pituitary gland. 3,4 Two discrete populations of SON MCNs produce two major neuropeptide hormones secretory products, namely arginine vasopressin (AVP, encoded by the Avp gene) and oxytocin (OXT, encoded by the Oxt gene). 5 The mature peptide hormone processing products of these genes are stored in PN nerve terminals until secreted into the systemic circulation in response to appropriate physiological cues. Dehydration results in a rise in plasma osmolality that is detected by intrinsic MCN osmosensory mechanisms, 6 and by osmoreceptive neurones in the circumventricular organs such as the subfornical organ which provide excitatory inputs to MCNs. 7 Both of these processes alter the firing of AVP MCNs leading to hormone release. AVP travels through the blood stream to specific receptors located in the kidney collecting duct, where it promotes water reabsorption, 8 and the thick ascending limb of the loop of Henle, where it elicits sodium reabsorption. 9 As a consequence of the secretion of AVP during dehydration, there is a need to replenish PN stores. This starts with an up-regulation in Avp gene transcription, 10 which results in an increase in the abundance of the precursor heteronuclear RNA, 11 and the mature mRNA. 12 In addition to its well-established roles in parturition and lactation, OXT is assumed to have natriuretic activity at the level of the kidney. 13 Chronic osmotic stimulation evokes a dramatic functional remodelling of the rat SON that affects both neurones and glia. 14,15 A plethora of activity-dependent changes in the morphology, electrical properties and biosynthetic and secretory activity of the SON have been described, 16 which contribute to the facilitation of hormone production and delivery, and hence survival. We 17-21 and others 22 have sought to understand this function-related plasticity at the level of the expression of the genome. This strategy involves subjecting rats to physiological cues, then carrying out transcriptomic analysis of specific brain regions. These data are then subjected to bioinformatic analysis to identify nodal target genes, the functions of which are then investigated in vivo. Thus, we used GeneChip™ (Affymetrix, Santa Clara, CA, USA) analysis [17][18][19][20][21] to document the transcriptomes of the adult male and female Sprague-Dawley rat SON and PVN, and to describe how these change following the chronic osmotic challenge of 72 hours of complete fluid deprivation (dehydration). Some of the novel differentially expressed genes (DEGs) identified have been subjected to detailed functional investigation, for example Creb3l1, [23][24][25][26][27] Azin1, 28 Slc12a1, 29 Caprin2 30,31 and Rasd1. 32 We have thus demonstrated that the SON is a tractable model that enables sustained progress from transcriptome datasets to novel physiological understanding. 33 Note that gene symbols are used throughout this manuscript; the corresponding gene descriptions are provided in the Supporting information (Table S1).
Our 2006 Affymetrix analysis 17 utilised the Rat 230 2.0 array, which enabled the simultaneous analysis of 31 099 probe sets, corresponding to 28 700 genes. Although robust and extensive, as well as state-of-the art at the time, these studies were limited with respect to gene number and annotation. Over the subsequent years, transcriptome technologies have become truly comprehensive with the development of RNA-sequencing (RNA-seq). Here, we employ next generation sequencing to describe the polyadenylated transcriptomes of the Wistar Han rat SON from 3-month-old euhydrated and dehydrated rats. We have mined these datasets to catalogue the basal expression profiles of categories of functionally relevant genes within the SON. We have gone on to describe how the transcriptome profile changes with dehydration. These data have been subject to detailed and robust analyses that revealed enriched functional classes of DEGs, and putative regulatory networks (Regulons).
Comparison of the Wistar RNA-seq data with our Sprague-Dawley microarray data 17 revealed a core set of common responsive genes that are presumably crucial for the orchestration of SON functionrelated plasticity in the rat.

| Animals
All experiments were performed under a Home Office UK licence held under, and in strict accordance with, the provisions of the UK Animals (Scientific Procedures) Act (1986); they had also been approved by the University of Bristol Animal Welfare and Ethical Review Board. We choose to use male Wistar Han rats from the international standardisation programme (IGS) in our study (Charles River, Portishead, England). This carefully managed breeding program minimises the impact of genetic drift so that colonies bred in different locations around the world are not significantly divergent from each other giving a level of continuity and reproducibility in studies performed in laboratories worldwide. Twelve rats aged 11 weeks were purchased from Charles River for this study. Rats were housed in groups of three for 2 weeks under a 14:10 hour light/dark photocycle (lights on 5.00 am) at 22°C and 50%-60% relative humidity (v/v) with food and water available ad lib. Cages contained sawdust, bedding material and cardboard tubing for enrichment.
Rat cages were randomly assigned to two groups: control (free access to drinking water) and dehydrated (removal of drinking water for 72 hours). All rats were humanely killed by striking of the cranium (stunning) and then immediately decapitated with a small animal guillotine (Harvard Apparatus, Holliston, MA, USA). Trunk blood was collected in heparin-coated tubes and centrifuged at 1600 g for 15 minutes at 4°C. Plasma osmolality was measured by freezing point depression using a Roebling micro-osmometer (Camlab, Over, UK). Plasma for AVP measures was collected in 1-mL aliquots and snap frozen in liquid nitrogen before being stored at −80°C. Brains were rapidly removed from the cranium and placed into a chilled rat brain matrix for separation of the forebrain from the hindbrain. The forebrain was placed cut edge down onto aluminium foil resting on pellets of dry ice and covered with powdered dry ice (within 3 minutes of stunning). Animal experiments were performed between 10.00 am and midday.

| Plasma AVP measures
Trunk blood was extracted from 1 mL of plasma. Two sample volumes of ice-cold acetone were added and samples were vortexed for 1 minute. Protein precipitates were removed by centrifugation at 2500 g for 25 minutes at 4°C. The supernatant was transferred to a new tube and mixed with 2 mL of cold petroleum ether by vortexing for 1 minute. The tubes were left to stand for 1 minute at room temperature before discarding the upper phase. The lower phase solution was lyophilised using a freeze dryer (Benchtop Pro; Biopharma, Winchester, UK). Plasma AVP concentrations were determined by specific radioimmunoassay. 34

| RNA-seq
Bilateral punches of the SON were collected from 12 coronal slices with a 1 mm sample corer (Fine Scientific Tools, Heidelberg, Germany) using the optic chiasm as a reference as described. 23 The microtubes containing SON punches were maintained on dry ice prior to re-suspension by continuous vortexing for 1 minute in 400 µL of QIAzol lysis reagent (Qiagen, Valnecia, CA, USA). After a 10-minute incubation at room temperature, debris was pelleted by centrifugation at 12 000 g for 3 minutes. Then, 350 µL was removed and mixed with an equal volume of absolute ethanol. This mix was applied to a Zymo-Spin IIC column and total RNA was extracted

| RNA-seq data mining
RNA-seq alignment and subsequent data analysis were all performed in house using our high-performance computer; "Hydra" (PowerEdgeR820 12 core supercomputer equipped with 512 GB RAM and 12 × 1 TB HDD; Dell, Round Rock, TX, USA).
RNA-seq reads were processed using RTA and CASAVA from Illumina's suite of sequencing software. This produced a series of compressed FASTQ files per library. All raw reads were merged per library, and pre-processed for quality assessment, adaptor removal, quality trimming and size selection using the FastQC toolkit to generate quality plots for all read libraries. 35 We adopted a phred30 quality cutoff (99.9% base call accuracy).
Our pipeline accepts RNA-seq post-trimmed data as input, before ultimately producing output tables of differentially expressed transcripts. Paired-end (2 × 75 bp) raw input data is initially aligned with STAR 36 to the sixth iteration of the Rattus norvegicus reference genome (Rn6). FeatureCounts is used to generate read counts for each gene present in the Rn6 genome annotation. 37 Our pipeline then uses the DEseq2 (v1.28.1) package in R to call DEGs. 38 This allows us to predict DEGs with high confidence, and to utilise the predictions with low P-values in our downstream validation.
Benjamini-Hochberg correction was used for multiple comparison.
The analysis was sufficiently powered (n = 5) to reduce the false discovery rate, and to enable systems level analysis. 39 DEGs with P adjusted (P Adj ) values of < 0.05 are considered significant. ClusterProfiler. 50 Over-representation analysis was carried by filtering genes for average expression in normalised reads across all samples (baseMean) > 10, P Adj < 0.05 and log2 fold-change (log2FC) = 0.5. All expressed genes filtered for baseMean > 10 were used as the background in the over-representation analysis and as the input gene list in the gene-set enrichment analysis (GSEA). 51 Benjamini-Hochberg correction (P Adj < 0.05) was used for multiple comparison correction for the pathway analysis.
Transcriptional regulatory network (Regulon) analysis was carried out using RTN (version 2.12.1) package in R. 52,53 A list of transcription factors screened for regulatory network enrichment in the SON was generated by annotating all expressed genes for associated GO terms and filtering the unique results for matching 'DNA-binding transcription factor activity' (GO:0003700) term. Additionally, a list of validated human transcription factor orthologue genes 54 was also screened for regulon enrichment. This has led to the identification of a single additional regulon (Rbck1) in the Wistar SON.
All derived gene expression data was visualised using custom scripts written in R. 55

| Experimental protocol
Our experimental protocol is summarised in Figure 1. Euhydrated (control) rats (n = 5) were compared with rats subjected to 72 hours of complete fluid deprivation (dehydration) (n = 5). As expected, dehydration resulted in a significant decrease in body weight and a significant increase in plasma osmolality and plasma AVP content (see Supporting information, Table S2). We sequenced the polyadenylated SON transcriptomes from these animals. Reads were processed using a bespoke RNA-seq pipeline as described in detail in the Materials and methods. A complete catalogue of expressed genes and DEGs can be found in the Supporting information (Table S3).

| The basal euhydrated transcriptome of the rat SON
In total, 22 219 genes are predicted to be expressed in the SON region (no base mean cut-off). The basal transcriptome of the eu-  (1) Adult male Wistar rats were divided into two experimental groups: one with free access to standard food and water and another with free access to standard food but water deprived for 72 hours. Rats were killed, brains were collected, rapidly frozen and stored at −80°C until processed. (2) Using a cryostat, brains were sliced at 60 µm and the SON was collected by using a 1-mm punch. (3) Total RNA was extracted and purified and its purity and integrity were analysed. (4) Libraries were generated from polyadenylated RNA using Illumina TruSeq Stranded mRNA kits. Sequencing was performed on an Illumina NextSeq platform. Generated reads were processed using a bespoke RNA-seq pipeline.  Figure S8) and 144 other pharmacological targets (see Supporting information, Figure S9). Figure 3 summarises the top 5% most highly expressed genes in each of these different categories.

| Transcriptome dynamics in the SON following dehydration
Differential expression analysis using DEseq2 identified a total of 2247 DEGs, 924 of which were down-regulated and 1323 were upregulated ( Figure 4A). Notably, the majority of genes (19 912) with non-zero expression values did not breach the statistical significance threshold (Benjamini-Hochberg correction, P Adj ≤ 0.05) ( Figure 4A).

Principal component analysis using all genes revealed distinct separation between the transcriptomes of control and dehydrated
samples. Principal component 1 (PC1) explained 51% of total variance between the samples and was attributable to the experimental condition ( Figure 4B). Samples separated based on experimental condition following hierarchical clustering of all identified DEGs, highlighting a distinct expression phenotype in dehydration ( Figure 4C). Figure 5A presents the top 100 most significant DEGs (P Adj < 5.01 × 10 -15 ) ordered by fold-change in dehydration (relative expression). Next, we selected all DEGS (P Adj < 0.005, n = 2247) and highlighted the top 100 genes with the greatest expression fold-change in dehydration (relative expression) ( Figure 5B). This highlighted significant genes with the greatest response to dehydration disregarding the underlying variation between the samples.
We observed that log2FC (relative expression in dehydration compared to the control) tends to favour lowly expressed transcripts over the abundant genes ( Figure 5B). To account for this, we selected the top 100 genes with the greatest expression change favouring highly expressed genes. Genes were filtered for differential expression (P Adj < 0.005) and the difference between the average normalised read counts between the control and dehydrated conditions (delta) was calculated. The top 100 genes with the greatest difference (delta) in dehydration are shown in Figure 5C and listed in the Supporting information (Table S4). This comparison is inherently biased towards highly expressed transcripts and rep-  Figure 6G) and six additional pharmacological targets ( Figure 6H). These genes are promising candidates mediating the function related plasticity of the SON. 14-16

| GO and pathway analysis of SON dehydration DEGs
To identify gene categories, pathways and networks that might be functionally involved in the osmoregulatory plasticity of the SON, we performed pathway analysis on DEGs identified in response to dehydration using GO, KEGG and Reactome databases. Cckbr (log2FC = 1.51; P Adj = 1.24 × 10 -41 ) genes associated with both the 'Hormone binding' and 'G protein-coupled receptor activity' terms ( Figure 7A; see also Supporting information, Table S5).
Moreover, genes related to polymerase-II related transcription ac-  . We observed that log2 fold-change (relative expression in dehydration compared to the control) favours lowly expressed transcripts over the abundant genes (B) thus we selected the top 100 genes with the greatest expression change favouring highly expressed genes. For this, genes were filtered for differential expression (P Adj < 0.05) and the difference between the average normalised read count between the control and dehydrated conditions (delta) was calculated. The top 100 genes with the greatest delta values were selected. This represents an expression change in dehydration measured by absolute units (uniquely mapped reads) rather than relative change from the control condition. This analysis highlights the greatest expression changes in dehydration of highly expressed transcripts  (Table S5).

F I G U R E 6
Gene expression changes in the rat supraoptic nucleus (SON) as a consequence of dehydration categorised according to functions as transcription factors or pharmacological classification. The table summarises the total number of statistically significant regulated genes (P Adj < 0.05), the number of genes with fold change (FC) bigger than 25% (P Adj < 0.05; FC > 25%) and the number of genes with fold change bigger than 50% (P Adj < 0.05; FC > 50%). The graphs present the 567 genes with a fold change greater than 25%. Data are presented as the power of 2 of the mean of normalised counts ± SD. GPCRs, G protein coupled receptors; WD, water deprived.  (Table S5) Figure 10B; see also Supporting information, Table S6). To as-   Table S3).

F I G U R E 7
To visualise the change in gene expression of the enriched regulons in dehydration, we plotted the log2FC values of genes comprising the regulatory networks for each regulon ( Figure 9D). All  Table S7).
We note that the largest regulon, Tcf25, contained two other TFs  Table S7). Interestingly, the pathway analysis on the Tcf25, Hes6 and Sin3a regulatory networks revealed all three TFs to be associated with eukaryotic translation (see Supporting information, Table S7). This coincides with the pathway analysis results that 'Structural constituent of ribosome' and 'Ribosome' pathways

| Common dehydration genes
To reveal the core set of gene transcripts differentially expressed in the SON as a consequence of dehydration, we compared the newly Wistar -RNA-seq). The bioinformatics pathway used to enable this comparison is illustrated in the Supporting information ( Figure S10).
Overall, fewer DEGs were identified in Sprague-Dawley (n = 1244; P Adj < 0.005) compared to Wistar (n = 2259; P Adj < 0.05) rats ( Figure 10A). To further compare the Wistar and SD datasets, we plotted DEGs identified in each experiment (P Adj < 0.05) according to their expression change (log2FC) ( Figure 10B; see also Supporting information, Figure S11). The RNA-seq approach detected more small expression changes compared to microarray as the greatest difference between the datasets was in genes with expression change values of 0.5 < log2FC < −0.5 ( Figure 10B). In addition, the Next, we analysed the overlap between the Wistar and Sprague-Dawley datasets and found 504 genes to commonly change their expression in response to dehydration ( Figure 9A). We suggest these genes to represent a core set of genes that change their expression in the rat hypothalamic SON in response to water deprivation. The majority of common DEGs had low expression change values (0.5 < log2FC < −0.5) ( Figure 10B,D) and a high confidence of DEG calling (P Adj < 0.05) ( Figure 10C; see also Supporting information, Figure S11C). Interestingly, a much greater number of common DEGs were up-regulated in dehydration (422 up-regulated vs 82 down-regulated genes) ( Figure 10B,D). Common DEGs showed a high degree of correlation in their expression change in response to dehydration (Spearman rho = 0.78; P = 2.2 × 10 -16 ) ( Figure 10C).
Notably, several of common DEGs had an opposite direction of expression change ( Figure 10C). Nine transcripts (Fam180a, Svil, Slc6a20, Mrvi1, Plscr1, Cavin2, Gramd2b, Flnb and Des) were detected to be down-regulated in SD rats, whereas it was up-regulated in the Wistar rats. Only two transcripts (Wdr6, Rbfox1) were detected to have an opposite mismatch of expression change. Lastly, we used a volcano plot to visualise common DEGs in dehydration ( Figure 10D). we outlined how many of the common DEGs were among the genes with the greatest expression change in response to dehydration as shown in Figure 5 (see also Supporting information, Figure S11D,E).
A full list of DEGs common and unique to each analysis can be found in the Supporting information (Table S8). Many of these genes have been previously validated using either a quantitative reverse transcription polymerase chain reaction, or by in situ hybridisation, in Sprague-Dawley and Wistar rats ( Table 1). The remaining 95% of genes have been defined as the ignorome. 64 Stoeger et al 66 investigated the reasons why potentially important genes are ignored. They found that "biological research is primarily guided by a handful of generic chemical and biological characteristics of genes, which facilitated experimentation during the 1980s and 1990s, rather than the physiological importance of individual genes". 65 In our view, this convergence is detrimental to science and hinders discovery. 66 This work was intended to serve as a resource for the diverse neuroendocrine research community, persuading it to acknowledge the unexplored frontiers of the HNS. We hope that the revelation of a core set of genes and molecular pathways governing the osmotic response, many with no characterised roles in the SON, will encourage the neuroendocrine community to pursue new investigations into the 'known-unknowns' presented in this study.

| RNA-seq
We sequenced the polyadenylated transcriptomes of the SON, thus focussing, in the main, on transcripts (mRNAs) that code for proteins.
Most non-coding RNAs will have been excluded from our analysis. In the present study, we have focussed on the transition from euhydration to chronic dehydration. It would be of great interest to determine how the transcriptome changes when animals are allowed to recover during a period of rehydration.

| Cellular heterogeneity
Although every care was taken to accurately punch the SON from frozen sections, it is inevitable that some surrounding tissue will be included. This will increase the noise of the analysis and may introduce signal that is not pertinent to the SON. Furthermore, the bulk SON sequencing carried includes transcriptome information from every cell type in this nucleus, not only the oxytocinergic and vasopressinergic MCNs, but also the surrounding glia, microglia, some interneurones, and vessels and the blood therein. We await with great interest the inevitable single cell RNA-seq analysis of the euhydrated and dehydrated SON, which will be highly informative regarding the transcriptomic responses of these different cell types. Furthermore, it is to be expected that the MCNs themselves will exhibit an intrinsic diversity with respect to basal gene expression patterns and responses to physiological cues that will have functional implications.

| GO analysis
GO and pathway analysis is limited by the fact that most genes are not annotated with functional GO and pathway information. The GO, as the name implies, is an unordered and unstructured set of genes associated with a specific biological process or mechanism.
Although inherently sensitive at detecting any type of enrichment, gene-set (ontology) analysis results are sometimes difficult to be put into biological context. By contrast, the KEGG and Reactome databases contain information on the interaction of genes, gene products or metabolites associated with a specific mechanisms or pathways. Structured into pre-defined and directional pathways, and based on prior known gene interactions, pathway analysis may struggle at identifying enrichment if the genes affected by the treatment are not previously known to interact.
Enrichment analysis is also dependent on the probability based (ORA) and functional class scoring (GSEA) classes of methods. 80 Although sensitive and widely used, ORA is biased and depends heavily on the criteria used to select the differential expression

| Regulon analysis
Prediction of transcriptional regulatory networks requires a defined list of transcription factors that are screened against the gene-expression profile of a given sample. In our analysis, we identified expressed transcription factors using GO annotations. However, it should be noted that the identification of transcription factors is based on binding site predictions and not always followed by functional characterisation.
Furthermore, regulons analysis was carried out on a limited number of samples used in the experiment (n = 5). Although the sample size was suitable for the differential expression analysis, experimental power of the regulon analysis could be further improved by a larger sample size.
Lastly, all identified transcription factors representing a transcriptional network had only positive interactions with their regulatory genes and were relatively small in size. This represents a limitation for the twoway GSEA analysis that was able to assess only unidirectional enrichment associated with the TF expression.

CO N FLI C T O F I NTE R E S T S
The authors declare that they have no conflicts of interest.

PE E R R E V I E W
The peer review history for this article is available at https://publo ns.com/publo n/10.1111/jne.13007.

DATA AVA I L A B I L I T Y S TAT E M E N T
In addition to the processed data reported in the manuscript, all raw data are available from the corresponding author upon reasonable request.