Set2‐mediated H3K36 methylation states redundantly repress the production of antisense transcripts: role in transcription regulation

Methyltransferase Set2‐mediated methylation of histone H3 lysine 36 (H3K36), which involves the addition of up to three methyl groups at this site, has been demonstrated to function in many chromatin‐coupled events. The methylation of H3K36 is known to recruit different chromatin effector proteins, affecting transcription, mRNA splicing and DNA repair. In this study, we engineered two yeast set2 mutants that lack H3K36 mono/dimethylation (H3K36me1/2) and trimethylation (H3K36me3), respectively, and characterized their roles in the production of antisense transcripts under nutrient‐rich conditions. Using our new bioinformatics identification pipeline analysis, we are able to identify a larger number of antisense transcripts in set2∆ cells than has been published previously. We further show that H3K36me1/2 or H3K36me3 redundantly repressed the production of antisense transcripts. Moreover, gene ontology (GO) analysis implies that H3K36me3‐mediated antisense transcription might play a role in DNA replication and DNA damage repair, which is independent of regulation of the corresponding sense gene expression. Overall, our results validate a coregulatory mechanism of different H3K36 methylation states, particularly in the repression of antisense transcription.

Methyltransferase Set2-mediated methylation of histone H3 lysine 36 (H3K36), which involves the addition of up to three methyl groups at this site, has been demonstrated to function in many chromatin-coupled events. The methylation of H3K36 is known to recruit different chromatin effector proteins, affecting transcription, mRNA splicing and DNA repair. In this study, we engineered two yeast set2 mutants that lack H3K36 mono/ dimethylation (H3K36me1/2) and trimethylation (H3K36me3), respectively, and characterized their roles in the production of antisense transcripts under nutrient-rich conditions. Using our new bioinformatics identification pipeline analysis, we are able to identify a larger number of antisense transcripts in set2Δ cells than has been published previously. We further show that H3K36me1/2 or H3K36me3 redundantly repressed the production of antisense transcripts. Moreover, gene ontology (GO) analysis implies that H3K36me3-mediated antisense transcription might play a role in DNA replication and DNA damage repair, which is independent of regulation of the corresponding sense gene expression. Overall, our results validate a coregulatory mechanism of different H3K36 methylation states, particularly in the repression of antisense transcription.
Antisense transcripts are a class of long noncoding RNAs that originate from the opposite strand of the sense transcripts of protein-coding genes or nonprotein-coding genes. Antisense transcripts were discovered in both bacteria [1] and eukaryotes [2]. In the human genome, more than 30% of annotated mRNAs produce antisense transcripts [3]. However, the abundance of antisense transcripts is much lower than that of sense genes due to extensive RNA degradation pathways, which makes them hard to detect. The genomic methods widely developed over the years have allowed researchers to identify numerous novel antisense transcripts and to understand that they generally appear throughout the entire genome of various species [4][5][6].
In Saccharomyces cerevisiae, antisense transcripts are usually associated with cryptic transcription from cryptic promoters [7]. A large number of cryptic transcription events occur when the transcription complex makes any errors during the transcription process [8][9][10][11]. It has been demonstrated that Set2-mediated histone H3K36 methylation plays an important role in this process involving mechanisms such as suppressing increased histone acetylation and regulating histone mislocalization, which can inhibit the production of antisense transcripts [12]. Some recent studies have shown that mutations in core proteins associated with transcription processes or chromatin remodelling lead to a change in chromatin structure, and thus, cryptic transcription occurs. For example, a lack of ISW1 or IOC4 will cause nucleosome incorrect positioning, making the Rpd3S complex unable to bind to nucleosomes normally, and lead to increase the level of histone acetylation and cryptic transcription [12][13][14]. These results imply that Set2-mediated H3K36 methylation plays an important role in cryptic transcription. Indeed, a novel group of Set2-repressed antisense transcripts has been identified upon deletion of SET2 [15]. Recently, DiFiore et al. further revealed roles of different H3K36 methylation states on antisense transcription upon nutrient deprivation [16]. Whether different H3K36 methylation states will impact on the production of antisense transcription genome-wide under nutrient-rich conditions remains unclear.
Here, we established two set2 mutations that specifically lack H3K36me1/2 and H3K36me3. These mutants provide a useful tool to analyse the effect of different methylation states on the production of antisense transcripts, and lead to the identification of a large amount of antisense transcripts repressed by the Set2 protein. Gene expression and GO analyses showed that the expression of these antisense transcripts did not affect the expression of the corresponding sense genes. Moreover, we illustrated that H3K36me1/2 and H3K36me3 methylation showed strong consistency in the expression of antisense transcripts, indicating that a redundant regulatory mechanism occurs. Our study provides further evidence that H3K36 methylation redundantly functions in the production of antisense transcripts under nutrient-rich conditions.

Materials and methods
Yeast strains and plasmids BY4741 background yeast strains used in this study include wild-type, set2Δ expressing pRS415 empty vector (with Leu-selective marker), set2Δ expressing set2-Y149F (pRS415 vector) and set2Δ expressing set2-Y236F (pRS415 vector). Yeast cells were grown at 30°C in YPD medium (1% yeast extract, 2% peptone and 2% dextrose) or SD medium (0.67% yeast nitrogen base without amino acids, supplemented with appropriate amino acids and 2% glucose). Set2 mutations were generated by the site-directed mutagenesis method from Invitrogen.

mRNA extraction and strand-specific RNA sequencing
Yeast cells were grown overnight in YPD or SD medium at 30°C, diluted to an OD 600 of 0.2 and collected when the OD 600 reached 0.8. Cells were harvested by spinning down at 3000 r.p.m. for 5 min, and then, the cell pellets were washed with 10 mL sterile water twice. Total RNA was extracted with HiPure Yeast RNA Kit (Magen, Guangzhou, China, R4182-02). The purity and yield of RNAs were examined by NanoDrop One Spectrophotometer (Gene Company). RNA integrity was examined by agarose electrophoresis.
For preparation of mRNA, 20 lg of total RNA was diluted into 50 lL of DEPC water and incubated with 50 lL of mRNA capture beads as described in the product manual (Vazyme, Nanjing, China, N401-01). Then, the enriched mRNA was used for strand-specific library construction by the KAPA Stranded RNA Sequencing (RNA-Seq) Library Preparation Kit (KAPA, KK8401) through the dUTP method. The libraries were used to generate a total of~6 GB sequencing data from 75-bp length singleend reads using a NextSeq CN500 equipment.

Identification of antisense transcripts
The reads obtained from each sample were mapped to the yeast genome SacCer3 (https://www.yeastgenome.org/) with TOPHAT (version 2.1.1). For each sample, a BAM file was used for subsequent processing. In SET2 deletion mutants, the BAM file was divided into forward and reverse strands by BEDTOOLS (version 2.29.2). The file contained both the reads of the existing annotated transcripts and the antisense reads in the opposite direction. SAMTOOLS (version 1.10) was used to filter antisense reads of both forward and reverse strands by SAM flags. The BEDTOOLS software was used to obtain the reads of intergenic regions. Finally, the antisense reads from the forward strand, the reverse strand and the intergenic region were merged together to obtain an integrated BAM file. Antisense reads were assembled without reference genome annotation in CUFFLINKS (version 2.2.1). The expression levels of antisense transcripts with a Pvalue < 0.05 were analysed using the gene annotation file of antisense transcripts by Cuffdiff.

Differentially expressed genes and gene ontology analysis
Data were analysed in the R (version 3.5.0). Differential expression analysis was done in EDGER (version 3.30.3). Gene ontology analysis was performed by CLUSTERPROFILER (version 3.16.0). Adjusted P-value < 0.05 was used for term ranking and selection. All plots and graph were created using ggplot2 in R.

Northern blotting
Northern blot was carried out using stranded-specific DIGlabelled probes as described previously [20]. Briefly, pairs of DNA primers were designed, which contain T7 promoter on either the sense strand or the antisense strand, and were utilized for PCR amplification to produce about 500-bp length DNA templates. An amount of 200 ng DNA template was used to generate DIG-labelled RNA probe in an in vitro transcription reaction with HiScribe T7 High Yield RNA Synthesis Kit (NEB, #E2040S) and digoxigenin-11 labelled UTP (Roche, Cat. No. 12039672910). The primers used to prepare the riboprobes are listed below: PSP2-F: Strand-specific DIG-labelled northern blot was carried with 30 lg total RNA. Briefly, the RNA mixed with an equal volume of 2 9 RNA loading was denatured at 65°C for 10 min and allowed to stand on ice for 1 min. Then, it was separated on a 1.2% agarose/formaldehyde (2.2 M) gel running in 1 9 MOPS buffer (20 mM MOPS, pH 7.0, 5 mM sodium acetate and 2 mM EDTA in DEPC-treated water). After separating the RNA, the gel was visualized with ultraviolet light and washed thrice with distilled water to remove formaldehyde. The gel was then rinsed in 20 9 SSC (3 M NaCl and 300 mM sodium citrate, pH 7.0) for 15 min twice and transferred to a positive charged nylon membrane. The bolt was cross-linked under UV for 5 min and prehybridized with 2 mL DIG Easy Hyb Granules (Roche, Roche, Switzerland, Cat. No. 11 796 895 001) at 68°C for 1 h. The DIG-labelled RNA probe was added to the buffer after heating at 100°C for 5 min and incubated at 68°C overnight. The blots were then washed twice in 2 9 SSC and 0.1 9 SSC for 15 min. After hybridization and stringency washes, membrane was rinsed briefly for 1-5 min in washing buffer (0.1 M maleic acid, 0.15 M NaCl, pH 7.5, and 0.3% Tween-20). The blot was then incubated for 30 min in 10 mL blocking solution and 30 min in 10 mL antibody solution (anti-digoxigenin-AP, 1 : 10 000; Roche, Cat. No. 12039672910). After washing twice for 2 9 15 min in washing buffer, the blot was equilibrated in detection buffer (0.1 M Tris/HCl and 0.1 M NaCl, pH 9.5) and CDP-Star was used to expose the blots in imaging device for 1000 s.

Phe/Tyr switch mutations in Set2 generate a tool to distinguish different H3K36me states
In 2003, Xiao et al. demonstrated that the mutation of tyrosine at position 245 to alanine or phenylalanine in SET7/9 would convert the ability of SET7/9 to catalyse the monomethylation of histone H3K4 into dimethylation or trimethylation activity [21]. Protein structural analysis showed that the Phe/Tyr transition at certain sites on some methyltransferases could affect their enzymatic activities [22,23]. Recently, DiFiore et al. found that the Phe/Tyr switch located in the SET domain of Set2 separates H3K36me states both in vivo and in vitro [16]. The SET domain, approximately 130 amino acids in length, was named after three Drosophila proteins, namely Su(var)3-9, enhancer of zeste and trithorax, which are characterized by an evolutionarily conserved domain in all eukaryotes [24]. Of note, the SET domain is responsible for the catalytic activity of the SET domain-containing proteins [24].
Our group independently discovered that engineered Set2 mutants could produce different methylation states of H3K36, which allows us to explore their unique or shared functions. It is noted that the primary sequences of the SET domains between yeast Set2 and human SETD2 proteins are relatively conserved (Fig. S1A). Structural remodelling indicated that the SET domain of yeast Set2 protein is very similar to that of the human SETD2 protein (Fig. S1B). Therefore, the SET domain structure of yeast Set2 protein was utilized to execute structural prediction. Through structural analysis described previously [23], two tyrosine residues (Y149 and Y236) located in the SET domain emerged as key residues impacting H3K36 methylation by altering the SET domain lysine-binding pocket (Fig. 1A-B). We mutated Y149 and Y236 to phenylalanine, named set2-Y149F and set2-Y236F, respectively, and their catalytic activities were examined in vivo. Consistent with previous results [16], the set2-Y149F mutation showed loss of H3K36 dimethylation and decreased H3K36 monomethylation but maintained H3K36me3, whereas the set2-Y236F mutation showed loss of H3K36me3 but maintained H3K36me1/2 (Fig. 1C). Thus, these two Tyr/Phe amino acid switch mutations provide us with a good tool to study the roles of different H3K36 methylation levels in producing antisense transcription.

Identification of antisense transcripts repressed by Set2
To identify the antisense transcripts produced upon deletion of SET2, set2-Y149F or set2-Y236F mutations, total RNA was extracted from the wild-type (WT) yeast strain and the mutant strains grown in nutrient-rich medium, and enriched mRNA with polyadenylated (polyA) tails was used to prepare a strand-specific library. Strand-specific RNA-seq was performed. Using our new pipeline, we were able to identify more antisense transcripts than a previous study [15]. It has been shown that 6670 antisense transcripts (GEO: GSE167338) were generated in total 3663 corresponding sense genes upon deletion of SET2 in our dataset, whereas 1179 antisense transcripts (SRA: SRP089706) were found from total 1001 corresponding sense genes published previously [15] (Fig. 2A). The antisense transcripts enriched (ATE) genes shown in our experiment covered 85.9% of Set2-repressed candidate antisense transcript (SRAT) genes that were found previously [15] ( Fig. 2A). By using our experimental pipeline to reanalyse the previous published dataset (SRP089706) [15], we were able to identify 2476 SRAT-associated sense genes in set2D cells, which is 2.4-fold more than the 1001 SRAT-associated sense genes that were identified using their pipeline (Fig. 2B). Since the previous study set a selected threshold with twofold changes and a P-value less than 0.05 as parameters to pare down polyA SRAT genes, we decided to use the same parameters to further analyse the ATE genes in our dataset. Based on these criteria, we eventually identified 777 upregulated antisense transcripts and 582 downregulated antisense transcripts (Fig. 2C). In contrast, they found a list of 501 polyA SRAT-enriched sense genes that are only associated with upregulated antisense transcripts upon loss of Set2 [15]. Among these selected antisense transcripts, there was only 2% overlap of corresponding sense genes between the downregulated antisense transcripts in our data and polyA SRATs reported previously [15] ( Fig. 2D), whereas approximately 58% upregulated antisense polyA SRAT-associated sense genes were found in our data, suggesting the suppressive role of the Set2 protein (Fig. 2E). Intriguingly, 457 downregulated antisense transcripts were identified in the published dataset using our experimental pipeline, in which only 27% overlaps with our results showing 582 downregulated antisense transcripts (Fig. 2F). The divergence between those two sets of data needs to be further explored. We assume that downregulation of antisense transcripts is unlikely a direct effect by Set2, based on our results and previous studies [15,16].

Antisense transcripts do not affect the expression of the corresponding sense transcripts
Previous studies have shown that certain antisense transcripts affect the transcription events of the corresponding sense genes [25]. Therefore, we wanted to explore whether SRATs could affect the expression of their corresponding sense genes. We noticed that 290 upregulated genes and 150 downregulated genes were identified in set2Δ cells relative to the WT (Fig. 3A). The gene density map showed that the gene expression patterns from sense transcripts were very similar between each indicated strain (Fig. 3B). Nevertheless, the altered expression changes of sense genes were not due to the production of corresponding antisense transcripts, as the majority of 687 genes with upregulated SRATs showed very few changes in sense gene expression (8 genes upregulated and 6 genes downregulated) in the set2Δ strain (Fig. 3C). These data indicated that the expression of SRATs from the antisense strand generally does not affect sense transcription under nutrient-rich conditions. We observed the expression levels of a few genes affected by the production of antisense transcripts, in which six corresponding sense genes were upregulated accompanied by upregulation of the antisense transcript (Table 1). Interestingly, the proteins encoded by these genes are related to cell division, cell flocculation and energy metabolism. For example, Fig 2 protein is responsible for maintaining the integrity of the cell wall during cell mating to ensure high mating efficiency [26]. As a cell wall protein, Flo10 directly participates in adhesive cell-cell interactions in the process of flocculation [27]. Sfl1 is involved in cell surface assembly and the regulation of the flocculation process [28]. In addition, 8 corresponding sense genes were downregulated by the upregulated expression of antisense transcripts (Table 2). Proteins encoded by these genes are related to the process of cell mitosis and energy metabolism. For example, Clb6 can interact with Cdc28 to regulate the G1/S process during mitosis [29]. Glucose transporter encoded by HXT2 plays an important role in energy metabolism [30]. These results suggested that antisense transcripts might cross talk with the sense strand to regulate gene expression through some unknown mechanism, which is of particular interest to be explored in the near future.

Different H3K36 methylation states showed shared roles in repressing antisense transcription
Next, we attempted to determine the expression differences of antisense transcripts in the set2-Y149F and set2-Y236F mutants. Applying the same cut-off as described earlier, we obtained 553 upregulated antisense transcripts in set2-Y149F, which represents H3K36me3, and 517 upregulated antisense transcripts in set2-Y236F, which represents H3K36me1/2 (Fig. 4A). A total of 476 antisense transcripts were upregulated in both mutant strains, and the overlap between them reached over 90% (Fig. 4A). Genome browser profiles indicated that the antisense transcripts produced at the loci of YDR452W and YML017W showed robust enrichment in the set2D strain and moderate abundance in the set2-Y149F and set2-Y236F mutants, while very low antisense expression was observed in coding regions of the WT strain (Fig. 4B). We validated the strand-specific RNA-seq experiments by northern blotting, which showed a similar result, underlining the reproducible production of the SRATs in different mutants (Fig. 4C). Pearson's correlation analysis of the antisense transcript expression in replicated samples indicated a stronger  correlation of the two mutants than that of the set2Δ or WT strain (Fig. S2). Moreover, each individual upregulated antisense transcript identified in the two mutant strains displayed a comparable expression level, which was much lower than the expression level of the set2D mutants (Fig. 4D). All of these data suggest that different H3K36 methylation states redundantly contribute to the repression of antisense transcripts. Previous studies have identified different classes of cryptic transcripts, such as cryptic unstable transcripts (CUTs) [4], Xrn1-sensitive unstable transcripts (XUTs) [6] and stable unannotated transcripts (SUTs) [6], by NET-seq in Saccharomyces cerevisiae. We wanted to further investigate whether different H3K36 methylation states exert unique regulation of such cryptic transcripts. However, we found that deletion of SET2 or H3K36 methyl-deficient mutants did not affect the production of either sense transcripts or these cryptic transcripts (Fig. 4E). Once again, we observed that the loss of a specific methylation state caused upregulated expression of antisense transcripts, but deletion of SET2 resulted in a higher level of antisense transcription, which reinforced the coordinated roles of H3K36me1/me2 and H3K36me3 in the repression of antisense transcripts (Fig. 4E).
To explore the potential regulatory roles of upregulated antisense transcripts upon loss of Set2 or H3K36 dimethylation or trimethylation, GO analysis was utilized. The antisense transcripts upon loss of Set2 were mainly correlated with cell vacuoles, cell division and budding positions (Fig. S3). Interestingly, the top 10 sense genes in both cellular component enrichment and biological function processes were consistent (Fig. S3). However, the results demonstrated that H3K36me3 might mediate unique biological roles. We found that the 77 upregulated antisense transcripts uniquely identified in the set2-Y149F strain were not enriched in any biological process or cellular component, whereas the 41 upregulated antisense transcripts uniquely identified in the set2-Y236F strain predominantly participated in various kinase activities that are required for DNA replication and DNA damage repair (Fig. 4F). Altogether, these results implied that such antisense transcripts may directly or indirectly participate in these biological processes, and the real functions need to be verified in future studies.

Discussion
In this study, we took advantage of a different bioinformatics method to explore a class of antisense transcripts suppressed by H3K36 in different methylation states. Using this method to filter antisense reads in the whole genome, we were able to identify more antisense transcripts in the Set2 null mutant than in a previous study [15]. Moreover, we showed that H3K36me1/2 mediated by the set2-Y236F mutant or H3K36me3 mediated by the set2-Y149F mutant is not unique for regulating the expression of antisense transcripts, which illustrates a  [16]. Using a published data obtained from the same laboratory, in which there are 439 genes with bidirectional cryptic transcripts in set2Δ cells upon nutrient deprivation, they re-evaluated the location with cryptic initiation sites (CISs) in set2 mutants bearing H3K36me1/2 or H3K36me3 alone [31]. Different from that they validated production of the CIS position in these mutants using a yeast reporter system [31], our genome-wide analysis provided much larger numbers of antisense transcripts (3663 ATE genes) than those from their data ( Fig. 2A). Moreover, we also focused on the final production of antisense transcripts and compared the different expression levels of antisense transcripts in different H3K36 methylation states directly by strand-specific northern blot analysis (Fig. 4C). Therefore, we argue that our analysis provides an incremental contribution towards antisense transcription. Meanwhile, we devoted more attention to the functions of the sense genes corresponding to these antisense transcripts under nutrient-rich conditions. It has long been reported that Set2-mediated H3K36 methylation is involved in numerous DNA damage repair processes, development and ageing. Interestingly, we found that the antisense transcripts regulated by H3K36 methylation were also enriched in the process of DNA replication, mitosis or cell budding (Tables 1  and 2). GO enrichment analysis showed that the corresponding sense genes with upregulated antisense transcripts in cells lacking H3K36me3 were mainly enriched with endodeoxyribonucleases or exonucleases. These enzymes are often involved in DNA replication or DNA damage repair pathways [32]. Given that upregulated antisense transcripts do not affect the expression of the corresponding sense transcripts (Fig. 2), it is intriguing that how these antisense transcripts exert biological functions? A line of evidence suggests that antisense transcripts might regulate the sense genes during the post-transcriptional process [33]. For example, antisense expression controls translational efficiency by affecting the produced transcript isoform of the zinc-finger E-box-binding homeobox 2 gene (ZEB2), which encodes a transcriptional repressor of E-cadherin [34]. In addition, an antisense transcript regulates the translational efficiency of the ubiquitin carboxy-terminal hydrolase L1 gene (Uchl1) [35]. In bacteria, the antisense transcript SymR can directly bind to the 5' end of the SymE transcript, which inhibits SymE translation [36]. Alternatively, antisense transcription may affect sense gene expression during transcriptional and cotranscriptional processes under certain stress conditions [33]. For instance, accumulated antisense transcripts in old yeast cells were exhibited in a subset of genes and are detrimental to life span, which suggests that ageing-related genes might be affected under a stress condition [37]. It would be interesting to examine the biological relevance of antisense transcription with corresponding sense expression using those set2 mutants under various stimulus conditions. Although the information we obtained here is only the tip of the iceberg for the mechanism by which H3K36 methylation inhibits antisense transcripts, we believe that this evidence may provide a new tool for characterizing other unique biological functions of different H3K36 methylation states.

Supporting information
Additional supporting information may be found online in the Supporting Information section at the end of the article. Fig. S1. Comparison of the SET domains between yeast Set2 protein and human SETD2 protein. A, Sequence alignment of the SET domains of yeast Set2 protein (a.a. 1-300) and human SETD2 protein (a.a. 1447-1701). The secondary structures were displayed using ESPript 3.0 software. a1, a2: a-helices. g1-g3: 3 10 -helices; b1-b8: b-sheet; TT: b-turns: TTT: a-turns. B, Structural comparison of the SET domains between human SETD2 (green) and Set2 protein (red).