Broad H3K4me3 domains: Maintaining cellular identity and their implication in super‐enhancer hijacking

The human and mouse genomes are complex from a genomic standpoint. Each cell has the same genomic sequence, yet a wide array of cell types exists due to the presence of a plethora of regulatory elements in the non‐coding genome. Recent advances in epigenomic profiling have uncovered non‐coding gene proximal promoters and distal enhancers of transcription genome‐wide. Extension of promoter‐associated H3K4me3 histone mark across the gene body, known as a broad H3K4me3 domain (H3K4me3‐BD), is a signature of constitutive expression of cell‐type‐specific regulation and of tumour suppressor genes in healthy cells. Recently, it has been discovered that the presence of H3K4me3‐BDs over oncogenes is a cancer‐specific feature associated with their dysregulated gene expression and tumourigenesis. Moreover, it has been shown that the hijacking of clusters of enhancers, known as super‐enhancers (SE), by proto‐oncogenes results in the presence of H3K4me3‐BDs over the gene body. Therefore, H3K4me3‐BDs and SE crosstalk in healthy and cancer cells therefore represents an important mechanism to identify future treatments for patients with SE driven cancers.

but studies into the covalent addition of acetyl and methyl groups began to unravel a host of histone modifications that positively and negatively impact gene expression. [6,8,9]The annotation of N-terminal histone modifications in the human, mouse and other metazoan genomes has further accelerated the discovery of regulatory regions.
Written into the non-coding genome are distinct promoters upstream of a gene's transcriptional start site (TSS).Gene promoters can be identified by the presence of di-and tri-methylation marks on lysine 4 of histone H3 (H3K4), denoted as H3K4me2 and H3K4me3 respectively, generally spanning 1-2 kb. [3,10]In contrast, mono-methylated H3K4 (H3K4me1) is associated with transcriptional enhancers. [11]e activity of both regulatory elements is associated with high levels of acetylation of lysine 27 on histone 3 (H3K27ac).Some genes are covered by stretched histone regions extending beyond the usual 1-2 kb promoter region.Extended regions of H3K4me2/3 (broad domains) are found to stretch across the gene body upstream and downstream of promoters and are associated with large open DNase I accessible regions, which can span up to 60 kb. [12]e term 'enhancer' was first used to describe the activating effect on gene expression of the polyomavirus simian virus 40 (SV40) DNA. [13]Later, similar properties were also identified in the immunoglobulin heavy chain locus (IGH). [14]Since then, enhancers have been characterised in eukaryotes with genome-wide techniques, including DNase-seq to identify open chromatin linked to activation, and ChIP-seq targeting transcription factors (TFs) and histone marks H3K4me1 and H3K27ac [15] that are associated with enhancer activity. [16]Based on these histone modification signatures, the latest update from ENCODE reported the existence of 668,000 candidate enhancer-like sequences, encompassing roughly 5.6% of the human genome. [17]However, it is important to remember that these chromatin features do not necessarily correlate with bona fide enhancer functions in living cells and in vivo validation is required. [18]Routinely used validation approaches include report assays, which link a candidate enhancer sequence to a minimal promoter that drives the expression of a reporter gene of choice in the presence of a functional enhancer. [19]This classic method is not high throughput, and by validating one sequence at a time, it is time-consuming and not scalable for testing the putative enhancers in the human genome, now estimated to be more than 1 million. [20]Massively parallel reporter assays (MPRAs) provided the solution by allowing thousands of predicted sequences to be tested in a multiplex manner. [21]Each sequence transcriptional activity is measured through RNA-seq, linking the sequences of interest to a transcribed sequence-based barcode and normalising the RNA-seq data based on the corresponding DNA barcode, sequenced via DNA sequencing.Because of these advanced technologies, the term enhancer evolved into defining any DNA region with a chromatin profile linked to distal gene regulatory function.In addition, it became apparent that enhancer elements contribute to cell-type-specific variability in gene expression. [22,23]With the rapid characterisation of promoters and enhancers genome-wide, [16,24] the ters of enhancers (also named super-enhancers, or SEs) stretching beyond the typical 100-300 bps enhancer spanning up to 50 kb, [26] defined by open chromatin and extensive binding of TFs. [27,28]These regions were observed to be linked to tissue-specific gene expression patterns. [29]Despite the functional redundancy that clusters of enhancers can offer, allowing one enhancer to be inactivated without severe consequences, [30] different cases of enhanceropathies have been identified [31] (we will cite other reviews in this special issue too), where dysfunction [25] or dysregulation of one enhancer [32] leads to the development of human disease.One such event, found frequently in cancer cells, [33] occurs when an enhancer or a SE is 'hijacked' from its original genomic position by chromosomal translocation [34] or is the target of focal amplification. [35]This subsequently contributes to cancer development and maintenance. [35] this review, we summarise the role of broad activating H3K4methylation domains in healthy cells, their link with SE hijacking in cancer cells, and postulate on the future direction for broad H3K4me3 domains identification, which is critical for understanding oncogene activation and tumorigenic processes.

THE CROSSTALK BETWEEN HISTONE MODIFICATIONS AND TRANSCRIPTION
Histone modifications are in constant flux to tightly regulate chromatin structure and are 'written' by evolutionarily conserved lysine methyltransferases, such as the SET domain-containing proteins of which the human KMT2 (MLL) family belongs [36,37] (Table 1).Lysine acetyltransferases (KATs) such as p300/CBP are responsible for 'writing' acetylation on histones. [38]The functional consequence of H3K4me3, for example, is that they are recognised by 'readers' of chromatin, such as TFIID, which promotes p53-dependent transcription via the preinitiation complex. [39]Other 'readers' of H3K4 include CHD1, [40] NURF, [41] GATAD1, SIN3B and KDM5A (JARID1A). [42]Also contributing to the fine balance of chromatin structure are the 'erasers' .
Aptly named, these proteins remove modifications from histones, and each protein belonging to this group acts on specific residues.For example, KDM1A (LSD1) specifically demethylates H3K4 to transcriptionally repress genes. [43]The KDM5 family of H3K4 demethylases (reviewed in ref. [44]) have been shown to repress transcription but also activate transcription in a demethylase-dependent and independent manner. [44]KDM5A/B/C and D have all been implicated in cancer, for example, KDM5B was first identified in breast cancer where tumour suppressor genes (TSGs) are repressed by its activity, thus promoting proliferation. [45]Removal of acetylation is performed by histone deacetylases (HDACs) such as HDAC1/2/3 which disrupts core regulatory TF binding, thus silencing promoters or enhancers. [46] remains unclear whether histone modifications are a marker for gene transcriptional initiation or deposited after gene transcription.
The subunit CFP1 of the KMT2F/G (SET1A/B) complex lays down H3K4me3. [47]Research by Thomson et al. looking into the relationship between CpG islands (often found at promoters), DNA methylation and H3K4me3 found that inhibition of gene cfp1 resulted in significant loss of H3K4me3 at promoters. [48]However, despite the loss of H3K4me3, minor changes to gene expression levels were observed in mouse embryonic stem cells (mESC). [48]This suggests that H3K4me3 at promoters might act as a functional memory for but may not instruct transcription as a blanket rule.This does not discount the strong evidence that H3K4me2/3 found at promoters is a proxy for poised promoters and gene expression and likely evidence of contextdependency. [49]Another important factor that complicates the interpretation of these results is the varying specificity and cross-reactivity of antibodies detecting histone marks, as well as normalisation and calibration approaches. [50,51] addition, other studies have shown that H3K4me3 acts as an inheritable mark that can be passed down through generations of cells. [52]Most of the work to address this question has been carried out in yeast; although this model organism shares SET domain-containing proteins with other eukaryotes, budding yeast have different protein subunits and therefore, more studies in different organisms need to be carried out in this area (reviewed in ref. [53]).Recent works used targeted degradation of core subunits of KMT2 H3K4 methyltransferase complexes on mESCs, and their results suggest that H3K4me3 is not required for transcriptional initiation. [54,55]Instead, the data show that H3K4me3 is essential for RNA polymerase II (Pol II) transcriptional pause-release, contributing to transcriptional elongation.This newly proposed function of H3K4me3 could explain the previously reported crosstalk between H3K4me3 and mRNA splicing (reviewed in ref. [56]).Therefore, H3K4me3 might not contribute to initiation but still be instructive for transcription. [53]wever, acetylation may act differently than H3K4me3.In work directly addressing causal or consequential acetylation, Martin et al.,   discovered that inhibition of transcription results in rapid histone deacetylation in both yeast and mESCs. [57]In addition, they demonstrated that promoter-bound KATs are unable to acetylate histones independently of transcription, suggesting that histone acetylation is indeed a consequence of transcription.

THE EMERGENCE OF THE SUPER-ENHANCER CONCEPT
The new concept for the existence of SEs suggests that a small number of clustered enhancers with high levels of activity are required to ensure high and consistent expression of genes linked to cell identity and fate. [26]SEs were first defined in mESCs as regulatory elements with enriched binding of master regulators, such as Nanog, Oct4 and Sox, and co-activators, like Mediator.Enhancer elements within 12.5 kb were considered to belong to the same epigenomic entity and combined with ranking of individual enhancers, according to levels of Med1 binding, provided multiple markers to identify the location of SEs. [26]ter in the same year, the Young lab wanted to identify a surrogate marker that would make it easier to identify SEs genome-wide.
They found that stitching and ranking enhancers using the presence of H3K27ac identified many Mediator-bound SEs. [58]Further characterisation expanded the list of SEs in mESCs but also paved the way to define cell-specific SEs and their mutation in human diseases, including Alzheimer's and cancer. [58]Interestingly, in many cancers, SEs were found to be associated with key oncogenes that were identified to contribute to tumour initiation and progression. [59]Therefore, targeting the epigenetic machinery that maintains active SEs to modulate or silence them has been considered a therapeutic option. [60]Also, further studies investigating the effects of individual enhancers that belong to an SE showed that a transcriptional hierarchy can be present, where one enhancer can be more powerful than the others in regulating the SEs activity.This was suggested after experimental evidence showed reduced signs of activity, acetylation and TF binding following the deletion of a specific individual enhancer element. [61,62]Moreover, recent studies displayed that the interactions within SEs were substantially stronger than the ones observed between conventional promoters and enhancers, and that three-dimensional organisation played a key role in the process. [63]Because of this, proteins involved in the 3D organisation of the genome can affect SEs activity, in a context-dependant fashion: an example of this is provided by the differential effect of cohesin depletion on SE activity in normal [64] and cancer cells, with stronger interactions in the latter scenario. [65]fferent mechanisms have been described to explain how newly oncogenic SEs are acquired in the context of cancer development, including focal amplification, [66] DNA mutations or insertion/deletion events, [67] structural changes to the chromatin, [68] activation by viral oncogenes [69] and genomic element rearrangements. [70]

EXTENSION OF PROMOTER SIGNATURES DEMARCATES BROAD H3K4 DOMAINS
In 2010, the group of Salvatore Spicuglia observed that a distinct cluster of promoters in CD4+ T-cells had an H3K4me2 signature The historical evolution in defining broad H3K4 methylation domains.From left to right, Broad peaks of H3K4me2 were found to identify a unique set of cell-identity genes (yellow ring) following K-means clustering analysis. [71]H3K4me3 was then used which identified that the top 5% of all peaks (yellow wedge after dotted line) possessed additional features leading to accelerated discovery of H3K4me3-BDs. [75]eight and size ranking methods have also been used with differing final cut off points (arrows to dotted lines). [77,79]Most recently, the use of two histone marks, H3K4me3 and H3K27ac, has further developed our understanding which identify active promoter regions with those over 2.5 kb (right of dotted line) identify H3K4me3-BDs (Figure produced using biorender.com).
that extended from the promoter and TSS across the gene body [71] (Figure 1).Using Gene Ontology (GO) term analysis, [72,73] they showed this signature uniquely marked T-cell specific genes with the remaining narrow H3K4me2 clusters enriched for metabolic processes.Strikingly, these broad H3K4me2 domains covered genes that were: The authors hypothesised that the extension of the histone signature into the gene body represented the presence of intergenic cisregulatory elements due to the co-binding of H3K4me1 and H3K27ac.Indeed, the subgroup of genes with broad H3K4me2 had notable enrichment of DNase I hypersensitive sites preferentially located 2 kb upstream from the TSS.A key finding in this study was the association of broad H3K4me2 and H3K4me3 domains.In 2011 Pekowska et al., followed up their initial findings by showing that a subset of H3K4me3 peaks also had a broad distribution across the gene body of mouse T-cell-specific genes in CD3+ thymocytes. [74]Interestingly, these regions of H3K4me3 binding overlapped with enhancer elements.To assess the deposition of H3K4me3 within enhancer regions during pre-TCR activation, they assessed the Cd8 gene locus.Enhancer elements in the gene locus and gene body acquired H3K4me3.Repeating similar ChIP-seq experiments in pre-pro-B and pro-B cells, which are at different stages of B-cell development, they found that Bcell developmental cis-regulatory elements acquired H3K4me2/3 yet a B-cell inactivated Cd3d enhancer acquired H3K4me1/2 but not H3K4me3.This suggests that trimethylation of H3K4 is an important step in the activation of some enhancers.Indeed in 2014 Benayoun et al. [75] described similar characteristics to Pekowska et al. [71,74] They described broad H3K4me3 domains (H3K4me3-BDs) that mark cell-type-specific genes with consistent expression using the broadest 5% of H3K4me3 peaks, a method also used by others [76] (Figure 1).Expression consistency of genes covered by H3K4me3-BDs was demonstrated using single-cell RNAseq data, whereby they found that these genes were not simply highly expressed by a subpopulation of cells, but by most of the population of cells of a given cell type.Given that genes with H3K4me3-BDs have functions related with cell-type identity, it does make sense that they are expressed in all cells.
By integrating different ChIP-seq data sets from mouse myotubes and mESC, worm embryos, fly S2 cells and human H1 human embryonic stem cells (hESCs), H9 hESCs, A549, GM12878, HeLa S3, HepG2 and K562 cells they also implemented Random Forest machine learning models to predict which additional histone modifications or DNA binding proteins are important in identifying H3K4me3-BDs. [75]For example, in the widely used and profiled human lymphoblastoid GM12878 cell line, they identified H3K4me2 as the second most important feature to H3K4me3 for correct classification of H3K4me3-BDs.
The defining features of H3K4me3-BDs underwent further evolution with the addition of stringent peak merging criteria in addition to size and ranked height H3K4me3 peaks [77] with others further stratifying into proximal and distal peaks. [78]More recently, the inflection point, the point at which the size of H3K4me3 peaks exponentially increases, has been used to identify intermediate and large regions [79] (Figure 1).We have developed methods which considers multiple histone modifications when defining H3K4me3-BDs. [80]We used statistical modelling with ChromHMM [81] to build a map of H3K4me3, H3K4me1, H3K27ac, H3K27me3, H3K9me3, and H3K36me3 binding across the genome.This chromatin state histone code was used to define domains of H3K4me3 > 2 kb (Figure 1).However, despite the different approaches used, all methods identify regions enriched for tissue or cell-type-specific biological processes.

Highly expressed genes
Regulating the expression levels of genes is vital in many genomic pathways, for example, it is important to control the expression of genes through the cell cycle to enable correct division of healthy cells.An increase in aberrant gene expression of oncogenes is a hallmark of cancer and the role of epigenetics in this process is not new.However, upon discovery of H3K4me-BDs a novel defining feature was that they were strongly associated with higher and more consistent [82] gene expression compared to regions of narrow H3K4me2/3 [71,82] (Figure 2).Human CD4+ T-cells [71,77] as well as multiple cells lines, hESC and mESC and neuronal progenitor cells, [82] all show this epigenetic phenotype.H3K4me3-BDs have also been shown to be implicated in gene dysregulation associated with the autoimmune disease systemic lupus erythematosus. [83]Whilst investigating systemic lupus erythematosus Zhang et al., quantitatively correlated the width of H3K4me3-BDs in primary monocytes with gene expression. [83]They found that every 1% increase in H3K4me3-BD width downstream of the TSS resulted in a 1.5% increase in expression.

Cell-type-specific genes
H3K4me-BD genes are also highly cell-type and tissue-specific (Figure 2).In a study comparing H3K4me3-BDs and non-broad H3K4me3 regions in pig, mouse and human they found that H3K4me3-BDs marked cell-type-specific genes in the brain and adipose tissue for all three species. [84]Further investigation into pig H3K4me3-BDs identified between 99 and 309 H3K4me3-BDs that shared orthol-ogous regions in human and mouse, indicating that they might be functionally conserved.Using ChIP-seq data from hESC H1 and mESC, Kurum et al. used computation inference to identify connections between H3K4me3-BDs and pluripotency within stem cells. [85]By modelling methods, the authors predicted which features were most important for the identification of pluripotency genes.They identified that OCT4 binding patterns were one of the most important predictors agreeing with OCT4 enrichment at SEs in ESCs. [26]Interestingly they showed that the greatest predictor in mESCs was in fact H3K4me3-BDs, which also ranked third in hESCs.In addition to ESCs, embryos represent an interesting model to investigate these domains due to the rapid differentiation of cell types.To this end it has been shown that H3K4me3-BD-marked genes increase in number through early development of mouse embryos and then reduce in number again in blastocyst stage. [86]H3K4me3 regions between 1 and 5 kb were highly dynamic during this process, broadening and narrowing.Strikingly the dynamic change was not observed as frequently from H3K4me3-BDs to H3K4me3 < 1 kb, suggesting that H3K4me3-BDs are more resistant to rapid alteration of H3K4 modification.H3K4me3-BDs were also more resistant to the co-occupation of repressive H3K27me3, and many H3K4me3-BDs disappeared during differentiation in agreement with their cell-type-specificity.
To better understand the impact of H3K4me3-BDs in the brain, researchers performed ChIP-seq and RNA-seq on cells from healthy mice, non-human primate and human brain tissue. [76]Using prefrontal cortex (PFC) neuronal cells and non-neuronal cells, they compared H3K4me3-BDs and found that cell-type-specific genes were 1.9-fold more enriched within H3K4me3-BDs than all the remaining 95% of H3K4me3 peaks.Increasing threshold stringency of H3K4me3-BDs from the top 5% to the top 1% of H3K4me3 peaks displayed even greater enrichment to 2.4-fold likelihood to be cell-type-specific.
Cell-type and tissue-specificity raises the question whether H3K4me-BD are inherited through cell division.Integrating H3K4me3 ChIP-seq data from cells across mouse oocyte development, Zhang et al. analysed the pattern of H3K4me3-BDs and narrow regions. [87]ey identified a similar profile of H3K4me3-BDs across the maternal alleles of zygotes and early two-cell embryos.The authors, along with others, [88] suggest this confirms the inheritance of H3K4me3-BDs in mouse pre-implantation embryos.Inheritance of H3K4me3-BDs represents a very interesting mechanism that could maintain the expression of genes that are responsible for defining cell identity.

Consistent expression
Sharp H3K4me3 peak promoters are dynamic, as previously discussed.
Dynamism in different cells of the same cell type leads to bursting of Pol II and inconsistent mRNA, which is referred to as transcriptional noise. [89]Transcriptional noise, however, can be buffered.H3K4me3-BDs are enriched at cell-identity genes and their location appears to provide these genes with the ability to reduce noise.Benayoun  Chromatin accessibility is limited as indicated by reduced space between the nucleosomes.Transcription at promoters is executed as bursts indicated by the red mRNA strands of uneven product.Sharp H3K4me3-bound promoters mark non-cell-identity genes in healthy cells but tumour suppressor gene (TSG) promoters are often dysregulated in malignant cells.Right: Broad H3K4me3 marks H3K4me3-BDs have more interactions within the 3D genome. [106]Chromatin accessibility is extensive and stretches across the gene body. [75]Transcription of genes covered by H3K4me3-BDs is more consistent and generally at higher levels as indicated by a broader black arrow. [79,82]Transcription is also more consistent within cell types as indicated by identical mRNA transcripts. [75]H3K4me3-BDs mark cell-identity genes [71,75,76,79,80] and TSGs [77,90] in healthy cells however; in malignancy, they mark oncogenes [79] as well as cell-identity genes (Figure produced using biorender.com).and mouse C2C12 and MEFs.Calculating the variance in expression relative to expression level for each gene, they found that H3K4me3-BDs genes had low transcriptional noise [75] (Figure 2).Furthermore, disrupting H3K4me3-BD breadth led to increased noise and a marked reduction in transcriptional consistency. [75]

NARROWING AND BROADENING OF H3K4ME3 DOMAINS MARK DIFFERENT ONCOGENIC PROCESSES Tumour suppressor genes
In addition to their association with vital healthy processes, the appearance and disappearance of H3K4me-BDs is linked to disease.We will focus first on the oncogenic connection with H3K4me-BDs and TSGs and their role in suppressing cancer development.
TSG expression is required for the appropriate regulation of healthy cells.Having established that H3K4me3-BDs overlap with consistently expressed genes (Figure 2), the analysis of multiple different data sets confirmed that TSGs are enriched for the presence of H3K4me3-BDs, specifically the broadest H3K4me3 peaks. [77,79] a study by Dhar et al. they used Cre-Lox recombination to knockout Kmt2d in the mouse brain, an important member of the KMT2 family of lysine methyltransferases responsible for methylation of H3K4.
The knockout of Kmt2d is sufficient to reduce H3K4me1, H3K27ac and H3K4me3 [90] with the most dramatic reduction seen at TSGs with H3K4me3-BDs.Kmt2d knockout mice developed spontaneous medulloblastoma demonstrating the importance of H3K4me3-BDs in maintaining expression of TSGs.

Oncogenes
The oncogenic process is multifaceted and complex (reviewed in ref. [91]).Commonly, an early step in oncogenesis is the activation of a gene, for example, via mutation, which can provide fitness to a given cell in a population.These genes, oncogenes, are catalogued in databases such as COSMIC and include transcription factors and cytoplasmic tyrosine kinases. [92]To understand the role that H3K4me3-BDs play in activating oncogenes, a study by Belhocine et al. compared the locations of H3K4me3-BDs in healthy and malignant T-cells.They observed a higher number of H3K4me3-BDs in the cells derived from patients with T-cell acute lymphoblastic leukaemia (T-ALL) compared to the healthy T-cell precursors [79] (Figure 2).This increase in H3K4me3-BDs may be because of aberrant gene expression and indeed many overlap with driver-oncogenes that provide the cell with fitness.Kyoto Encyclopedia of Genes and Genomes (KEGG) pathway analysis tool is a high-powered method to attribute biological processes to many genes.The term 'pathways in cancer' is used to describe a protein interaction network that has been described as oncogenic. [93]en et al. identified H3K4me3-BDs in CD4+ T-cells and found then to be enriched at genes associated with this term. [77]In K562, a cell line derived from a patient with chronic myelogenous leukaemia, H3K4me3-BDs were also enriched for leukaemia-associated genes in another study. [78] B-cells, double-strand DNA breaks are initiated by the recombination-activating gene (RAG) complex upon binding to recombination signal sequences (RSS) that subsequently assemble gene segments in the immunoglobulin loci, a process called V(D)J recombination. [94]The RAG complex has been shown to bind these sequences through direct interaction with H3K4me3 [95] at regions of extended chromatin accessibility. [96]Through investigations into genomic instability at RSS motifs, the sites of V(D)J recombination, Heinäniemi et al. found that genes with RSS motifs had frequent breakpoints. [97]These genes also had a significantly broader TSS, extended open chromatin and broad H3K4me3 signal.H3K4me3-BDs have previously been shown to have increased Pol II pausing, [75] and therefore Heinäniemi et al. suggested that overlap with Pol II stalling sites could be the cause of translocation events at vulnerable RSS of broad H3K4me3 marked DNA. [97]However, Pol II pausing at H3K4me3-BDs remains to be fully explored with others reporting the inverse relationship. [77,79]inäniemi et al. also observed that breakpoints found within the PAX5 gene had transcription from both strands (convergent transcription), which overlapped a significantly stretched region of H3K27ac, uncovering the presence of an intragenic enhancer.However, the chromatin signature of PAX5 is an H3K4me3-BD overlapping a SE, a sub-group of stretched histones regions which remain poorly understood.
We, together with Belhocine et al. linked for the first time the presence of H3K4me3-BDs to activated oncogenes. [79,80]Moreover, we showed that the location of H3K4me3-BDs is strongly associated with SEs and that upon genomic rearrangement in cancer, H3K4me3-BDs are generated over genes leading to constitutive activation. [80]

BROAD H3K4ME3 DOMAINS FORM CHROMATIN INTERACTION HUBS
Growing evidence suggests that regions of chromatin, including enhancers, form condensates which partition into droplets containing high levels of transcriptional activation proteins such as BRD4 and MED1. [98]KMT2D is a lysine methyltransferase specific to H3K4 and has been attributed to shaping the enhancer landscape through recruitment of CBP/p300 [99] and H3K4me1. [11]Dhar et al., describe that KMT2C/D knockout leads to downregulation of expression at H3K4me1 covered genes which were bound by KMT2C/D in the wild type setting. [100]Whilst investigating the balance of activating and repressive histone marks in Kabuki syndrome, Fasciani et al. revealed that mutant KMT2D mesenchymal stem cells displayed a reduced number of KMT2D/BRD4/MED1 condensates further strengthening the suggestion that enhancers form condensates containing KMT2D. [101]rthermore, the knockdown of Wdr5, a core catalytic subunit of KMT2A/B (MLL1/2), KMT2C/D and KMT2F/G complexes, [102] is sufficient to reduce H3K4me1 but also H3K27ac levels genome wide [75] which is a key histone modification at active enhancers, SEs, active promoters and H3K4me3-BDs.This effect is likely caused by the inhibition of KMT2C/D, which is more greatly associated with H3K4me1 at enhancers but provides evidence of interaction for KMT2D and enhancers forming condensates. [103,104]Interestingly, the knockdown of Wdr5 reduced the width of H3K4me3 regions, likely due to reduced KMT2A/B activity. [104]Strikingly, however, the global H3K4me3 level was not significantly altered in agreement with others. [105]Dhar et al. also interrogated the 3D interactions of TSGs, Dnmt3a and Bcl6, after kmt2d knockout.They discovered that in addition to a reduced promoter signature retreating from the gene body, these TSGs had significantly reduced interactions with their SEs.Taken together, the evidence points to the KMT2 family as being a key regulator of H3K4me3-BDs and SEs.The effect of reduced H3K4me3-BD width because of KMT2D loss is likely due to reduced interactions with the nearby SE given that global H3K4me3 is unaffected.
Using a genome-wide approach, Thibodeau et al. analysed 3D interaction networks (derived from ChIA-PET data), and identified that H3K4me3-BDs and SEs are the most connected elements in the 3D genome (Figure 3).They and others [77] also observed a significant enrichment of interaction frequency within and between H3K4me3-BDs and SEs. [78,106]Using higher resolution 3D interaction data (Hi-C data), it is possible to capture hubs of open and interacting chromatin (HOCIs). [107]Li et al. identified that SEs and H3K4me3-BDs had the greatest enrichment within HOCIs and a shorter distance of interaction than that of other regulatory elements within hubs, such as promoters and enhancers.Taken together with the work of Thibodeau et al. this suggests that active H3K4me3-BDs are impacted by the absence of KMT2C/D, potentially in condensates where they may share transcriptional machinery, which has significant impact on the ability to maintain chromatin interactions with SEs.
As SEs and H3K4me3-BDs have been better characterised genomewide, we now understand that they are present within highly connected chromatin environments.However, their consideration as independent signatures have been called into question.As discussed above KMT2D knockdown can affect H3K27ac levels which may suggest an alternative mechanism of action or the presence of an interaction pathway involving TFs. [108]For example, H3K27ac regions in the vicinity of the TSS of zygote activation genes were also shown to be located within or close to H3K4me3-BDs. [88]Further studies into the genomic location of previously defined SEs and H3K4me3-BDs in CD4+ Tcells, [77] MCF-7, K562 and GM12878 cell lines [106] found that SEs and H3K4me3-BDs very often overlapped in the genome with the main exception to the rule being H3K4me3-BDs covering TSGs which exist independently of SEs.With H3K4me3-BDs and SEs appearing independently or co-existing within the same region of the genome, it is important that we better understand the contribution these regions play in gene regulation.

SUPER-ENHANCER HIJACKING AS A MECHANISM OF TUMOURIGENESIS
Chromosomal rearrangements are a hallmark of cancer, and they comprise translocations, inversions, duplications and deletions. [109]In the case of SE involvement, the movement of genomic material often leads to the dysregulation of the physiological activity of the SE, juxtaposing it next to a proto-oncogene, activating the oncogene and ensuring high levels of expression, which drives the development of the disease. [110]This phenomenon is known as 'super-enhancer hijacking' and it has been characterised in a variety of human cancers, including F I G U R E 3 H3K4me3-BDs form chromatin interaction hubs with SEs.Left: A simplified depiction of a promoter (P) and an enhancer (E) interaction network.Right: H3K4me3-BDs interact more frequently with one another (thicker dashed line) involving more frequent interactions with SEs and additional genetic elements such as genes and enhancers, represented as nodes (Figure produced using biorender.com).lymphoma, [111] neuroblastoma [112] and colorectal cancer. [113]Important examples of this mechanism of tumorigenesis are provided by translocations involving the proto-oncogene MYC, resulting in its overexpression, cancer progression and poor outcome. [114,115]Interestingly, a SE that naturally regulates MYC (MYC-SE, located 1.7 Mb downstream of MYC) in healthy cells is also involved in genomic rearrangements leading to tumorigenesis; for example, the translocation t(3;8)(q26;q24) found in a subgroup of acute myeloid leukaemia (AML) cases, results in the overexpression of EVI1 [116] and is associated with a poor outcome for this subgroup of AML patients. [117]rther investigations by the Delwel group identified that the dysregulated expression of EVI1 in AML was caused by the hijacking of the MYC-SE, which was then responsible for driving high levels of EVI1 transcription. [118]In the same study they also tried to dissect the mechanism by which this MYC-SE was able to interact with the EVI1 gene and through chromatin capture techniques like Chromosome Conformation Capture sequencing (4C-seq) and Assay for Transposase-Accessible Chromatin followed by sequencing (ATAC-seq).They were able to show interactions between the EVI1 promoter and different modules forming the MYC-SE.Intriguingly, they also highlighted how important the chromatin structure is for this type of interaction by investigating the effect of disrupting nearby CTCF binding sites and the enhancer hub within the SE responsible for the interaction with CTCF.In both cases, EVI1 expression was significantly reduced, highlighting the important cooperation between SEs and chromatin structure. [118]nce the structural organisation of the genome is clearly important for SE-promoter interactions, it would be essential to characterise the epigenomic and 3D organisation of rearranged genomes of interest, to better understand the molecular mechanisms underlying the interaction between hijacked SE and the proto-oncogene promoter with the potential to highlight new targets for future therapies.
To begin to understand how SE hijacking leads to the overexpression of oncogenes, we used publicly available data from healthy and malignant lymphoid cells, analysing the chromatin states and DNA accessibility in regions of the genome that frequently translocate in patients with haematological malignancies. [80]This revealed that active SEs are associated with H3K4me3-BDs in their wild-type location, however, when cut and pasted next to proto-oncogenes, retain the ability to lay down H3K4me3-BDs in their new location.
Genomic translocation events are coupled with an epigenomic translocation resulting in an H3K4me3-BD covering the proto-oncogene gene body, increasing both chromatin accessibility and gene expression (Figure 4).
Having characterised the epigenomic changes associated with oncogenic translocation, we next wanted to understand the 3D genome consequences of such an event.However, despite recent advances in reducing the primary material required to perform most common chromatin capture and characterisation techniques, it is challenging to characterise the 3D landscape of genomic rearrangements using primary patient material. [119,120]To address this problem, we proposed an in silico approach to predict the consequences of a common immunoglobulin translocation, IGH-CCND1, found in mantle cell lymphoma (MCL) and multiple myeloma (MM). [121,122]By combining publicly available datasets on DNA accessibility, CTCF protein binding and histone modifications, the 'highly-predictive heteromorphic polymer' (also called HiP-HoP) [122] model successfully predicted the wild-type genome organisation and interactions within the CCND1 locus in the healthy lymphoblastoid cell line, GM12878, when compared to high-resolution Hi-C data. [123]Further simulations in cell lines harbouring IGH-CCND1 rearrangements involving the IGH SEs, Eα1 and Eμ in U266 and Z138 respectively, confirmed the requirement of an epigenomic translocation event to enable proto-oncogene activation.These simulations also provided novel insight into additional interacting regions within the oncogene gene body and new interactions that were generated within the local 3D genome. [80,123]More experimental data is required to validate these predictions and to fully support the notion of super-enhancer hijacking in more types of cancer.However, these recent studies open the possibility to target the 'genomic roots' of the translocation, rather than trying to ameliorate the phenotype generated by it.This would be particularly useful considering personalised medicine and therapies tailored to the patient, especially for those subgroups that associate with poor outcome after treatments with common approaches.
F I G U R E 4 SE hijacking and H3K4me3-BD generation.In healthy cells (top panel) super-enhancers (SE) and broad H3K4me3 domains (H3K4me3-BD) interact to regulate cell-identity genes (top left) and proto-oncogenes are tightly regulated; in this image, the proto-oncogene is not expressed (top right).However, SEs and H3K4me3-BDs can translocate leading to a malignant cell phenotype (lower panel).The juxtaposition of SEs to proto-oncogenes generates H3K4me3-BDs with new interactions between proto-oncogenes and SEs resulting in deregulated expression and conversion to oncogenes. [123]Adapted from ref. [80] (Figure produced using biorender.com).

FUTURE CONSIDERATIONS
H3K4me3-BDs are comparatively understudied versus SEs with whom they are commonly associated with.It is vital that we uncover how H3K4me3-BDs are generated.They appear in healthy and malignant cells, and therefore it is likely more than one mechanism to generate them exists.It has previously been shown that interactions are enriched between H3K4me3-BDs and SEs, [78,106] and the juxtaposition of a SE near to a proto-oncogene is enough to generate an H3K4me3-BD. [80]However, it is unclear whether all H3K4me3-BDs connect with a SE via chromatin interactions.These connections could be mediated by the strength of insulation of SEs which has previously been shown to differ across the genome. [124]The disruption of insulation and topologically associated domains (TADs) has also been suggested to affect SE interactions in a disease setting whereby the genomic aberrations alter TAD boundaries and create new SE connections. [80,125]H3K4me3-BDs are a feature of a highly expressed, highly connected gene but there are some H3K4me3-BDs that do not fall near a known-gene. [78]It is therefore likely that subgroups of H3K4me3-BDs exist and alternative mechanisms for their generation require further study.Indeed, some H3K4me3-BDs overlap SEs and others do not. [78]We hypothesise that overlapping and independent H3K4me3-BDs and SEs will bind distinct subsets of TFs, which remains unexplored.As discussed H3K4me3-BDs and SEs share sensitivity to KMT2D knockdown [90] it would be interesting to uncover the effect of overlapping H3K4me3-BDs and SEs in this context.
In mouse cerebella 64.07% of KMT2D ChIP-seq peaks were found to fall in genic regions, of these, 39.11% fell within the gene body. [90]creased gene body binding of KMT2D could induce the generation of H3K4me3-BDs.Dhar et al. showed KMT2D density within H3K4m3-BDs was the greatest ±4 kb from the TSS with a comparatively reduced density at the TSS. [90]Another study found that KMT2B was bound to 70% of promoters and only 14% at gene bodies in mESC, [126] suggesting a difference in binding of KMT2D and KMT2B.As discussed, the disagreement in the literature regarding the role of H3K4me3-BD in transcription elongation requires further investigation to understand the broad nature of these domains.
Targeted treatments for cancer subtypes are becoming increasingly available to patients in the clinic, and research into the sensitivity of H3K4me3-BDs and their associated proteins is beginning to be explored, which could elucidate further treatment options (Figure 5C).
2-(4-methylphenyl)−1,2-benzisothiazol360 3(2H)-one (PBIT) inhibitor of lysine-specific demethylase 5 family (KDM5) has recently shown promise in targeting cells driven by H3K4me3-BDs. [79]Jurkat cells, an immortalised T-cell lymphocyte line, [127] treated with PBIT showed an increase in H3K4me3.However, closer inspection showed that the location of increased H3K4me3 was at genes involved in Tcell differentiation and haematopoiesis.In addition, the authors assessed the gene expression changes in the T-acute lymphoblastic leukaemia cell line, Loucy, [128] before and after treatment with PBIT. [79]They found that genes covered by an H3K4m3-BD showed a greater up or downregulation in expression levels, compared to genes with sharp H3K4me3 peaks, which demonstrates a greater F I G U R E 5 Experimental approaches to abrogate super-enhancer hijacking related gene deregulation.Cas9 (A) or dCas9 conjugated with inhibitory proteins (B) can specifically target core sequences within SEs involved in H3K4me3-BD interactions.Via removal of DNA sequence (A), epigenetic silencing through H3K9me3 (B), or the use of selective inhibitors to proteins involved in transcriptional activity of SEs, we hypothesise that the interactions between SEs and H3K4me3-BDs will weaken, and the H3K4me3-BD will narrow, leading to reduced gene expression.This will selectively target malignant cells with SE translocated genes (Figure produced using biorender.com).sensitivity of H3K4me3-BDs to epigenetic modulation.Further review of H3K4me3-BD drugs can be found in ref. [12]   Gene editing techniques like CRISPR offer exciting experimental approaches to better understand how H3K4me3-BDs are generated, maintained, and interact with SEs in both the healthy and malignant setting.The most common CRISPR technology is the active CRISPRassociated-protein (Cas) 9 which, through the addition of custom single guide RNA molecules (sgRNA), can cut DNA strands to create small insertion or deletion events in specific regions of the genome. [129]We hypothesise that the use of Cas9 can remove core, essential regions that interact with H3K4me3-BDs (Figure 5A).
A challenge in designing sgRNAs to unknown SE regions of interaction can also be overcome by modelling interactions utilising tools such as HiP-HoP polymer modelling. [123]By removing sequences of DNA predicted to be vital for SE-H3K4me3-BD interaction, we hypothesise that this will abolish TF binding motifs.An alternative approach to the removal of DNA sequence is to 'silence' the epigenome of essential regions (Figure 5B).Mutations in the catalytic domain of Cas9 create dCas9.Fusion of inhibitory proteins, such a KRAB to dCas9 can be used as a targeted gene silencing method. [130]In combination with sgRNAs the deposition of inhibitory histone marks such as H3K9me3 can close chromatin to inhibit TF binding and reduce the expression of genes and non-coding elements. [131]We propose that silencing a region maintains the structural integrity of DNA and insulating TADs.We hypothesise that both approaches discussed will inhibit interactions between SEs and H3K4me3-BDs in malignancies with chromosomal translocations.In addition, the H3K4me3-BD may retreat from the gene body, but we expect a marked reduction in oncogene expression, reducing proliferation, enabling differentiation or programmed cell death.
In summary, accumulating evidence shows that physical, functional, and regulatory connections exist between H3K4me3-BDs and SEs.We believe that efforts should be focused on new experimental approaches to better understand this relationship in healthy and disease settings.
Research into the mechanism of sharp to broad H3K4me3 domain transition is needed before we learn how to target these regions therapeutically.

1 .
more highly expressed in T-cells compared to other tissues, 2. had the highest gene expression levels in T-cells, 3. had the greatest Pol II binding in T-cells, and 4. clustered tightly with other activating histone marks across the gene body such as H3K4me1, H3K4me3 and H3K27ac.
et al. assessed single-cell RNA-seq data from human HCT116, ESC, LNCaP F I G U R E 2 Comparison of sharp and broad H3K4me3 domain characteristics and location.Left: Sharp peaks of H3K4me3 marks gene promoters.Black arcs indicate interactions within the three-dimensional (3D) genome.