Comprehensive epigenetic analyses reveal master regulators driving lung metastasis of breast cancer

Abstract The lung metastasis of breast cancer involves complicated regulatory changes driven by chromatin remodelling. However, the epigenetic reprogramming and regulatory mechanisms in lung metastasis of breast cancer remain unclear. Here, we generated and analysed genome‐wide profiles of multiple histone modifications (H3K4me3, H3K27ac, H3K27me3, H3K4me1 and H3K9me3), as well as transcriptome data in lung‐metastatic and non‐lung‐metastatic breast cancer cells. Our results showed that the expression changes were correlated with the enrichment of specific histone modifications in promoters and enhancers. Promoter and enhancer reprogramming regulated gene expression in a synergetic way, and involved in multiple important biological processes and pathways. In addition, lots of gained super‐enhancers were identified in lung‐metastatic cells. We also identified master regulators driving differential gene expression during lung metastasis of breast cancer. We found that the cooperations between regulators were much closer in lung‐metastatic cells. Moreover, regulators such as TFAP2C, GTF2I and LMO4 were found to have potential prognostic value for lung metastasis free (LMF) survival of breast cancer. Functional studies motivated by our data analyses uncovered an important role of LMO4 in regulating metastasis. This study provided comprehensive insights into regulatory mechanisms, as well as potential prognostic markers for lung metastasis of breast cancer.

Lines of evidence have suggested that abnormal epigenetic alterations could perturb the transcription regulatory program during cancer development and metastasis. 3 A major component of epigenetic regulation is histone modification that affects the accessibility of cis-elements, thus influences the recruitment of transcriptional regulators. 4 For example, histone methylation induced by histone methyltransferase SMYD3 was required for the MRTF-A-mediated transactivation of MYL9 via promoter binding, and promoted migration of breast cancer cells. 5 In addition, enhancers defined by H3K27ac and H3K4me1 reprogramming were also found to have effects on promoting cancer metastasis. 3 Moreover, computational analysis of global histone modification profiles could provide a complete picture of chromatin structure in specific cells, and facilitate the prediction of active cis-elements and transcription regulatory network. For instance, specific networks of transcription factors (TFs) in different human monocyte subsets were identified by the integration of genome-wide histone modification data and gene expression data. 6 Also, using global epigenetic data, tissue-specific regulatory circuits were predicted by computationally linking TFs to promoters and enhancers. 7 In addition, novel drivers of hepatocellular carcinoma were recently identified by integrating epigenetic marks with transcription data. 8 Although many previous studies had explored the whole-genome histone modification profiles of non-metastatic breast cancer subtype, 9,10 13 However, the changes of chromatin structure of whole genome and the specific regulatory network during lung metastasis of breast cancer were still poorly understood. In addition, given the fact that drugs targeting epigenetic factors hold vast potential in therapy of metastatic cancer, 14,15 the genome-scale epigenetic analysis will provide data and theoretical support for these therapeutic strategies.
In this study, we analysed the chromatin remodelling and transcriptional changes during lung metastasis of breast cancer by integrating ChIP-Seq data of multiple histone modifications and RNA-Seq data. Genome-scale cis-elements and master regulators were identified in lung-metastatic cells. We found that multiple biological processes and pathways were reprogrammed by chromatin remodelling in lung metastasis of breast cancer. Our study provided a comprehensive insight into the whole cistrome in the lung-metastatic breast cancer cells, as well as data resource for the development of therapeutic strategies based on epigenetics.

| Cell culture
Both MDA-MB-231 and LM2-4175 cell lines were obtained from ATCC and cultured in DMEM (Thermo Fisher Scientific) supplemented with 10% FBS (Gibco) at 37°C and 5% CO 2 in a humidified incubator.

| ChIP-Seq
For chromatin immunoprecipitation, MDA-MB-231 and LM2-4175 cells were harvested and performed by ChIP-IT High Sensitivity kit (Active Motif) according to manufacturer's instructions. Briefly, the cross-linked chromatin was sonicated into a size of 200-500 bp fragments. The sheared chromatin was immunoprecipitated using antibodies ( Table S1). All of the ChIP-Seq reads were mapped to the unmasked human reference genome (hg19) using Bowtie 2.0 16 with default parameters. Only uniquely mapped reads were retained.
ChIP-Seq peak calling was performed using MACS v2.0.10 software, 17 with"-broad option". Regions with q < 0.01 were identified as peaks. For each cell line, the inputs were used as control data. The nearest RefSeq gene was assigned to each peak. and q < 0.05 were kept.

| Average density profile of histone marks
The average tag density of histone modifications around transcription start site (TSS) ±3 kb of genes with different expression levels was calculated and showed. Briefly, in each cell line, all genes were categorized into 10 groups by ranking their expression values. Genes in group 1 had a top 10% expression level of the whole transcriptome, and so on. The TSS ±3 kb region of each gene was split into 200 bins, and tag density (tags per Kilobase per Million) in each bin was calculated. We averaged the tag density of each group and plotted the profile using R scripts.

| Identification and analysis of promoter state
In each cell line, we defined TSS ±2 kb as promoters, and identified the state of each promoter according to the dominant histone modification on it. Promoters dominantly modified by H3K4me3 and H3K27ac were identified as active promoters. Repressive promoters were defined by enrichment of H3K27me3. In addition, promoters enriched by both active markers (H3K4me3 or H3K27ac) and repressive marker (H3K27me3) were considered to be poised. Promoters without any histone modification enrichment were classed as 'None' state. The detailed thresholds were listed in Figure S3A.

| Identification and analysis of enhancer and super enhancer
The active distal enhancers of MDA-MB-231 and LM2-4175 were identified by H3K27ac peaks located at least 2000 bp away from TSS. The gained and lost enhancers in LM2-4175 were identified using the 'getDifferentialPeaks' script in HOMER software. 22 Enhancers showing at least fourfold tag count differences between two cell types and P < 0.0001 were considered to be differential. In addition, we identified super-enhancers, which were regions comprising multiple enhancers and collectively bound by an array of transcription factors. Super-enhancers were identified using Rank Ordering of Super-enhancers algorithm (ROSE). 23 Briefly, H3K27ac peaks within 12.5 kb were stitched together as candidate super-regions. Then, we ranked all the stitched regions by increasing read counts. Super-enhancers were defined as the sites whose signals were higher than the inflection point of curve.

| Analysis of clinical data
We combined clinical data of 404 samples from three independent public datasets, including GSE2034, 29 GSE2603 13 and GSE5327. 30 Both ER+ (240 samples) and ER-(164 samples) patients were included. There were 68 patients with lung metastasis among them.
Others patients were without any metastasis. Using nonnegative matrix factorization (NMF) method, these samples were unsuper- mator was applied to estimate the LMF survival for the two groups, and the differences were analysed using the log rank test. Survival analysis was conducted by R package 'Survival'.

| Motif enrichment
We collected the position weight matrix (PWM) of 662 TFs from previous study, 7 and scanned these known motifs in cell-line-specific active promoters and enhancers. The P-value of motif scanning was calculated by 'findMotifsGenome' script in HOMER software. 22 Using a relatively strict threshold, motifs with P-value less than 10 −10 in at least one dataset were presented. Only TFs which were differentially expressed were shown.

| Network analysis
Genes associated with promoters/enhancers which contained significant motifs of TFs were identified as potential targets. Then cell-specific TF-target networks were constructed using cytoscape 3.0. 31 The network nodes represented TFs or target genes, and edges represented proximal or distal regulation. We disassembled the network into modules using MCODE tool. 32 Jaccard index (JI) score was used to measure the co-localizations of pairwise TFs.

| Analysis of enriched hallmarks of cancer
The GO terms and genes that associated with hallmarks of cancer were obtained in a previous study. 33 In each hallmark, we measured the percentage of genes with the differential promoter, enhancer or expression in lung-metastatic cells and showed it in a pie plot.

| Functional validation of LMO4
Molecular experiments were performed to determine the function of LMO4. Details of quantitative real-time PCR, Western analysis, RNA-mediated interference and cell migration assay were described in Supplementary Methods and Table S2.

| MDA-MB-231 and LM2-4175 cell lines are suitable models for analyzing lung metastasis of breast cancer
In the attempt to assess the recapitulation of real process in lung

| Perturbation of chromatin landscape drives differential gene expression in lung metastatic cells
To investigate the global chromatin remodelling during lung metastasis of breast cancer, multiple histone modifications including H3K4me3, H3K27ac, H3K4me1, H3K27me3, H3K9me3 and Pol-II were profiled using ChIP-Seq assay in MDA-MB-231 and LM2-4175 (Table S4).
We explored the correlation between gene expression and dynamic changes of chromatin at gene promoters. As shown in Figure 1C, it was evident that genes with higher expression value had more enrichment of H3K4me3, H3K27ac and H3K4me1, but less enrichment of H3K27me3 and H3K9me3 on their promoters; whereas genes with lower expression value had more enrichment   (Table S5).  Figure 2C and Figure S3C). CD70, as a member of tumour necrosis factor (TNF) ligand family, had been repeatedly reported to involve in tumour proliferation, invasion, metastasis and T cell immunity. 34 Importantly, CD70 was considered as an emerging target in cancer immunotherapy. 35 Our results showed that CD70 had an accessible promoter and actively expressed in lung-metastatic breast cells, implying the importance of CD70 and providing a potential diagnosis and therapy biomarker.

| Enhancer reprogramming contributes to expression changes in lung metastasis
Lines of evidence showed that not only the promoter states could  Figure S4B).

| Gained super-enhancers promote lung metastasis of breast cancer
Moreover, we revealed that the super-enhancers were differentially distributed between MDA-MB-231 and LM2-4175. For example, many genes such as MEF2A, FOXP1, JUN and TGFBR2 gained new super-enhancer in LM2-4175 ( Figure 3A,B). Significantly, up to 970 super-enhancers were newly formed in lung-metastatic cells, indicating that the chromatin structure was turned to be more accessible in metastatic cells ( Figure 3C). As shown in Figure 3D, KHDRBS3 gained a super-enhancer on its downstream region of TSS, and significantly up-regulated in LM2-4175. KHDRBS3 was previously reported to enhance stemness and metastasis in basal-like breast cancer. 36 What is more, MEF2A gained a contiguous super-enhancer on its gene-body, and was also up-regulated in LM2-4175. MEF2A was previously found to promote epithelial-mesenchymal transition (EMT) and invasiveness of hepatocellular carcinoma. 37 Importantly, we found that some genes associated with gained super-enhancers were differentially expressed between non-lungmetastatic and lung-metastatic patients, and related to clinical outcome. As shown in Figure 3E, 18 genes that located near gained super-enhancers were found to be significantly up-regulated in lung-metastatic patients. Furthermore, 14 of these genes had obvious prognostic significance for LMF survival, as the patients with high expression showed more probability of lung metastasis. The survival analysis of KHDRBS3 and MEF2A were shown in Figure 3F.
Therefore, the accessible chromatin structure resulted from superenhancer reprogramming enables the activation of multiple genes for promoting lung metastasis.

| Promoter and enhancer remodelling disrupt multiple functions and pathways in lungmetastatic process
We hypothesized that genes influenced by chromatin changes of both promoter and distal enhancer might play important roles in lung metastasis of breast cancer. Function enrichment analysis suggested that these genes were mainly involved in five classes of biological function, including cell migration, vascular system development, mesenchymal cell proliferation, regulation of muscle cell differentiation and neurogenesis ( Figure 4A). As angiogenesis, EMT, mesenchymal cell proliferation and migration are indispensable processes which lead to metastasis, targeting the involved genes through epigenetic intervention will possibly inhibit these important pathways of metastasis. In addition, genes involved in nervous system development were also found to be epigenetically reprogrammed.
Interestingly, the influences of the nervous system in non-nervous system cancers were paid little attention. A recent review highlighted the relationship between neurogenesis and tumour microenvironment of prostate, pancreas, stomach and skin cancer. 38 Our epigenetic analysis implied that nervous system development might have potential importance in the microenvironment changes of lung metastasis of breast cancer.
Furthermore, multiple signalling pathways were discovered to be influenced by chromatin reprogramming ( Figure S5).

| Identification of regulators driving differential gene expression in lung metastasis
To identify regulators that are most important for describing lung metastasis of breast cancer, we analysed the core transcription regu-  Table S6. The node border was used to present the expression changes, where the red border indicates up-regulated genes and the blue border indicates down-regulated genes. Red edges represent the proximal (promoter) regulation, blue edges represent the distal (enhancer) regulation and green edges represent that both proximal and distal regulations exist. Solid edges represent the LM2-4175-specific regulation, and dashed edges represent the MDA-MB-231-specific regulation LMO4, and most of these factors were up-regulated in LM2-4175.
Both POU2F2 and TFAP2C are proved critical regulators of tumorigenicity, EMT and metastasis, [39][40][41][42] suggesting the reliability of our epigenetic analysis for identifying master regulators. However, the function of LMO4 in lung metastasis was rarely reported, and still needed further evaluation.
In an attempt to predict the regulation relationship between TFs and target genes associated with active promoters and enhancers, a bioinformatic framework was designed to analyse the regulatory network that driving differential expression during lung metastasis and explore potential co-occupancy or cooperation between regulators ( Figure 5B). Briefly, active promoters and enhancers were identified according to the enrichment of multiple histone modifications as mentioned above. We scanned the active promoters and enhancers using available PWMs of motifs. Genes associated with promoters/enhancers which contained motifs of TFs were identified as target genes. And then cell-specific TF-target networks were constructed. The pairwise co-localizations between factors were quantified to analyse the changes of interaction among regulators during lung metastasis ( Figure 5B). To visualize different features, we combined the MDA-MB-231 and LM2-4175 specific network, and illustrated multiple different data types within a single network.
Both proximal (promoter) and distal (enhancer) regulatory were presented, and expression changes of TFs were also annotated. The whole network was split into modules based on the network topology structure ( Figure 5C). The regulation relationships between TFs and target genes were listed in Table S6.
We predicted the interactions between TFs based on their  Figure 6C and Figure S6).

| LMO4 plays an important role in the regulation of EMT and migration
According to our above results, TF LMO4 was found to gain active promoter and super-enhancer, resulting in activated expression in LM2-4175( Figure 1D). Moreover, our regulatory network analysis also indicated that LMO4 might play an important role in driving differential expression of downstream target genes and actively involving in TF-TF interaction in LM2-4175 ( Figures 5,6A,B).
Importantly, high expression of LMO4 was proved to be associated with poor outcome of breast cancer patients ( Figure 6C). Thus, we speculated that LMO4 might play an important role in regulating lung metastasis of breast cancer. And molecular experiments were performed to validate its biological functions.
We knocked down LMO4 in LM2-4175 cells with siRNA transfection. Both the protein and mRNA levels of LMO4 were significantly decreased in transfected cells compared with siNC ( Figure 7A,B).
Furthermore, expression levels of predicted target genes of LMO4 were decreased after knock-down of LMO4 ( Figure 7C). Importantly, genes involved in EMT were also found to be down-regulated in LMO4 decreased LM2-4175 cells, suggesting that LMO4 may regulate the EMT process in breast cancer lung metastasis ( Figure 7D). In addition, cell migration ability after LMO4 knocking down was also confirmed by transwell assay. It was shown that the migration ability was strikingly inhibited in LMO4 decreased cells ( Figure 7E). Overall, these results suggested that LMO4 played an essential role in regulating cell migration and EMT in lung metastasis of breast cancer.

| D ISCUSS I ON
The comprehensive epigenetic study reported here identifies the whole cistrome in the lung metastasis process of breast cancer cells, and elucidates how the interplay between TFs and chromatin cis- These analyses provided comprehensive regulatory network and potential regulators that might be involved in regulating lung metastasis of breast cancer.
In this study, we applied ChIP-Seq and RNA-Seq assays to anal-  (PRO-seq) and ChIP-exonuclease (ChIP-exo). We analysed the overlap between our data and the HCC1806 cell data (Table S7). Results showed that a great number of histone modifications peaks were overlapped between HCC1806 and MDA-MB-231/LM2-4175 cell lines. However, a relatively small number of overlapped top-expressed genes between them were found, possibly because that the According to our results, many biological functions and pathways, including cell migration, angiogenesis, immune response and mesenchymal cell proliferation, were epigenetically reprogrammed in lung-metastatic breast cancer cells. Therefore, therapies targeting epigenetic factors are likely to improve many aspects and be effec- Apart from providing a reference resource, the integrated analysis identified potential biomarkers for therapy and prognosis of lung metastasis of breast cancer. For example, lung-metastatic breast cancer cells showed an increased global level of H3K4me3 and decreased level of H3K27ac. Some corresponding enzymes that regulate histone methylation and acetylation were also found to be differentially expressed, and could possibly to become indicators for predicting lung metastasis. In addition, lots of gained super-enhancers were identified in lung-metastatic breast cancer cells. We found that genes associated with gained super-enhancers were observed to have potential prognostic value for lung metastasis of breast cancer. Accumulating evidence point to the critical role of super-enhancers play in cancer progression. 49 Besides, there have been many attempts to use super-enhancer profiles for prognosis and therapy of cancer. 49 Our data resource and results provided directions for further exploring the clinical implications of super-enhancers in breast cancer metastasis. Especially, LMO4 was found to gain active promoter and super-enhancer in LM2-4175, and patients with highly expressed LMO4 showed increased probability of lung metastasis. A series of experiments also proved the functions of LMO4 in promoting EMT and invasion. Furthermore, in lung-metastatic cells, the cooperative relationship of TFs were far closer than in non-lung-metastatic cells, indicating that there was a subtle regulatory mechanism controlling lung metastasis of breast cancer.
Besides, the regulators that frequently interacted with other factors were identified as important factors for lung metastasis and showed prognostic power. This study not only confirmed the role of known factors (such as TFAP2C) but also identified some potential regulators (such as LMO4) which played pivot roles in lung metastasis.
In summary, based on integrated epigenetic and transcriptional analysis, our study provided comprehensive insights into the regulatory mechanism, as well as potential prognostic markers for lung metastasis of breast cancer. Besides, our data resource will enable numerous further functional and computational studies to examine the role of regulators and advance our understanding of lung metastasis of breast cancer.

CO N FLI C T O F I NTE R E S T
The authors declare no conflict of interest.
F I G U R E 7 Function validation of transcription factor LMO4 in lung metastasis of breast cancer. (A) The efficiency of LMO4 knockdown was confirmed by Western blotting assay in LM2-4175 cells transfected with siLMO4 and vector siRNA. β-actin was used as a control.
(B) The efficiency of LMO4 knockdown was confirmed by qRT-PCR in LM2-4175 cells transfected with siLMO4 and vector siRNA. (C) Expression changes of predicted target genes of LMO4 after LMO4 knockdown. Expression levels were analysed by qRT-PCR. Student's t test was used for statistical comparison (*P < 0.05; **P < 0.01). (D) Expression changes of EMT associated genes after LMO4 knockdown. Expression levels were analysed by qRT-PCR. Student's t test was used for statistical comparison (*P < 0.05; **P < 0.01; ***P < 0.001).
(E) The migration abilities were examined by the Transwell assay in LM2-4175 cells. Student's t test was used for statistical comparison (***P < 0.001)

AUTH O R CO NTR I B UTI O N S
KNL conducted the bioinformatic analyses and wrote the paper. CLX conducted the ChIP-Seq and RNA-Seq experiments. YXD conducted the functional validation experiments and revised the manuscript.
MJ and ACK revised and edited the language. DQW designed and supervised the whole study.

DATA AVA I L A B I L I T Y S TAT E M E N T
The histone landscape by ChIP-Seq and the gene expression profile by RNA-Seq in this paper have been deposited in NCBI GEO: GSE124379 and GSE124380.