Analysis of differentially expressed genes in oral epithelial cells infected with Fusobacterium nucleatum for revealing genes associated with oral cancer

Accumulating evidence links Fusobacterium nucleatum with tumorigenesis. Our previous study demonstrated that F. nucleatum infection can induce epithelial‐mesenchymal transition (EMT) in oral epithelial cells and elaborated a probable signal pathway involved in the induction of EMT. However, the comprehensive profiling and pathways of other candidate genes involved in F. nucleatum promoting malignant transformation remain largely elusive. Here, we analysed the transcriptome profile of HIOECs exposed to F. nucleatum infection. Totally, 3307 mRNAs (ǀLog2FCǀ >1.5) and 522 lncRNAs (ǀLog2FCǀ >1) were identified to be differentially expressed in F. nucleatum‐infected HIOECs compared with non‐infected HIOECs. GO and KEGG pathway analyses were performed to investigate the potential functions of the dysregulated genes. Tumour‐associated genes were integrated, and top 10 hub genes (FYN, RAF1, ATM, FOS, CREB, NCOA3, VEGFA, JAK2, CREM and ATF3) were identified by protein‐protein interaction (PPI) network, and Oncomine was used to validate hub genes' expression. LncRNA‐hub genes co‐expression network comprising 67 dysregulated lncRNAs were generated. Together, our study revealed the alteration of lncRNA and potential hub genes in oral epithelial cells in response to F. nucleatum infection, which may provide new insights into the shift of normal to malignant transformation initiated by oral bacterial infection.

signalling pathways. 3 Based on a latest meta-analysis, a consistent increase in the abundance of F. nucleatum in CRC tissues was observed and high abundance of F. nucleatum was associated with poorer overall survival. 4 F. nucleatum was also correlated with oral cancers, 5 and F. nucleatum levels were significantly elevated in oral squamous cell carcinoma (OSCC) tissues. 6,7 To date, several studies have shown that F. nucleatum infection affected cell proliferation, apoptosis, migration and invasion in gingival epithelial cells or OSCC cells. [8][9][10] In our previous study, we established a cellular model of human immortalized oral epithelial cell (HIOECs) infected with F. nucleatum, and reported that F. nucleatum facilitated cell migration, functional loss of E-cadherin, and up-regulation of SNAI1 in both non-cancerous and cancerous oral epithelial cells, which were considered as characteristics of epithelial-mesenchymal transition (EMT). 11 Furthermore, we uncovered a probable signal pathway involved in the induction of EMT by F. nucleatum infection. 11 Nevertheless, EMT is a complicated process associated with tumorigenesis, 12  Long non-coding RNAs (lncRNAs) are classically defined as RNA transcripts larger than 200 nucleotides but lack of protein coding capacity. 13 LncRNAs have been unravelled to exert their functions via modulating chromatin remodelling, transcription regulation, post-transcriptional modifications and signal transduction. 14 It has been shown that lncRNAs play crucial regulatory roles in various biological functions, including cell growth, differentiation, migration, invasion and EMT programs. [15][16][17] Pioneering studies have demonstrated lncRNA dysregulation exert functions on EMT and tumour progression, such as lncRNA SNHG6, 18 lncRNA HCP5, 19 lncRNA ZEB1-AS1 20 and lncRNA TTN-AS1. 21 Our previous study suggested that lncRNA MIR4435-2HG played a role in F. nucleatum inducing EMT. 11 Moreover, many other lncRNAs and oncogenes may also participate in the regulation of cellular response to F. nucleatum infection. In this study, we aimed to comprehensively analyse the differentially expressed genes (DEGs) including lncRNAs and mRNAs from previous high-throughput sequencing data for further investigation. To the best of our knowledge, this work is the first to evaluate the signatures of lncRNAs and tumour-associated genes (TAGs) in the context of F. nucleatum contributing to EMT in oral epithelial cells, which may shed new light on the mechanism of F. nucleatum contributing to malignant transformation.

| Cell culture and F. nucleatum infection
HIOECs were provided by Prof. Wantao Chen from Key Laboratory of Shanghai Oral Medicine, Shanghai Jiao Tong University, and grown in defined keratinocyte-SFM (Gibco) with growth supplement at 37°C and 5% CO2. F. nucleatum ATCC25586 was routinely cultured on tryptic soy broth agar plates containing 5% defibrinated sheep blood, 5 µg/mL hemin and 1 µg/mL menadione under anaerobic condition. The cellular model of HIOECs infected with F. nucleatum at a MOI of 100:1 was established as described previously. 11 The non-infected HIOECs were served as the control.

| High-throughput sequencing and data preprocessing
We established the cellular model of HIOECs infected with F. nucleatum at MOI of 100:1, and collected the total RNA from F. nucleatum-infected HIOECs and the non-infected HIOECs, and each had 3 repetitions, and thus, 6 RNA samples were submitted to analyse the global lncRNA and mRNA profiling at Genery Biotech Ltd. by high-throughput sequencing on an Illumina Hiseq3000 platform (Illumina). Briefly, the raw data were filtered to remove low-quality reads, and the resulting high-quality data were aligned to the human reference genome (University of California at Santa Cruz hg19). Four online analytic tools, including CNCI (https://github.com/www-bioin fo-org/CNCI), CPAT (http://lilab. resea rch.bcm.edu/cpat/index.php), CPC (http://cpc.cbi.pku.edu. cn/) and PLEK (https://sourc eforge.net/proje cts/plek/), were used to predict the coding potential. Transcripts without coding potential were the candidate set of lncRNAs. DESeq2 software was used to obtain gene level fragments per kilo-base of exon per million (FPKM) reads as lncRNA and mRNA expression profiles. Fold change and P value were calculated based on the FPKM data. LncRNAs and mRNAs with a significant fold change of ≧2 and P value < .05 were deemed as differentially expressed genes (DEGs). Furthermore, the P value was adjusted by multiple testing using the Benjamini-Hochberg false discovery rate (FDR) procedure, and a standard threshold of 5% was selected for declaring significance.
Volcano plot filtering was performed to generate an overview of the differentially expressed (DE) lncRNAs and mRNAs with statistical significance between F. nucleatum-infected and the control HIOECs. Target genes in cis and trans-acting of DElncRNAs were predicted for further functional analyses. The cis role of lncRNAs indicated their actions on neighbouring target genes and the coding genes close to 100k upstream and downstream regions of lncRNAs were considered as the cis role target genes. On the contrary, the trans target genes of lncRNAs are located anywhere outside that range and identified by expression levels, according to Pearson's correlation coefficient (PCC ≧0.95).

| Functional analysis of DEmRNAs and DElncRNAs altered by F. nucleatum infection
To gain insight into the functions and potential roles of the acquired

| Protein-protein interaction (PPI) network construction and identification of hub genes
The Search Tool for the Retrieval of Interacting Genes and Proteins (STRING: http://strin g-db.org) was applied to evaluate known and predicted protein-protein interactions. The TAGs identified from the previous step were mapped to STRING database to construct the PPI network with a high confidence of 0.70. Cytoscape software was used to visualize the network, and CytoHubba plugin was utilized to screen out the top 10 hub genes with the highest degree of connectivity in the network. 22 P < .05 was considered to have statistical significance.

| Construction of the interaction networks of DElncRNAs and hub genes
Interaction networks of DElncRNAs-hub genes were constructed to explore the potential regulatory function of lncRNAs with aberrant expression (|log2FC| ≧1 and P value < .05). Pearson's correlation coefficient (PCC) between DElncRNAs and the identified hub genes was calculated, and those pairs with PCC >0.990 were selected. The co-expression network of DElncRNAs-hub genes interactions was visualized using Cytoscape software.

| Expression analysis of the hub genes and correlated lncRNAs from the co-expression network
The expression analysis of the identified hub genes such as FOS, CREB1, JAK2, was conducted using Oncomine database (https:// www.oncom ine.org/), which provided genome-wide expression analyses in various types of tumours and normal tissues. The hub genes were submitted to the Oncomine database, and seven data sets of OSCC were selected to explore their expressions in clinical samples.
A P value < .05 and a fold change ≧2 were considered as the cut-off with gene ranking in the top 10%. For each hub gene, we compared the results of cancerous with those of normal tissues. Additionally, the expressions of some lncRNAs identified in the co-expression network were obtained from the GEPIA database (http://gepia.cance r-pku.cn/), which integrated the data from The Cancer Genome Atlas (TCGA) and the Genotype-Tissue Expression project, containing sequencing data of 9736 tumours and 8587 normal samples. The data set of HNSC was selected for analysis, and the cut-off was set as |Log2FC| >1, P value < .05 and Jitter size 0.4.

F I G U R E 1 Volcano plots showing expression profiles of lncRNAs (A) and mRNAs (B) in Fusobacterium nucleatum-infected HIOECs.
Up-regulated genes are marked in blue, down-regulated genes are marked in purple. Differentially expressed genes were selected with thresholds of fold change >2 and P value < .05

| DEGs in HIOECs in response to F. nucleatum infection
Volcano plots were used to illustrate the general scattering of the differentially expressed lncRNAs and mRNAs between F. nucleatum-infected and the non-infected HIOECs ( Figure 1). A total of 3307 mRNAs were obtained (|Log2FC| ≧1.5, P value < .05), of which 1261 genes were upregulated and 2046 genes were down-regulated. Totally, 522 known lncR-NAs with significant differential expression were identified (|Log2FC| ≧1, P value < .05), including 282 up-regulated lncRNAs and 240 down-regulated lncRNAs. The top 20 DEmRNAs and DElncRNAs were listed in Table S1. In terms of KEGG pathway analysis, the majority of the genes were also enriched in HTLV-I infection pathway ( Figure 2H). According to the GO and KEGG enrichment analysis of the trans targeted genes of lncRNA ( Figure 2I-L), the most significantly enriched BP, CC and MF terms were cell division and cell-cell adhesion, cytoplasm and protein binding, respectively, and the metabolic pathways were the most remarkable pathway for enrichment. Therefore, we can speculate that the DElncRNAs have similar functions as the dysregulated mRNAs, and the lncRNAs may exert their functions via regulation of mRNAs in some way. Of note, F. nucleatum infection caused a variety of genetic changes involved in regulating various biological functions, and our previous study showed that F. nucleatum significantly promoted cell migration. 11 In addition to MMP2/9, other genes like EGFR, PDGFA, which might participate in regulating cell migration based on the functional analysis, are listed in Table 2.

| Co-expression network of DElncRNAs and hub genes
To  Table 2.

F I G U R E 4 PPI analysis of the TAGs and co-expression of the DElncRNAs and hub genes. (A) A protein-protein interaction (PPI) network
constructed by STRING was visualized with Cytoscape. 83 nodes and 106 edges were displayed. Up-regulated genes are shown in red, whereas down-regulated genes are shown in green. The hub genes were labelled with octagon in yellow. Larger node sizes correspond to higher fold changes of DEGs. Edges were shown in grey from light to dark according to the combined score (from low to high). (B) Coexpression network of DElncRNAs and hub genes. The circle represents lncRNA and the octagon in yellow represents the hub genes. Red represents positive correlation and green represent negative correlation. Edges between two nodes represent interaction between lncRNAs and hub genes F I G U R E 5 Seven OSCC data sets were selected and the expression of the hub genes was compared across the seven data sets. Normal tissues were selected as the control. Values above the average were considered overexpressed hub genes (red) Notably, CREB1, CREM and NCOA3 were overexpressed in more than 3 data sets, whereas RAF1 was significantly under-expressed in 5 data sets ( Figure 5). In addition, the expressions of several DElncRNAs in the co-expression network were assayed by analysing the data set from the GEPIA database, which contains research data both from TCGA and GTEx. As shown in Figure 6, infection. 23,24 For instance, LINC00702 were identified to be under-expressed in both COAD and READ, MNX1-AS1 were found to be overexpressed in COAD and READ , and LINC00511 were overexpressed in COAD, ESCA, PAAD and READ compared with their respective normal tissues ( Figure S1). these bacteria with immune system and stromal cells, immune evasion and immune suppression. 23 And a portion of genes dysregulated by F. nucleatum infection in cancers confirmed in previous studies were summarized in Table 4, such as CCL20, 25  ATM (ataxia telangiectasia mutated) has been reported to trigger a cascade of signalling reactions mediating phosphorylation of Akt, GSK3ß and Snail to promote EMT in breast cancer cells. 41  cytosolic serine/threonine protein kinase and plays an important role in mitogen and stress-induced signalling responses, proliferation and cell survival. 47 Feng et al found that knockdown of RAF1 decreased the cell migration ability and dampended EMT of HK2 cells. 48 As a member of the Ras family, RAF1 has been reported to activate the MEK/ERK signalling cascade and regulate the EMT process. 49 Except for the differentially expressed coding genes, the aberrantly expressed lncRNAs were also identified in HIOECs infected with F. nucleatum in our present study. Given that a number of the differentially expressed lncRNAs were unknown or rarely studied, we therefore mainly focused on the known ln-cRNAs. For the lncRNAs in the co-expression network that might regulate the expression of the hub genes ( Figure 4B), pioneering studies have demonstrated their dysregulation might result in the progression of EMT or tumorigenesis, such as LINC01006, 50 LINC00511 51 and LINC01133. 52 Of note, LINC00460 has been unravelled to play a positive role in EMT progress in HNSCC, 53 and LINC00460 was in a significantly higher expression level in OSCC tissues compared with the normal tissues validated by the GEPIA database ( Figure 6). As shown in Figure 4B, LINC00460 was predicted to be in a co-expression pattern with VEGFA, which has been proved to be overexpressed in aggressive OSCC. 54 We can speculate that LINC00460 may contribute to oral malignancy through regulating VEGFA, which needs further study.

| D ISCUSS I ON
MNX1-AS1 is a natural antisense transcript of MNX1 and has been reported to act as an oncogene in a variety of cancers. [55][56][57] Cheng et al demonstrated that overexpression of MNX1-AS1 could induce EMT and activate Akt/mTOR pathway in breast cancer. 57 Wu et al reported that MNX1-AS1 mediated EMT of the osteosarcoma cells via activating MNX1, thereafter accelerating the progression of the osteosarcoma. 58 However, its role in regulating oral cancer-associated biological processes remains unclear. By comprehensively analysing the data shown in Figure 4, lncRNAs including LINC00460, LINCO00702, LINC01160 and MNX1-AS1, and hub genes ATM, NCOA3, and VEGFA were con-

ACK N OWLED G EM ENTS
This study was supported by the National Natural Science Foundation of China (NO.81670997).

CO N FLI C T O F I NTE R E S T
The authors confirm that there are no conflicts of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data sets used and/or analysed during the current study are available from the corresponding author on reasonable request.