Secondary analysis of existing microarray data reveals potential gene drivers of cutaneous squamous cell carcinoma

Cutaneous squamous‐cell carcinoma (cSCC) is the second most common skin cancer, with an increasing incidence in recent years. To define the molecular basis that drive cSCC development and progression, this study aimed at identifying potential novel molecular targets for the diagnosis and therapy of patients with cSCC. Two data sets with the accession number GSE45164 and GSE66359 were downloaded from Gene Expression Omnibus (GEO) database. After the identification of differentially expressed genes (DEGs) from these two data sets, respectively, between cSCC samples and controls, a combination of DEGs from these two data sets were subjected to the following analyses, including functional annotation, protein–protein interaction (PPI) network and module construction, transcription factor (TF)‐target regulation prediction, and drug–gene interaction predictive analysis. A total of 204 upregulated genes and 213 downregulated genes were found in two data sets which were used for the follow‐up analysis. Upregulated and downregulated genes were mainly involved in the functions such as cell division, mitotic nuclear division, cell cycle, and p53 signaling pathway. Interferon induced protein family members and proteasome subunit members were involved in the TF‐target regulatory network, such as PSMB8, CXCL10, and IFIT3. Eight upregulated genes ( TOP2A, CXCL8, RRM2, PSMB8, PSMB9, PBK, CXCL10, and ISG15) that were hub genes in the PPI network and significant modules were identified in the predicted drug–gene interaction. In conclusion, TOP2A, CXCL8, RRM2, PSMB8, PSMB9, PBK, CXCL10, and ISG15 may be potential targets for the diagnosis and therapy of patients with cSCC.

. Thus, defining the critical molecular basis that drive cSCC is an urgent research need for the improvements of treatment in the management of cSCC both in the general and highrisk population groups.
Gene expression analysis such as DNA microarray analysis provides an efficient method and has been a global approach to clarify gene expression changes in many cancer types, including skin cancers (Padilla, Sebastian, Jiang, Nindl, & Larson, 2010). Recently, though antitumor functions of IL-24 established in other cancer types, significant upregulation of IL-24 was found in the invading front of cSCC via enhancing focal expression of MMP7 (Mitsui et al., 2014). L. Zhang, Qin, Wu, Chen, & Zhang (2018) identified several genes as potential pathogenic genes contributing to the progression of actinic keratosis to cSCC. HOXB7 was identified as an upregulated gene in cSCC and silencing of the HOXB7 suppressed cSCC cell migration and invasion while induced cell apoptosis (Gao, Chen, & Yang, 2018). A recent study reported the expression of chitinase-3-like protein 1 (YKL-40) in cSCC (Salomon, Piotrowska, Matusiak, Dziegiel, & Szepietowski, 2018). In addition, Notch-effector CSL was shown to promote the tumorigenesis of cSCC by repressing histone demethylase KDM6B (Al Labban et al., 2018). However, our current knowledge about the gene driving cSCC is still limited and the mechanism of the pathogenesis of this cancer as well as the development of biomarkers for tumor detection, diagnosis, and prognosis remain underexplored.
Secondary analysis of existing data is a cost-effectiveness way to enhance the overall efficiency of the primary results from the original study (Cheng & Phillips, 2014). In this study, we thus downloaded microarray data sets from a public database and performed data reanalysis. To identify more potentially genes associated with cSCC, differentially expressed genes (DEGs) were identified in cSCC samples in comparison with control samples and several significant genes were found as suggested by subsequent bioinformatics analyses. Our study may provide novel intervention targets for cSCC therapy.

| Microarray data
The microarray data of human SCC for the past 5 years, which contained both the cancer samples and control samples, were searched from the public archive and resource NCBI GEO (Barrett et al., 2012), accessible at http://www.ncbi.nlm.nih.gov/geo/). Finally, two data sets with the accession number GSE45164 (Brooks et al., 2014) and GSE66359 (Farshchian et al., 2015) were selected. As shown in the original studies, the gene expression profile GSE45164 contained 10 surgically excised cSCC samples and a set of three normal human epidermis controls (Brooks et al., 2014). The platform used for this data set is GPL571 [HG-U133A_2] Affymetrix Human Genome U133A 2.0 Array. The gene expression profile GSE66359 included normal human epidermal keratinocytes (n = 5) purchased from PromoCell (Heidelberg, Germany) or from normal skin of patients in surgery for mammoplasty and cSCC cell lines (n = 8) from surgically excised cSCCs (Farshchian et al., 2015). The platform used for GSE66359 is GPL570 [HG-U133_Plus_2] Affymetrix Human Genome U133 Plus 2.0 Array.

| Data preprocessing and DEG screening
The original raw data (CEL files) were downloaded from the GEO database. We applied the affy package (Gautier, Cope, Bolstad, & Irizarry, 2004; version 1.58.0, http://bioconductor.org/help/search/ index.html?q=affy/) to perform the background correction and normalization of the data, including conversion of raw data formats, imputation of missing values, background correction using Affymetrix MicroArray Suite (MAS) method (Hochreiter, Clevert, & Obermayer, 2006), and quantile normalization. The median in the box plot lay in a straight line, indicating that the data normalization was reasonable (Supporting Information Figure 1). The probe was annotated with a platform annotation file to remove probes that did not match the gene symbol; for different probes mapped to the same gene, we took the mean of the different probes as the final expression of the gene.
The samples were subjected to differential expression analysis using the classical Bayesian method provided by the Limma package (Ritchie et al., 2015;version 3.10.3, http://www.bioconductor.org/packages/2. 9/bioc/html/limma.html). The genes with a p < 0.05 and an absolute value of log 2 fold change (FC) ≥ 2 as a DEG.

| Functional analysis of DEGs
The Kyoto Encyclopedia of Genes and Genomes (KEGG; Ogata et al., 1999) pathway and Gene Ontology (GO) Biological Process (BP) enrichment analysis of DEGs was analyzed using the commonly used enrichment analysis tool DAVID (Huang, Sherman, & Lempicki, 2008 genes count ≥2 and the significance threshold p < 0.05 were considered as the threshold for a significant enrichment result.

| PPI network and module analysis of DEGs
The interaction between DEGs-encoded proteins was analyzed by STRING (Szklarczyk et al., 2014;version: 10.0, http://www.string-db. org/) database. The input gene set was all DEGs, and the species was set as human. The parameter PPI score was set to 0.7 (indicating high confidence org/apps/MCODE) was applied to analyze the most significant clustering modules in the PPI network. The threshold score was selected as ≥10. In addition, the genes in the significant modules were subjected to KEGG pathway and GO BP enrichment analyses using DAVID (Huang et al., 2008).

| Drug-gene interaction predictive analysis
The Drug-Gene Interaction database (DGIdb) mines existing resources and generates assumptions about how genes are therapeutically targeted or prioritized for drug development (Griffith et al., 2013). In

| Ethical statement
The data of cSCC patients we used in this study are all from the publicly accessible GEO database. Separate ethics committee approval is not required.

| DEG screening
According to the screening criteria, a total of 116 DEGs were obtained from GSE45164, of which 63 were upregulated and 53 were downregulated. From GSE66359, we obtained 335 DEGs of which 160 were upregulated and 175 were downregulated. The clustering heat maps of DEGs is shown in Figure 1. To ensure that the next analysis does not lose the disease-associated DEGs, we took a collection of DEGs from two data sets, removing genes with opposite expression trends. A total of 204 upregulated genes and 213 downregulated genes were included in the next analysis.

| Functional enrichment analysis of DEGs
As shown in Table 1

| PPI network and module analysis
The PPI network is shown in Figure 2

| DISCUSSION
In this study, a total of 204 upregulated genes and 213 downregulated genes were identified from two microarray data sets, which were then subjected to the comprehensive analyses. Several hub genes were found from the PPI network and modules as well as the regulatory network. Interestingly, they were also found in the predicted drug-gene interaction, including TOP2A, CXCL8, RRM2, However, there is no direct evidence showing the correlation between this gene and cSCC. CXCL8 affects multiple processes involved in the progression of SCC (Christofakis, Miyazaki, Rubink, & Yeudall, 2008). CXCL10 is highly expressed in a variety of human diseases and may act as a novel therapeutic target in cancer (Liu, Guo, & Stiles, 2011). Overexpression of RRM2 significantly enhances cellular invasiveness (Duxbury & Whang, 2007), playing a critical role in determining tumor malignancy (K. Zhang et al., 2009). We suggest potentially important roles of these genes in the development and occurrence of cSCC.
PSMB8 and PSMB9 encode the proteasome subunit (Agarwal et al., 2010). Inhibition of proteasome has been a novel therapeutic approach in human cancers (Orlowski & Kuhn, 2008). A recent study provide evidence that proteasome inhibitors prime cSCC cells for rapid BAK-dependent death (McHugh et al., 2018). Few studies have investigated the roles of PSMB8 and PSMB9 in the development of cSCC. PBK, also known as PDZ Binding Kinase, is regulated by cell cycle-specific transcription factors CREB/ATF and E2F and its expression has been found in human urinary bladder transitional cell carcinoma and colon cancer (Singh et al., 2014;Yang et al., 2016).
Studies has shown that PBK promote cancer cell proliferation by regulation of the DNA damage and interacting with tumor suppressor p53 (Ayllon & O'connor, 2007;Hu et al., 2010). The ubiquitin-like protein ISG15 play an important role in antiviral immunity and defense (Skaug & Chen, 2010). In addition, evidence indicates the association of human papillomavirus infection and cSCC (Masini et al., 2003). Altogether, these data indicate that these genes may play critical roles in the pathogenesis of cSCC and further investigation towards their functions is needed.
However, it should be noted that the main limitation of this study is the lack of experimental verifications which will be the focus in our future study.
In conclusion, the present study identified several candidate genes in cSCC. It would be of interest to further identify the downstream mechanisms of these genes involved in this cancer. This would aid in the development of novel targets for cSCC therapy.