Signal transducer and activator of transcription family is a prognostic marker associated with immune infiltration in endometrial cancer

Abstract Background Signal transducer and activator of transcription (STAT) is a unique protein family that binds to DNA and plays a vital role in regulating major physiological cellular processes. Seven STAT genes have been identified in the human genome. Several studies suggest STAT family members to be involved in cancer development, progression, and metastasis. However, the predictive relationship between STAT family expression and immune cell infiltration in endometrial cancer remains unknown. Methods We explored STAT family expression and prognosis in endometrial cancer using various databases. The STRING, GeneMANIA, and DAVID databases, along with GO and KEGG analyses, were used to construct a protein interaction network of related genes. Finally, the TIMER database and ssGSEA immune infiltration algorithm were used to investigate the correlation of STAT family expression with the immune infiltration level in uterine corpus endometrial carcinoma (UCEC). Results Our study showed that different STAT family members are differentially expressed in UCEC. STAT1 and STAT2 expression increased at various stages of UCEC, and STAT5A, STAT5B, and STAT6 levels were decreased. STAT3 and STAT4 expression was not significantly different between UCEC and normal tissues. High STAT1 expression may be a prognostic disadvantage of UCEC, and high STAT6 expression may improve UCEC patient prognosis. The STAT family‐associated genes were significantly enriched in signal transduction, protein binding, DNA binding, and ATP binding upon GO analysis. Related genes in the KEGG analysis were mainly enriched in pathways in cancer, viral carcinogenesis, chemokine signaling pathway, JAK/STAT signaling pathway, and regulation of the actin cytoskeleton. In terms of immune infiltration, STAT1 and STAT2 were positively correlated with B, CD8+ T, CD4+ T, and dendritic cells, and neutrophils (p < 0.05). All STAT family members were positively correlated with neutrophils and dendritic cells (p < 0.05). STAT1 and STAT2 showed similar correlations with all immune cell types, whereas STAT1 and STAT6 showed opposite correlations. Conclusion These findings suggest that the STAT family is a prognostic marker, and the immune infiltration level, a therapeutic target, for endometrial cancer.


| INTRODUC TI ON
Endometrial cancer is a common reproductive tract tumor in women, with annual increases in diagnosed cases. With a tendency to develop at a younger age and a high risk of recurrence and death, it gravely endangers women's health, particularly in advanced uterine corpus endometrial carcinoma (UCEC). [1][2][3] The development of UCEC is a complex process that involves many dysregulated genes. 4 Despite significant advances in UCEC treatment, including radiotherapy, chemotherapy, and surgical interventions, adjuvant treatment options for patients with endometrial cancer are limited, and 5-year survival rates remain low owing to the extensive metastasis of advanced UCEC. Therefore, there is a necessity to explore molecular proteins associated with the pathogenesis of endometrial cancer at the gene expression level and identify markers associated with the prognosis and immune infiltration of endometrial cancer, thus providing new therapeutic targets.
Members of the signal transducer and activator of transcription (STAT) protein family are key proteins in cytokine signaling and interferon-related antiviral activity. 5,6 These factors have the ability to transmit signals from the cell membrane to the nucleus, thereby activating gene transcription. The main STAT family members identified to date are STAT1, STAT2, STAT3, STAT4, STAT5a, STAT5b, and STAT6. 7 Their signaling is involved in multiple normal physiological cellular processes, including proliferation, differentiation, apoptosis, angiogenesis, and immune system regulation. 8 Numerous studies have shown that different STAT family members play essential roles in the development of several cancers, mainly through the JAK/STAT signaling pathway. Flavopereirine inhibits oral cancer progression by inactivating the JAK/STAT signaling pathway via LASP1 upregulation. 9 IGF2BP3 promotes STAT proteins, which play an important role in cervical cancer development, and JAK/STAT pathway inhibition may be integral in promoting tumor cell death. 11 Colorectal cancer progression is caused by dysregulation of cytoplasmic transcription factors, including STAT proteins involved in the JAK/STAT signaling pathway. 12 However, the expression and prognosis of different STAT family members in endometrial cancer and their relationship with the level of immune infiltration in UCEC remain unknown. No studies have reported a bioinformatic analysis of the STAT family in endometrial cancer. Therefore, we comprehensively explored the relationship between the STAT family and endometrial cancer using multiple public databases.
To our knowledge, this is the first study to investigate the relationship between STAT family expression and UCEC immune infiltration in endometrial cancer. Our study helps to identify markers associated with prognosis and immune infiltration in endometrial cancer and is expected to optimize the treatment of patients with endometrial cancer.

| STAT family expression in pan-cancer and UCEC
The Cancer Genome Atlas (TCGA) is a landmark cancer genomics project depicting the molecular characterization of over 20,000 primary cancers and providing normal samples of 33 cancer types.
Our data were obtained from the TCGA database (https://www. cancer.gov/about -nci/organ izati on/ccg/resea rch/struc tural -genom ics/tcga) ALL (pan-cancer) project and the UCEC project in level 3 HTSeq-RNAseq data in FPKM format. The data were statistically analyzed and visualized using the R package ggplot2 [version 3.3.3].
We used the UALCAN database (http://ualcan.path.uab.edu/index. html) for analysis of STAT family expression in UCEC. 13 UALCAN database is an online analysis and mining site based on relevant cancer data from the TCGA database, capable of analyzing the STAT family according to sample type, tumor staging, and the patient race for different subgroup analyses of STAT family expression.

| STAT family expression in immunohistochemistry
The HPA database (https://www.prote inatl as.org/) provides information on the tissue and cellular distribution of 26,000 human proteins, which uses particular antibodies to examine in detail the distribution and expression of each protein within 64 cell lines, 48 normal human K E Y W O R D S bioinformatic analysis, endometrial cancer, immune infiltration, prognostic markers, STAT family tissues, 20 tumor tissues, and 12 blood cells. 14 We used the "pathology panel" of the HPA database to detect the expression of different members of the STAT family in UCEC tissues by particular antibodies, and compared it with the expression of different members of the STAT family in normal endometrial tissues in the "tissue panel."

| Survival analysis of the STAT family in UCEC patients
We used the Kaplan-Meier Plotter database (http://kmplot.com/ analy sis/) to study the association of STAT families with the prognosis of UCEC patients. 15 The Kaplan-Meier Plotter database is based on gene chips from public databases such as GEO, EGA, and TCGA, and RNAseq data were constructed to assess the impact of 54,675 genes on survival in 21 cancers. When analyzing the predictive value of a specific gene, the Kaplan-Meier Plotter database divides patients into two cohorts based on different quartiles of expression of that gene, and 95% CI and log-rank p values are calculated.

| Mutations in the STAT family, the relationship between genes, and protein-protein interaction (PPI) network construction
The cBioPortal database (http://www.cbiop ortal.org/) is a visual tool for studying and analyzing cancer gene data, which allows analysis of mutations, copy number, and expression of STAT family members in all UCEC samples. 16 We used UCEC patient data from TCGA to correlate the seven members of the STAT family using Spearman's statistical method. The data were statistically analyzed and visualized using the R package ggplot2 [version 3.3.3].
The String database (https://strin g-db.org/) is commonly used to construct protein-protein interaction networks between target proteins, which provides a list of protein molecules that interact with protein regulators based on information from text mining, experimental validation, and raw letter prediction. 17 We used the String database for PPI-protein interaction network construction for the STAT family.

| Gene Ontology (GO) and Kyoto Encyclopedia of Genes and Genomes (KEGG) analyses of STAT family-related genes
The GEPIA database (http://gepia.cance r-pku.cn/) is another website that allows dynamic analysis and visualization of TCGA gene expression profile data, which is simple, easy to use, and very powerful. 18 Using Pearson's correlation analysis, we used the GEPIA database to screen the top 350 genes associated with the STAT family (including the STAT family). The David database (https://david.ncifc rf.gov/) can provide systematic and comprehensive biofunctional annotation information for the large-scale gene or protein lists, mainly for functional and pathway enrichment analysis of differential genes. 19,20 We used the David database to perform GO and KEGG analyses on 350 STAT family-related genes.

| Correlation of STAT family gene expression with immune infiltration
We used UCEC patient data in TCGA to statistically analyze and visualize the data using the GSVA package [version 1.

| Statistical analysis
All statistical analyses were performed on the respective database sites, and the data were calculated using R software (v.3.6.3). The chi-squared, Fisher's exact, and Wilcoxon rank-sum tests were used to analyze clinical information. Statistical significance was set at p < 0.05.

| STAT family expression in pan-cancer and UCEC
We analyzed STAT family expression in pan-cancer using pancancer tumor data from The Cancer Genome Atlas (TCGA) database (https://www.cancer.gov/about -nci/organ izati on/ccg/resea rch/ struc tural -genom ics/tcga) ( Figure 1). The differences in the expression of different STAT family members in UCEC and normal tissues are shown in Figure 1. STAT1 and STAT2 expression was significantly higher in UCEC. STAT5A, STAT5B, and STAT6 expression was significantly lower in UCEC, while the difference in STAT3 and STAT4 expression between UCEC and normal tissues was not statistically significant. To further analyze the expression of STAT family members in UCEC, we analyzed the differences in STAT family expression at different stages of UCEC using the UALCAN database (http:// ualcan.path.uab.edu/index.html; Figure 2). STAT1 and STAT2 were highly expressed at all stages of UCEC (particularly stages 1 and 3 for STAT2). STAT5A, STAT5B, and STAT6 were lowly expressed in all stages of UCEC. In contrast, STAT3 and STAT4 expression in all stages of UCEC was not significantly different from that in normal tissues.

| Association between the STAT family and the clinical characteristics of UCEC and univariate and multifactor regression analyses
UCEC data from TCGA were used to describe the relationship between STAT family members and the clinical characteristics of UCEC using Excel. STAT1, STAT2, and STAT6 expression was closely related to the clinical characteristics of UCEC (Table 1). High STAT1 expression was significantly associated with clinical stage, age, histological type, residual tumor, histological grade, and median survival age of UCEC. Similar to STAT1, high STAT2 expression was closely associated with clinical stage, age, histological type, and histological grade of UCEC. Low STAT6 expression was associated with histological type, residual tumor of UCEC, histological grade, and median F I G U R E 1 STAT family expression in pan-cancers (ns, p ≥ 0.05; *p < 0.05; **p < 0.01; and ***p < 0.001) F I G U R E 2 Expression of STAT family in different stages of endometrial cancer (p<0.05 for different stages of STAT1 compared with normal tissue; p < 0.05 for STAT2 in stage 1 and stage 3 of UCEC compared with normal tissue; and no statistically significant difference in expression for different stages of STAT3 and STAT4 compared with normal tissue. STAT5A, STAT5B, and STAT6 were expressed differently at different stages compared with normal tissues, p < 0.05)

| STAT family expression in immunohistochemistry
We used the Human Protein Atlas (HPA) database (https://www. prote inatl as.org/) to investigate the differences in STAT family expression between UCEC and normal endometrial tissues ( Figure 3). STAT1, STAT2, STAT3, and STAT4 were more significantly expressed in UCEC tissues. STAT5A, STAT5B, and STAT6A were differentially expressed in the normal and UCEC tissues. These results are slightly different from those obtained using the UALCAN database.

| STAT family is strongly associated with the prognosis of patients with UCEC
We used the Kaplan-Meier plotter database (http://kmplot.com/ analy sis/) to investigate the relationship between STAT family members and the prognosis of patients with UCEC ( Figure 4

| Mutations in the STAT family, the relationship between genes, and PPI network construction
The cBioPortal database (http://www.cbiop ortal.org/) was used to study STAT family mutations in patients with UCEC ( Figure 5). The STRING (https://strin g-db.org/) and GeneMANIA databases (http://genem ania.org/) were used for PPI network construction of the STAT family ( Figure 5). The ten genes most closely related to STAT family molecules in the STRING database were selected, namely EPOR, IRF1, PIAS1, CREBBP, EP300, IFNAR1, EGFR, HSP90AA1, ERBB4, and JAK1. Alternatively, the GeneMANIA database demonstrated 20 target genes interacting with the STAT family, with more SH2 signaling protein family genes, suggesting that the STAT family is primarily involved in signal transduction biological processes.

| GO and KEGG analyses of STAT familyrelated genes
First, the top 350 genes associated with the STAT family (including the STAT family) were screened using the GEPIA database (http://gepia.cance r-pku.cn/) and Pearson's correlation analysis.
These 350 related genes were subjected to GO and KEGG analyses using the David database (https://david.ncifc rf.gov/; Figure 7).
The first three items of biological processes in the GO analysis

| Correlation of STAT family gene expression with immune infiltration
The UCEC patient data from TCGA were used to analyze the correlation between STAT family genes and various immune cell types using

| DISCUSS ION
Numerous studies have shown that different STAT family members play essential roles in the development of several cancers, including oral, 9 bladder, 10 cervical, 11 and colorectal cancers. 12 It has been proposed that the JAK/STAT signaling pathway is involved in the development of UCEC, and targeted inhibition of the IL-6 receptor and its downstream effectors JAK1 and STAT3 significantly reduces UCEC tumor cell growth. 23   Therefore, the use of the STAT family as a diagnostic and prognostic basis requires careful consideration of each specific cancer type and the characteristics of each patient. 8,29 To further investigate the mechanism of STAT family development in UCEC, we screened the STRING and GeneMANIA  T cells. 36 Ectopic expression of STAT5A enables the expansion of tumor-specific CD4+ T cells and triggers antitumor CD8+ T-cell responses. 39 The results from the TIMER database showed that all STAT family members were positively correlated with neutrophils and dendritic cells. The level of immune cell infiltration differed according to immune infiltration software.
The present study had certain limitations. We illustrated the expression level and prognosis of the STAT family in endometrial cancer and its relationship with the UCEC immune infiltration level using multiple databases. However, this has not been validated through basic experiments. Bioinformatic analysis suggests that the STAT family has potential prognostic markers for UCEC as its therapeutic targets, particularly STAT1 and STAT6. However, definitive conclusions cannot be drawn. This study initially explored the possible mo-

| CON CLUS IONS
The STAT family is associated with the prognosis and level of immune infiltration in endometrial cancer. High STAT1 expression and low STAT6 expression may be detrimental factors in UCEC prognosis. STAT-related genes play a critical role in signal transduction and transcriptional activation and are involved in tumor development.
The STAT family is expected to be a prognostic marker, and the level of immune infiltration, a therapeutic target, for endometrial cancer.

This work was supported by the Discipline Construction Promoting
Project of Shanghai Pudong Hospital (Project no. Tszb2020-07 and Project no. Zdzk2020-16).

CO N FLI C T O F I NTE R E S T S
The authors declare that they have no conflict of interest related to this study.

AUTH O R CO NTR I B UTI O N S
Zhou Xinying made significant contributions to this work. Dai Haiyan and Zhang Hu are corresponding authors of this article. All authors listed have made a substantial, direct, and intellectual contribution to the work, and approved it for publication.

E TH I C S A PPROVA L
The data used in this study are publicly available and allow unre-

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are openly available in TCGA database at https://portal.gdc.cancer.gov/.