Prognostic and predictive value of a mRNA signature in peripheral T‐cell lymphomas: A mRNA expression analysis

Current international prognostic index is widely questioned on the risk stratification of peripheral T‐cell lymphoma and does not accurately predict the outcome for patients. We postulated that multiple mRNAs could combine into a model to improve risk stratification and helping clinicians make treatment decisions. In this study, the gene expression profiles were downloaded from the Gene Expression Omnibus (GEO) database. Weighted gene co‐expression network analysis (WGCNA) was used to screening genes in selected module which most closely related to PTCLs, and then built a mRNA signature using a LASSO Cox regression model and validated the prognostic accuracy of it. Finally, a nomogram was constructed and the performance was assessed. A total of 799 WGCNA‐selected mRNAs in black module were identified, and a mRNA signature which based on DOCK2, GSTM1, H2AFY, KCNAB2, LAPTM5 and SYK for PTCLs was developed. Significantly statistical difference can be seen in overall survival of PTCLs between low‐risk group and high‐risk group (training set:hazard ratio [HR] 4.3, 95% CI 2.4‐7.4, P < .0001; internal testing set:hazard ratio [HR] 2.4, 95% CI 1.2‐4.8, P < .01; external testing set:hazard ratio [HR] 2.3, 95% CI 1.10‐4.7, P = .02). Furthermore, multivariate regression demonstrated that the signature was an independently prognostic factor. Moreover, the nomogram which combined the mRNA signature and multiple clinical factors suggesting that predicted survival probability agreed well with the actual survival probability. The signature is a reliable prognostic tool for patients with PTCLs, and it has the potential for clinicians to implement personalized therapeutic regimen for patients with PTCLs.


| INTRODUC TI ON
Non-Hodgkin lymphomas are clonal neoplasms that arise from lymphocyte at various stages of maturation, 1 it estimated that 77 240 new cases of non-Hodgkin lymphoma are expected in the United States, and 19 940 patents will die for this disease in 2020. 2 Peripheral T-cell lymphomas (PTCLs) are a subgroup of non-Hodgkin lymphomas which also characterized as a infrequency and heterogeneous aggressive behaviour diseases that associated with very dismal prognosis, representing 10%-15% of non-Hodgkin lymphomas (NHLs) in Western countries but up to 35% in some countries of Asian. 3 Peripheral T-cell lymphomas (PTCLs) comprise more than Organization (WHO) classification system 2017. 4 Numerous attempts have been made to optimize the treatment approach, but no definitive standard therapy has been reached. 5 The traditionally combination regimens such as CHOP or a CHOP-like regimen which initially established for aggressive B-cell lymphomas are most widely used in PTCLs patient. 6 However, outcomes for most patients treated with CHOP are still poor, with only 33%-43% with PTCLs achieving a complete response (CR) and 5-year overall survival (OS) barely exceeds achieving 38.5%. 7 Given the poor outcomes in PTCLs, several novel drugs such as pralatrexate, Mogamulizumab, Chidamide, romidepsin, brentuximab vedotin, and Forodesine have been approved by FDA for the treatment of relapsed and refractory PTCLs recently, 8 but none of these new drugs led to improvement of survival. 9,10 Moreover, the role of stem-cell transplantation for PTCLs remains controversial in front-line settings. 11 There may be a role for prognostic biomarkers in risk classification of PTCLs patients. High-risk patients could receive more intensive treatment to avoid insufficient treatment, whereas low-risk patients should choose low-intensity treatment regime to avoid excessive drug toxicity. Therefore, it is urgent to identify robust biomarkers for predict the prognosis of PTCLs, and discriminate patients who might benefit from the therapy.
To date, the most widely used model for evaluating the prognosis of peripheral T-cell lymphoma is international prognostic index (IPI) that based on performance status, lactate dehydrogenase, extranodal involvement, stage and age, which was initially established for diffuse large B-cell lymphoma (DLBCL). However, Given the marked heterogeneity among the patients that diagnosed with PTCLs, the IPI score is far less satisfactory for distinguishing recurrence risk for PTCLs patients than for aggressive B-cell lymphoma. 12 For example, even patients which categorized in the best risk group (IPI 0) still experience an extremely unfavourable outcome, the cause of this phenomenon is attributed to that IPI score only focused on clinical characteristics, with very few genomic information reflecting the molecular mechanism underlying the PTCLs biology. On the other hand, the lack of information on risk stratification brings the merits of limitations for clinicians to conduct individualized treatment strategies. Recently, several gene expression biomarker signatures that based on gene expression profiling (GEP) and whole-genome methylation profiling have been build and used to predict the prognosis of human cancer, [13][14][15][16] but none mRNA signatures have been utilized for PTCLs patients.
Weighted gene co-expression network analysis (WGCNA) is powerful screening approach and has been gradually valued in discovery of novel biomarkers or therapeutic targets via construct freescale gene co-expression networks. 17 In this study, we explore the correlation between PTCLs and gene sets by WGCNA. Furthermore, the univariate proportional hazards analysis and LASSO Cox regression were carried out to identify a mRNA signature which beyond clinical parameters and significant associated with PTCLs prognosis.
Finally, a prognostic nomogram was established based on the combination of signature and clinical characteristics.

| Data sources and data processing
The raw data of GSE59307, GSE58445, GSE19069, GSE90597 and GSE53798 were downloaded from the Gene Expression Omnibus (GEO; http://www.ncbi.nlm.nih.gov/geo/) database, and all datasets except GSE90597 were built based on the GPL570 platform [HG-U133_Plus_2]. A total of 14 samples of cutaneous T-cell lymphoma (CTCL) and 8 cases of healthy control specimens were obtained in GSE59307 (Figure 1A), whereas the GSE58445 and GSE19069 comprises 193 and 137 samples of PTCLs, respectively. In addition to this, 66 cases of ENKTL which is a subtype of PTCLs were included in GSE90597 and GSE53798 comprise by 26 cases of Diffuse large B-cell lymphoma.According to current WHO classification, CTCL is a subtype of PTCLs; therefore, GSE59307 was chosen to construct the co-expression network. The packages of 'simpleaffy', 18 'affy-PLM' and 'arrayQualityMetrics' were utilized to perform the process of quality assessment (QA), quality control (QC), background correction and normalization. The probe id in datasets which based on GPL570 platform was annotated by the 'hgu133Plus2' package, and probe id of GSE90597 was annotated by GPL10739 files.

| Co-expression network construction
The top 5000 variant of expression profiles in GSE59307 were used to construct a co-expression network by using the package of WGCNA, and the network topology was analysed with soft-threshold power from 1 to 30. After determining the optimal beta value for the soft threshold parameter, the relational matrix can be converted into adjacent matrix, and then, it can be transformed into topological overlap matrix (TOM). Finally, average linkage hierarchical clustering was conducted to classify the highly correlated into modules according to the measure of TOM-based dissimilarity measure.

| Clinically significant modules visualization and identity the hub genes
To identify the modules which are significantly related to PTCLs, the module eigengene (ME) was used to characterize the expression profiles of each module and the correlation between PTCLs. The relationship of each genes with PTCLs was measured by gene significance

| LASSO Cox Regression conduction and identification of a mRNA signature
Univariate Cox proportional hazards regression analysis was applied to assess the relationship between the expression of WGCNAselected genes and the overall survival (OS) of patients with PTCLs, genes which calculated with P < .05 were sorted out and chosen to screening the most valuable predictable mRNAs by performing the LASSO Cox regression analysis which depends on the R package 'glmnet'.The optimal values of the penalty parameter λ were estimated through 10-time cross-validations. The risk score of mRNA signature for each patient was calculated by the coefficient that from LASSO regression analysis and expression level of each mRNA.
The risk score was constructed as follows: n was the number of prognostic genes, exp a was the expression value of gene a, and β was the regression coefficient. All PTCLs patients were separated into high-and low-risk groups according to median risk score that used as cut-off value. Kaplan-Meier estimator was carried out to assess the prognostic value of the mRNA signature. Survival prediction based on the risk score was illustrated by using the 'survivalROC' package. Wilcoxon signed-rank was applied to compare the differential expression between highrisk group and low-risk group of PTCLs. In addition, the protein expression levels of the six genes in the mRNA signature were validated by immunohistochemistry through using the Human Protein  And only if P value of each sample less than 0.05 will be retained for subsequent analysis. Considering the small sample size of GSE90597, we combined the two datasets of GSE19069 and GSE90597 and used the combat function of the R software package of sva to remove batch effects and calculate the distribution of immune cells. 20 Then, we analysed the immune differences between the high-risk and low-risk groups in the combined dataset and the GSE58445 dataset.

| Integrated analysis by combining the clinical factors and mRNA signature
To investigate the effect of the risk signature on the prognosis of PTCLs patients, univariate and multivariate Cox regression analyses were conducted. The risk scores of six-mRNA signature and other clinical characteristics, including gender and age, were used as covariates. Moreover, the six mRNA which screened by LASSO Cox regression also were selected as candidate mRNAs to explore the difference in survival between high and low expression groups which performed by Kaplan-Meier survival analysis. Furthermore, the analysis concerning the correlation between risk score and currently available clinical characteristics was conducted.

| Association of a mRNA signature and response to chemotherapy
The dataset of GSE53798 was originally established for predicting sensitivity to chemotherapy drugs in CHOP (cyclophosphamide, doxorubicin hydrochloride, vincristine, prednisone) regime for diffuse large B-cell lymphoma cell. 21 Considering that the degree of malignancy of diffuse large B-cell lymphoma is comparable to that of some subtype of peripheral T-cell lymphoma, and currently treatment of peripheral T-cell lymphoma is similarly to which used for diffuse large B-cell lymphoma (DLBCL). So, in this part, we investigated whether the mRNA signature could predict patients' responses to chemotherapy of vincristine.

| Nomogram development and validation
The Cox regression model was used to perform the multivariable survival analysis and build nomograms. Calibration curves were selected to assess the consistency between the actual survival and the predicted survival for the nomogram. Nomogram and calibration curves were performed with the package named rms. The C-index was utilized to measure the discrimination of the nomogram.

| Pre-processing of the datasets
All microarray data were converted into expression matrix after processing, 31 cases in GSE58445 and 11 cases in GSE90597 which lacking survival data were excluded in this study. In addition, after excluding unqualified samples, 162 patients in GSE58445 were randomly divided into the training set (n = 98) and testing set (n = 64) according to a ratio of 6:4.

| Construction of weighted co-expression network and identification of key modules
To ensure build a scale-free network, the power of β = 28 (scalefree R2 = 0.84) was selected as the best soft-thresholding parameter ( Figure 1B). Next, co-expression modules were produced by method of dynamic tree cutting and make sure that the number of genes in each module is not <30 ( Figure 1C ( Figure 1D).

| Identification of the six-mRNA signature in training group patients
All 799 WGCNA-selected hub genes used to identify survival-related mRNA by univariable Cox survival analysis in training group dataset, 15 genes were pre-filtered based on P values < .05, and then, those genes were selected to preform LASSO Cox regression analysis in GSE58445 cohort ( Figure S1). The risk score for predicting the outcome of patients was calculated with the following for- were associated with poor prognosis; LAPTM5 and SYK significantly overexpressed in low-risk patients compare to high-risk patients and related to prolonged prognosis( Figure 2C).

| Validation of prognostic and predictive accuracy of the six-mRNA signature in the internal and external testing group
The prognostic value of six-mRNA signature was further evaluated in the internal test set and external testing set. In the internal testing cohort, the PTCLs were categorized 32 (50%) of 64 patients into the low-risk group and 32 patients (50%) into the high-risk group, and 5-year os was 9.37% for the high-risk group and 25% for the low-risk group, which were significantly different in terms of overall  Figure 2D). In the external testing cohort, the ENKTL which is a subtype of PTCLs was categorized 27 (49.09%) of 55 patients into the low-risk group and 28 patients (50.91%) into the high-risk group, and 5-year os was 7.4% for the high-risk group and 28.57% for the low-risk group, ([HR] 2.3, 95% CI 1.10-4.7, P = .02. Figure 2G). We also noted similar results in the total set of GSE58445, and 5-year os was 11.1% for the high-risk group and 29.6% for the low-risk group ([HR]: 3.3, 95% CI 2.2-5.0, P < .0001 Figure S2). Prognostic accuracy of the six-mRNA based signature is also assessed by time-dependent ROC analysis. and SYK is a protective factor for prognosis of PTCLs; however, high expression of DOCK2, GSTM1, H2AFY and KCNAB2 is a risk factor to prognosis (Figure 3). Furthermore, we analysed the correlation between the expression of six genes and survival in GSE90597, and X-tile software was used to find the best cut-off value. We found that the high expression of GSTM1, H2AFY and KCNAB2 is negatively correlated with the prognosis, and LAPTM5 and SYK are positively correlated with the prognosis, which in line with the result in dataset of GSE58445 ( Figure S3).

| Distribution of immune cells in different risk groups
After completion of CIBERSORT immune analysis, we found that the two cohorts of PTCLs patients generally have similar immune cell distribution; additionally, naive B cells are statistically different in the high-risk and low-risk groups in both two cohorts. In the GSE58445 cohort, 81 cases PTCLs in each of the low-risk group and the highrisk group showed a significant difference in the presence of 6 immune cells types (naive B cells, memory B cells, resting natural killer cells, M1 macrophages, resting mast cells, eosinophils) ( Figure 5A).
In the combinational cohort which incorporated by GSE19069 and GSE90597 dataset, 99 cases PTCLs in the low-risk group and 92 PTCLs cases in the high-risk group showed a significant difference in distribution of 4 immune cells types (naive B cells, activated CD4 + T Figure 5B).

| Independent prognostic role of the mRNA signature
To confirm the value of mRNA signature in assessing PTCLs patients'

F I G U R E 6
Distribution of the mRNA risk score in distinct clinical characteristics and the role of risk stratification on response to chemotherapy. A,D, Differences in risk score among different PTCLs subtypes. B,E, The risk score was group by age. C,F, The risk score was group by gender. G, Risk stratification on response to vincristine chemotherapy 3.9 | Establishment of the nomogram and assessment of predictive value of mRNA signature In order to develop a convenient clinical tool that could facilitate clinician to predict overall survival (OS) probability of every patient, a nomogram which included a mRNA signature, age and gender was constructed to predict the 1-, 3-and 5-year OS of PTCLs patients ( Figure 7A,C).The calibration curve also illustrates high consistency between predictive survival time and observation survival time for the probabilities of 3-and 5-year OS in the PTCLs cohort. In the GSE58445 dataset, the Harrell's concordance index for OS was 0.722 ( Figure 7B).In the GSE90597 dataset, the Harrell's concordance index for OS was 0.684 ( Figure 7D), it means that the calibration plots for the 3-and 5-year OS rate were estimated well in entire PTCLs patients.

| D ISCUSS I ON
Peripheral T-cell lymphoma is an aggressively lymphoproliferative disease that seriously threatens human health, and most patients with PTCLs have a poor prognosis due to the combination of the lack of specific treatment and an aggressive clinical process. 22 However, molecular risk stratification which based on gene expression profile (GEP) into some type of human cancer has opened an avenue for clinicians to personalized medicine and brought enthusiasm for researchers to applicate to other cancer types. 23 Until recently, PTCLs were lagged behind in terms of risk classification unfortunately. In the present research, we developed a prognostic signature that is found to be highly expressed in most PTCLs. 50 Moreover, the inhibitor of SYK was shown to not only inhibit T-cell lymphoma cell lines proliferation but also induce apoptosis. 51 In our study, the prognosis of the SYK high-risk group is better than that of the lowrisk group, which may be attributed to the absent expression of SYK in some lymphoma with worse prognosis. 52 But it cannot be ruled out that it has a protective effect in some subtype of PTCLs, because it has been reported that SYK has a protective effect in some solid tumour. [53][54][55] Genomic changes have been shown as the cause of carcinogenic and progression of tumours, but in recent analyses infer that the changes in the tumour microenvironment (TME) are also closely related to cancer prognosis and have influence on the response of immunotherapy. 56 The high infiltration of B cells in tumours has been demonstrated to be associate with patients prolonged survival 57 and unique role for B cells in antitumour immunity may be responsible for this phenomenon. 58 To explore the composition of the immune microenvironment of PTCLs, the scale of value of immune cells in the high-and low-risk groups was calculated and analysed. The proportion of naïve B cells is significant higher in the low-risk group than in the high-risk group, which in line with the Javeed Iqbal's research that the signatures of B cell predicted a favourable outcome of PTCLs. 59 In addition to this, the presence of B cell in tumours could promote immunotherapy response, 60 and it suggests that low-risk group PTCLs may be more effective for immunotherapy.
Limitations of the present study should be acknowledged. Firstly, the sample size might not be adequate and may lead to selection bias. Secondly, lack of complete clinical characteristics and absent comprehensive analysis of signature and clinical features. What's more, additional genetic and experimental studies are required to elucidate the mechanism and the function of these genes that are included in signature which in the carcinogenesis and progression of PTCLs. Finally, our results in more larger samples or more external independent datasets need further validation.

| CON CLUS ION
In conclusion, this is the first study to investigate the ability of mRNA risk signature as novel prognostic biomarkers for PTCLs.
In present research, we identified a six-mRNA based signature for predicting OS of PTCLs and the mRNA signature has showed power performance to stratify all PTCLs patients into low and high risk group. Moreover, A nomogram which integrated mRNA signature and clinical characteristics potentially offers good value for clinicians implementing personalized therapeutic regimen for patients with PTCLs.

CO N FLI C T O F I NTE R E S T
The authors declare that they have no competing interests.

DATA AVA I L A B I L I T Y S TAT E M E N T
All data generated or analysed during this study are included in this article.