Development of a risk scoring system for patients with papillary thyroid cancer

Abstract As the importance of personalized therapeutics in aggressive papillary thyroid cancer (PTC) increases, accurate risk stratification is required. To develop a novel prognostic scoring system for patients with PTC (n = 455), we used mRNA expression and clinical data from The Cancer Genome Atlas. We performed variable selection using Network‐Regularized high‐dimensional Cox‐regression with gene network from pathway databases. The risk score was calculated using a linear combination of regression coefficients and mRNA expressions. The risk score and clinical variables were assessed by several survival analyses. The risk score showed high discriminatory power for the prediction of event‐free survival as well as the presence of metastasis. In multivariate analysis, the risk score and presence of metastasis were significant risk factors among the clinical variables that were examined together. In the current study, we developed a risk scoring system that will help to identify suitable therapeutic options for PTC.


| INTRODUC TI ON
Thyroid cancer is the most commonly diagnosed cancer in Korea, 1 and its incidence continues to rise worldwide. 2 Current treatment options for papillary thyroid cancer (PTC) include surgery, radioactive iodine ablation, and thyroid hormone replacement. 3 Although thyroid cancer is associated with a generally favorable prognosis, a minority of patients with thyroid cancer experience recurrence or distant metastasis. 3 Therefore, the challenge remains to distinguish between patients with indolent or aggressive thyroid cancer. An accurate risk stratification of thyroid cancer is essential in order to select the most suitable treatment options. In 2015, the American Thyroid Association proposed a model including low-, intermediate-, high-risk groups for differentiated thyroid cancer (DTC) mostly based on histologic report after surgery. 3 However, understanding of the molecular basis of pathogenesis and progression in thyroid cancer has progressed. 4 Genetic mutations in genes such as BRAF, and RAS are associated with both the pathogenesis of DTC and prognosis of thyroid cancer. 5 In the era of precision medicine, personalized treatment according to potential prognosis for individuals with thyroid cancer is critically important.
Big data have been mass produced for customized diagnosis; however, many medical scientists still use traditional statistical methods such as univariate Cox analysis (1972), the least absolute shrinkage selection operator (Lasso, 1997) and Elastic Net (2005) regression to predict survival. 6,7 Although these methods have been widely used in survival analysis, they do not incorporate the most up-to-date information regarding the complex interplay between biological pathways. A novel variable selection method, so-called Network-Regularized high-dimensional Cox-regression, has been developed that takes into account signalling pathways and gene networks with the addition of an optional gene-gene correlation matrix. 7,13 A new strategy that uses individual information is crucial to accurately stratify patients with PTC. Therefore, we aimed to develop a novel risk scoring system for PTC based on gene networks using The Cancer Genome Atlas (TCGA).

| Data acquisition and characteristics
The primary and processed data were downloaded from the Genomic Data Commons Data Portal (https://gdc-portal.nci.nih. gov/) in January 2017. All TCGA data were available without restrictions in publications or presentations according to TCGA publication guidelines. We downloaded mRNA expression data, and clinical information, which was last updated lastly in May 2016. Of 509 cases, the following samples were excluded; metastatic tissues (n = 8); history of other malignancies (n = 33); history of neoadjuvant therapy (n = 4); missing data (n = 9). In total, 455 patients were included in this study.

| Selection of genes and risk score
We performed Network-regularized high-dimensional Cox regression (Net) using the R package coxnet (version 0.2) to evaluate the association between event-free survival (EFS) and mRNA expression. The terms 'events' was used to refer to recurrence and/or progression. To obtain more significant results, optional parameters were required. We made a gene-gene pathway matrix using six large databases (Biocarta, HumanCyc, KEGG, NCI, Panther, and Reactome) as a regularized parameter 'Ω' using the R package graphite. The mixing parameter α, which decides the balance between Lasso and Ridge, was determined with minimal cross-validation error. The gene set was selected using Net and the 'leave-one-out cross-validation (LOOCV)' method. LOOCV is the most exhaustive cross-validation methods which train and test on all possible ways to divide the observation into a training and a validation set. Risk score was calculated as the level of expression of each gene, multiplied by the corresponding regression coefficients, consisting of 35 genes in total ( Table 1). The cutoff (−5.769287) was determined with maximal Uno's c-index. 14 Lower risk scores indicated lower risk for recurrence/progression. The study protocol is presented in Figure 1.

| Statistical analysis
Survival analysis was performed to predict EFS. Variables such as age, sex, histologic subtype, extrathyroidal extension (ETE), lymph node metastasis, distant metastasis, and risk score were assessed using

| RE SULTS
In total, 455 patients with PTC were included in this study (120 men, 335 women). The mean age was 45.8 years. Of 455 patients with PTC, 43 (9.5%) experienced recurrence/progression during followup (37.0 ± 30.6 months). Patients' characteristics are summarized in Table 2.

| Risk scoring system
We developed a risk scoring system to predict recurrence/progression of PTC. The risk score ranged between −8.841 and −3.425 for all patients. Time-dependent receiver operating characteristic analysis showed an acceptable predictive AUC of between 0.902 and 0.943 ( Figure 2). A c-index for the whole course of time was excellent with a value of 0.910. As shown in Figure 3, the risk score was statistically significance (P < 0.0001) for EFS as well as the presence of distant metastasis (P < 0.0001).

| D ISCUSS I ON
In this study, a risk scoring system derived from mRNA expression values and the presence of distant metastasis were strong predictors for recurrence/progression in patients with PTC. The incidence of thyroid cancer continues to rise worldwide, including in Korea. 2,15 However, the survival rates for thyroid cancer are relatively good with a 5-year rate of 97.3%. 16 Conventional treatment of PTC involves a three-tiered approach including surgery, radioactive iodine ablation, and replacement of exogenous thyroid hormone, which has remained unchanged since the 1950s. 3,17 In addition, several reports suggested only active surveillance of low-risk PTC without surgery. 18 In this regard, there is a need for new prognostic factors that predict recurrence/progression in thyroid cancer.
The study of cancer genomics have accelerated the convergence of discovery science and clinical medicine. 19 The molecular characterization of thyroid cancer has begun to influence diagnosis and the overall treatment landscape. Genetic mutations such as BRAF, RAS, and RET are known to be prognostic markers in thyroid cancer. 5 PTCs with BRAF mutations show a higher risk of recurrence and a higher risk of death. 5 TERT mutation is also associated with aggressive clinicopathological characteristics and poorer prognosis in PTC. 20  Therefore, genes with significant prognostic value as identified by in traditional approaches may have the weakness of overfitting, however, fit in their own dataset. In this study, to overcome these limitations, we developed a novel risk scoring system using Network-Regularized high-dimensional Cox regression analysis that incorporated biological pathways as a regularized parameter. To obtain more biological information, we constructed a gene-gene pathway matrix using six largest pathway databases (Biocarta, HumanCyc, KEGG, NCI, Panther, and Reactome).
In this study, a total of 35 genes were included in risk score system. As lower risk scores indicate the favorable prognosis, the higher The mixing parameter α, which decides the balance between Lasso and Ridge, was determined with minimal cross-validation error. The gene set was selected using Net and the 'leave-one-out' method for cross-validation. Risk score was calculated as the level of expression of each gene, multiplied by the corresponding regression coefficients, consisting of 35 genes in total ( Table 1). The cutoff (−5.769287) was determined with maximal Uno's c-index. 14 Lower risk scores indicated lower risk for recurrence/progression. The study protocol is presented in Figure 1.
In addition to a risk scoring system derived from mRNA, the presence of distant metastasis was an independent predictor of recurrence/progression in this study, which is consistent with a previous meta-analysis. 28  required. In the current study, we developed a novel risk scoring system for PTC derived from mRNA expression values, which is an independent predictor of prognosis in PTC. Although some limitations exist, our results provide insight into the prognostic prediction of patients with PTC.