Machine learning for the prediction of bone metastasis in patients with newly diagnosed thyroid cancer

Abstract Objectives This study aimed to establish a machine learning prediction model that can be used to predict bone metastasis (BM) in patients with newly diagnosed thyroid cancer (TC). Methods Demographic and clinicopathologic variables of TC patients in the Surveillance, Epidemiology, and End Results database from 2010 to 2016 were retrospectively analyzed. On this basis, we developed a random forest (RF) algorithm model based on machine‐learning. The area under receiver operating characteristic curve (AUC), accuracy score, recall rate, and specificity are used to evaluate and compare the prediction performance of the RF model and the other model. Results A total of 17,138 patients were included in the study, with 166 (0.97%) developed bone metastases. Grade, T stage, histology, race, sex, age, and N stage were the important prediction features of BM. The RF model has better predictive performance than the other model (AUC: 0.917, accuracy: 0.904, recall rate: 0.833, and specificity: 0.905). Conclusions The RF model constructed in this study could accurately predict bone metastases in TC patients, which may provide clinicians with more personalized clinical decision‐making recommendations. Machine learning technology has the potential to improve the development of BM prediction models in TC patients.

Because of the low incidence and asymptomatic nature of BM, testing for BM is often overlooked during the initial diagnosis of a patient with TC. The current detection method is mainly bone scanning, however, due to the defects of high cost, radiation damage, and low sensitivity to micrometastases focus. 10 Patients' bone scanning are recommended only in the presence of suspicious skeletal-related events (SRE), and it has been reported that the median time to develop SRE is 5 months after bone metastasis (BM). 6 By then, many TC patients may miss out on the best treatment opportunities because they may have developed an advanced disease or multiple metastases. Machine-learning (ML) technology makes it possible to infer important connections between data items from disparate data sets otherwise these data items will be difficult to correlate. 11,12 Today, the sheer volume and complexity of medical data make the use of ML in diagnosing disease and predicting clinical outcomes promising. ML has been used in clinical settings and have demonstrated greater accuracy than conventional methods. 13,14 Therefore, we aim to establish a machine learning-based predictive model for predicting BM occurrence of patients with TC. This study may provide clinicians with more personalized clinical decision making and allocate health resources more appropriately.
F I G U R E 1 Flow diagram of the study population selected from the Surveillance, Epidemiology, and End Results (SEER) database. Based on the inclusion and exclusion criteria, 17,138 patients were included in this study 2804 | LIU et aL.

| Study population
This study was derived from the Surveillance, Epidemiology, and End Results (SEER) database. Patient data were downloaded from the "SEER 18 Regs Research Data + Hurricane Katrina Impacted Louisiana Cases (1973-2016)" by using SEER*stat 8.3.8 software.
The study was limited to the period between 2010 and 2016, as information on metastasis at the site of interest was only available in 2010 and later. Also the criteria of exclusion are as follows: (1) unknown information of T stage, N stage, race, grade, insurance status, marital status, and bone metastatic status; (2) TC is not the first tumor. Meanwhile, the patient selection procedure is displayed in Figure 1. The seventh edition of the AJCC TNM staging system was used as the basis for staging the cases included in the study.

| Model establishment
All statistical analysis in the study was performed with R (version 3.6.8, R Foundation for Statistical Computing) and Python (version 3.7, Python Software Foundation). All variables were tested for Pearson correlation with each other, and the results are presented with a heat map ( Figure 2). All patients were randomly divided into training set and test set at 7:3 ( Table 1). The chi-square test was used to analyze the differences between the training and test sets. The training set was used to establish a random forest (RF) model and a multivariate logistic regression (LR) model, and the test set was applied to evaluate them. For RF, it builds Bagging integration based on decision tree (DT), and further introduces random attribute selection in the training process of DT. Figuratively speaking, it is to build many DTs to form a "forest" of DTs, and make decisions through the voting of multiple trees. This method can effectively improve the classification accuracy of new samples. 15 The randomness of the RF is reflected in the fact that the training samples for each tree are random, and the splitting properties of each node in the tree are randomly selected. With these two random factors, the RF does not over-fit even if no pruning is performed on each DT. At first, we used the number of trees in a RF (ntree = 500) to build the model. For multivariate LR, we use an enter variable selection method to establish the model. Area under the receiver operating characteristic curve (AUC), accuracy score, recall rate, and specificity were applied to compare the prediction power of two models.

| Model improvement
After the first round of model building was completed, we extracted the important features from the first round of  model. Also, additional machine learning algorithms such as classifier (Ada), DT, Naive Bayes classification (NBC), and Support vector machine were introduced for comparison. [16][17][18][19][20] 3 | RESULTS

| Demographic and pathological characteristics
A total of 17,138 patients with TC were enrolled in this study. Of these patients, 166 developed bone metastases (0.97%) and 16,972 were without bone metastases (99.03%) at primary diagnosis. All patients were completely randomized in a ratio of 7:3 into a training set (n = 11,997) and a test set (n = 5141). And demographic and clinicopathological variables are detailed in Table 1.

| Model analysis and variable influence on prediction
All variables were tested for Pearson correlation with each other, and the correlation heat map showed no significant correlation between them (Figure 3), indicating that the variables are independent of each other. For multivariate LR model with enter variable selection method, seven characteristics were identified as independent risk factors, including F I G U R E 3 Results of Pearson correlation analysis between all variables. The heat map shows the correlation between the variables sex (p = 0.015), age (p = 0.011), race (p < 0.001), grade (p = 0.029), histology (p = 0.043), T stage (p < 0.001), and N stage (p = 0.005) ( Table 2). For RF model, variable importance was evaluated in terms of out-of-bag (OOB) error rate, which can reflect the contribution of each variable when categorizing BM versus no BM ( Figure 4). Grade, followed by T stage and histology were the top three most important variables. Interestingly, in the RF model, the top seven most important variables are consistent with the risk factors screened by the LR model.

| Model performance
The test set was applied to test and compare the predictive performance of the all models. The AUC, accuracy score, recall rate, and specificity were used to evaluate and compare the model performances.  Figure 5A). After that, we adjusted the parameters of the RF model and iterated over the ntree values from 1 to 500 to choose the ntree value that makes the best prediction performance (ntree = 7, Figure 2A). The improved random forest (RF2) model using the top seven significant features has the best prediction performance among all machine learning models (AUC: 0.917, accuracy: 0.904, sensitivity: 0.833, specificity: 0.905, Table 3; Figure 5B). It also achieved excellent performance in the 10-fold cross-validation of the training set (average AUC = 0.916, Figure 2B). Meanwhile, the prediction results of the improved RF model are shown in Table 4, which intuitively shows its prediction power.  8 In the present study, the prevalence of BM in patients with TC was less than previously reported, only 0.97%. This may be due to the fact that the data recorded in the SEER database were diagnostic of BM at the same time, whereas the BM data in the other studies were cumulative data at different times. So the incidence of BM was lower in this study. From the above, it can be seen that in patients with TC, the probability of developing a BM at the primary diagnosis is low, and most BMs develop during the clinical follow-up after the initial diagnosis of TC. Therefore, after the initial diagnosis of TC patients, further follow-up examination of those patients with a high probability of developing bone metastases is important for receiving appropriate treatment and improving prognosis. Bone scintigraphy is usually used to identify possible bone metastases in patients newly diagnosed with TC. However, because bone scintigraphy is expensive and has radiation damage, further follow-up examination may not be appropriate with this method. Pathological diagnosis is considered the gold standard. However, studies have shown that biopsy is not only difficult and painful, but also increases the risk of tumor cell proliferation, which means it may not be safe for routine diagnosis. 22 To better address this problem, we used advanced machine learning algorithms and constructed a RF model to identify BM highrisk TC patients.

| DISCUSSION
Random forest seems to be the machine learning algorithm of choice in most clinical studies. 23,24 Studies have shown that it is one of the most accurate machine learning models, and is superior to other techniques in handling large numbers of features and highly nonlinear data, is agile in handling data noise, and is easier to tune and integrate with learning algorithms than other algorithms. 25 In the research, we found that advanced machine learning techniques like RF modeling can improve the utilization of information in analytical databases and enable the development and validation of predictive models with better performance. The RF model has stronger predictive performance, probably because the RF model uses more advanced classification decisions and different weighting ratios compared to the other model. The model has shown excellent performance in predicting BM in F I G U R E 4 Feature importance derived from random forest model. The plot shows relative importance of the variables in random forest model TC patients, which can provide clinicians with more accurate and personalized health-care decisions. The potential use of this model is to help patients with TC predict the likelihood of bone metastases and to alert patients at high risk of BM for further investigation, which may help improve their prognosis.
In this study, we found that the top seven most important features in the RF model are precisely the risk factors screened out in the LR model, including grade, T stage, histology, race, sex, age, and N stage. Although SRE has long been recognized as a sign of BM, it is not reasonable to consider targeted screening for BM in TC patients only when they have symptoms of bone involvement, as this would delay their treatment. Therefore, models are necessary to predict patients with TC at high risk for bone metastases, and to provide early attention and screening. In previous studies, [26][27][28][29] age has been demonstrated to have an impact on the prognosis of TC patients, and it has been reported that the risk of DM was significantly reduced in younger TC patients compared with older patients. 30 And we found that age was also an important feature influencing BM in our study. Zhao et al. 31 found that sex was a risk factor for TC lateral lymph node metastasis and skip metastasis. In this research, we also found that sex is an important characteristic that affects BM, with men being more likely to develop BM than women.
There are now many studies shows that tumor biology is believed to play an important role in disease development, which may be closely related to the occurrence and development of BM. A meta-analysis found significant correlations between tumor multifocality, size, vascular infiltration, extrathyroidal extension, and lymph node metastasis and DM. 32 In the present study, we found that T and N stage were important features predicting the development of bone metastases in patients with TC. This study also found that patients with poorly or undifferentiated tumors were more likely to develop BM, possibly because cancer cells invade surrounding tissues, capillaries, and lymphatic vessels, and these poorly or undifferentiated tissues have a greater potential to grow and undergo early metastasis. These findings are consistent with those of Sugino et al. 33 Thyroid cancer is highly heterogeneous in terms of clinical and molecular characteristics and consists of four major subtypes associated with different  34 This may be because vascular invasion in FTC is more common and reasonable than vascular invasion in PTC. This study applied machine learn-based RF methods with SEER data to predict BM in TC patients. It extends the LR-based nomogram model that has been used frequently by other researchers recently. However, this study still has several limitations. First, the model is based on machine learning and deep learning algorithms, so there may be some difficulties in clinical interpretation of the important features screened out by the model. Second, this is a study based on a North American population, so there may be gaps in population applicability, so it is necessary to include a broader population in future studies. Third, the SEER database records information at the time of initial diagnosis, which means that subsequent treatment data are missing, and we were unable to include them in the BM prediction analysis of TC patients.

| CONCLUSION
In conclusion, here, we developed a RF prediction model for bone metastases in TC patients that outperformed traditional LR models. This facilitates personalized diagnosis and refined clinical decision making for BM in TC patients.

ACKNOWLEDGMENTS
This work is supported by the Department of Science and Technology Program of Jiangxi Province, China (No. 20192ACBL21041, 20202BBGL73015) and the project of Jiangxi Provincial Health Commission (No. 20161024). We are thankful for the contribution of the SEER database and the 18 registries supplying cancer research information, and we thank Mr. Wenxing Qian of the Department of Computer Science, Beijing Jiaotong University for his assistance in computer science.

CONFLICT OF INTEREST
No benefits in any form have been or will be received from any commercial party related to the subject of this manuscript.

ETHICS APPROVAL AND CONSENT TO PARTICIPATE
We received permission to access the research data file in the SEER program from the National Cancer Institute, US. Approval was waived by the local ethics committee, as SEER data is publicly available and de-identified.

DATA AVAILABILITY STATEMENT
The data sets generated and/or analyzed during the current study are available in the SEER database (https://seer.cancer. gov/).