Artificial neural networks for simultaneously predicting the risk of multiple co‐occurring symptoms among patients with cancer

Abstract Patients with cancer often exhibit multiple co‐occurring symptoms which can impact the type of treatment received, recovery, and long‐term health. We aim to simultaneously predict the risk of three symptoms: severe pain, moderate‐severe depression, and poor well‐being in order to flag patients who may benefit from pre‐emptive early symptom management. This was a retrospective population‐based cohort study of adults diagnosed with cancer between 2008 and 2015. We developed and tested an Artificial Neural Network (ANN) model to predict the risk of multiple co‐occurring symptoms within 6 months after diagnosis. The ANN model derived from a training cohort was assessed on an independent test cohort for model performance based on sensitivity, specificity, accuracy, AUC, and calibration. The mutually exclusive training and test cohorts consisted of 35,606 and 10,498 patients, respectively. The area under the curve for the risk of experiencing severe pain, moderate‐severe depression, and poor well‐being were 71%, 73%, and 70%, respectively. Patient characteristics at highest risk of simultaneously experiencing these three symptoms included: those with lung cancer, late stage cancer, existing chronic conditions such as osteoarthritis, mood disorder, hypertension, diabetes, and coronary disease. Patients with over a 40% risk of severe pain also had over a 70% risk of depression, and over a 55% risk of poor well‐being. Our ANN model was able to simultaneously predict the risk of pain, depression, and lack of well‐being. Accurate prediction of future symptom burden can serve as an early indicator tool so that providers can implement timely interventions for symptom management, ultimately improving cancer care and quality of life.

have been shown to be associated with an increased likelihood of experiencing severe symptoms. 5 Severe symptoms can have an impact on the type of treatment received, on recovery and on long-term health outcomes. 2 It is, therefore, important for patients to receive timely and appropriate symptom management throughout their cancer trajectory. Being able to simultaneously predict the risk of experiencing multiple co-occurring symptoms based on a patient's profile and characteristics can assist cancer care providers in developing a plan for timely symptom management. Such risk prediction tools can offer opportunities for intervention so that patients experience less symptom distress or are better prepared for symptom exacerbations.

| Methodological background
The majority of studies examining predictors and risk of symptom burden among patients with cancer utilize traditional statistical techniques, such as logistic or modified Poisson regression. [5][6][7] When several symptoms for each patient are being considered, each symptom is often examined separately meaning that an independent regression model is developed for each symptom. Unfortunately, this approach fails to account for the possibly strong correlation in symptom burden measures taken from the same patient. 8 Moreover, the relationship between patient-level predictors and symptom burden may be nonlinear and complex, making it difficult to explicitly capture using traditional regression techniques.
As machine learning techniques are gaining attention as prediction tools in health research, it is of interest to determine if they can be used to simultaneously predict multiple outcomes of symptom burden among patients with cancer. An Artificial Neural Network (ANN) offers a convenient way to use large volumes of individual-level data to predict multiple co-occurring outcomes. The basic ANN structure consists of three layers: an input layer, a hidden layer, and an output layer. The patient-level predictors are represented as nodes in the input layer, and the patient-level outcomes are represented as nodes in the output layer. The nodes in the hidden layer are intermediate unobserved values that allow the ANN to model complex nonlinear relationships between the input nodes and the output nodes. 9,10 To address gaps in prior approaches that predict symptom burden risk, we sought to develop and validate an Artificial Neural Network to simultaneously predict the risk of experiencing multiple co-occurring symptoms among patients with cancer.

| Study design and population
This was a population-based retrospective cohort study among all adults diagnosed with cancer in Ontario, Canada, during 2008-2015. Individuals without a valid health card and those that did not participate in symptom screening were excluded from the cohort.

| Data sources
In Ontario, Canada's most populous province, the Edmonton Symptom Assessment System is a validated tool implemented across cancer centers to screen for nine common symptoms. The severity at the time of assessment of each symptom is rated from 0 to 10 on a numerical scale; with 0 meaning that the symptom is absent and 10 that it is the worst possible severity. 11,12 For cancer patients in Ontario receiving home care or residing in a long-term care facility, the interRAI Assessment System is another validated set of tools that also screens for symptoms such as pain and depression, and can provide valuable information to support person-specific care planning across the continuum of care. 13 We

| Outcomes (ANN output nodes)
A priori, we were interested in predicting the presence of three specific symptoms over a 6-month window (within 3-9 months after a cancer diagnosis): severe pain, moderate to severe depression, and poor well-being, respectively, representing physical, psychosocial, and global symptom measures. For every patient, information on the presence/absence of each of these three symptoms was retrieved in a hierarchical manner from the Symptom Management Reporting Database and the interRAI databases, similar to our prior work 7 : 1. Severe pain: Defined as a score of 7-10 (severe) for pain on ESAS; or a score of 3 (severe or excruciating) for pain intensity from the interRAI 2. Moderate-severe depression: Defined as a score of 4-10 (moderate to severe) for depression on ESAS; or a score of 3 or more on the Depression Rating Scale from the interRAI 3. Poor well-being: Defined as a score of 7-10 (poor) wellbeing on ESAS; or Yes for "client feels he/she has poor health when asked" under health status indicators from the interRAI.

| Covariates (ANN input nodes)
The following 39 unique covariates were measured on each patient (within the first 3 months after diagnosis): It should be noted that the exposure measurement window (within 3 months after diagnosis) was distinct from the outcome measurement window (3-9 months after diagnosis).

| Descriptive analyses
Prior to initiating any modeling, we randomly divided our population into two mutually exclusive cohorts: 75% of patients comprised the training cohort and the remaining 25% of patients comprised the test cohort. The distributions of characteristics for both training and test cohorts were explored; continuous measures were described with medians and interquartile ranges, and categorical measures were described using frequencies and percentages. 2.5.2 | Artificial Neural Network Model for simultaneously predicting pain, depression, and well-being We developed both 3-layer and 4-layer perceptron models. The 3-layer network consisted of an input layer, 1 hidden layer, and an output layer; the 4-layer network consisted of an input layer, 2 hidden layers, and an output layer. All 39 covariates described above were represented as nodes in the input layer, which were normalized and encoded as required. 14 The output layer consisted of three nodes representing each of our binary symptom outcomes (severe pain yes/no, moderate to severe depression yes/no, poor well-being yes/no). We began with a 3-layer model with two nodes in the first hidden layer, after which 1 additional node was incorporated until we reached 10 nodes in the first hidden layer. This process was repeated with a 4-layer model that included two nodes in its second hidden layer. The weights of the neural network were estimated using the training cohort. This was done using backpropagation with a weight backtracking algorithm, where a cross entropy error function was minimized. 15 The AUC (area under the ROC curve) value was then calculated for the training cohort to understand the degree of discrimination under each ANN model. We found that values for sensitivity, specificity, and AUC in the training cohort were optimal when we had three nodes in the first hidden layer and no nodes in the second hidden layer. As a result, the final ANN model included an input layer, three nodes in its single hidden layer, and three nodes in the output layer ( Figure 1). This is in line with prior recommendations stating that one layer of hidden neurons is generally sufficient for classifying noncomplex data. 16

| Assessing predictive ability
The ANN model was used to predict the 6-month risk of severe pain, moderate-severe depression, and poor well-being for each patient in our test cohort. The estimated single set of weights from the ANN model were able to simultaneously predict each of the three symptoms. For each symptom, the predicted number of outcomes was compared to the actual number of outcomes in the test cohort by composing a confusion matrix. In the test cohort under the ANN model, we calculated sensitivity (true positive fraction), specificity (true negative fraction), accuracy (true positive or negative fraction), and discrimination (measured using the AUC value). Additionally, calibration plots for each symptom were constructed under the ANN model using the test cohort. This was done by grouping patients into deciles (10 groups) based on their predicted risk, and then, plotting the observed symptom risk within a decile against the corresponding mean predicted risk within that decile. 17,18 Points closer to the 45 degree line indicate better calibration. Individuals predicted to be in the highest decile of risk for all three symptoms were identified from the calibration plots. The distribution of baseline characteristics for these individuals were examined to gain a better understanding of highest risk profiles.

| 3-Dimensional visualization
After assessing our ANN model's ability to simultaneously predict all three outcomes of symptom severity, the marginal 6-month predicted risks for each symptom for every patient were illustrated with a 3-dimensional scatter plot. The x-, y-, and z-axis represent the risk of experiencing severe pain, moderatesevere depression, and poor well-being, respectively. As risk is a probability, the range of each axis goes from 0.0 to 1.0. Each point on the plot represents a patient from the test cohort, and the x-, y-, and z-coordinates of the point represent the corresponding risk estimates for each symptom. All analyses and graphs were completed using statistical software R version 3.6.1. 19

| RESULTS
The study population consisted of 46,104 unique patients, of which 35,606 patients comprised the training cohort and the remaining 10,498 patients comprised the test cohort. Due to the random selection process, the distributions of baseline characteristics were well-balanced between the training and test cohorts (Table 1, Table A). The most common diagnoses were breast, lung, and colorectal cancers. More than half of the training cohort suffered from hypertension at the time of diagnosis; 41.8% had osteoarthritis, and diabetes was present F I G U R E 1 Visualization of a 10-3-3 neural network: 1 input layer consisting of 10 nodes (x1-x10), 1 hidden layer consisting of 3 nodes (h1-h3), and 1 output layer consisting of 3 nodes (o1-o3). The grey lines represent the connections/weights that need to be estimated. Note the network described herein was 39-3-3 As mentioned above, several ANN structures with 1 and 2 hidden layers were developed and assessed for simultaneously predicting the 6-month risk of three symptoms: severe pain, moderate-severe depression, and poor well-being. The final ANN model included an input layer, three nodes in its single hidden layer, and three nodes in the output layer ( Figure  1). Although this ANN framework jointly models all three symptoms, the marginal risk probabilities of each symptom can still be extracted. Table 2 provides the prediction performance of the ANN model (which was derived from the training cohort) on the test cohort. The marginal estimates of sensitivity, specificity, accuracy, and discrimination are given for each symptom. The area under the curve for the risk of experiencing severe pain, moderate-severe depression, and poor well-being were 71%, 73%, and 70%, respectively. The mean marginal predicted risk and the marginal observed risk for each symptom were computed on the test cohort to illustrate calibration (Figure 2). Overall for each symptom, the dots (representing each decile of predicted risk) are tight along the 45 degree line. A greater discrepancy can be seen among patients in the lowest deciles of predicted risk (dots to the left of each plot). The ANN model appears to overestimate the risk, as the mean predicted risk is larger than the observed risk; this finding is consistent for all three symptoms. The ANN model performs specifically well for patients in the highest deciles of predicted risk (dots to the right of the plot). Table 3 provides the distribution of characteristics among all unique patients whose predicted risks lie in the highest deciles for all three symptoms; that is, individuals with the highest risk of jointly experiencing severe pain, and moderate to severe depression, and poor well-being. Majority of patients were stage 4, and lung or gastrointestinal cancer were the most common diagnoses types. Nearly 60% of patients with the highest risk of symptom burden suffered from osteoarthritis, and 54% were living with hypertension. Mood disorder and diabetes were also common among this highest risk group. Figure 3 provides an illustration of the marginal predicted risks for each symptom for every patient using a 3-dimensional scatter plot. The plot consists of 10,498 points, one for each patient in the test cohort. There is a clear relationship between pain, depression, and well-being. As the risk of severe pain and moderate-severe depression increases, so the does the risk of experiencing poor well-being. On average, the 6-month risk of experiencing severe pain was 5.4%, moderate-severe depression was 8.2%, and lack of well-being was 7.2%. Patients with over a 40% risk of severe pain also had over a 70% risk of depression, and over a 55% risk of poor well-being.

| DISCUSSION
The ability to predict the future occurrence of multiple symptoms can be a powerful tool for the cancer care team. Such prediction tools can assist providers in risk-profiling patients, identifying those at higher risk of symptom burden, and improving the timing of pre-emptive and personalized symptom management interventions. 20 The ANN model developed in this paper was able to predict the risk of experiencing three co-occurring symptom outcomes: pain, depression and lack of well-being among patients diagnosed with cancer. The model also identified patient characteristics at highest risk of simultaneously experiencing these three symptoms. Profiles including lung cancer, late stage cancer, existing chronic conditions such as osteoarthritis, mood disorder, hypertension, diabetes, and coronary disease can be used to flag patients who may benefit from early symptom management. ANNs play an important role in risk prediction and can be particularly appealing when aiming to jointly predict multiple outcomes. ANNs are predominantly a distribution-free data driven approach. These models do not require a priori knowledge on the extent of correlation arising from outcomes within the same individual. In our case, the correlations in the severity of pain, depression, and well-being were intrinsically captured through the connections in the hidden layers of the network. ANNs also do not require a priori knowledge on the relationships between the predictors and the outcomes, nor do interactions between the predictors need to be prespecified. 17 Although ANNs are referred to as black-box models, as estimates of the weights in an ANN cannot be easily interpreted, these models are able to provide individual-level risk prediction based on a patient's covariate profile and can be employed as decision support tools once they are integrated into clinical practice. 9 There has been extensive research conducted on determining factors associated with symptom burden among patients with cancer, but very limited work has been done on predicting symptom burden. 5,6,[21][22][23][24] A recent study developed a tool using logistic regression for predicting the risk of symptom burden among cancer patients. 7 Although their model's level of discrimination was similar compared to our ANN, symptom correlation was ignored and each symptom was predicted independently using separate regressions. Prior work has used machine learning techniques such as support vector regression and nonlinear canonical correlation analysis to predict the severity of multiple co-occurring symptoms during a cycle of chemotherapy, however, these models were developed and tested with a relatively small sample of cancer patients. 20 This paper has numerous strengths. To our knowledge, it is the first study to simultaneously predict severity of multiple symptoms under an ANN framework using a population-based cohort of patients with cancer. With over 46,000 F I G U R E 2 Calibration plot for each symptom under the ANN risk prediction model (on the test cohort) individuals, we were able to build and validate our ANN risk prediction model on large training and test sets, respectively. The model was developed using an extensive list of covariates, including demographic-, clinical-, and treatment-characteristics, and information on baseline patient-reported measures of functional status and symptom burden, and measures on various types of healthcare utilization. Due to the limited exclusion criteria when creating the province-wide cohort, our ANN framework, network weight estimates, and findings can likely be applied to other populations of cancer patients receiving universal healthcare.
This study also has several limitations. Information on medications for dealing with symptoms such as pain or depression may improve prediction model performance, however, these data were not available. Symptom screening with ESAS was initiated in Ontario, Canada in 2007, which is the first year of accrual in our cohort. Since uptake of ESAS was gradual across cancer centers, the recent years of data are more representative of the current population of patients participating in symptom screening. We also did not directly compare the risk prediction performance of our ANN model against other commonly used approaches such as logistic regression, or against more complex approaches that account for correlation of multiple outcomes such as joint mixed models. Both techniques require making distributional assumptions, and interactions between predictors and correlation structures for multiple outcomes often need to be explicitly specified. These comparisons remain an important part of our future work with these data.
This study demonstrates the use of ANN models to simultaneously predict the risk of experiencing multiple co-occurring symptoms among cancer patients. With the growing availability of vast amounts of data on large population-based cohorts, researchers should consider machine learning techniques particularly when interest lies in predicting several, possible correlated, outcomes.

| ETHICAL STANDARDS
This study involved secondary data analyses only and was thus exempt from requiring REB approval because ICES is a designated "45.1 entity" under the Personal Health Information Protection Act (PHIPA) enabling the use of personal health information.

ACKNOWLEDGMENTS
This study was supported by ICES, which is funded by an annual grant from the Ontario Ministry of Health and Long-Term Care (MOHLTC). Parts of this material are based on data and information complied and provided by: MOHLTC, Cancer Care Ontario (CCO), and the Canadian Institute for Health Information (CIHI). The analyses, conclusions, T A B L E 3 Distribution of (selected) baseline characteristics for individuals predicted to be in the highest decile of risk for all three symptoms (from the test cohort)

Variable Value
Highest risk decile for all three symptoms Mean and standard deviation (SD) provided for continuous covariates; frequencies and percentages provided for binary or categorical covariates; (*) cell frequency suppressed due to small size