Prediction of the output factor using machine and deep learning approach in uniform scanning proton therapy

Abstract Purpose The purpose of this work is to develop machine and deep learning‐based models to predict output and MU based on measured patient quality assurance (QA) data in uniform scanning proton therapy (USPT). Methods This study involves 4,231 patient QA measurements conducted over the last 6 years. In the current approach, output and MU are predicted by an empirical model (EM) based on patient treatment plan parameters. In this study, two MATLAB‐based machine and deep learning algorithms — Gaussian process regression (GPR) and shallow neural network (SNN) — were developed. The four parameters from patient QA (range, modulation, field size, and measured output factor) were used to train these algorithms. The data were randomized with a training set containing 90% and a testing set containing remaining 10% of the data. The model performance during training was accessed using root mean square error (RMSE) and R‐squared values. The trained model was used to predict output based on the three input parameters: range, modulation, and field size. The percent difference was calculated between the predicted and measured output factors. The number of data sets required to make prediction accuracy of GPR and SNN models' invariable was also evaluated. Results The prediction accuracy of machine and deep learning algorithms is higher than the EM. The output predictions with [GPR, SNN, and EM] within ± 2% and ± 3% difference were [97.16%, 97.64%, and 92.95%] and [99.76%, 99.29%, and 97.18%], respectively. The GPR model outperformed the SNN with a smaller number of training data sets. Conclusion The GPR and SNN models outperformed the EM in terms of prediction accuracy. Machine and deep learning algorithms predicted the output factor and MU for USPT with higher predictive accuracy than EM. In our clinic, these models have been adopted as a secondary check of MU or output factors.

Machine and deep learning techniques combine the power of computer science and statistics, and have the potential to revolutionize cancer treatment technologies in radiation oncology. 1 Proton beam therapy having desirable properties such as sharp dose falloff after the Bragg peak has shown reduced dose to critical organs while achieving comparable dose to the target volume as conventional radiation therapy. 2,3 Utilizing machine and deep learning techniques in proton therapy will lead to the development of new tools that can benefit patient care. 4,5 In the uniform scanning proton therapy (USPT) delivery technique, the beam is delivered uniformly layer by layer by varying the beam energy. Scanning magnets are used to move the beam in two dimensions along the transverse plane. In radiation therapy, monitor units (MU) and the output factor (OF), which is a ratio of measured dose for a certain field with respect to a reference field, are required for the accurate and reliable delivery of the treatment dose to the patient. Due to the complexity of the USPT beam delivery, no commercial treatment planning system is available to calculate MU.
The MU and OF are calculated in both USPT and passively double-scattered proton therapy (DSPT) from results of measurements, empirical models (EMs), or Monte Carlo simulation of each patientspecific treatment field. There are several publications available which report the calculations of OF for DSPT and USPT techniques.
For DSPT, Kooy et al. 6,7 reported the analytical method for the calculation of OF by combining the parameters of range, modulation, and source shift change. The OF calculation procedure for DSPT at M.D. Anderson Cancer Center was reported by Sahoo et al. 8 Ferguson et al. 9 compared the three output prediction models named, Sahoo

2.A | Patient and proton system
A total of 4,231 patient-specific field QA measurements conducted over a span of 6 years (2013-2018) were used in this study. All measurements were performed using an IBA proton therapy system (Louvain-la-Neuve, Belgium). The description of the beam line has been detailed elsewhere. 11,14 The proton system can deliver beam modulation widths (proximal 95% to distal 95% isodose point) from 2.0 to 13.0 g/cm 2 and 13.0 to 25.0 g/cm 2 in increments of 0.5 and 1.0 g/cm 2 , respectively, and beam ranges (depth of the distal 90% isodose point) from 4.0 to 31.5 g/cm 2 in increments of 0.1 g/cm 2 .
The patient QA data include proton ranges from 4.0 to 31.5 g/cm 2 , modulation widths from 2.0 to 25.0 g/cm 2 and field sizes from 2 × 2 to 26 × 26 cm 2 . The field size was estimated by square root of the product of the x-axis and y-axis at the opening of the aperture. The patient QA measurements were performed using four snout sizes: 10 cm diameter, 18 cm diameter, 25 cm diameter, and 30 × 40 cm 2 .

2.B | Output factor measurements and empirical model
The empirical model (EM) used for the calculation of patient-specific output factor and MU is described by Zheng et al. 11 The EM model was developed based on various treatment conditions using combinations of ranges and modulations with the 10 cm aperture.
The measured OF was then modified with several multiplicative factors such as field size, inverse square correction, and scanning field size that were empirically determined. The OF and MU from the EM for each patient-specific field were verified by measurements using a water tank of dimensions 32 × 21×21 cm 3  and air-gap: 7 cm), and then the charge (C pt ) for each patientspecific field using the same range, modulation, MU pt as prescribed in the treatment plan and EM with field specific apertures, excluding the compensator. The measured output factor is then calculated as:

2.C | Machine and Deep learning models
Since our problem involved nonlinear data fitting and multidimensional data, two MATLAB (The MathWorks, Natick, MA; version 2018b)-based machine and deep learning algorithms -Gaussian process regression (GPR) and shallow neural network (SNN)using supervised learning method were developed. The supervised learning model approach for GPR is shown in Fig. 1. In this study, we distinguishably used "machine learning (GPR)" and "deep learning (SNN)" as used in MATLAB even if deep learning can be viewed as an extension or a subset of machine learning techniques. The four parameters from patient QA (range, modulation, field size, and measured output factor) were used to train these algorithms. The data were randomized and the models were developed with a training set containing 90% of the data and a testing set of remaining 10%.

2.C.1 | Gaussian process regression
The GPR model, a form of Bayesian nonlinear regression, attempts to predict a response variable from input variables using an approach that yields certain advantages. 15,16 Given a set of training data, classical algorithms attempt to fit a single model to the data by optimizing several parameters; the model is then used to predict future inputs. Such an approach characterized by discrete parameters is known to be parametric. In contrast, GPR is nonparametric, meaning that it is not bounded by any single function, but can calculate a probability distribution over every possible function that presumably may fit the input data. Naturally, this also leads to the added benefit of obtaining uncertainty information regarding the algorithm's predictions.
The Gaussian process is defined by two functions: the mean function that outputs the expected value and the covariance function that defines how it changes as the input changes (smoothness) given that the variables included in the model come from a joint, multivariate Gaussian distribution. The mean function in this model was set to be a constant value. The covariance function can be defined as a kernel, selected to be exponential in this model, and is parameterized by a set of kernel parameters commonly known as hyperparameters. The nonisotropic exponential kernel used in the model has all predictor variables with their own correlation length scale. The exponential kernel is defined as: where x i and x j are values in input space, σ f is the signal standard deviation, σ l is the characteristics length scale and r is the Euclidean distance between x i and x j .
The GPR model implemented in MATLAB can be trained via the fitrgp function. 17 Observation values are projected into a feature space using basis functions. Given the basis function, the covariance function, and the initial parameter values, the fitrgp function estimates basis function coefficients, the noise variance, and the hyperparameters (signal variance and characteristics length scale) for the covariance kernel.

2.C.2 | Shallow neural network
The shallow neural network as the name suggests uses fewer number of hidden layers, usually one hidden layer and one output layer as shown in Fig. 2. The number of neurons in the hidden layer may vary. Neural networks which are sets of algorithms or computing models mimicking the neural network structured in the brain can be trained for pattern recognition, classifying data, and regression problems which describe the relationship between output and one or more input variables.
A two-layer feed-forward SNN was constructed in this fitting problem. 18 The hidden layer used 100 neurons with the tan-sigmoid transfer function due to the inherent nonlinearity of our data. The sum of the weighted inputs and bias is used as the input to the tansigmoid transfer function. The output layer uses the linear transfer function.
For optimal neural network training, the 90% training data set was divided into three subsets: training (80%), validation (15%), and testing (5%). The training set was used to compute the gradient and updating the network's weights and biases. During training process, the validation set error was monitored and decreased. To avoid F I G . 1. Supervised learning algorithm approach for gaussian process regression.
overfitting, which is characterized by the rise in the validation set error, weights and biases were saved at the minimum of the validation error. To optimize the performance function for training multilayer feed-forward network, a Levenberg-Marquardt backpropagation algorithm 19,20 was used.

2.D | Dependency of the size of the data set
In order to determine how many data points were needed to train the GPR and SNN models to make their output factor prediction accuracy invariable, the output prediction by the models was tested with 1%, 5%, 10%, 20%, 40%, 80%, and 100% of the training data set.

2.E | Algorithm evaluation
The algorithm performance during training was assessed using root mean square error (RMSE) and R-squared values. The trained model was used to make the output predictions based on the three input parameters: range, modulation, and field size. The testing data set is shown in Fig. 3. Percent difference was calculated between the predicted and measured output factor from the testing data set. A two-tailed t-test was performed using a significance level of 0.05. and (GPR or SNN) was statistically significant (P < 0.001). Figure 4 shows the histogram and normal distribution fit of the percent differences between modeled and measured output factors for all patient QA data for EM, GPR, and SNN.

3.B | Dependency of the size of the data set
The GPR model outperformed the SNN model for the sets which contained a smaller number of training data points. As the training set varied from 1% to 100% of all data points, the percentage of OF predictions within ± 2% from the measurements varied from 77% to 97% and 11% to 97% of fields for GPR and SNN, respectively. Both the models became saturated in output factor prediction accuracy at the 40% of the training data set mark as shown in

| DISCUSSION
In radiation therapy, the success of the treatment is primarily dependent upon accurate dose calculation. In USPT, the conversion of the dose to MUs through the OF is presently accomplished by empirical models. Based on the patient's treatment plan QA data, two machine and deep learning models (GPR and SNN) were employed for the OF prediction. Our results indicate that both models outperformed the EM in parameters of mean error, mean absolute error, maximum absolute error, and difference within ± 2% and ± 3% of modeled OF.
These encouraging results led to the development of a MATLAB compiled executable application for use in our clinic as a secondary check of MUs.
As implemented clinically, the current EM quantifies each of three features (range, modulation, and field size) into their relative contribution to the final OF. In this interpolative approach, each feature must be measured holding the other constants; as such, the incorporation of more data for treatment fields is cumbersome, requiring multiple measurement setups to quantify each individual feature alone. The machine learning approach, by contrast, allows incorporation of new data quickly, without the need for interpolation across multiple features.
Both the GPR and SNN models reported accuracy to over 97% of fields for a 2% difference between predicted and measured OF.
The Cubist machine learning model developed by Sun et al. 4 for their DSPT reported 97.5% prediction accuracy for a difference within 2% between predicted and measured OFs. The field size factor is interdependent upon a number of factors including range, modulation, snout position, and calibration depth.
Zheng et al. 21 reported that by varying the field size from 10 to T A B L E 1 Percent difference, mean and maximum error, standard deviation (SD) between measured and predicted output factor. for each beam while OF of the PBS system depends on too many input parameters (spot position, spot energy, spot spacing, energy layer spacing, spot per MU, field optimization technique (single-field or multiple-field), and so on). Generalization of these input parameters is a successful key to build an acceptable OF prediction model for PBS.

| CONCLUSION
The accurate conversion of the planned dose to the machine readable monitor units is one of the most important parts of treatment planning in uniform scanning proton therapy. We developed two output prediction methods based on machine and deep learning algorithms using GPR and SNN. Both of these algorithms outperformed the empirical model. In our clinic, these models have been used as secondary check of MU or output factor.

CONF LICT OF I NTEREST
There is no conflicts of interest.