Predictive quality assurance of a linear accelerator based on the machine performance check application using statistical process control and ARIMA forecast modeling

Abstract Purpose A predictive linac quality assurance system based on the output of the Machine Performance Check (MPC) application was developed using statistical process control and autoregressive integrated moving average forecast modeling. The aim of this study is to demonstrate the feasibility of predictive quality assurance based on MPC tests that allow proactive preventative maintenance procedures to be carried out to better ensure optimal linac performance and minimize downtime. Method and Materials Daily MPC data were acquired for a total of 490 measurements. The initial 85% of data were used in prediction model learning with the autoregressive integrated moving average technique and in calculating upper and lower control limits for statistical process control analysis. The remaining 15% of data were used in testing the accuracy of the predictions of the proposed system. Two types of prediction were studied, namely, one‐step‐ahead values for predicting the next day's quality assurance results and six‐step‐ahead values for predicting up to a week ahead. Results that fall within the upper and lower control limits indicate a normal stage of machine performance, while the tolerance, determined from AAPM TG‐142, is the clinically required performance. The gap between the control limits and the clinical tolerances (as the warning stage) provides a window of opportunity for rectifying linac performance issues before they become clinically significant. The accuracy of the predictive model was tested using the root‐mean‐square error, absolute error, and average accuracy rate for all MPC test parameters. Results The accuracy of the predictive model is considered high (average root‐mean‐square error and absolute error for all parameters of less than 0.05). The average accuracy rate for indicating the normal/warning stages was higher than 85.00%. Conclusion Predictive quality assurance with the MPC will allow preventative maintenance, which could lead to improved linac performance and a reduction in unscheduled linac downtime.

in results is also embedded in the MPC module and data can be exported in.csv format. The MPC application has now been evaluated by multiple authors as a daily linac QA tool. [4][5][6][7][8] Statistical process control (SPC) is a statistical method for detecting the defects of a process and was first presented by Shewhart. 9 SPC has become a standard method of quality control. In SPC, continual observations are used to calculate a control chart, which includes a maximum control limit and minimum control limit that define a quality level. The control chart is often used in monitoring a process and detecting failure states at a point of measurement.
Binny et al. applied SPC to analyze the QA output variation in helical and static output for periods of up to 4 yr in an effort to improve helical tomotherapy QA. 10 Meanwhile, López-Tarjuelo et al. adopted SPC in the daily quality control of linac electron beams. 11 Fuangrod et al. applied SPC in constructing a clinically significant threshold of a real-time treatment verification system. 12 Recently, SPC has been used with MPC data in another study by Binny et al. 13 In that study, SPC analysis was conducted for MPC data across six TrueBeam linacs for 12 months in an attempt to determine MPC tolerances.
However, the study did not attempt to forecast MPC results to be used for predictive QA.
The concept of predictive QA that allows preemptive maintenance based on SPC analysis has been studied for radiotherapy linacs. [14][15][16][17] Predictive QA testing would allow radiotherapy departments to be proactive in their maintenance by remedying faults before clinical tolerances are breached. In theory, this should provide for improved linac performance consistency and reduced unscheduled linac downtime that can be disruptive to departmental workflow. Previous predictive QA studies have either been based on readouts from the linac itself 15-17 and hence lack independence or have been based on film measurements, 14 which are impractical on a daily basis.
The present study presents a framework of predictive linac QA based on MPC data. Control limits were determined from the SPC analysis of MPC data over the long term and used to determine the standard linac performance. When this performance is within clinical tolerances, there is a window of opportunity to rectify the problem before it becomes clinically significant. Such remedial action can be scheduled out of standard treatment hours to avert disruption to the clinical workflow. The forecasting of QA results provides a measure of how long this window of opportunity is. Figure 1 presents the proposed predictive QA system, which can be divided into the four steps of prediction, display, evaluation, and relearning. Daily MPC test data are prepared and loaded into the system. The system predicts both one-step-ahead and six-step-ahead MPC test results using a predictive model, which has been learnt from historical MPC data. In the display step, the predictive MPC data for each parameter are displayed on the constructed control chart which shows the upper control limit (UCL), lower control limit

2.C | Control chart construction
The control chart 9 and UCL and LCL are constructed according to where x is the average of m data observations, MR is an average of , and E is 2.66. 18 UCL, LCL, and CL are, respectively, the UCL, LCL, and CL.

2.D | Predictive model calculation: autoregressive integrated moving average
The autoregressive integrated moving average (ARIMA) model is a statistical predictive model whose fitting allows future prediction of the time series. 9 The ARIMA methodology includes an autoregressive term (AR), which is the weighted sum of recent differenced values, moving average term (MA), which is the weighted sum of the forecasting error, and integrated term, which is the degree of differencing for nonstationary elimination, if necessary. The general form of ARIMA (p,d,q) is as follows: whereẑ t is the predictive value at t, z t denotes differenced values in the time series at t, μ is a constant value, ϕ i denotes the weights of differenced values of AR (i ¼ 1; 2; . . . ; p), a t denotes the orders of predicted error, # j denotes the weights of predicted error . . . ; q), p is the order of AR, q is the order of the MA, and d is the order of differencing in nonstationary elimination.
Maximum likelihood estimation is adopted to estimate the parameter and error terms of the ARIMA model. The Akaike information criterion 20 is used to select the optimal model, which is the model that best fits the recorded time-series data. The ARIMA model is also used to predict the n-step-ahead value, which can be used to represent the future trend of the time-series data.

2.E | Predictive model and system evaluation
The accuracy of the predictive model was evaluated by comparing the ARIMA predicted results with the actual results in the final 15% of the MPC data. Model accuracy was assessed using the rootmean-square error (RMSE) and absolute error (MAE).
whenŷ i is the predicted value, y i is the actual value of state i, and n is the number of observations. RMSE and MSE are considered as the standard error measurements of model in predicting quantitative data. In addition, the model can also be evaluated by comparing the predicted results against the control chart (pass, fail, or warning status). The accuracy of the model in this sense can be calculated using the average accuracy: The average accuracy rate represents the average overall effec-  the UCL to LCL) and a grey area is a warning stage level (ranging from the UCL/LCL to the tolerance).

3.C | Predictive model performance
The performance of the ARIMA model in predicting daily MPC results is presented in  Figure 5 presents the results for individual MLC leaves. The results show that the high numbered leaves are predicted better than the low numbered leaves for Bank A, while the prediction is more consistent across the bank for bank B. The reason for these results is unclear.
The average accuracy rate was applied to assess the ability of the models to accurately flag the warning stage. The fourth column in  F I G . 6. Example of trend line to detect the output change exceeded the UCL that demonstrates the system is able to flag the warning stage before it occurs.
expected so that they can assign urgency to the problem being addressed and organize remediation. For a TG-142 tolerance fail, the decision may well be to remove the linac from clinical use until the failure has been investigated, but for results within the warning stage between control limits and clinical tolerance, the investigation would likely be delayed until outside of normal treatment hours to avoid disruption to the clinical schedule. In these cases, it is likely that investigation will be made at either the end of the clinical day or on the weekend. In these cases, one-step-ahead and six-stepahead prediction is appropriate.
The testing in the present study found that the majority of the one-step-ahead predictions were accurate (RMSE and MAE << 0.10) and the majority of average accuracy results were greater than 85.00%. The worst predictive performance was for the beam output An additional weakness of the ARIMA model is that the data require cleaning to eliminate poor-quality data, such as user errors.
The ARIMA model is sensitive to all values in the learning process and learning from poor-quality data will lead to poor prediction.
Moreover, the learning should continually progress by taking new measurements into account.
A possible source of error in the prediction model could be caused by EPID detector drift distorting the MPC results. To mitigate this source of uncertainty, throughout the study the EPID darkfield calibration was updated monthly and pixel defect map annually or as dead pixels were identified. It is noted that MPC is independent of the flood field calibration. EPID response constancy has also been extensively studied in the literature. The short-and long-term dose-response reproducibility of amorphous Silicon EPIDs has been found to be consistently within 1.0 % and 0.5 %, respectively, for all models once linac output variation from nominal had been accounted. 23 29 These events were not accounted for in this study.
Such systematic changes will influence the prediction model and hence at such events the prediction model input data should be reassessed and model training data collection potentially restarted. Due to the nature of prediction models being based on large learning-phase datasets the prediction models are not designed to detect large sudden one-off jumps in data such as might be expected with a linac component failure. Linac interlocks are still required to mitigate treatment delivery errors from such events as well as routine retrospective QA. Predictive QA is more suited to detecting and forecasting gradual drifts and failures that repeat at regular intervals.
The results of the present study suggest that the approach of predictive QA based on MPC data is feasible, but additional data on more linacs are required and the method needs to be tested further in terms of sensitivity for the system to be clinically useable. Such study is proposed as future work.

| CONCLUSION
The concept of linac predictive QA with the MPC using SPC and the ARIMA forecast model was demonstrated and its accuracy and performance evaluated. A window of opportunity between SPC control limits and clinical tolerances based on TG-142 was demonstrated, suggesting that the MPC is an appropriate tool for predictive QA.
The concept can be developed further with a greater number of linacs, sensitivity testing, and the evaluation of other predictive model techniques. Such testing thus has the potential to reduce the linac unscheduled downtime and allow linac performance parameters to be controlled within tolerances tighter than those typically applied in the clinic.

CONFLI CT OF INTEREST
The authors have no conflict of interest to disclose.