Experimental comparison of linear regression and LSTM motion prediction models for MLC-tracking on an MRI-linac

Background: Magnetic resonance imaging (MRI)-guided radiotherapy with multileaf collimator (MLC)-tracking is a promising technique for intra-fractional motion management,achieving high dose conformality without prolonging treat-ment times. To improve beam-target alignment, the geometric error due to system latency should be reduced by using temporal prediction. Purpose: To experimentally compare linear regression (LR) and long-short-term memory (LSTM) motion prediction models for MLC-tracking on an MRI-linac using multiple patient-derived traces with different complexities. Methods: Experiments were

of the online LR and the (4.4 ± 2.4) mm when using the no-predictor.According to statistical tests, differences were significant (p-value < 0.05) among all models in a pair-wise comparison, but for the offline LSTM and online LR pair.The offline+online LSTM was found to be more reproducible than the offline LSTM and the online LR with a maximum deviation in RMSE between two measurements of 10%.Conclusions: This study represents the first experimental comparison of different prediction models for MRI-guided MLC-tracking using several patientderived respiratory motion traces.We have shown that among the investigated models, continuously re-optimized LSTM networks are the most promising to account for the end-to-end system latency in MRI-guided radiotherapy with MLC-tracking.

K E Y W O R D S
linear regression, long short-term memory, MLC-tracking, motion prediction, MRI-linac, respiratory motion

INTRODUCTION
Magnetic resonance imaging (MRI) guided radiotherapy (MRIgRT) offers high soft-tissue contrast visualization and the opportunity to adapt to changes in patient anatomy prior and during irradiation. 1,2MRI-linac systems, which are linear accelerators with an embedded MRI unit, are increasingly being used clinically over the past years. 3To adapt to intra-fractional changes, for instance due to respiratory motion, current clinical systems rely on motion monitoring 4 with gated beam delivery. 5In this type of treatment, the irradiation target is visualized in real-time using cine MRI and the beam is automatically stopped if the target exits a predefined area, thus recovering the conformality of static treatments and avoiding an increase of dose to healthy tissues surrounding the tumor.A disadvantage of this approach are the increased treatment times, with duty cycle efficiencies of 20% or 50% having being reported in clinics, depending on patient compliance. 6,7An alternative approach with comparable dose conformality but increased treatment efficiency is multileaf -collimator (MLC)-tracking, during which the MLC aperture is continuously shifted to follow the target motion. 8However, it has been shown that a factor which is critical to the accuracy of MRIgRT with MLC-tracking is the system latency. 9he system latency is defined as the time lag between the physical motion of the target and the execution of beam adaptation, which in the case of MLC-tracking is the time when the MLC leaves reach their desired positions.According to the AAPM Task Group 264, a latency ≤ 500 ms is necessary to meet the definition of realtime motion compensation. 10For MRI-linacs capable of MLC-tracking it has been experimentally measured to range from 205 ms to 411 ms, 9,11 mainly depending on the acquisition frequency of the cine MRIs.To overcome the system latency, temporal prediction models can be used.6][17] In a comparative computational study, Jöhl et al. used 93 respiratory motion traces to show that among 18 motion predictors ranging from Kalman filters to artificial neural networks, linear regression (LR) models were the best candidates for respiratory motion prediction for various time horizons and noise levels. 17When computationally possible, models were retrained at every time step (i.e., online) as this approach had been shown to improve performance.Recently, a class of machine learning algorithms called long short-term memory (LSTM) networks, which is ideally suited to deal with sequential input data, has been shown to be very promising for motion prediction both in RT 18,19 and in MRIgRT. 20,21Specifically, LSTMs were shown to outperform LR models for the prediction of superior-inferior (SI) target centroid positions based on patient data acquired on a 0.35 T MRI -linac. 20hile all aforementioned studies were in-silico, to the best of our knowledge there are four studies which experimentally investigated motion prediction for MLCtracking during MRIgRT.In an early phantom study by Yun et al. 22 , it was shown that motion prediction using artificial neural networks and sinusoidal motion led to MLC-tracking with similar dosimetric accuracy as in the static scenario.Uijtewaal et al. 23 showed that an online LR can compensate the latency for the delivery of intensity modulated radiotherapy (IMRT) plans with MLC-tracking to a phantom moving with Lujan motion (cos 4 ).In two follow up studies with the same motion predictor, they used Lujan motion and additionally one patient-derived motion trace to investigate MLC-tracking with VMAT plans 24 and with a hybrid 2D/4D-MRI methodology. 25A limitation of all four studies is that either sinusoidal or a single patient-derived trace was used.
This study aimed at experimentally validating the in-silico comparison of LSTM and LR motion prediction models 20 for MLC-tracking with a prototype MRI-linac.Compared to previous MRIgRT studies with MLC-tracking and motion prediction, [22][23][24][25] this was the first time multiple motion traces with different complexities were used and conventional and machine learning based predictors were compared.Specifically, we compared models trained on retrospective data (i.e., offline) with online models updated in real-time during the experiments.The performance of the motion predictors with the different motion traces was evaluated in terms of geometric accuracy of the MLC-tracking.

Experimental setup
All experiments were performed on a prototype MRIlinac system featuring a 1.0 T open bore magnet (Agilent, UK) with a control system based on a Magnetom Avanto spectrometer (Siemens, Erlangen, Germany) and a 6 MV industrial linac (Linatron, Varex Imaging, Utah,USA).The radiation beam generated by the linac is aligned with the B 0 field (fixed beam line, no gantry) and shaped by a MLC with 120 leaves (Millennium, Varian, Palo Alto, California, USA). 26The motion for the experiments was executed with the MRI-compatible Quasar phantom (Modus Medical Devices, Ontario, Canada), positioned at a source-to-surface distance of 2.4 m.The phantom (same as in Liu et al. 9 ) contained a single MRI-visible target and was placed inside the bore such that the target was located at the isocenter.During the experiments, the target was moved by a motor in SI direction according to the provided motion traces (see Section 2.2).During irradiation, the target was imaged and localized as described in Section 2.3.The extracted target positions were given as input to one of the motion prediction models (Section 2.4).Prior to the motion prediction experiments,a sinusoidal trace was tracked three times to characterize the end-to-end latency of the system, analogously to Liu et al. 9 : A sinusoidal was fitted to the centroid positions of the targets and apertures (moved by the no-predictor, see Section 2.4) and the latency was calculated as the time difference between the two fits.We then performed motion prediction with four models on eight motion traces.Each experiment was repeated twice to test the stability of the models, for a total of 64 experiments.MLC-tracking driven by the model predictions was used to compensate for the observed motion (Section 2.5) and an electronic portal imaging device (EPID) was used to quantify the geometric accuracy of tracking when using the different models (Section 2.6).The overall experimental setup is shown in Figure 1.

Motion traces
In this study, eight publicly available motion traces previously exploited in a multi-institutional markerless lung target tracking study were used. 27The traces were obtained from seven different lung tumor patients and feature different motion amplitude, complexity and frequency, all factors known to influence the accuracy of tracking.Four traces were taken from a clinical study using measurements of the centroid position of implanted Calypso beacons. 28hese 3D motion traces were originally acquired at 10 Hz and represent motion with high complexity, high motion amplitude, mean complexity and mean motion amplitude (https://github.com/MarcoMueller-MCT/AAPM_GrandChallenge_MATCH/tree/master).
The other four traces were acquired during treatments with a Cyberknife Synchrony system (Accuray Incorporated, Sunnyvale, California, USA) in 3D at a sampling rate of 25 Hz. 29These traces include evident baseline shifts, right-left (RL) dominant, high frequency and typical lung target motion (https://cloudstor.aarnet.edu.au/plus/index.php/s/iHz0aoTGBho3yu2?path=%2FLung% 2F4DLungTrajectories).Additionally to the patient traces, a sinusoidal trace with an amplitude of 20 mm and a period of 7.5 s was used to characterize the end-to-end latency of the system.
During the experiments, the SI component of the traces was used to rigidly move the target but for the "baseline shift" and the "dominant RL" traces, for which the lateral component was used.Independently of the component used, the target was always moved in SI direction.For each trace, motion was executed for about 2 min, however, during analysis, we did not use the first and last 30 s of data to exclude for instance buffering of the motion prediction models (i.e., the time needed until enough input positions are accumulated to start prediction) or the time for starting the radiation/EPID.The motion characteristics of the remaining 1 min of each trace, which was effectively used to assess the tracking accuracy, are shown in Table 1.While the name of the traces was the same as in Mueller et al., 27 the period is slightly different as a different subset of each trace has been used.Also the displacement of each trace is different as in our experiments we rescaled each trace to have a peak-to-peak amplitude over the entire trace of 30 mm to avoid very small MLC motion (arising from the large source-to-surface distance) and limitations of the EPID (spatial and temporal resolution).

Imaging and localization
The moving target was visualized in real-time using cine MRI.Sagittal 2D slices were acquired using a balanced   9 a modified MRI reconstruction pipeline was used to stream raw image frames from the reconstruction computer to the target localization computer, where the images were analyzed in real-time.Using in-house software, target centroid positions in SI direction were obtained for each cine MRI frame using a cross-correlation based template matching algorithm (Figure 2, left panel).The template target for the matching process was defined on an MRI acquired prior to the motion experiments.Using the User Datagram Protocol (UDP), the target centroid positions were sent to the motion prediction computer.To avoid data loss inherent to the UDP, redundancy was introduced by sending the positions at 100 Hz.

Motion prediction
The SI target centroid positions were pre-processed and used by a prediction model to obtain the future target centroid position in real-time, as described in the next sections.These tasks were implemented as different co-routines, such that they could run in an overlapping manner without blocking the main execution.

Data pre-processing
First, the data redundancy introduced by the UDP sending process was taken out by checking if the received target position differed from the previous one, thus recovering the original imaging frequency of 4 Hz.Then, target positions were accumulated until an input sequence of 8 s was formed (first buffering time).The input sequence had fixed length,that is,every time a new target position was available, the oldest target position in the input sequence was dropped and the new position was added.The input sequence was then smoothed using a moving average filter, to decrease the impact of noise arising from imaging/target localization on the prediction models.The moving average filter acted on a sliding window of three data points.To avoid boundary effects for the last two points of each sequence (the most recent and relevant for the prediction), we left the last point unchanged and set the penultimate point to the average between the unchanged penultimate point and the point one obtain with the moving average filter.Finally, each input sequence was normalized in the range from −1 to +1.The scaling factor for the normalization was temporarily saved for each sequence to later re-normalize the predictions.

Prediction models
Three respiratory motion prediction models previously compared in-silico 20 and a baseline no-predictor were implemented in this study and applied to the eight unseen motion traces: 1. Offline LSTM: this model had been previously trained and validated using motion traces extracted from 4 Hz cine MRIs of 70 patients treated with a 0.35 T MRI-linac (13.1 h of data). 20It was applied without any changes to hyper-parameters or weights to the unseen experiment traces and predicted the future target centroid position in 250 and 500 ms.Linear interpolation between these two points was used to obtain a prediction matching the end-to-end latency measured for the system.The interpolated prediction was then used for MLC-tracking.2. Offline+online LSTM: this model was based on the offline LSTM described above.However, in this case we loaded the weights obtained from the optimization with the cohort of 70 patients and additionally re-optimized based on recent motion during the experiments.This worked by accumulating 20 s of target positions (second buffering time), which were subdivided into sets of input and output sequences.These pairs of input and output were used to iteratively train the LSTM using the mean-square-error loss between the output and predicted sequences.Every time a new target position was available (i.e., every 250 ms), the set of input/output sequences was updated and a new training was started for 250 ms, which allowed the completion of about 10 epochs.
A more detailed explanation of the online optimization can be found in Lombardo et al. 20 As for the offline LSTM, the interpolated prediction was used for MLC-tracking.3. Online LR: similarly to the offline+online LSTM, this model was continuously updated during the experiments based on recent motion.In contrast to the LSTM, the LR does not require iterative optimization, as an analytical solution exists. 30For this reason, the online LR was solely based on the last 20 s of data and was solved from scratch on the updated set of sequences every 250 ms, that is, every time a new target position was available, as in the previous insilico study. 20Also for the LR, linear interpolation between the 250 and 500 ms predictions provided the target position which was used for MLC-tracking.4. No-predictor: to compare the three motion prediction models with a baseline without any prediction, we utilized the last available target centroid position for the subsequent MLC-tracking.
The LSTM models were run (and optimized) on an A5000 GPU with 24 GB of VRAM while the LR was run and solved using an Intel Xeon W-1250 CPU with 6 cores and 64 GB of RAM.A table showing the hyperparameters taken over from the previous in-silico study can be found in the supplementary Table S1.

MLC-tracking
In this study, a rigid single-target MLC-tracking software based on previous work was used. 31Prior to irradiation,a rectangular MLC aperture was loaded and aligned with the target center.During irradiation, the predicted target position was used to calculate a displacement vector with respect to the target's original position.This displacement vector was then used to calculate an ideal aperture update which in turn was used by a leaf fitting algorithm to calculate the closest matching deliverable MLC aperture, taking into account physical limitations of the MLC such as finite leaf speed.The updated leaf positions were sent at 20 Hz to the MLC controller which shifted the aperture to compensate for the observed motion in real-time, as in previous MLC-tracking studies. 8,32

Accuracy evaluation
To evaluate the geometric accuracy of MLC-tracking with different motion predictors, EPID images were acquired at 3.5 Hz during irradiation and then analyzed after the experiments following Liu et al. 9 Using in-house software, we first applied a low-pass filter to the EPID frames to reduce the noise introduced by the magnetic field of the MRI-linac and then leveraged template matching to extract the target centroid positions and the MLC aperture centroid positions for each EPID frame automatically (see Figure 2, right panel).The template for the aperture and target were defined on a selected EPID frame once.All positions were scaled taking demagnification from the EPID plane to the isocenter into account.The root-mean-square error (RMSE) between the target and the MLC aperture positions (representing the motion predictions) was used to assess the performance of the different motion prediction models.As mentioned in Section 2.2, the RMSE was computed for all prediction models on the same 1 min of each trace to enable a fair comparison.To find out whether there was a significant difference between the RMSEs obtained by the four models for the different motion traces, a non-parametric Friedman test was used. 33If the Friedman test was significant (p-value < 0.05), a post-hoc Nemenyi test 34 was used to infer which model performed significantly better than another in a pair-wise fashion.

Latency measurements
When repeating the MLC-tracking experiments using no-predictor with the sinusoidal trace three times, an average end-to-end latency of (389 ± 15) ms was obtained.We then computationally shifted the acquired aperture centroid curve by the calculated latency and obtained a baseline RMSE between aperture and target of (1.1 ± 0.1) mm.

Patient traces
Table 2 shows the RMSEs obtained for all prediction models and traces for each of the two measurements.MLC-tracking using the offline+online LSTM as motion predictor resulted in the best accuracy for all investigated motion traces.When calculating the mean and standard deviation of the RMSE over all motion traces, the offline LSTM led to (3.3 ± 1.0) mm, the offline+online LSTM to (2.8 ± 0.7) mm, the online LR to (3.3 ± 0.7) mm and the no-predictor to (4.5 ± 1.4) mm.The mean RMSE over all traces and for each trace (two measurements combined) for the different models are displayed as a bar plot in Figure 3.
Comparing the RMSE obtained from a measurement with its repetition revealed that the offline+online LSTM was also the most reproducible model with a deviation of up to 10%, compared to the offline LSTM with up to 14% and the online LR with up to 18%.Repeating the same trace using the no-predictor led to a maximum deviation of up to 6%, which can be considered the baseline.Differences between all models were significant according to the Friedman test (p-value = 2e-9).The post-hoc Nemenyi test showed that there was a significant difference between all models in a pair-wise comparison but for the offline LSTM and the online LR pair, as shown in Table 3.
Figure 4 shows the centroid positions of the MLC aperture and the target obtained with the analysis of the EPID frames for four different models and a selected motion trace.Qualitatively, it can be noticed that the offline LSTM was more robust to the irregularity present at the end of the shown trace while the  online LR overshot less during regular breathing.The offline+online LSTM seemed to combine the advantages of the two models while for the no-predictor the system latency was clearly visible.For the same trace, we show in the online supplementary materials an EPID video displaying the target and the MLC aperture driven by the no-predictor ('pass_through_ high_complexity_1.avi') or the offline+online LSTM ('online_lstm_smooth_high_complexity_1.avi').

DISCUSSION
The experiments performed in this study showed that accurate MLC-tracking using motion prediction is possible for a variety of different breathing patterns.We were able to successfully compensate for a measured end-toend system latency of (389 ± 15) ms, which is slightly increased compared to the latency of (328 ± 44) ms F I G U R E 4 MLC aperture centroid (blue) and target centroid (black) obtained from the EPID for a selected part of the "high complexity" motion trace for one of the two measurements.The MLC is moved according to one of the four motion prediction models.The same four models but for the other seven traces are shown in the online supplementary (mp16770-sup-0001-SuppMat.pdf ).EPID, electronic portal imaging device; MLC, multileaf collimator.
measured by Liu et al. on the same system, 9 but still within what is expected for MLC-tracking on MRI-linacs in other studies. 11,22The latency corrected baseline RMSE of (1.1 ± 0.1) mm is comparable with the RMSE of (1.2 ± 0.1) found by Liu et al. 9 and includes experimental limitations such as imaging resolution, MLC performance and EPID accuracy/analysis.The offline+online LSTM model was found to be the best motion predictor for all eight investigated motion traces, being significantly better than both its offline version, that is without continuous re-optimization, and an online LR.The offline LSTM and online LR performed similarly, with one model being better than the other depending on the motion trace.The no-predictor was significantly worse than all other models, confirming the value of motion prediction during MRIgRT with MLC-tracking.In general, absolute performance differences were also dependent on the motion trace.
For the "typical lung" trace, which is the one with the smallest IQR of motion (Table 1), we found a maximum RMSE difference between any two experiments with the offline+online LSTM and the offline LSTM/online LR/nopredictor of 0.3 mm/0.5 mm/1.3 mm (Table 2).On the other hand, for the "high frequency" trace, which is very regular but presents high amplitude and frequency, we found an RMSE difference between the offline+online LSTM and the offline LSTM/online LR/no-predictor of 1.3 mm/0.6 mm/3.4 mm.Based on this trace, we hypothesize that the online LR, which is solely based on the last 20 s of motion, performs particularly well if the motion is regular, compared to the offline LSTM which has been trained on data which presented on average a different frequency.The offline+online LSTM might have outperformed all other models because it combined training on a large set of different breathing patterns with being able to adapt to for example a patient-specific breathing frequency.We also investigated reproducibility of the models by repeating each experiment twice and found the online LR to be the least reproducible with a deviation of 18%.We hypothesize that this originates from the fact that this is the only model which solely relies on current data (no prior training on large datasets) and is therefore the most sensible to variations in the input centroids due to for example acquisition noise or imperfections of the template matching.
While this work investigated MLC-tracking with motion prediction using MRI-guidance, the proposed methods could also be used with x-ray guidance.The reduced soft-tissue contrast might require more advanced target localization algorithms than template matching in a markerless setting. 35As the motion prediction models are based on centroid positions, only small modifications would be needed such as training on other temporal resolutions to reflect the different imaging frequency and adjustment of the prediction horizon to smaller latencies.
The current study presents a few limitations.The motion phantom allows for 1D rigid shifts only,neglecting the fact that motion occurs in all three directions and that deformation or rotations can be observed. 36,37Assuming the cine MRI is acquired in a single 2D slice as in this study, MLC-tracking to compensate for in-plane displacement/deformation/rotation could be implemented in future studies by leveraging a deformable target localization algorithm, 38 a 2D contour prediction algorithm 21 and a deformable MLC-tracking algorithm. 39Beam's eye view 2D cine MRI with tumor-volume projection might be used to ensure better beam conformality. 40,41To fully compensate motion in all three directions, time-resolved volumetric MRIs would be needed, which are currently being investigated by several groups [42][43][44] and would lead to an increment in latency, which in turn increases the relevance of motion prediction.Another limitation consists in the fact that we had to re-scale the motion traces to 30 mm peak-to-peak amplitude due to limitations in the experimental setting, the original mean peak-to-peak amplitude over all traces being 16.4 mm (range 9.4-23.8mm).This means that the obtained RMSEs represent in absolute terms an overestimation of the error which would have been obtained with the original traces while all relative comparisons between the models hold true.Finally, the fact that the target in the phantom is visible with high contrast on the cine MRI facilitated its localization.However, imaging a real tumor would have affected the target localization and therefore all models in the same manner, so the results of our comparison should hold true.

CONCLUSIONS
In this study, we experimentally compared conventional and machine learning motion prediction models for MLC-tracking in SI direction based on 4 Hz cine MRI.We showed for eight patient-derived respiratory motion traces with different complexity that all models significantly improved the MLC-tracking performance compared to a baseline no-predictor.A continuously reoptimized LSTM model was found to perform the best for all motion traces, confirming the in-silico result that this model is an ideal candidate to mitigate the latency and therefore improve the accuracy of MLC-tracking during MRIgRT.

AC K N OW L E D G E M E N T S
The

C O N F L I C T O F I N T E R E S T S TAT E M E N T
The Department of Radiation Oncology of the University Hospital of LMU Munich has research agreements with Brainlab, Elekta and ViewRay.PJK is an inventor on US patents 7,469,035 and 8,971,489 that are related to MLC-tracking.Patent 7,469,035 is unlicensed; patent 8,971,489 is exclusively licensed to Asto CT.

F I G U R E 1
Experimental setup for MRI-guided MLC-tracking with motion prediction.A target inside a 1D motion phantom (photo on the top left) is moved following one of the eight patient respiratory motion traces.The moving target is imaged at 4 Hz using cine MRI.The target centroid position is then extracted with template matching and sent to the motion prediction model.The model outputs the future centroid position which is used to shift the MLC aperture in real-time.Finally, an EPID is used to characterize the geometric accuracy of MLC-tracking.EPID, electronic portal imaging device; MLC, multileaf collimator; MRI, Magnetic resonance imaging.TA B L E 1 Motion characteristics of the eight patient traces used for the experiments.

F I G U R E 2 (
Left) Cine MRI frame acquired during irradiation with real-time target localization (yellow) using template matching.(Right) EPID frame acquired during irradiation with post-irradiation template matching analysis to obtain the target centroid positions (black) and the MLC aperture centroid positions (blue) needed for the evaluation.EPID, electronic portal imaging device; MLC, multileaf collimator; MRI, Magnetic resonance imaging.

24734209, 2023, 11 , 3
Downloaded from https://aapm.onlinelibrary.wiley.com/doi/10.1002/mp.16770by Cochrane Germany, Wiley Online Library on [30/01/2024].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License TA B L E 2 MLC-tracking RMSE obtained when using different motion prediction models.Plot with the mean (bar) and standard deviation (error bar) of the RMSE obtained from the two measurements for the four models and eight motion traces used in this study.Additionally, the mean and standard deviation of the RMSE over all traces for each model is shown.RMSE, root-mean-square error.TA B L E 3 p-values obtained from the post-hoc Nemenyi test for all possible pairwise model comparisons.
27 described in the main text, both the inter-quartile-range (IQR) of displacement and the period can differ from Mueller et al.27due to re-scaling and usage of a subset of each trace.
Note:•.The slice thickness was 7 mm, the in-plane resolution 128 × 128 pixels and the field of view 300 mm, resulting in a voxel size of 2.34 × 2.34 × 7 mm 3 .Using these sequence parameters, an imaging frequency of 4 Hz was obtained.As inLiu et al., 24734209, 2023, 11, Downloaded from https://aapm.onlinelibrary.wiley.com/doi/10.1002/mp.16770by Cochrane Germany, Wiley Online Library on [30/01/2024].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License authors thank Magdalena Bazalova-Carter for her help during the measurements and staff at Image X Institute and Ingham Institute for their support.EL was funded by the German Research Foundation (DFG) within the Research Training Group GRK 2274.PZYL and DEJW acknowledge funding from Cancer Institute NSW fellowships.Open access funding enabled and organized by Projekt DEAL.