A convolutional neural network model for EPID‐based non‐transit dosimetry

Abstract Purpose To develop an alternative computational approach for EPID‐based non‐transit dosimetry using a convolutional neural network model. Method A U‐net followed by a non‐trainable layer named True Dose Modulation recovering the spatialized information was developed. The model was trained on 186 Intensity‐Modulated Radiation Therapy Step & Shot beams from 36 treatment plans of different tumor locations to convert grayscale portal images into planar absolute dose distributions. Input data were acquired from an amorphous‐Silicon Electronic Portal Image Device and a 6 MV X‐ray beam. Ground truths were computed from a conventional kernel‐based dose algorithm. The model was trained by a two‐step learning process and validated through a five‐fold cross‐validation procedure with sets of training and validation of 80% and 20%, respectively. A study regarding the dependance of the amount of training data was conducted. The performance of the model was evaluated from a quantitative analysis based the ϒ‐index, absolute and relative errors computed between the inferred dose distributions and ground truths for six square and 29 clinical beams from seven treatment plans. These results were also compared to those of an existing portal image‐to‐dose conversion algorithm. Results For the clinical beams, averages of ϒ‐index and ϒ‐passing rate (2%‐2mm > 10% Dmax) of 0.24 (±0.04) and 99.29 (±0.70)% were obtained. For the same metrics and criteria, averages of 0.31 (±0.16) and 98.83 (±2.40)% were obtained with the six square beams. Overall, the developed model performed better than the existing analytical method. The study also showed that sufficient model accuracy can be achieved with the amount of training samples used. Conclusion A deep learning‐based model was developed to convert portal images into absolute dose distributions. The accuracy obtained shows that this method has great potential for EPID‐based non‐transit dosimetry.

Volumetric Modulated Arc Therapy adds a degree of modulation through dose rate variation and continuous gantry rotation. To meet regulatory requirements for quality and safety in clinical routine, these techniques require additional attention. 1 As part of patient-specific Quality Assurance (QA), pre-treatment verification guarantees that the beam fluence initially planned by the Treatment Planning System (TPS) can be correctly delivered by the linear accelerator (LINAC). More specifically, the objective of this check is to detect, prior to each first patient treatment fraction, possible data integrity issues, beam output variations or mechanical errors of collimation elements. 2 The amorphous-Silicon (a-Si) Electronic Portal Image Device (EPID) is a planar digital detector taking into account the beam fluence. The EPID has a good reproducibility,a high signal-to-noise ratio,provides highresolution 2D digital images and its response is considered linear with the dose. 3,4 These different qualities make it an ideal tool for QA purposes such as nontransit dosimetry, that is, acquisitions without attenuator between the radiation source and the imager. However, its use for pre-treatment verification requires dosimetric bias correction and absolute dose calibration. 5 As a result, two approaches for non-transit dosimetry exploiting the assets offered by EPID have emerged. First, the back-projection method which involves extracting the primary fluence from the raw EPID signal to perform a TPS-like dose calculation in the patient's planned computed tomography. 6,7 The resulting dose distribution is compared to the planned dose distribution using the gamma-index (γ-index) metric. 8 Second, the direct method displayed in Figure 1, which is based on the comparison between the measured Portal Dose distribution (mPD) and the predicted Portal Dose distribution (pPD). 9 The pPD is simulated in a virtual water phantom at a given depth from the data of the DICOM RT plan file, while the mPD is computed from the measured Portal Image (mPI) with an EPID grayscale to dose-to-water conversion algorithm. For this purpose, models based on measurements 10,11 or kernels [12][13][14][15] have been proposed. The latter offer sufficient accuracy for delivery error detection 9,16 despite the approximation of physics modeling required for their commissioning.
Recently, the rise of Machine Learning (ML) has enabled the development of many applications for which it was previously difficult to find reliable solutions with current parametric models. 17 In the field of radiotherapy, the contribution of ML to QA is major. For instance, ML was experimented for the automation of QA processes [18][19][20][21] and a new approach to current analytical algorithms to compute dose distributions for EPID-based non-transit dosimetry. 22 Regarding the latter, three studies experimented with the use of artificial neural networks (ANNs) to convert mPIs into planar mPDs for pre-treatment verification of IMRT beams. Khalantzis et al. 23 proposed the extraction of two clusters in the fluence domain using a K-means algorithm to train two MultiLayer Perceptrons (MLPs) dedicated to high and low dose regions, respectively. Mahdavi et al. 24 developed a single MLP to compute mPDs from the raw EPID signal. Chatrie et al. 25 extended a similar approach to other EPID models and tumor sites. The results obtained by these previous studies were encouraging. However, the use of MLPs for image regression tasks has drawbacks. For instance, if entire images are used as input, each input neuron is dedicated to a single pixel, resulting in excessive memory usage. Patches can be used but an assumption about the appropriate size is made. In addition, this inherently restricts the receptive field of the model. Finally, it was shown that the densely-connected layer connections are redundant which complicates learning for image processing tasks. 26 With significant success in the field of Deep Learning (DL), Convolutional Neural Networks (CNNs) are a type of ANNs specifically designed to process data in the form of sequences or images. 26 In particular, the Unet originally proposed by Ronnerberg et al. in 2015, 27 has become the reference CNN architecture for image regression applications. More specifically, this model is a fully-convolution network composed of an encoder, a bottleneck and a decoder forming a single structure. In the field of medical physics, the U-net has been tested for dose and dose rate computation, [28][29][30][31][32] denoising of CT images, 33 correction of mPIs acquired from MR-LINAC 34 and conversion of MR signal to density matrix. 35 Compared to MLP, the use of U-net for the mPI-to-mPD conversion should be more suitable. Indeed, the use of convolutional layers and dimensionality reduction provide more efficient features extraction while optimizing the number of trainable parameters. 26 This reduces memory usage and increases learning efficiency. In addition, compared to MLP neurons, the convolution kernels slide through features maps, enabling CNNs to process an entire image of any size in a single forward pass. However, the use of CNNs can result in insufficient encoding of spatialized information. 36 This can be an obstacle for mPI-to-mPD conversion where pixel-wise transformations, such as beam profile restoration, 37 are required.
In this study, we investigate the use of CNNs for the conversion of grayscale mPIs to absolute mPDs for pre-treatment verification of IMRT Step & Shoot (S&S) beams of various tumor locations. The proposed model is based on an adapted U-net architecture. A set of preprocessed mPIs were directly used as input data and the reference pPDs computed by a conventional kernel-based dose algorithm were used as output data. A non-trainable layer called True Dose Modulation (TDM) combined with a two-step learning process were also introduced to efficiently recover the spatialized information.
In the first section of this paper, the equipment used, the data acquisition, and the database distribution are described. The True Dose Modulation layer, the proposed U-net architecture, the training process as well as the model evaluation are also presented. In the second section, results on elementary and clinical cases are described, compared to existing methods, and discussed.

Database
All mPIs were acquired with the synergy LINAC (Elekta, Stockholm, Sweden) equipped of an Agility MLC of 80 leaf -pairs with a 6 MV X-ray beam and a nominal dose rate of 400 MU/min. The a-Si flat panel EPID iView GT (Perkin Elmer Optoelectronics, Wiesbaden, Germany) was used for mPI acquisitions in integrated mode. This EPID model is positioned at a source-imager distance of 1600 mm and is provided with a 1024 × 1024 pixel array equivalent to 24.5 × 24.5 cm 2 active area at sourceaxis distance (SAD), yielding a pixel pitch of 0.24 mm. In this study, the EPID was centered on the beam axis. All acquisitions were achieved within 1-month interval and without additional build-up material. Acquired mPIs are 16-bits grayscale encoded and stored with DarkField and FloodField corrections. The reference pPDs correspond to the 2D absolute dose distributions located at SAD, at 50 mm depth in a virtual water phantom. The computation of pPDs was performed by the prediction algorithm of the EPIbeam system (version 1.05, DOSIsoft, Cachan, France). This software uses data from the DICOM RT plan file, and a kernel-based dose engine parametrized from dose distributions obtained by Collapsed Cone Convolution with the RayStation TPS (version 1.08, RaySearch Laboratories, Stockholm, Sweden). To compare the developed model with existing methods, the EPIbeam system was also used to compute mPDs from its EPID grayscale-to-dose conversion algorithm. The latter is parametrized from the prediction algorithm used for the pPD computations.
The data reliability was ensured by a preliminary verification. For this purpose, each sample was visually validated by a medical physicist and quantitatively assessed through the computation of the global ϒ-index between pPDs and mPDs with 2%-2mm criteria and a minimum threshold at 10% of the maximum reference dose (2%-2 mm > 10% D max ). In this manner, the absence of beam output variations or mechanical errors of collimation elements was guaranteed. For the entire database, obtained ϒ-passing rates were greater than 95%.
In this study, a training database consisting of 186 samples (mPI-pPD pairs) from 36 IMRT S&S treatment plans was used for a cross-validation procedure (see Section 2.4). Twenty nine IMRT S&S beams from seven treatment plans, excluded from the training set, and six square beams exposed to 100 MU with 15 mm, 50 mm, 80 mm, 100 mm, 150 mm, and 200 mm side were used for the evaluation of the selected model. More details about the data used in this study are available in the Tables 2 and 3 in the Appendix A.
Prior to the learning phase, preprocessing of the training data was performed. First, to ensure consistency between input and output data, the mPIs and pPDs were cropped with 32 pixels on each side and extrapolated by nearest neighbors. This edge correction is applied because an erosion is made by the analytical prediction model on pPDs for beams protruding from the EPID surface (see Figure 2). All data were also rescaled to ensure model convergence and better performance. The input and output normalization factors were,respectively, defined as the central pixel value of the mPI and pPD of the 80 mm square beam. During inferences, the output normalization factor (Gy) was also used for absolute dose calibration of inferred mPDs.

True dose modulation layer
Although CNNs are known for their spatial invariance, 38 it was shown that convolutional layers are able to encode spatialized information by exploiting image boundaries, in particular using large receptive fields and zero-padding. 39,40 However, when tasks explicitly depend on the absolute position of pixels, the recovery of spatialized information may not be sufficiently accurate. 36 For mPI-to-mPD conversion of flattened beams, the spatialized information relates to the off -axis dose modulation due to both the incident fluence modulation, mainly caused by the flattening filter, and the difference in spectral energy response between water and a-Si of the EPID. 37 In addition, the Flood Field correction applied on the mPIs flattens the EPID signal. The intrinsic off -axis modulation of the incident fluence is therefore completely erased from the input data.
To address this in the present deep-learning approach, different learning-based solutions were explored such as densely connected, 2D locally connected and 2D CoordConv layer proposed by Liu et al. 36 However, these layers must be optimized in a time-consuming learning process, and they did not provide sufficient model convergence. A simpler and more modular solution was chosen. The method involves introducing a 2D non-trainable layer in the model, called True Dose Modulation (TDM). This layer is computed outside the main learning phase from results of the primary trained CNN and specific reference data (see Section 2). The TDM is then added to the model as the last layer (see Figure 3). Thus, it contributes to the calculation of the Mean Squared Error (MSE) cost function to adjust CNN parameters through a fine-tuning process.

Deep learning model
The proposed model is a U-net followed by the TDM layer presented in the previous section. The overall model architecture is shown in Figure 3. In this work, hyperparameters such as number of kernels and F I G U R E 3 Model architecture containing the U-net followed by the True Dose Modulation layer.
convolutional layers as well as the model depth and loss function were manually tuned with a grid search procedure. The proposed U-net has a depth of four maxpooling layers with a stride of 2 × 2, where each block consists of two convolutional layers (Conv) with an identical number of kernels followed by a Rectifier Linear Unit (ReLU) activation function. After passing through the bottleneck, the signal is reconstructed by four upsampling layers with a stride of 2 × 2. The last decoder's block consists of three convolutional layers where the last one has a single kernel of shape 1 × 1 and no activation function. The purpose of this subsidiary layer is to perform a point-wise convolution to recover the number of output channels, which is equal to one for the dose values. Skip-connections were also added to facilitate signal reconstruction in the decoding-path. 41 These connections consist in concatenating the features maps of equal spatial dimensions coming out of encoder blocks to those going into decoder blocks. The kernel number of convolutional layers begins at eight and is doubled in each of the new blocks constituting the encoding-path. In the same manner, this number is successively divided by two in the decoding-path. Finally, the TDM layer is connected to the U-net output through an Hadamard product to characterize the contribution of each pixel. This array has a resolution of 1024 × 1024 pixel and a single channel to keep the dimensions of output data. In this study, no dropout, batch normalization layer and cost function regularization term were used.

Training
The model was developed via a cross-validation procedure. Five U-nets with identical hyperparameters were randomly initialized and successively optimized on different combinations of training, validation, and test sets (see Figure 4). A proportion of 80% training set (119 samples) and 20% validation set (30 samples) was chosen. Each cross-validation learning process was conducted in two stages. First, primary U-nets were trained without TDM layer. Then, each TDM was computed as the ratio of the pPD of the 260 mm square beam to the corresponding inferred mPDs and added to the U-net output as described in Figure 3. Finally, a fine-tuning of all model parameters was performed on the same dataset. Once the five models were trained, the one that provided the lowest average ϒ-index (2%-2 mm > 10% D max ) on its test dataset (37 samples) was selected for the final evaluation with the 29 clinical and six square control beams. The Adam algorithm was used as the optimizer to minimize the MSE. Its parameters were set to 0.9, 0.999 and 10 −6 for β1, β2 and ε, respectively. Adam optimizer is based on the stochastic gradient descent algorithm; it combines an adaptive learning rate and a second momentum. 42 For the first and second training phases, the maximum learning rate was initially set to 10 −3 and 10 −4 , respectively. If the validation loss did not decrease during four epochs, the learning rate was reduced by a factor of 0.8. Furthermore, to avoid overfitting and F I G U R E 4 Data splitting steps.
interrupt the training at the most appropriate time, an early stopping callback based on the validation loss and a delta set to five epochs was used. Each trained model was then saved with its best state, that is, the one providing the best performance in terms of loss on the validation data. In addition, a custom callback was also implemented to compute the average ϒ-index and the average ϒ-passing rate at each five epochs step. Conditioned by technical limitations, an amount of four samples was chosen for the training and validation batches. Weights and biases were initialized with Glorot uniform 43 and zeros, respectively.
All processes including preprocessing, model architecture, trainings, and quantitative analysis were implemented in Python (version 3.8.10) with Tensorflow (version 2.0.1) and Keras (version 2.1.0) as backend. Each training was performed using a single job of the Vertex-AI API from the Google Cloud Computing with an NVIDIA Tesla P4 GPU and 16 GB of CPU memory. The inferences were performed using no GPU and an Intel Xeon CPU with 12 cores clocked to 3.5 GHz.

2.5
Model evaluation

Amount of training data
A study was conducted to assess the performance of the proposed model based on the amount of training data provided. For this purpose, the training method described in the previous section was repeated ten times while keeping the model architecture (see Sections 3 and 4) but increasing the amount of training data from 18 to 186 samples (18,37,56,74,93,112,130,149, 168 and 186 samples). Note that, hyperparameters were preliminarily optimized for the maximum amount of data available, that is, 186 training samples. At each fold of the cross-validation of each dataset, the samples were randomly selected and the proportion of 80%/20% between training and validation data was kept. Then, a U-net was trained to convert mPIs into mPDs with the two-step training method. The γ-index statistics (2%-2 mm > 10% D max ) were computed between the mPDs and pPDs with the 29 clinical control beams (see Section 2.1). The statistics obtained for each dataset aggregate the results from the five U-nets trained on that same set.

Training
For the 186 training samples set, the cross-validation boxplots of the averages γ-index (2%-2 mm > 10% D max ) and γ-passing rate were plotted to determine which model performed best. Each box plot aggregates results of a single cross-validation model with its test set.
The convergence of the retained model over epochs was assessed by analyzing its learning phase. The records contained the training and validation losses, the learning rate, and the averages of the γ-index and γpassing rate computed between mPDs and pPDs of all clinical control beams.

TDM and square beams
To qualitatively assess the contribution of the TDM layer and the performance of the selected model on elementary cases, an analysis of dose profiles was performed on three square beams of 15 mm, 100 mm, and 200 mm side. For each beam, the left-right profiles centered to the beam axis of the absolute mPDs computed by the models with (U-net TDM ) and without TDM layer (U-net) were compared to those of the corresponding reference pPDs. For didactic purpose, the left-right profiles of grayscale mPIs and that of the TDM layer were also plotted.

Model performance
The overall performance of the model was assessed through a quantitative analysis based on the six square beams and the 29 IMRT S&S control beams (see Section 2.1). The statistics of ϒ-index (2%-2 mm > 10% D max ), ϒ-passing rate, and local absolute and global relative errors were computed between the pPDs and the mPDs of the U-net, U-net TDM and the analytical conver-sion algorithm (Analytic). To facilitate this analysis, box plot of each model and metric with the clinical control beams was plotted.
Using the IMRT S&S control beams set, a visual analysis of the results obtained with six different tumor locations was performed. For this purpose, the pPDs and mPDs inferred by the U-net TDM as well as their respective left-right profiles centered on the beam axis were plotted. The ϒ-index map, the averages of ϒ-index and ϒ-passing rate were also computed.

Amount of training data
The Figure

Training
The results of the five cross-validation U-nets trained from the set of 186 samples are shown in Figure 6. As a reminder, the learnings are decorrelated from each other. The training and test data sets per fold were randomly constructed while keeping the same number of samples to cover the entire database. For all cases, the average γ-index was less than 0.29 (±0.08). The minimum average γ-index was 0.24 (±0.04) and the maximum average γ-passing rate was 99.28 (±0.84)%. These results were obtained by the model associated with the 5 th fold; it was thus retained for the continuation of the study. Note that the interquartile ranges of the γpassing rate were between 0.47% (fold 5) and 4.35% (fold 1). These dispersions highlight the importance of using the cross-validation procedure to determine which model performs best. The learning rate, training and validation losses, γindex, and γ-passing rate over epochs of the selected model are shown in Figure 7. Based on the validation loss, the first learning phase was interrupted at the 40 th epoch. At this time, the model has provided results of 0.30 (±0.17) and 98.99 (±0.82)% for the averages of the γ-index and the γ-passing rate, respectively. Due to the addition of the TDM layer to the U-net output, a slight increase in losses was observed. This performance degradation was subsequently reduced by the fine-tuning process.
Since the lowest validation loss was obtained by the 59 th epoch model state, it was selected for the remainder of this study. This choice was also reinforced by the degradation of all metrics starting at the 60 th epoch. This model state provided averages of 0.28 (±0.19) and 99.09 (±0.75)% for the γ-index and the γ-passing rate, respectively. Given the technical specifications described in Section 1.D, the entire training lasted approximately 35 min and the fine-tuning was completed in less than 15 min.

TDM and square beams
The Figure

Model performance
The Table 1 gathers the statistics obtained by the U-net, U-net TDM and analytical conversion model (Analytic) on the whole control database. Globally, for the 29 IMRT S&S beams, a very good agreement between pPDs and mPDs of all models is observed. Indeed, with the restrictive criterion used (2%-2 mm > 10% D max ), a maximum average γ-index less than 0.34 (±0.06) and a minimum average γ-passing rate greater than 98.02 (±1.23)% were obtained. Maximum averages of 0.43 × 10 −2 (±0.13 × 10 −2 ) Gy and 0.58 (±0.13)% were obtained for the absolute and relative dose errors, respectively. These results were obtained by the analytical model and those of the two U-nets were systematically better. For instance, differences of 0.10 (p < 0.001) and 1.27% (p < 0.001) are observed in favor to the U-net TDM for the average γ-index and average γ-passing rate, respectively. For the average absolute and relative dose errors,the differences are approximately 0.16 × 10 −2 Gy (p < 0.001) and 0.2% (p < 0.001),respectively.This trend is also visible on the box plots of the models obtained with the clinical beams in Figure 9. For all metrics, the averages and medians are in favor to the U-net TDM and the interquartile ranges are lower except for the relative error. About the square beams, the performance of the U-net TDM remains superior to that of the analytical model. Overall, a decrease in the performance of the three models is observed with the square beams compared to the clinical beams.
Regarding the contribution of the TDM layer, we note that adding it to the U-net structure provides better results. Indeed, for all metrics and data types, results of the Unet TDM were better compared to the U-net alone. With clinical control beams, differences of 0.04 TA B L E 1 Statistical results between pPDs and mPDs of U-net, U-net TDM and analytical model for each control dataset type  (p < 0.001) and 0.6% (p < 0.001) were obtained in favor of the Unet TDM for the average γ-index and average γpassing rate, respectively. This is corroborated on all box plots in Figure 9 where medians and interquartile ranges are in favor to the Unet TDM . The benefit of the TDM layer was also reinforced with the six square beams with an increase of approximately 6% (p-value non-significant) of the average γ-passing rate and a decreasing of 0.05 of the γ-index (p-value non-significant) by the Unet TDM . These discrepancies between both U-nets are even more significant on the minimum averages of the different metrics. For instance, a difference of approximately 19 % was obtained between the minimum averages of the γ-passing rate of both U-nets. The Figure 10 illustrates an overview of results obtained by the U-net TDM with six beams from the clinical control dataset. On all profiles, very good agreement between mPDs and pPDs was obtained. This is corroborated by the analysis of the γ-index maps, where most pixels obtained a value below 0.75. For all cases, the average γ-index was less than 0.28 (±0.20) and the γ-passing rate was greater than 99.25%.

DISCUSSION
In previous studies, MLPs were trained to convert mPI patches into mPD pixels from a set of predefined input features. [23][24][25] In this study, a U-net was developed to convert, in a single forward pass, entire grayscale mPIs into absolute mPDs of IMRT S&S beams of various tumor sites. The benefit of the TDM layer was directly visible on the analysis of the dose profiles illustrated in Figure 8. Indeed, for all cases, the horn effect was well recovered by the U-net TDM compared to the U-net alone. The loss record illustrated in Figure 7 demonstrated the need to fine-tune the model after the TDM addition to increase its accuracy. The benefits of the TDM layer on the overall performance was also reinforced by the quantitative analysis. For the 29 IMRT S&S control beams, a slight increase in the performance of the U-net TDM was observed. Regarding the square beams, this trend was even more significant where an increase of approximately 6% of the average γ-passing rate was obtained. Overall, we note that the use of the TDM layer in combination with the fine-tuning improves the results on all tested data types. An interesting point to note is that the TDM profile in Figure 8 has similar shape and amplitude to those expected for flattened beams. Thus, this layer appears to be equivalent to the Off -Axis-Ratio correction usually encountered in analytical models. 44 As expected, the TDM recovers the spatialized information that could not be encoded by the pretrained U-net.
In clinical routine, tolerance limits of the γ-passing rate are typically set at 95% with the criteria of 3%-2 mm > 10% D max 1 . In this study, the restrictive criteria of 2%-2 mm > 10% D max were used, and the γ-passing rates obtained by the U-net TDM with the 29 clinical control beams were systematically greater than 97.4%. Specifically, with respect to the penumbra width, very good agreement between the mPDs of U-nets and pPDs in high and low dose regions was observed. It should be noted that slightly worse results were obtained with the square beams comparatively to the IMRT S&S beams set. This can be explained by the fact that U-nets were exclusively trained on clinical beams. In addition, the distance to agreement criterion of the γindex tends to improve the results for more modulated dose distributions. 45 In this study, several assumptions were made. First, the pPDs computed by the prediction algorithm were used as reference model outputs. This solution was preferred over the TPS DICOM RT dose files because it facilitates the data collection while having a sufficient accuracy. Second, we assumed that training the U-nets with the computed pPDs rather than the analytical mPDs avoids any additional bias related to the calculations of the analytical conversion algorithm. This choice is reinforced by the fact that the latter is, like the U-net, parametrized from a set of mPIs and pPDs computed by the prediction algorithm. Furthermore, although the mPIs necessarily incorporate the fluctuations inherent in normal equipment operation, an effort was made to rigorously ensure the absence of delivery errors in the treatment plans used. In this way, we hope to eliminate any bias in the dataset and ensure achievable accuracy by the U-net enabling to detect clinically relevant errors. It is important to note that since the U-net was trained on TPS-based reference data, it is not intended for TPS modeling validation. Any systematic errors in TPS modeling will necessarily be incorporated into the model learning and,consequently,into the model itself.In this way, the part of the patient QA process is limited to detecting LINAC delivery errors relative to the expected beams. Finally, in contrast to previous studies, [23][24][25] it was assumed that the tumor sites DICOM tag should not be made as a specific learning feature. Therefore, no distinction was made in the construction of the datasets.
The final model was trained with a set of 186 samples. This amount of training data may seem small compared to those typically found in DL applications. However, the results of the study described in Section 3. have shown that sufficient model accuracy can be achieved with approximately five times fewer samples than those used for the final model. As presented in Figure 5, Unets trained with the dataset of 37 samples provided a maximum average γ-index of 0.46 on the clinical control beams, and the results for datasets with a larger number of samples were roughly equivalent. Although these results are satisfying, they were obtained with only five models per dataset. Therefore, it could be interesting to increase the number of trained models. This study could also be extended to larger datasets to determine if higher accuracy can be achieved.
Overall, better agreement was obtained with the mPDs of U-nets than those of the analytical conversion model. This is corroborated by the quantitative analysis and the Figure 9 where the U-net TMD provides the best results for all metrics and all tested data types. This gain in accuracy of the U-net TMD over the analytical model show that it could be more sensitive in detecting delivery errors.A long-term clinical study could be interesting to quantify this sensitivity. Computation time measurements on suitable hardware could also determine whether DL methods are an interesting alternative to existing methods for EPID-based non-transit dosimetry.
However, although the results show a gain in accuracy of U-net, the objective of this study is not to replace the existing methods but to extend previous studies using MLPs. In this work, we investigated the feasibility of using the popular U-net for the mPI-to-mPD conversion and provided a method to better recover the spatialized information. More prospectively, this work was a first step towards the development of a DL model for EPID-based dosimetry in transit conditions. Indeed, the use of EPID for in vivo dosimetry is of major interest for patient-specific QA. 46 As radiotherapy requires increasingly precise and rapid QA control systems, the achievable accuracy and possible computational speed of DL models 47 make them potential candidates for the development of an efficient real-time tool for in vivo dosimetry.
As a reminder, the present study was limited to IMRT S&S beams obtained from one linac model, one a-Si-EPID model and one energy. Since the EPID is energy dependent 48 and there is an inherent variability in response between imagers, it appears that the operating range of the proposed U-net is limited by the energy, fluence mode, and linac and EPID models used for its training. Regarding treatment technics, it can be assumed that the model should provide acceptable performance since the steps to convert mPIs to mPDs remain the same. In this sense, it might be interesting to analyze the results with other treatment technics and to extend the study to other beams (energy, fluence mode and equipment).
In this work, the feasibility of using DL models for EPID-based dosimetry in non-transit conditions was shown. The results show an accuracy of the proposed model at least equivalent to existing methods.

CONCLUSIONS
A deep learning-based model was developed to convert portal images into absolute dose distributions of IMRT S&S beams. The method consists of optimizing a Unet followed by the True Dose Modulation layer using a two-step learning process. With this architecture, the model can learn the global features and recover the intrinsic dose modulation of flattens beams. The accuracy obtained shows that this method has great potential for EPID-based non-transit dosimetry.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are available on request from the corresponding author. The data are not publicly available due to privacy and ethical restrictions.