Denoising and uncertainty estimation in parameter mapping with approximate Bayesian deep image priors

To mitigate the problem of noisy parameter maps with high uncertainties by casting parameter mapping as a denoising task based on Deep Image Priors.


INTRODUCTION
In MRI, the image intensity depends on different tissue parameters of the imaged subject.The degree of dependence, and thus also the image contrast, can be altered to best suit the purpose of the imaging by selecting different scanner-parameter such as pulse repetition time (TR) and echo time (TE).
Quantitative MRI (QMRI) is a subset of MRI that involves quantifying these tissue parameters. 1 QMRI methods require a series of MRI images from which one isolates the tissue parameters using modelled relationships between the tissue parameters and the MRI signal.This process is often referred to as parameter mapping since the resulting tissue parameters are often presented as a two-dimensional heat map, displaying both the parameters' magnitude and spatial position.
There are several applications where tissue parameters provide helpful information.For instance, in oncology, apparent diffusion coefficient (ADC) estimation for prostate cancer staging, 2 and T1 mapping in dynamic contrast-enhanced-MRI for invasive breast cancer detection. 3In addition to T1 mapping, T2 mapping is also of interest in prostate cancer due to the provided ability to discriminate between cancer tissue, normal gland tissue, and prostate gland enlargement. 4he parameter map estimation is a postimaging procedure.Conventionally, the tissue parameters are obtained by least squares regression of the MRI magnitude signal at different user-defined scanner settings.Signal noise (e.g., from receiver coil readouts) is especially problematic in this process, which propagates to produce noisy parameter maps with high uncertainties in the parameter values.Therefore, there is great interest in improving QMRI by incorporating spatial denoising in the parameter estimation process.To this day, there are several published methods to achieve this.A common approach is implementing a Bayesian framework to denoise the parameter reconstruction with a spatial prior distribution.Examples are relaxation times estimation with Markov Random Fields, 5 coherent region smoothness, 6,7 and total variation priors. 8,99][20] More modern data-driven deep learning approaches include mapping relaxation parameters with residual networks, 21,22 frameworks with physical model constraints, [23][24][25][26] supervised 27 and unsupervised 28 intravoxel incoherent motion estimation and models that address uncertainty estimation in dynamic contrast-enhanced-MRI 29 and ADC mapping. 30o some degree, any method that incorporates denoising will rely on specific prior knowledge rather than the data.Developing a framework thus involves both the selection/formulation of denoising prior and to what extent it should be trusted.The level of trust is usually selected by tuning model hyperparameters or regularization parameters, for example, size and distance of patches in nonlocal mean denoising and weight selection in total variation denoising.
In contrast to formulating an explicit image prior, Ulyanov et al. 31 showed that denoising and other standard inverse problems could be addressed by Deep Image Priors (DIP), that is, by treating the structure of image generator network as an implicit image prior.DIP has been shown to correspond to a limiting case of a stationary Gaussian process. 32DIP introduces an alternative approach in-between explicitly formulated priors and data-driven machine learning algorithms.Low-level image features, for example, noise, are captured from untrained networks without prior training.
DIP for parameter mapping is interesting for several reasons, the most prominent of which are reports of successful results in other medical imaging fields, including PET reconstructions, 33 DWI denoising 34 and dynamic image reconstruction. 35However, parameter mapping is, to our knowledge, unexplored.Further, the absence of prior training in DIP makes it directly available in parameter mapping without data collection for network training.Collecting network training data for parameter mapping is tedious since network training data must accurately cover all applications and use cases, including method-specific signal equations, scanner settings, and different anatomical regions, which are rarely available.Also, the absence of prior training prevents training-induced anatomical artifacts from entering the results, a common issue among network approaches.DIP is easy to customize for different applications since most of the framework is application-independent.Application specifics, that is, signal equations in parameter mapping, are easily handled so that a single generator network applies to several applications.
In its original implementation, DIP is sensitive to the values of hyperparameters, mainly the number of optimization steps, to achieve significant noise reduction.Also, the method does not present any information on model uncertainty.Gal et al. 36 presented a solution to this by using dropout 37 as a Bayesian approximation, later extended for convolutional neural networks as approximate Bernoulli variational inference with dropout network training. 38This approximate Bayesian implementation, commonly called Monte Carlo (MC) dropout, reduces the Bayesian implementation to simply performing dropout after every convolution layer when training a deterministic neural network.MC dropout is highly interesting in parameter mapping mainly due to its inherent robustness to overfitting and the added possibility of investigating model uncertainty.The combination of DIP and MC dropout has been addressed in magnitude image denoising 39 but has to our knowledge, not been applied in parameter mapping.This work addresses noisy parameter maps with high uncertainties in QMRI with the combination of DIP denoising and approximate Bernoulli variational inference with MC dropout.More formally, the purpose of this work was to properties, accuracy in the uncertainty estimate, computational requirements, and ease of use.

Statistical model
Let  ∈ R P×V denote the collection of P tissue parameter maps with V voxels each.Parameter mapping, that is, the estimation of , is based on the collection of an MRI dataset y ∈ R M×V with M different scanner settings in the acquisition.The elements of y, denoted y m,v , are modeled as ∀m ∈ {1, … , M} and ∀v ∈ {1, … , V}, where s m ( * ,v ) ∈ R is the MRI signal equation,  * ,v ∈ R P is the collection of all, denoted by index * 1 , tissue parameters in the vth voxel, and  m,v ∈ R describes noise in the measured signal.The noise in Equation (1) follows the Rice distribution for magnitude images but can be approximated as Gaussian at high signal-to-noise ratio. 40By assuming that the model targets are independent and follow a Gaussian distribution, the data likelihood becomes 1 Throughout this paper, we will denote slices of high-dimensional objects with index * .For example, for the tissue parameters in  ∈ R P×V , we denote all tissue parameters in the vth voxels  * ,v ∈ R P , and the entire map of the pth tissue parameter as  p, * ∈ R V .Voxels are, for simplicity, indexed with a linear index v ∈ {1, 2, … , V} in all objects.
where  2 is the noise variance.
Maximizing the likelihood in Equation ( 2) coincides with solving a nonlinear least squares (NLLS) regression due to the assumption of Gaussian distributed noise.The signal equation is application-specific, for example, spoiled gradient echo (SPGR) for variable flip-angle (VFA) T1 mapping 41 and multi echo-time spin-echo (SE) for T2 mapping. 42n this work, we developed an improved parameter mapping method based on DIP image denoising. 31DIP performs image denoising by interpreting the output of a randomly initialized image generator as a parametrization of an image.The image generator is a neural network, defined as a function, f  , that maps a randomized input, z, to an image, f  (z), of the same spatial shape.This method achieves denoising with the implicit bias of the generator instead of with an explicit prior defined in the space of images.We extend this concept into the domain of QMRI by instead interpreting the network output as a parametrization of the tissue parameters in parameter mapping applications.Given a QMRI magnitude signal dataset, y, acquired with MRI signal, s, we define f  to be a parameter map generator and obtain denoised tissue parameter maps β ∈ R P×V by solving the optimization where E is the mean squared error (MSE) loss function that compares the network output, f  (z) ∈ R P×V , with the signal dataset, y.This optimization problem is a two-step process in which the minimized weights, θ is found by minimizing the loss function, and the network outputs the final parameter map estimate with θ applied.Our main contribution is reinterpreting the neural network output for parameter mapping tasks, which involves mapping the output through the signal equation to estimate the network weights.To maximize the likelihood in Equation ( 2) is equivalent to minimizing its negative log-likelihood; this gives us the following MSE loss for DIP-enhanced parameter mapping Note that to achieve denoising, the nonlinear optimization program in Equation (3a) needs to be terminated after some number of optimization steps to avoid the collapse of β to the maximum likelihood estimate.
We complement this denoising process with model uncertainty estimation.The image generator, f  , is reinterpreted as a BNN with Bernoulli approximate variational inference implemented using dropout after every convolution layer of f  . 38The dropout layers randomly set elements of their inputs to zero, with probability p, that is, they add Bernoulli distributed multiplicative noise to the convolution layer outputs.The posterior in this model is approximated, and the posterior prediction is approximated by MC integration of N s concurrent forward passes with dropout.With this approximation applied, the parameter estimation in Equation (3a) is changed to where f  is redefined with the additional input , which denotes the collection of Bernoulli random variables required to implement dropout.The estimated parameter maps are now complemented with maps of estimated model uncertainty,  est ∈ R P×V , with elements ∀p ∈ {1, … , P} and ∀v ∈ {1, … , V}, which is the standard deviation over the N s concurrent forward passes with dropout.

Implementation
The proposed framework was implemented in Python (Python Software Foundation, https://www.python.org/)with the deep learning framework PyTorch2 .Our code 3 , is based on the original DIP example code and Laves et al.'s implementation of MC dropout. 39The process (Algorithm 1) outputs denoised tissue parameter maps, β, with estimated model uncertainty,  est , from a parameter mapping dataset y.The process starts by optimizing the generator with the Adam stochastic gradient algorithm 43

Signal equation layers
We implement the signal equation as application-specific layers in the generator framework.Each application will require the network output to be mapped through the signal equation to compute the MSE loss in each optimization step; see Section 2.2.We have implemented parameter mapping for T1, T2, and ADC, based on the VFA SPGR, multi-echo spin echo and multiple b-value DWI acquisitions. 1In all implementations, the generator outputs two tissue parameter maps, f  (z, ) ∈ R 2×V , where the first channel contains the signal magnitude M, proportional to proton density, and the second channel contains the parameter of interest, that is, either T1, T2, or ADC.Strictly, the second channel equals the negative logarithm of the parameter of interest, for example, f  (z, ) = −log T1 (omitted in the table for simplicity), due to better computational stability.The relevant scanner settings and tissue parameters are summarized in Table 1, together with the associated signal equation.

Generator architecture
The implemented generator (Figure 1) utilizes a skip-connection encoder-decoder type CNN with MC dropout, which is a combination of the architecture provided by Ulyanov et al. 31 for DIP denoising together with the MC dropout extension provided by Laves et al. 39 for Bayesian DIP denoising of magnitude images.The generator input, z, is passed through a five-layer deep encoding-decoding process with added skip connections.Lastly, a single convolution layer is added to create an output f  (z, ) ∈ R P×V , that is, P parameter maps with V voxels each.These are then mapped through the application-specific signal equation layer s to update the network with MSE loss calculation.See Table 1 for specifics of different applications.

Data
We used three separate data sources to evaluate the proposed method.One synthetic with known ground truth reference and two in vivo datasets with volunteering patients undergoing cancer treatment.
• The PD and T2 map was used to compute multi-echo spin echo data with TE = 50,150, and 300 ms.
When creating the synthetic datasets, the PD reference tissue map was scaled to span the range [0, 1], that is, magnitude normalized, to decrease the need for application-specific hyperparameter tuning.Complex circular Gaussian noise was added to both datasets to Specifics of the three investigated applications, with MRI acquisition method, variable scanner setting, generator output interpretation, and associated signal equation.

Acquisition method
Variable scanner setting The network output two tissue parameter maps, f  (z, ) 1, * and f  (z, ) 2, * , where the first contains the signal magnitude M, proportional to proton density, and the second contains the parameter of interest, that is, either T1, T2, or ADC.

F I G U R E 1
Illustration of the generator architecture.The generator input passes a five-level deep encoder-decoder process with added skip connections at each forward pass through the network.The encoding section consists of encoding blocks (green blocks with 128 output filters) which involve a factor 2 downsampling with strided convolutions.The decoding section consists of convolution blocks (red blocks with 128 output filters) and bilinear spatial upsampling (blue blocks).At each level, the decoding block is concatenated (black ball labelled ||) with a skip block (yellow blocks with four output filters).Lastly, a 1 × 1 convolution block (pink) transforms the output from 128 filters to 2, that is, to the two separate parameter maps, which are then mapped through the application-specific signal equation s (red) to enable MSE loss calculation.See section 2.2.1 for details.
simulate the impact of noise perturbations from Rice distributed noise.The noise SD was 0.01 (multi-echo data) and 0.02 (SPGR data) to simulate two different levels of the noise environment.

2.3.2
In vivo data This study used two in vivo datasets: one SPGR VFA study for T1 mapping and one DWI study for ADC mapping.Both datasets were collected with a PET/MRI scanner (SIGNA PET/MR 3T, GE Healthcare) after informed consent was obtained from the patients.This study was conducted by the principles embodied in the Declaration of Helsinki and was ethically approved by the national research ethics committee (nr 2019-02666).
• The SPGR dataset consists of eight patients scanned with a 3D SPGR sequence (seven male, one female, age span 39-75 years, mean age 52 years).Voxel size, FAs, TR, and TE were set identical to the synthetic SPGR data (see Section 2.3.1 for details) and the voxel bandwidth was set to 488 Hz∕voxel.For each patient, one single axial slice was selected for T1 estimation.
• The DWI dataset consists of eight patients from the PAMP research project 4 (PSMA, Acetate and Multiparametric MRI for Prostate cancer).Each patient was scanned with a diagnostic multiparametric pelvic protocol that included diffusion-weighted EPI acquisitions with b-value 0, 200, and 1000 s∕mm 2 with 16 axial slices of 160 × 80 voxels.The voxel bandwidth was set to 1953 Hz∕voxel.For each patient, the DWI data from a single axial slice was selected from the mid-prostate region for ADC estimation.
The in vivo datasets were magnitude normalized by dividing each voxel value with the maximum value, that is, y v,m → y v,m ∕ max(y) so that the signal data spanned the range [0, 1].In all datasets, voxels with insufficient signal magnitude, for example, outside of the patient volume, were excluded by applying a binary mask.

Parameter mapping
We analyzed the proposed method with synthetic and in-vivo T1 mapping, synthetic T2 mapping, and in vivo ADC mapping.For each task, we estimated the tissue parameter map, β, and its associated model uncertainty map,  est , according to Algorithm 1.For reference, we conducted the same parameter mapping tasks with a conventional NLLS estimator, calculated with the SciPy least-squares function 5 using the Trust Region Reflective algorithm, which in this case returns the solution to minimize where  ⊂ R 2 is the region limited by the bounds of the least-squares algorithm.We denote this method NONE, that is, with no implemented denoising.With these estimates, we computed the Cramér-Rao Bound (CRB) variance  2 CRB , which equals the reciprocal of the Fisher information matrix, I.In this case of vector signal parameters estimated in approximate Gaussian noise, 48 the Fisher elements in the vth voxel equal We complement this with more detailed experiments with synthetic T1 and T2 mapping with available maps of ground truth reference,  (ref) .We apply several denoising methods to the input signal prior to computing the NLLS with Equation (7).From the restoration module in scikit-image 6 , we implement wavelet denoising (WAVE), total-variation (TV), and nonlocal means (NLM).We also apply block-matching and 3D filtering (BM3D) 49,50 with the python package BM3D 7 and deep decoder (DD) 5 https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.least_squares.html 6https://scikit-image.org/docs/stable/api/skimage.restoration.html#skimage.restorationhyperparameters: sigma=0.05(WAVE), weight=0.02(TV), patch_size=7, patch_distance=11, and h=0.02 (NLM) 7 https://pypi.org/project/bm3d/implemented with hyperparameters noise_type=gw, noise_std=0.02/0.01(T1/T2) denoising 8 , the latter of which is an under-parameterized nonconvolutional alternative to DIP.
Hyperparameters were selected to achieve significant noise reduction in WM with a minimal blurring of spatial features.
With all of these methods, including the proposed method (labeled OURS), we simulated the actual uncertainty by repeating the same parameter mapping task N r = 100 times with identical synthetic data but unique noise realization for each run.This enabled the calculation of bias and reference uncertainty maps with elements, as well as a frequency mean of the model uncertainty with OURS, Equation ( 9) gives a direct measure of the precision of the parameter estimate, β, while the differences between the outcomes in Equations ( 10) and ( 11) quantify the precision of the model uncertainty estimate,  est .We present the results of Equations ( 9)-( 11) as heatmaps and ROI analysis.

Uncertainty calibration
The model uncertainty with MC dropout is not inherently calibrated and can contain errors in data magnitude or scale differently for different datasets. 51We investigated calibrating  est to decrease the difference in model uncertainty and frequency outcome in Equation (10).For this, let A ∈ R P×V be data magnitude maps of the estimated tissue parameters, with elements, such that A is normalized to the range [0, 1] for tissue parameter p.Since the generator outputs the tissue parameter of interest on a logarithmic scale (see Section 2.2.1), we apply the correction in Equation ( 12) to the model uncertainty as for all voxels v ∈ {1, … , V} of the model uncertainty.

MC dropout in DIP
For completeness, we also include DIP-enhanced parameter mapping without the Bayesian approximation by disabling MC dropout.For this, we conduct T1 mapping with several different training iterations, N b , both with and without MC dropout applied.We disable MC dropout by setting the dropout probability to zero so that concurrent forward passes return identical outputs.This comparison highlights the difference between Equations ( 3) and ( 5), in terms of the trade-off between convergence rate, computational cost, and the ability to avoid overfitting to the noise in the final result.

RESULTS
The resulting tissue parameter maps and associated calibrated model uncertainty in all tested applications are presented in Figure 2. Comparing A and B, we conclude that the proposed method successfully incorporates denoising for all tested applications.Further, the model uncertainty produces uncertainty estimates similar to the CRB but smoother and with lower magnitude (Figure 2C,D).A detailed zoom of the T1 results with synthetic data is presented in Figure 3, which highlights the performance of using the proposed method compared to other denoising methods.Results associated with the rest of the patients included in this study are presented as Figures S1 and  S2.The bias and uncertainty analysis are presented as heatmaps in Figures 4 and 5, complemented with ROI analysis for different tissues in Table 2.All denoising methods introduce some bias compared to NLLS without denoising (NONE).In T1 mapping, OURS introduces the least RMS bias in all ROIs.Equivalently for T2 mapping, OURS is the second lowest with a maximum deviation of 1.1% from the best performer.Note the increased bias with OURS in the border region of the imaged volume in T2 mapping (Figure 4G2).The simulated reference uncertainty (Equation 10) is presented in Figure 5. OURS seem visually most similar to TV, with significant uncertainty reduction in all tissues.Regarding ROI mean value, OURS deviates less than 1% from the best performer in all ROIs and applications; see Table 2.The model uncertainty from OURS, both with and without calibration, is appended Parameter mapping results with the proposed method (row A) with associated calibrated model uncertainty (row C) compared to a conventional nonlinear least squares estimator without applied denoising (NONE, row B) with associated Cramér-Rao Bound standard deviation (row D).Each column shows a separate implementation: Brain T1 mapping with synthetic (col 1) and in-vivo (col 2) variable flip-angle data, brain T2 mapping with synthetic multi-echo spin echo (col 3), and prostate apparent diffusion coefficient mapping with in vivo DWI data (col 4).The ADC (ADC) maps in column 4 have been clipped to 70 × 70 voxels to highlight the region encompassing the prostate gland.
to the same plot for comparison.These metrics are also presented for different ROIs in Table 2.
The effect of disabling the Bayesian approximation is presented in Figure 6.The proposed MC dropout implementation is compared to the same model without MC dropout (labeled non-bayesian).OURS is significantly less prone to reconstruct noise in the input data A detailed zoomed-in view of T1 mapping results with synthetic data.The proposed method (OURS, same as A1 in Figure 2) is compared with the ground-truth reference (GT), and nonlinear least squares estimates with different methods of signal denoising: no denoising (NONE, same as B1 in Figure 2), wavelet (WAVE), total variation (TV), nonlocal means (NLM), block-matching and three-dimensional filtering (BM3D), and deep decoder (DD).Each zoom-in region displays a rectangular patch of 70 × 70 voxels.
(i.e., overfitting), which facilitates the selection of appropriate hyperparameters.However, this comes at the price of a decrease in the convergence rate, which increases the computational cost.
All computations were conducted with Nvidia's parallel computing platform CUDA with a single RTX 2080 Ti GPU.The time consumption for single slice estimation with this setup is resented in Table 3, together with the application-and data-specific parameters on the number of voxels and the selected number of optimization steps.

DISCUSSION
We have reformulated parameter mapping as a DIP denoising task with associated model uncertainty estimation.
Our main findings are that DIP adapts successfully to several applications of parameter mapping with only limited hyperparameter tuning, and that MC dropout mitigates the burden of overfitting noise.However, this is the most time-consuming aspect of the proposed method.Our code is generalized to handle several applications with a single generator by only altering the signal equation block.This modular approach is easy to expand to more applications.Also, the associated uncertainty estimate provides valuable information when adequately calibrated.
The main disadvantage of the proposed method is the high computational cost and the need for hyperparameter selection.
The proposed method is computationally heavy, with long estimation times in all applications (OURS in Table 3).T1 mapping was almost six times slower than ADC mapping, which shows that time consumption scales with data size (35k voxels vs. 11k voxels).This is a clear drawback in clinical settings where large matrix sizes with multiple slices are often required.Non-Bayesian DIP is much faster and requires only 10k iterations (8 min) to output similar results to OURS at 50k steps (Figure 6), that is, most of the computational cost originated from MC dropout.We did not include model backtracking to prevent unsatisfactory gradient updates.Backtracking would, however, make non-Bayesian DIP even faster and further increase the difference between the two methods.
Network approaches are conventionally timeconsuming during training but fast in production.DIP is unique since DIP generators only use images from a single imaging occasion instead of large training sets with representative examples.This feature makes DIP easy to implement but slow in production since the training (or optimization) needs to be repeated for each imaging occasion.Also, this makes DIP methods less likely to do something unpredictable, like to hallucinate when encountering something unusual such as an artifact.This could be beneficial; however, trained models could possibly also understand that the artifacts should be removed if adequately trained, which one cannot anticipate for DIP-based methods.
Although beyond the scope of this work, we emphasize that future studies should prioritize computational Estimated bias in parameter map estimates with an axial brain slice, calculated with Equation ( 9) for the case of T1 mapping (col 1), and T2 mapping (col 2).The proposed method (OURS) is compared with nonlinear least squares estimates with different methods of signal denoising: no denoising (NONE), wavelet (WAVE), total variation (TV), and nonlocal means (NLM), block-matching and three-dimensional filtering (BM3D), and deep decoder (DD).A region of interest (ROI) analysis of different tissues is presented in the bias columns of Table 2.All plots are normalized to the ground-truth reference parameter map and displayed as percentages.
cost since it is the most prominent limitation in clinical implementations.
Insufficient convergence in the optimization step (Equation 3 or 5) is concerning in applications where the image data has a meaningful interpretation, such as in QMRI parameter mapping.Stopping the optimization too early results in absent low-level image features (not necessarily noise).This can be seen in Figure 6 by comparing the initial output and a later stage.For  2. All plots are normalized to the GT parameter map and displayed as percentages.

T A B L E 2
Region of interest (ROI) analysis of the bias and uncertainty results in Figures 4 and 5, separated in three separate ROIs encompassing either white matter (WM), grey matter (GM), or cerebrospinal fluid (CSF).

Bias (RMS %)
Notes: The value in the bias column is presented as the root mean square (RMS) error of the bias metric in Equation ( 9), and the values in the uncertainty column are presented as the mean value of Equation (10).For OURS, we also append the mean model uncertainty, both uncalibrated ( est,raw , Equation11) and calibrated ( est,cal , Equation13) for region-specific analysis of the proposed calibration procedure.
example, see the isolated region of low T1 value in the ventricle region (Figure 3B), which is visible in the proposed method.This spatial feature is invisible when producing the same T1 map at N b = 10k steps (Figure 6).
To address this, we set the level of proposed hyperparameters to ensure proper convergence of all spatial image features, that is, until some noise appears in the final output.
For all tested applications and datasets, the results were obtained with identical network hyperparameters except the number of optimization steps, N b , which were specifically selected for each application, mainly due to differences in the number of total voxels in the magnitude signal data which affects the convergence rate in the optimization process.The results suggest satisfactory denoising can be achieved in several applications with limited hyperparameter tuning.
The proposed method performs well when compared to other denoising methods, both standard methods (TV, WAVE, NLM), a more modern alternative (BM3D), and DD, an under-parametrized nonconvolutional deep learning alternative to DIP.OURS seems visually most similar to TV, with significantly reduced noise and uncertainty in parameter mapping.
For model uncertainty calibration, we utilized a metric for ground-truth uncertainty: the actual spread in estimated parameter maps for different noise realizations (Equation 10).Comparing this to our uncertainty (Figure 5G,H) shows the extent of this issue, mainly errors in magnitude since all spatial features are still captured regardless.The suggested calibration procedure mitigates this issue by incorporating magnitude correction from the estimated tissue parameter map (Figure 5G,I, and uncertainty column in Table 2).The proposed calibration approach mitigates issues with incorrect uncertainty in regions of high data magnitude, for example, in CSF for T1 and T2.In T2 mapping, the calibration seems to fail at the border region and also to some degree in WM (Figure 5I2), possibly due to high estimation bias in those regions (Figure 4G2).Bias in those regions is also visible with NONE, that is, un-denoised NLLS, possibly relating this finding to insufficient TE selection.The computational cost is essentially unaffected since the calibration is merely a hyperparameters-free multiplication after training.We conclude that more extensive testing is required

F I G U R E 6
Deep image prior (DIP)-enhanced parameter mapping with synthetic T1 variable flip-angle data (labeled y).T1 maps with nonlinear least squares without denoising (labeled NONE), the proposed method (labeled OURS), and non-Bayesian quantitative MRI DIP are presented for comparison.In both cases, our method successfully adapts DIP to parameter mapping.The two DIP implementations (Equations 3a and 5a) are presented at an array of different numbers of training iterations (N b ), which confirms that our method is less prone to overfitting noise due to the implementation of Monte Carlo dropout.
for use cases where uncertainty estimation is highly prioritized.
In this work, we aimed to investigate the possibility of addressing noise and uncertainties in parameter mapping by casting it as a DIP denoising task with MC dropout-produced uncertainty estimates.Based on our findings, we have identified the following possible extensions to improve the proposed method: automation of hyperparameter selection and the need for computational cost reduction.Automation of hyperparameter tuning is important to suppress the influence of subjective biases in the quantitative task of tissue parameter mapping.In particular, the number of optimization steps affects the denoising level in the estimated parameter map.Although not automized, the proposed facilitates the selection of optimization steps due to the MC dropout implementation.MC dropout lowers the convergence rate, making finding a suitable number of steps easier since a larger range of optimization steps produces an acceptable outcome.To take this one step further would be to remove the manual selection altogether, possibly by tracking convergence during the optimization step with some information criterion.
Addressing the computational cost and hyperparameter selection is an intertwined problem.For example, the generator architecture (Figure 1) depends on several parameters, such as U-Net depth and output filters in each convolution block.The selection of these parameters drastically affects the network size and computational load.We suggest to investigate a data-driven generator selection and to implement more effective pre-initialization of the network weights to reduce computational costs.
The experiments utilized a fixed standard architecture independent of the spatial size of the MRI data.This means that the spatial size of the most downsampled patch (i.e., in the bottleneck of the U-net) differs for different image sizes.Since this generator still produces satisfactory results for different image sizes, the number of parameters of the generator can likely be lowered to decrease computational cost.

T A B L E 3
Computational time for the different parameter mapping estimators in this study.Notes: V is the number of active voxels in the selected slice (rounded to the nearest thousand), that is, the sum of all voxels minus voxels in background regions with nonrelevant MRI signal value.N b is the number of optimization steps selected for DIP-network training.The non-Bayesian DIP approach severely overfitted at 50 000 (k) iterations.Therefore, we present the estimation time when stopping the process at 10k iterations, where it outputs a more reasonable level of noise reduction, see Figure 6.

Number of
One promising approach would be to investigate the DD architecture thoroughly.In contrast to the proposed over-parametrized DIP generator, DD is under-parametrized and, therefore, inherently efficient in suppressing overfitting when implemented with correct hyperparameters.Although we address overfitting with MC dropout, DD is still interesting, especially for reducing computational cost.The experiments show that DD signal denoising was significantly faster than the proposed method and the non-Bayesian DIP (see Table 3).
Regarding generator weight initialization, the experiments show that a significant portion of the optimization time is when the generator tries to find the correct output range (Figure 6, N b up to 10k iteration).This fraction can likely be decreased by task-specific pre-initialization of the network weights.
Also, the estimation model used in this study was assumed Gaussian, but all experiments were conducted with either in vivo data or Ricean simulated noise in synthetic data.The Gaussian approximation model has very low bias at SNR > 2, 40 which was the case in the experiments.However, for very low signal-to-noise ratio applications, one should implement Ricean modeling to avoid biased results.

CONCLUSIONS
We have implemented a DIP-based framework incorporating denoising and uncertainty estimation into the parameter mapping process.The implementation handles several applications with limited hyperparameter tuning.Although time-consuming, uncertainty information from MC dropout makes the method more robust and provides useful information when properly calibrated.We conclude that DIP successfully denoise the parameter mapping process and is easy to use since DIP methods do not use any network training data.

F I G U R E 5
Simulated ground-truth (GT) reference uncertainty,  ref , in parameter map estimates with an axial brain slice, calculated with Equation (10) for the case of T1 mapping (col 1, row A-G) and T2 mapping (col 2, row A-G).The proposed method (OURS) is compared with nonlinear least squares estimates with different methods of signal denoising: no denoising (NONE), wavelet (WAVE), total variation (TV), nonlocal means (NLM), block-matching and three-dimensional filtering (BM3D), and deep decoder (DD).Additionally, the frequency mean of the uncertainty estimate from OURS, Equation (11), are presented both uncalibrated (row H) and calibrated according to Equation (13) (row I).The magnitude of the data magnitude errors from in the MC dropout implementation are examined by comparing G to H, and the precision of the proposed calibration is equivalently examined by comparing G to I. A region of interest (ROI) analysis of these metrics in different tissue is presented in the uncertainty column of Table , is updated until it reaches the final state  N b after N b optimization steps.The generator input, z ∈ R N c ×V , is N c channels with V voxels of random uniform noise with standard deviation  input .z is perturbed with noise-based regularization with SD  reg , just as in the original DIP implementation.After optimization, maps of tissue parameters and estimated model uncertainty are calculated from N s MC dropout samples with fixed weights.The Bayesian approximation is enabled by drawing a new sample of the Bernoulli distributed random variables, , before passing the input z through the network.The size of  depends on the generator architecture and is omitted for simplicity in Algorithm 1.The application-specific signal equation s and the generator f are described in detail in Sections 2.2.1 and 2.2.2, respectively.The proposed parameter mapping algorithm outputs denoised tissue parameter maps, β, with associated model uncertainty,  est from a parameter mapping dataset y.The hyperparameters of this method are the number of training iterations N b , number of input channels, N c , number of MC Dropout samples, N s , dropout probability, p, input noise standard deviation,  input , and noise regularization SD,  reg . and  are the learning rate and weight decay required by the Adam algorithm.Identity matrices with shape N c × V are denoted I N c ×V .The application-specific signal equation s and the generator f are described in detail in Section 2.2.1 and 2.2.2, respectively with weight decay  and learning rate .The generator weights,