Fast deep learning reconstruction techniques for preclinical magnetic resonance fingerprinting

We propose a deep learning (DL) model and a hyperparameter optimization strategy to reconstruct T1 and T2 maps acquired with the magnetic resonance fingerprinting (MRF) methodology. We applied two different MRF sequence routines to acquire images of ex vivo rat brain phantoms using a 7‐T preclinical scanner. Subsequently, the DL model was trained using experimental data, completely excluding the use of any theoretical MRI signal simulator. The best combination of the DL parameters was implemented by an automatic hyperparameter optimization strategy, whose key aspect is to include all the parameters to the fit, allowing the simultaneous optimization of the neural network architecture, the structure of the DL model, and the supervised learning algorithm. By comparing the reconstruction performances of the DL technique with those achieved from the traditional dictionary‐based method on an independent dataset, the DL approach was shown to reduce the mean percentage relative error by a factor of 3 for T1 and by a factor of 2 for T2, and to improve the computational time by at least a factor of 37. Furthermore, the proposed DL method enables maintaining comparable reconstruction performance, even with a lower number of MRF images and a reduced k‐space sampling percentage, with respect to the dictionary‐based method. Our results suggest that the proposed DL methodology may offer an improvement in reconstruction accuracy, as well as speeding up MRF for preclinical, and in prospective clinical, investigations.


| INTRODUCTION
In recent years, there has been growing interest in quantitative multiparametric mapping (MPM) by means of magnetic resonance imaging (MRI) as a valuable tool for measuring, until the millimeter scale of resolution, physical properties such as nuclear spin-lattice relaxation time (T 1 ), nuclear spin-spin relaxation time (T 2 ), and proton density (PD). 1,24][5][6][7][8] Quantitative MPM, however, requires multiple sequences to measure quantitative parameters, and so it may result in a very time-consuming approach. 9,10cently, MR fingerprinting (MRF) has emerged as an accurate and time-efficient MPM technique delivering multiparametric maps in oneshot measurement. 11The key idea of MRF is to apply a train of radiofrequency (RF) pulses varying in a pseudo-random way some MR acquisition parameters, for example, flip angle (FA), repetition time (TR), echo time (TE), or readout trajectory, to eventually provide unique signal evolutions, called "fingerprints", for combinations of desired tissue physical properties, such as T 1 , T 2 , and PD. 12 In this way, the sample properties and composition can be traced back from the analysis of the fingerprints collected in each voxel.[13] The traditional MRF maps reconstruction method is known as pattern matching. 14Such a method compares the acquired voxel-wise fingerprints with a precomputed dictionary of synthetic MR signal evolutions generated using the Bloch equations or the extended phase graph (EPG) signal simulation model. 15For each voxel, the dictionary entry with the best matching correlation with the fingerprint is selected and the associated parameters (e.g., T 1 , T 2 , and PD) are assigned to the voxel to generate the quantitative maps associated with tissue properties.One of the main drawbacks of this methodology is that the reconstruction dictionary must be sufficiently large to cover the entire range of possible tissue values and to avoid errors in reconstructed tissue maps.Furthermore, the dictionary size grows exponentially with the number of tissue parameters encoded in the simulated fingerprints. 16This requires significant storage and computational resources, which are limiting factors for the clinical adoption of MRF techniques.Some dictionary-compression methods have been developed to reduce the dictionary size and to speed up the matching process. 17However, these methods make additional approximations that can affect the accuracy of the reconstructed maps.
9][20] Among these methods, neural networks (NNs) are particularly able to approximate a nonlinear function between the inputs and the outputs that may be difficult to represent via analytical functions.Using these techniques, it is possible to define a regressive DL model that is capable of predicting, after a supervised training procedure, the parameters associated with the experimental fingerprint as input.The three main advantages of this approach are (i) the fast computation of the quantitative parameter maps, (ii) the possibility of predicting MR parameters of unknown signal evolutions (while the matching process restricts the parameters prediction to those present in the dictionary), and (iii) the capability of addressing the scalability problem of the dictionary without increasing the calculation time.Recent studies 18,19,21 demonstrated the feasibility of a DL-based approach for MRF.This was achieved by training the DL model using ground truth data generated via pattern matching, then applying the results to experimental data.The NN is trained to learn the tissue properties from the generated dataset, eliminating the need to memorize the whole set of fingerprints after the training procedure.These works show that, compared with the traditional matching procedure, the application of the DL model to MRF data is 300-5000 times faster. 21cause of these advantages, many applications of the MRF framework have been reported in the clinical field.For example, in brain studies, the MRF-generated relaxometry maps were shown as a promising tool to help the diagnosis of suspicious hippocampal sclerosis in patients with mesial temporal lobe epilepsy, 22 to identify inconspicuous epileptogenic lesions from patients with negative conventional MRI diagnosis, 23 to study focal cerebral alterations and identify patients with frontotemporal lobe degeneration, 24 and to classify Parkinson's disease subjects and their disease severity. 25Moreover, the fast MRF maps quantification makes it highly applicable to abdominal and cardiac imaging. 26,27wever, as far as we know, the applications of MRF methodology in the preclinical field are still limited and an optimized MRF protocol for both acquisition and reconstruction phases that exploits DL techniques has not been proposed yet.
In this context, the aim of this preclinical study is to provide an optimized, DL-based MRF framework to measure T 1 and T 2 maps of the ex vivo rat brain.To pursue this goal: (i) we implemented a preclinical MRF pulse sequence with two different FA and TR pseudo-random profiles 28,29 to generate unique T 1 and T 2 MRF signal timecourses and to investigate which pseudo-random profile delivers the most accurate T 1 and T 2 maps reconstruction via the pattern-matching method; (ii) we trained, validated, and tested two supervised DL models with all the acquired MRF images to assess which network predicts the T 1 and T 2 maps with the most accuracy; and (iii) we optimized both the MRF acquisition sequence and the DL models to deliver the least time-consuming framework to acquire MRF images and to predict accurate and reliable T 1 and In particular, this study aims to extend current DL-based methods for the analysis of MRF data in three main directions.First, we will perform a supervised training procedure on experimental data, excluding the use of MRI signal simulators.This approach ensures that the DL models are trained on real experimental inputs, which should enhance their reliability and generalizability.Second, we will present an automatic procedure to optimize simultaneously the NN architecture, the structure of the DL model, and the supervised learning algorithm.Lastly, we will compare different MRF acquisition settings and DL architectures to optimize both the acquisition and reconstruction phases.By combining these approaches, we aim to develop an advanced DL-based MRF framework that improves both the accuracy and efficiency of T 1 and T 2 map measurements in preclinical research settings.

| Ex vivo rat brain phantoms
Two phantoms of ex vivo rat brains were analyzed with the preclinical MRF-optimized framework.Sprague Dawley rats obtained from Taconic were reared under standard conditions (25 ± 1 C, 60% relative humidity, 12-h light/dark), 3-4 per cage, with ab libitum access to food and water.
Rats were anesthetized with isoflurane and perfused transcardially with 125 mL of filtered phosphate-buffered saline (1Â PBS) followed by 50 mL of 4% paraformaldehyde (PFA) solution in PBS.The skull was dissected and dipped in 4% phosphate-buffered PFA overnight at 4 C. Subsequently, they were transferred to PBS for 2 days and finally included in 2% agarose in 1Â PBS to be analyzed by ex vivo MRF techniques.
Procedures involving animals and their care were conducted in conformity with the institutional guidelines according to the international laws and policies (EEC Council Directive 86/609, OJ L 358, 1, December 12, 1987; NIH Guide for the Care and Use of Laboratory Animals, U.S. National Research Council, 1996).The specific protocols covering the studies described in this paper were approved by the Italian Ministry of Health (protocol nr.5CEED.97)and by the University of Pavia Institute Institutional Animal Care and Use Committee.

| Magnetic resonance imaging acquisition
MRI acquisitions were performed with a 7-T Pharmascan scanner (Bruker, Billerica, MA, USA) equipped with a circularly polarized 1H transmit/ receive volume coil.Before performing gold standard mapping and running the MRF protocols, a B 0 map was acquired and automatic shimming by means of the MAPSHIM utility was executed using all the available shims up to the Z3 order.All the acquisition protocols were based on Cartesian sampling with a linear phase-encoding order, starting from the lower edge of the k-space.No phase-encoding acceleration was applied.
The gold standard T 1 and T 2 mapping protocol was applied to one axial slice (slice thickness = 1.0 mm; field of view [FOV] = 33 Â 33 mm 2 ; matrix size = 128 Â 128; with read, phase, and slice offset = 0) and included an inversion recovery-spin echo (IR-SE) sequence with 10 inversion time points (TR/TE = 5000/4.59ms; TI = 100, 200, 400, 700, 1100, 1600, 2200, 2900, 3700, and 4600 ms; scan time = 106 min), and a SE sequence with 10 echo time points (TR = 5000 ms; TE = 6, 10, 20, 40, 60, 80, 100, 150, 200, and 300 ms; scan time = 106 min).The IR-SE and SE data were analyzed with an in-house fitting software developed in Matlab (MathWorks, MA, USA).In particular, T 1 maps were computed from the following three-parameter model: T 1 j , while T 2 maps were computed from the following exponential decay function: The MRF framework was implemented according to the MRF-steady-state free precession (FISP) sequence proposed by Gao et al., 28 keeping the same slice geometry set for the gold standard mapping protocol [slice thickness = 1.0 mm, FOV = 33 Â 33 mm 2 , matrix size = 128 Â 128, with read, phase and slice offset = 0] with TE = 3.2 ms and two different FA/TR profiles.Specifically, we implemented both the 600 FA/TR points profile proposed by Gao et al. 28 and the optimized 400 FA/TR points profile proposed by Zhao et al. 29 with scan times of 30 min and 20 min respectively.For detailed visualization of these profiles, please refer to Figure 1.As regards the MRF dictionary-based reconstruction method, T 1 and T 2 maps were computed by using the traditional dictionary simulation and the pattern matching method.In particular, the two dictionaries

| Input data
The dataset consisted of six slices of two ex vivo rat brain phantoms, five slices from the first phantom and one slice from the second one.For each slice of the dataset, ground truth T 1 and T 2 maps (through the IR-SE and SE sequences, respectively) and the two MRF sequences were acquired.
Before passing the MRF images to the DL model, the following preliminary steps were performed.First, we normalized the pixel intensities of the MRF images and ground truth T 1 and T 2 maps to the [0,1] range.Then the MRF images and ground truth T 1 and T 2 maps were transformed into collections of monodimensional trajectories that compose the input and the expected output datasets, respectively.We excluded from the training dataset MRF signals that belong to the background and to the skull regions.

| DL reconstruction
We defined two different architectures of the DL model: a multilayer perceptron (MLP) and a recurrent neural network (RNN).Both the models were used to perform a voxel-wise regression of the T 1 and T 2 parameters, given a MRF signal evolution trajectory as input.This means that the DL model receives, as input, the MRF signal evolution trajectory from an individual voxel and produces, as output, the estimated pair of T 1 and T 2 values corresponding to that specific voxel.A schematic representation of both the architectures is provided in T 1 and T 2 values.The RNN was composed of a long-short term memory (LSTM) block with a hyperbolic tangent activation function and a sigmoid activation for the recurrent step, followed by a fully connected layer with two nodes for the output predictions.Because the LSTM block works better with short input sequences, 30 we reshape the one-dimensional MRF sequence into multiple parallel time series, each 20 time points long.The appropriate activation function of the output layer of each of the models was found using the hyperparameter optimization algorithm.
The supervised training procedure of both models was performed by minimizing the mean squared error (MSE) loss function and using the mean absolute error (MAE) as an additional metric to monitor the process.Early stopping was performed by monitoring the validation error: the training procedure was stopped when the validation error did not improve within a maximum number of epochs.This maximum number of steps is called early stopping patience, and it was set equal to a percentage of the total number of training epochs.To train, test, and evaluate the generalization ability of the proposed method, we performed a cross-validation procedure among different slices of the two phantoms.Specifically, we trained the model on five slices of the first phantom, randomly partitioning the input dataset into two separate subsets (80%/20%) for the training and validation phases, respectively.We used the slice of the second phantom excluded from the training process and from the hyperparameter tuning as an independent test set.
Code has been written in Python 3.8 leveraging on the tensorflow 2.8 machine learning platform. 31

| Hyperparameter optimization
To find the best configuration of the model for the MRF problem and for the available data, we performed the hyperparameter optimization through the Hyperopt library. 32Using the Hyperopt framework, many combinations among a selection of parameters are scanned in a semiautomatic way using the tree-structured Parzen estimator (TPE) algorithm to find the best configuration of the model.In particular, the TPE algorithm performs a Bayesian scan of the search space by sampling more densely the most promising regions of the space and, as a consequence, learning from the tuning history.To optimize the hyperparameters configuration we applied the k-folding method (k = 5) to five slices of the first phantom.The model was trained and validated on k-1 slices (80% training, 20% validation) and it was tested on the k-th slice.
For each hyperparameter combination, the k-folding technique was repeated five times, giving an opportunity to each fold to be the held-out test set, and the results were averaged to obtain a global evaluation of the hyperparameter combination.The whole process was repeated for 1000 hyperparameter combinations and returns the configuration of the search space, which minimizes the tuning loss.A schematic representation of the k-fold cross-validation strategy is provided in Figure 3. Specifically, we used the average MSE to evaluate the tuning procedure, defined as: Schematic representation of the two architectures used: a multilayer perceptron (MLP) on the left and a recurrent neural network (RNN) composed of a long-short term memory (LSTM) block on the right.Both models receive the magnetic resonance fingerprinting trajectory of a single voxel as input and output the corresponding T 1 and T 2 values for that voxel.The activation functions, number of hidden dense layers, and number of dense units for the MLP, as well as the number of LSTM units for the RNN, were optimized through the automatic hyperparametertuning procedure.
The hyperparameters that were tuned include the architecture of the model (i.e., MLP/RNN), some NN parameters (i.e., the number of hidden layers, the size of each layers, the initialization function and the activation function of the final layer), as well as some parameters of the supervised learning algorithm (i.e., the optimizer, the initial learning rate, the maximum number of epochs, and the early stopping patience).

| Evaluation strategy
Both the dictionary-based reconstruction method and the DL method were evaluated using the percentage relative error (PRE) and the structural similarity index (SSIM).
The PRE quantifies the agreement between the reconstructed map y and the reference one b y by computing the jb y i À y i j/b y i Á 100%, where y i and b y are the i-th voxel of y and of b y, respectively.To assess the overall agreement of the reconstructed maps, we studied the distribution of the PRE over the entire maps, excluding the areas associated with the background and skull, and over some anatomical districts of interest.Specifically, we selected the cortex and corpus callosum to provide examples of gray and white matter regions, respectively.We used the SSIM to measure the goodness of the reconstruction by comparing the statistical distributions of the two arrays of pixels. 33Unlike the PRE, the SSIM measures the difference between two images by assessing perceptual characteristics.The SSIM can assume values between 0 and 1: if it is equal to 1 then there is perfect structural similarity between the image and the reference image, while a value of 0 indicates that the images are completely different from each other.

| Sequence compression methods
The aim of this phase of the work was to determine the optimal design of the MRF sequence that allows for good estimation of the parametric maps with a shorter acquisition time.We examined the following aspects of the scanning scheme: the minimum number of sequence images necessary to obtain accurate quantitative maps and the minimum sampling percentage of the Cartesian k-space that guarantees good reconstruction performances.
F I G U R E 3 Schematic representation of k-fold cross-validation for hyperparameter optimization.The cross-validation was performed on the five slices of the first phantom.The sixth slice of the second phantom was not included in the optimization process and was used as an independent test set.MSE, mean squared error.
As regards the first point, we evaluated the reconstruction performances of the algorithm with different acquisition sequence lengths, obtained by selecting several subsets of images, starting from the beginning of the MRF acquisition.In particular, we tested different sequence lengths, ranging from the full-length experiments to 20 time points sequences.For the second aspect, to undersample the k-space, we proceeded in the following way.First, we transformed all the MRF images in the frequency domain by using the 2D Fourier transform.Then we undersampled the k-space data by keeping only the central phase-encoding lines of the k-space and applying zero filling to the peripheral lines.We tested 10 different subsampling percentages of the k-space, ranging from 100% to 10% of the total number of phase-encoding lines.By applying the inverse Fourier transform of the undersampled k-space data, we obtained the MRI images.To determine both the minimum number of images of the MRF sequence and the minimum k-space sampling percentage to be used, we applied the Elbow method proposed by Satopaa et al. 34 Specifically, we plotted the SSIM and the mean PRE as a function of the two parameters and we identified the optimal point of the MRF sequence as the point of maximum curvature on the graph using the kneed Python library.

| Hyperparameter optimization results
The optimization procedure was performed separately on each of the

| Phantom results
We trained the best configuration of the model on both the Gao et al. and Zhao et al. datasets separately, following the cross-validation strategy described in Section 2.4 and we evaluated the predictive performances of the DL model by applying it to the independent test set.Gold standard T 1 and T 2 maps, the dictionary-based reconstructed maps, and the DL-reconstructed maps, are presented in Figure 5 for both the Gao et al. and Zhao et al. datasets.First, we can qualitatively see that the estimated parametric maps of both methods have correctly reconstructed the anatomical structures of the brain, preserving the visual contrast between different tissues.We can also observe that the range of T 1 and T 2 maps T A B L E 1 Best configuration of the parameters obtained using the optimization process.Figure 5 shows the error map associated with each parameter computed voxel-wise as the PRE.We observe good agreement between the reference and each of the reconstructed maps for most of the anatomical regions of the phantom.In particular, good results were achieved in the regions of the brain.Larger discrepancies are visible in the volume surrounding the phantom and near to the skull, probably because of the imperfections in the RF excitation field, as already discussed in Gao et al. 28 These errors are more evident in the T 2 maps compared with the T 1 maps.We summarized the distributions of the error maps in the boxplots of Figure 6.We can observe that the agreement of the reconstructed maps with the ground truth ones is better for the DL method compared with the traditional dictionary-based one, both in terms of mean value and dispersion of the errors.

| Length of the MRF sequence results
We evaluated the reconstruction performances of the DL algorithm and of the dictionary-based one with different acquisition sequence lengths for both datasets (Figure 8).As can be observed from Figure 8, good agreement between the DL-reconstructed and true maps was obtained for a sequence length of at least 100 time points for both T 1 and T 2 .We repeated the same simulation for the dictionary-based method and found that a MRF sequence of at least 480 time points is necessary for a reliable reconstruction of T 1 maps and of 490 time points for T 2 maps.In terms of the acquisition time, reducing the number of MRF time points from 600 of the complete Gao et al. sequence to 100 of the compressed sequence means speeding up the acquisition phase by an acceleration factor of about 2.2.
Regarding the Zhao et al.MRF sequence, the results show that for the DL system, a good reconstruction of T 1 maps is achieved with at least 100 time points, and with at least 60 time points for T 2 maps.For the dictionary-based method, 290 and 330 images are necessary for a good reconstruction of T 1 and T 2 maps, respectively.By using 100 MRF images of the initial 400 means accelerating the acquisition time by a factor of about 1.7.

| k-space undersampling results
We tested the reconstruction accuracy of the DL model and of the traditional dictionary-based method for different k-space Cartesian sampling percentages considering the Gao et al. and Zhao et al. full-length experiments (Figure 9).We performed different scans for the two datasets by varying the k-space sampling percentage from 100% to 10%.
The results for the Gao et al. and Zhao et al. sequences are presented in Figure 9 for the DL reconstruction method.We can observe that the observed that below these thresholds the loss of spatial resolution because of k-space subsampling becomes evident.Because the acquisition time is directly proportional to the number of k-space phase-encoding lines sampled, the scanning time decreases, reducing the k-space coverage.
For the Gao et al. sequence, by reducing the k-space sampling percentage from 100% to 30%, we reduced the acquisition time by an acceleration factor equal to 3.3.Considering the Zhao et al. sequence, by sampling 40% of the k-space lines, the acquisition time was reduced by a factor equal to 2.5.

| Comparison with other DL methods
The proposed method was compared with other DL techniques to evaluate the reconstruction performances using the same dataset of preclinical MRF images.We considered the following DL architectures: convolutional neural network 1D (CNN 1D), convolutional neural network 2D (CNN 2D), convolutional encoder-decoder neural network (CED), 2D Unet, and 3D Unet.These five models are some of the state-of-theart methods in the field of DL techniques for image analysis.The specific architecture of each of these NNs is described in Data S1 of this document.Table 2 shows the results of the compared techniques in terms of PRE and SSIM of the reconstructed maps of the independent test set.We can see that our method reached the lowest reconstruction error considering both metrics and outperformed all the other considered models.
T A B L E 2 Comparison with some of the state-of-the-art models for image analysis for the In the current study, we propose a DL framework for the reconstruction of T 1 and T 2 maps acquired with two different MRF schemes.The imageanalysis system is composed of a DL NN that receives the MRF signal evolution trajectory voxel-wise as input and outputs the estimated values of the two T 1 and T 2 relaxation maps, and of a hyperparameter optimization algorithm to automatically choose the best structure of the DL model.
The NN was trained using two sets of MRF data of ex vivo rat brain phantoms acquired with a 7-T preclinical scanner.Both the datasets were randomly split into two separate subsets for training and validation phases then tested on an independent dataset.We compared the reconstruction performances of the DL technique with those achieved from the traditional dictionary-based method, evaluating the PRE and the SSIM.The DL reconstruction method achieved better results compared with the dictionary-based one for both the Gao et al. and the Zhao et al. datasets.In particular, we proved that our DL method decreased the mean PRE by a factor of 3 for T 1 and by a factor of 2 for T 2 .In terms of SSIM, both reconstruction methods achieved a SSIM equal to 0.91 for T 1 maps, while for T 2 maps, the DL method increased the SSIM from 0.77 to 0.89.Besides this improvement of the estimation performances, we also demonstrated that the proposed method significantly accelerates the computational time of the reconstruction process.
We extended current DL-based methods for the analysis of MRF data in two main directions: (i) we performed a supervised training procedure on experimental data excluding the use of theoretical MRI signal simulators; and (ii) we proposed an automatic procedure to automatically optimize the structure of the NN and the training process.As regards aspect (i), by excluding the MRI signal simulator from the reconstruction process, we exploit the flexibility of the NN, which is also able to recognize data stochasticities related to the acquisition process.This phenomenon is evident in Figure 5 if we compare the T 2 maps of both the acquisition schemes reconstructed with the dictionary-based method with those generated by the DL method.In the first case, the T 2 maps present artifacts in the area surrounding the phantom, probably because of imperfections in the RF excitation field, while in the second case, these artifacts are significantly reduced, because the network has learned to recognize and discard them.Furthermore, this approach is particularly convenient for quantitative parameters whose MRI signal cannot be described with well-studied models, and for which, therefore, the dictionary cannot be generated by any theoretical simulator.
As regards point (ii), we implemented a hyperparameter optimization strategy to select the best combination of the DL parameters.The key aspect of this optimization process is to include all the parameters in the fit, allowing the simultaneous optimization of the NN architecture, the structure of the DL model, and the supervised learning algorithm.Because the behavior of a DL model depends on configuration settings that are often interconnected to each other, for a correct hyperparameter tuning it is essential to evaluate the combined effect of all the parameters at the same time.This methodology is useful because it allows you to choose the optimal hyperparameter setup in a semiautomatic way with a systematic pipeline, avoiding performing such searches by hand.In particular, we used the TPE optimization algorithm, which performs a Bayesian The MRF acquisition scheme was analyzed to determine the best design of the MRF sequence necessary to obtain accurate quantitative maps with a reduced acquisition time.We evaluated the performances of both the reconstruction approaches with different numbers of MRF images and with different k-space sampling percentages.The results of Figure 8 demonstrated that, for the Gao et al. dataset, the number of MRF images can be reduced to 100 images for good reconstruction of both the T 1 and T 2 maps, compared with the 300 images necessary for the dictionary-based reconstruction.As regards the Zhao et al. dataset, Figure 8 shows that at least 100 time points are required for a good reconstruction of both the T 1 and T 2 maps, in contrast to the dictionary-based method, where 400 time points are required.Considering the k-space sampling, we showed that for both of the MRF sequences, 40% of the k-space sampling is sufficient for a reliable estimation of the T 1 and T 2 maps with the DL method and 50% of the k-space sampling for the dictionary-based method.Therefore, the DL approach enables not only speeding up the reconstruction phase avoiding the pattern-matching process, but also reduces the time duration of the MRF acquisition, enabling the use of a smaller number of time points as well as a minor sampling percentage of the k-space.These results are particularly significant for simplifying in vivo applications of MRF sequences in the field of preclinical imaging.
We extensively compared two different MRF acquisition schemes, considering the sequence designs proposed in the work of Gao et al. 28 and Zhao et al. 29 Regarding the DL reconstruction, the graphs in Figure 8 show that the model achieves similar performances on both datasets, demonstrating the stability of the methodology to changes in the acquisition parameter scheme.On the other hand, if we consider the dictionarybased reconstruction method, some differences in the maps calculated from the two MRF sequences are evident.As already discussed in the work of Gao et al., 28 T 2 maps present some inhomogeneities, probably because of imperfections in the RF excitation field that may lead to errors in the matching process between the acquired voxel fingerprints and the dictionary ones.By comparing the T 2 error maps of Figure 5, we can observe that these inhomogeneities are significantly reduced by the Zhao et al. acquisition scheme compared with the Gao et al. scheme.These results suggest that the TR and FA profiles used in the Zhao et al. scheme generate MRF signal evolution trajectories that are less susceptible to hardware imperfections and that therefore make the pattern-matching algorithm more robust.Furthermore, these inhomogeneities are further reduced with the DL-based reconstruction method.Thus, our study demonstrates that the combination of the FA/TR profile proposed by Zhao et al. and our DL-based method for map reconstruction effectively minimizes RF excitation field inhomogeneities, overcoming the limitations highlighted in the work of Gao et al. 28 Although satisfactory results have been achieved, this work can be improved in different ways.First, our method is focused on estimating T 1 and T 2 maps of ex vivo brains of healthy rats.To enhance the robustness and generalizability of our findings, it would be relevant to increase the sample size of our study and extend the proposed methodology to other anatomical regions and pathological tissues.A good strategy would be to include samples with heterogeneous relaxometric properties in the training dataset so that the model is able to predict T 1 and T 2 over a wider range of values with the same accuracy.Furthermore, the results obtained in the current work suggest that the two proposed MRF sequences combined with the DL reconstruction method could be reasonably applied to in vivo imaging of rodents.In fact, the acquisition time of both the MRF sequences makes quantitative mapping compatible with the duration of an in vivo acquisition protocol.However, it is worth emphasizing that, for a proper translation of the experimental approach in vivo, it is necessary to make some adjustments to the entire MRI acquisition protocol, because animal health also becomes a priority when setting the experimental conditions and parameters.In particular, a careful optimization is required to shorten the typical duration of several hours of SE and IR-SE acquisitions of relaxation times maps, which serve as the gold standard necessary to validate the MRF ones.Once these adjustments and optimizations are made, the proposed method used ex vivo can be implemented in vivo, allowing for quantitative mapping of T 1 and T 2 relaxation times in future animal studies.9][40] EPI sequences employ a more efficient k-space sampling strategy, enabling a significant reduction in acquisition time.This approach could be explored to achieve more time-efficient MRF acquisitions, also in the preclinical setting.Moreover, additional MRI parameters can be included in the quantified maps, such as diffusion, 41 perfusion, 42 or magnetization transfer maps. 43For example, in the work of Yu et al., 41 the authors formulated a MRF framework to simultaneously measure quantitative maps of T 1 and T 2 , as well as the apparent diffusion coefficient, demonstrating the potential of MRF to also quantify diffusion maps.

Figure 2 .
Figure 2.While MLPs are general purpose models for handling nonspecific input data, RNNs are intended to deal with time-series signals to solve temporal problems.Specifically, the MLP was composed of a sequence of fully connected layers.The input layer consists of a number of nodes equal to the MRF sequence length with a sigmoid activation function, while the output layer consists of two nodes for the prediction of 35 initialization function, to set the initial values of the layer's weights (specifically, the Glorot uniform for the Gao et al. dataset and the Glorot normal for the Zhao et al. dataset).Regarding the hyperparameters of the training phase, the Adam optimizer 36 achieved the best result, with an initial learning rate of the order of 10 À5 .Finally, concerning the maximum number of epochs and the early stopping patience, no large differences were observed in the behavior of the loss function as a function of the scanned values.Figure 4 presents the results of the hyperparameters tuning process for the (A) Gao et al. and (B) Zhao et al. datasets.
estimated with all the MRF techniques are comparable with the reference T 1 and T 2 maps, validating the MRF acquisition designs proposed by Gao et al. and Zhao et al.28,29

Figure 7
presents the distributions of the PREs of the two regions of interest (ROIs) marked in the upper left corner.We can see that the PRE distributions for the voxels of the two ROIs reflect the PRE distributions F I G U R E 4 Results of the optimization process for the (A) Gao et al. dataset and (B) Zhao et al. dataset as a function of each hyperparameter tuned.We use violin plots and scatter plots to graphically represent the distribution of the loss (mean squared error) values.Red dots highlight the best configuration.LSTM, long-short term memory; MLP, multilayer perceptron; NN, neural network; RNN, recurrent neural network.obtained over the entire image (Figure 6).However, the mean values and the standard deviations of the two ROIs are smaller than those obtained for the complete image.Overall, T 1 and T 2 estimates for white matter are slightly better than those for gray matter.Considering the SSIM of the T 1 map, the DL method reaches a value of 0.91 for the Gao et al. dataset and 0.92 for the Zhao et al. dataset; while for the T 2 map, the SSIM is equal to 0.88 for the Gao et al. dataset and 0.89 for the Zhao et al. dataset.Concerning the dictionary-based reconstruction F I G U R E 5 Comparison between the T 1 and T 2 true maps (first column), those reconstructed with the traditional dictionary-based method (Dict) and those generated with the deep learning (DL) method.The second and third columns represent the results for the Gao et al. dataset, while the fourth and fifth columns represent the results for the Zhao et al. dataset.Regions associated with the background and the skull are set to the minimum value of each map.Error maps are computed voxel-wise as the percentage relative error (PRE).

F I G U R E 6
Boxplots of the percentage relative errors of the T 1 and T 2 maps reconstructed with the deep learning (DL) methodology and with the dictionary-based (Dict) method for the Gao et al. and Zhao et al. datasets.Outlier values were excluded from the plot, as well as background and skull PREs.Red dots represent the mean values of the distributions.method, the T 1 map achieves a SSIM of 0.91 for both the Gao et al. and the Zhao et al. dataset; considering the T 2 map, the model reaches a SSIM equal to 0.73 for the Gao et al. dataset and 0.81 for the Zhao et al. dataset.F I G U R E 7 Boxplots of the percentage relative errors of T 1 and T 2 values computed on the two regions of interest (ROIs) highlighted in the upper left image.We use the red color to represent the ROI denoted with white matter that corresponds to the corpus callosum, while blue represents gray matter, which corresponds to the cortical area.The graph shows the PRE distributions obtained with the deep learning (DL) methodology and with the dictionary-based (Dict) method for the Gao et al. and Zhao et al. datasets.Outlier values were excluded from the plot.Red dots represent the mean values of the distributions.F I G U R E 8 Boxplots of the percentage relative errors of T 1 and T 2 test maps as a function of different sequence lengths of 20, 40, 60, 100, 200, 400, and 600 time points for the Gao et al. acquisition schedule (first row) and of 20, 40, 60, 100, 200, and 400 time points for the Zhao et al. acquisition schedule (second row).Outlier values were excluded from the plot.Red dots represent the mean values of the distributions.

T 1 and
T 2 maps are consistent with a minimum k-space percentage of 40% for both the Gao et al. and Zhao et al. sequences.Considering the traditional reconstruction method, we found that a minimum sampling percentage of 50% is required to estimate accurate maps for both the Gao et al. (40% T 1 and 50% T 2 ) and the Zhao et al. (40% T 1 and 40% T 2 ) datasets.For the two acquisition schemes and reconstruction methods, we F I G U R E 9 Boxplots of the percentage relative errors of T 1 and T 2 test maps as a function of the k-space sampling percentage.For both datasets, we tested 10 sampling percentages ranging from 10% to 100% (100% corresponds to full k-space sampling).Outlier values were excluded from the plot.Red dots represent the mean values of the distributions.

All simulations were performed on a 2 .
60 GHz Intel Core i7 central processing unit with 12 cores and 16 GB RAM.The DL model training process, realized on the dataset of 128 Â 128 Â 5 examples, took around 5 min and 38.4 s for the Gao et al. sequence and 8 min and 39.6 s for the Zhao et al. sequence.T 1 and T 2 map predictions, each consisting of 128 Â 128 pixels, required about 0.23 s for the Gao et al. dataset and 0.67 s for the Zhao et al. dataset.Considering the dictionary-based reconstruction method, the dictionary generation took around 57 and 34 min for the Gao et al. and the Zhao et al. datasets, respectively.The map estimation process took 26.26 s for the Gao et al. dataset and 24.58 s for the Zhao et al. dataset.The DL method enables speeding up the reconstruction process by a factor of 114 for the Gao et al. sequence and by a factor of 37 for the Zhao et al. sequence.
scan of the search space by sampling more densely the most promising regions of the space to find the best configuration that can optimally solve the MRF problem.Using this strategy, we compared two different architectures of the NN by testing the MLP and a recurrent NN composed of an LSTM block.The results showed that for the Gao et al. dataset the MLP model outperformed the RNN, both in terms of lower reconstruction errors and stability of the performances, while for the Zhao et al. dataset, the RNN was the best architecture of the NN.Architectures with convolutional layers achieved lower accuracy.This can be explained by considering the fact that the values of FA and TR defined in Gao et al. and Zhao et al. are not ordered.In future studies, it would be valuable to explore the potential of different architectures, like the self-attention model proposed by Hong et al.37 Gao et al. and Zhao et al.datasets to obtain the best configuration for both acquisition schemes.We performed 1000 iterations of the optimization algorithm and selected the configuration that minimized the tuning loss, L.The best configurations for theGao etal.and Zhao et al. datasets given by optimization scan are listed in Table 1.The best architecture of the NN for the Gao et al. dataset is defined by MLP with a single hidden layer composed of 15 dense units, while the NN's optimal architecture for the Zhao et al. dataset consists of a RNN composed of a LSTM block with 64 units.The sigmoid activation function is preferred for the output layer, as well as the Glorot Gao et al. and Zhao et al. datasets.All metrics were evaluated on the independent test set.Abbreviations: CED, convolutional encoder-decoder neural network; CNN, convolutional neural network; PRE, percentage relative error; SD, standard deviation; SSIM, structural similarity index.The best result in each column is highlighted in bold text.