A comparison of quantitative methods for clinical imaging with hyperpolarized 13C‐pyruvate

Dissolution dynamic nuclear polarization (DNP) enables the metabolism of hyperpolarized 13C‐labelled molecules, such as the conversion of [1‐13C]pyruvate to [1‐13C]lactate, to be dynamically and non‐invasively imaged in tissue. Imaging of this exchange reaction in animal models has been shown to detect early treatment response and correlate with tumour grade. The first human DNP study has recently been completed, and, for widespread clinical translation, simple and reliable methods are necessary to accurately probe the reaction in patients. However, there is currently no consensus on the most appropriate method to quantify this exchange reaction. In this study, an in vitro system was used to compare several kinetic models, as well as simple model‐free methods. Experiments were performed using a clinical hyperpolarizer, a human 3 T MR system, and spectroscopic imaging sequences. The quantitative methods were compared in vivo by using subcutaneous breast tumours in rats to examine the effect of pyruvate inflow. The two‐way kinetic model was the most accurate method for characterizing the exchange reaction in vitro, and the incorporation of a Heaviside step inflow profile was best able to describe the in vivo data. The lactate time‐to‐peak and the lactate‐to‐pyruvate area under the curve ratio were simple model‐free approaches that accurately represented the full reaction, with the time‐to‐peak method performing indistinguishably from the best kinetic model. Finally, extracting data from a single pixel was a robust and reliable surrogate of the whole region of interest. This work has identified appropriate quantitative methods for future work in the analysis of human hyperpolarized 13C data. © 2016 The Authors. NMR in Biomedicine published by John Wiley & Sons Ltd.


INTRODUCTION
Functional and molecular imaging is increasingly used as a routine clinical tool in many areas of oncology. The advent of molecularly targeted drugs, combinational therapies, and personalized medicine has resulted in an increasing requirement for specific imaging methods to monitor drug efficacy; consequently, new imaging methods to probe tumour biology are required that demonstrate both intra-and inter-patient repeatability.
Dissolution dynamic nuclear polarization (DNP) is a new imaging test, which has the potential to image tissue biology. The technique increases the signal-to-noise ratio (SNR) of molecules containing one or more 13 C nuclei by more than 10 000-fold above thermal levels; this process is undertaken outside of the animal or patient, and the hyperpolarized molecule is subsequently injected intravenously (1). When combined with 13 C-MRSI, this increase in SNR allows real-time in vivo metabolism of the molecules to be imaged non-invasively (2).
There are now many 13 C-labelled probes that have been successfully hyperpolarized using DNP, and these have been used to interrogate many aspects of tissue biology and metabolism that occur in a wide range of disease processes (3)(4)(5)(6)(7)(8). [1-13 C]Pyruvate is the most extensively studied of these probes; the major reaction of pyruvate is its conversion into [1-13 C]lactate, which is catalysed by the enzyme lactate dehydrogenase (LDH) and requires the reduced form of nicotinamide adenine dinucleotide (NADH) as a cofactor. Hyperpolarized [1-13 C]pyruvate may also form [1-13 C]alanine, [1-13 C]bicarbonate, and [1-13 C]pyruvate-hydrate in a pH-dependent reaction (9). These reactions have been well characterized in vivo in pre-clinical animal models of disease (10)(11)(12)(13)(14) and there are now a number of sites worldwide that are developing the technique for human use (15).
The success of hyperpolarized 13 C-labelled pyruvate as a cancer imaging biomarker is largely dependent on the phenomenon of aerobic glycolysis within tumours known as the Warburg effect (16). There is a high lactate concentration within most cancers, even in normoxic conditions, and the hyperpolarized 13 C signal may rapidly exchange between the injected [1-13 C]pyruvate and the endogenous lactate pool. Imaging of this exchange reaction in animal models has been shown to detect early treatment response (17,18) and correlate with tumour grade (19). The results from the only clinical study performed to date have shown that labelled lactate may be present in small tumours that are not visible with standard proton imaging techniques (15).
The sensitivity of 13 C-MRSI to small changes in metabolic rate offers the potential to non-invasively monitor metabolic alterations in patients. However, for any novel imaging technique to be widely adopted, it must be repeatable and reproducible, and will ideally utilize simple and robust quantitative methods for analysis. Current clinical approaches for the analysis of dynamic contrast-enhanced (DCE) MRI and positron emission tomography (PET) can offer insight into quantitative imaging with DNP. For example, the maximum standardized uptake value (SUV max ) is a very simple and powerful routine clinical tool to quantify metabolism in PET imaging with 18 F-fluorodeoxyglucose (20). For each voxel of a PET image, the SUV is defined as the tissue radioactivity concentration divided by the injected radiation per unit body weight, correcting for decay from the time of injection. The SUV max is the voxel of highest SUV within a region of interest (ROI), while the SUV mean is the average over that region. A number of methods have been employed to quantify the pyruvate-lactate exchange reaction measured using DNP, many of which involve fitting kinetic models of varying complexity to imaging or spectroscopy data to produce the forward reaction rate constant k P as a quantitative marker (17,19,21,22). Simple methods for estimating k P from the time course of pyruvate and lactate have been suggested (23), as well as methods using model-free parameters such as the area under the metabolite curve (AUC) (24), or ratios of the metabolite signal peaks (25,26). However, there is currently no consensus on the best method to characterize the pyruvate-lactate reaction, either for research purposes or for more routine clinical use.
In this study, we used both in vitro and in vivo dynamic hyperpolarized data to comprehensively compare a range of kinetic models and simple analysis parameters. We used a clinical hyperpolarizer and imaging sequences with a 3 T MR system, as well as a pyruvate concentration similar to the blood pyruvate concentration we anticipate in future patient studies. The aim was to determine which quantitative parameters are most appropriate to describe the dynamic time-course data acquired, testing for accuracy, simplicity and robustness in each case. Each analysis was also applied in vivo to rats with subcutaneous tumours, which were imaged with the same spectroscopic imaging sequence to examine the effect of pyruvate inflow.

In vitro experiments
Research grade fluid paths (GE Healthcare, Milwaukee, WI, USA) were filled with 100 μl of [1-13 C]pyruvic acid doped with 15 mM of an electron paramagnetic agent (trityl radical AH111501, GE Healthcare, Milwaukee, WI, USA) and 30 ml of dissolution fluid containing 1 g l À1 ethylenediaminetetraacetic acid (EDTA, GE Healthcare). Samples were polarized using a clinical hyperpolarizer (SPINlab, GE Healthcare) at approximately 0.9 K and 5 T for an average of 101 min (range 83-149 min) to an average polarization of 21% (range 10-38%) at the time of measurement using a benchtop NMR polarimeter (Oxford Instruments, Abingdon, UK). Following rapid dissolution, 1.4 ml of neutralization medium containing 0.72 M NaOH, 0.4 M Tris buffer and 0.1 g l À1 EDTA (GE Healthcare) was added to the solution. The final pH ranged from 6.7 to 7.4 with an average of pH 7.2.
Imaging phantoms consisted of 15 ml Falcon tubes filled to 14 ml with fivefold concentrated phosphate buffered saline at pH 7.2 and containing the coenzyme NADH at 4.4 mM (Sigma-Aldrich, Gillingham, UK). L-LDH from rabbit muscle (Sigma-Aldrich, Gillingham, UK) was added in quantities varying from 0 to 120 U, a range chosen to incorporate the expected range of k P in human blood based on the only published study to date (mean ± standard deviation (SD) of 0.045 ± 0.025 s À1 ) (15). 1 ml of the above 60 mM hyperpolarized solution was added simultaneously to the three tubes making up each set immediately before imaging, giving a final pyruvate concentration of about 4 mM. This is similar to the final blood concentration in the rats (~3 mM) and that expected in patients (~1.5 mM). If uniform distribution is achieved in the body in the timescale of the half-life of hyperpolarization, then the tissue concentrations would be lower. Neither pyruvate nor NADH were rate limiting. The time between dissolution and imaging was approximately 35 s.

In vivo experiments
Four adult female Fischer 344 rats (Charles River, Sulzfeld, Germany; 165 ± 6 g body weight) bearing subcutaneous mammary adenocarcinomas were imaged (27,28). Tumours were induced by implanting 1 × 10 6 MAT B III cells (syngenic breast cancer cell line), and imaging was performed 12-16 days after cell implantation. Animals were anesthetized with 1-3% isoflurane, monitored for ECG, breathing, and temperature, and kept warm on a heating pad with circulating warm water. The time delay between dissolution and injection was 15-20 s. The animal study was approved by the local governmental committee for animal protection and welfare (Tierschutzbehörde, Regierung von Oberbayern).
[1-13 C]Pyruvic acid doped with 15 mM of the trityl radical OX063 and 1 mM gadoteric acid (Guerbet, Paris, France) was polarized in a HyperSense DNP polarizer (Oxford Instruments) for approximately 45 min at 1.4 K and 3.35 T to a polarization of approximately 25%. Dissolution fluid, containing 80 mM NaOH, 80 mM Tris buffer, and 0.1 g l À1 EDTA dissolved in water, was heated to 185°C and used to rapidly dissolve the polarized sample. The final solution contained 80 mM [1-13 C]pyruvate at pH 7.6 C. J. DANIELS ET AL.
wileyonlinelibrary.com/journal/nbm and physiological temperature and osmolarity. This was injected into a tail vein inside the MRI scanner at a rate of approximately 0.2 ml s À1 and at a dose of 2.5 ml kg À1 . Dynamic imaging was performed from the time of injection.

Spectroscopic imaging
All imaging was carried out on a clinical 3 T MRI system (Signa HDx, GE Healthcare) using imaging sequences from the Multinuclear Spectroscopy (MNS) research pack Version 2.0 (GE Global Research, Munich, Germany). Enzyme phantoms were imaged in sets of three using a 13 C-1 H multi-nuclear receive/transmit coil (GE Coils, Aurora, OH, USA). First, four 90°non-localized spectralspatial pulses with T R = 1 s centred on the lactate Larmor frequency were applied (27); this was found to be the optimum number for removing signal from any lactate labelling prior to commencement of the experiment whilst retaining maximum pyruvate polarization. These were immediately followed by a 3 min IDEAL (iterative decomposition with echo asymmetry and least-squares estimation) spiral chemical shift imaging (CSI) acquisition (29) acquired using a single 20 mm axial slice. Each excitation is followed by a single-shot spiral image encoding module, with echo-time shifting of 1.12 ms between excitations. Seven time-shifted echoes plus a single free induction decay (FID) spectrum are acquired in total for each time step, with the chemical-shift information from the FID providing prior knowledge for the reconstruction. Other parameters were T R 500 ms, flip angle 5°, field of view (FOV) 80 mm, nominal matrix resolution 32 × 32 and 4 s time resolution.
Animals were imaged on a similar clinical 3 T MRI system (Signa HDx, GE Healthcare) using a rat-sized 13 C-1 H multinuclear birdcage coil. The same IDEAL spiral CSI acquisition was performed through four axial 10 mm slices over 1 min with a temporal resolution of 4 s. 13 C-pyruvate is replenished by inflowing blood in vivo, so a larger flip angle of 10°was used. All other parameters were consistent with the in vitro experiments. Single metabolite k-space data was first reconstructed by matrix inversion with off-resonance correction, followed by Cartesian regridding for spatial reconstruction (29). Gaussian k-space filtering was applied during post-processing on both data sets, resulting in an effective image resolution of 5 × 5 mm 2 . For anatomical reference, standard gradient echo proton images were acquired from the same slice geometry and FOV (resolution 256 × 256, slice thickness 3 mm, spacing 7 mm, T E 10 ms, T R 500 ms).

Data analysis
Imaging data was exported into MATLAB (MathWorks, Natick, MA, USA) for analysis using custom built software. Images were partially noise-corrected by subtracting the average squared background noise from the power images (30). Dynamic timecourse data for pyruvate and lactate were extracted from the images, using two methods for comparison. In the first, an ROI was defined by thresholding the t = 0 pyruvate image at 40% of the maximum pyruvate signal within each phantom for the in vitro data, or by outlining the tumour boundary based on the anatomical proton image for the in vivo data. This ROI was applied to all subsequent images in the series and the pixels within the ROI were averaged to produce the relative signal strengths of each metabolite at each time point. The second method defined a pixel of interest (POI) as the pixel within each phantom, or within the rat tumour, that demonstrated the highest lactate signal after averaging over all time points. Dynamic data from these two extraction methods was separately analysed using each of the methods detailed below.

Kinetic modelling
The exchange reaction between pyruvate and lactate, along with the irreversible hyperpolarized signal loss due to spin-lattice relaxation and applied RF excitation, can be characterized by the following coupled differential equations: where P and L are the relative pyruvate and lactate signal intensities, k P and k L are the forward and backward reaction rate constants respectively and ρ is the effective relaxivity given by where T 1 is the relaxation time of pyruvate in the medium or tissue, t R is the known repetition time of the applied RF pulses and θ is the flip angle. To reduce the number of parameters to be fitted, we assume throughout that the pyruvate and lactate relaxivities are equal, which in the limit of fast exchange between metabolites can be shown to be a good approximation (31), and use the above expression to correct for RF flip angle. Each model variant was applied to yield the forward reaction rate constant k P , the T 1 decay constant, their standard errors and other fitting parameters where appropriate. Constrained fits were carried out in MATLAB using the fmincon function unless otherwise stated. Solutions were found by minimizing the negative logarithm of the maximum likelihood function, which for N data points is given by Here K is the vector of free parameters to be fitted, σ 2 is the lactate variance, L i and P i are the model estimates of the metabolite signal strengths and L i data and P i data the measured values for the ith data point. This likelihood function is derived from a Bayesian extension to least squares minimization that addresses the disproportionately large early pyruvate signal, as described by Hill et al (24). It incorporates a lower-bound prior on the pyruvate noise, which acts to reduce the impact of any large early pyruvate residuals which can otherwise dominate the fit. Due to the sensitivity of the optimization algorithm to the initial values of P and L, we used a Monte Carlo method to randomly vary these inputs over 1000 runs for each model fit. Inputs were allowed to vary over a Gaussian distribution centred, with an SD of 5%, on the measured value of P or L at t = 0.

The two-site exchange model
For the two-way differential model, the above set of differential equations were solved simultaneously to fit the pyruvate and lactate time-course data with k P , k L and ρ as free parameters. The equations may also be solved analytically to give the following solutions: These equations themselves may then be fitted directly to the data in the two-way integral model, which is computationally faster than applying the full differential fit but may be more sensitive to the initial values of the parameters. For the in vitro data, the latter two models are virtually identical apart from the fitting algorithms used; however, they require different approximations for metabolite inflow when used in vivo. A popular simplification of the model is to set the backward reaction rate constant k L to zero (19,21,22,32,33). This is argued to be a reasonable assumption, since k L is often around 1/10 of the forward rate constant k P and, depending on the model used, may not be a mathematically distinct parameter. To investigate the validity of the one-way model, the above integral solutions with k L = 0 were also applied to the data.

Ratiometric model
A ratiometric model has been suggested by Li et al (31). Briefly, when the ratio is taken of the above integral solutions, ρ can be eliminated as a parameter; this new model can be fitted to the ratio of the lactate-to-pyruvate data to solve for the rate constants k P and k L . Prior to fitting with the nlinfit function in MATLAB, ratio data was smoothed by averaging over every three data points. The lactate data was used to weight the fit towards the region in which the rate constants are stable and ρ was subsequently derived as a single unknown by inputting the k P and k L obtained from the ratiometric fit into the differential kinetic model.

Model-free methods
Fall minus rise at height α approach Pagès and Kuchel suggest a method for estimating k P and ρ from graphical features of the time course, which they call the FmR α (fall minus rise at height α) approach (23). They suggest parameter β to be the ratio of the lactate-to-pyruvate signals for the time at which the lactate signal is maximal, then show this to be mathematically equal to the product of k P and T 1 eff . Furthermore, they show that an estimate for T 1 eff can be obtained from the width of the lactate curve at a specific height α. Although they suggest it is sufficient to use a consistent value of α = 0.8, we calculated α explicitly in each case to avoid incurring a nonphysical correlation between k P and T 1 .

Lactate-pyruvate peak ratio
The in vitro T 1 eff is only affected by physical factors such as magnetic field and temperature. Since these factors are controlled for, T 1 eff is not expected to show a large variation. The above parameter β, which we re-name the L-P peak ratio, could therefore be expected to correlate with LDH concentration, and as such was investigated as a potential parameter of interest.

AUC ratio
The ratio of the lactate-to-pyruvate AUC has been shown by Hill et al. to be independent of the shape of the pyruvate inflow (24), making it an excellent candidate for analysis of in vivo data. If it is assumed that only the pyruvate has an explicit input function and that the concentration of both metabolites is zero at t = 0, then using the two-way, two-site model the AUC ratio can be shown to be proportional to k P . For in vitro data this ratio was calculated in two ways: first taking the AUC ratio of the available data (AUC data) and second fitting bi-exponential solutions extrapolated back to t = 0 (AUC fit).

Time-to-peak measurements
Finally, we suggest a new approach that examines the time difference from the start of the lactate build up to its peak, as has been used in DCE-MRI. Starting from the two-way model, the rate equations can be solved analytically to find the time at which the lactate reaches its maximum by differentiating the lactate solution, and solving for t when this is zero. The time-topeak (TTP) is then given by The TTP should show a roughly inverse correlation with LDH concentration. A bi-exponential function was fitted to the timedomain lactate data and extrapolated back to find the predicted start time of the increase in lactate signal from zero. Schematic diagrams of each of the models and model-free methods described above are presented in Figure 1a-f.

In vivo imaging
In vivo analysis of pyruvate metabolism is complicated by the flow of injected metabolites in the bloodstream, tissue diffusion and cellular transport. For full modelling of this data, a pyruvate inflow function (PIF) must be included to describe the large initial pyruvate peak as the injected bolus reaches the tissue of interest. In the interest of simplicity, each of the four model variants was fitted only to data from the pyruvate peak onwards, which in each case occurred at the third data point. Each of the model-free methods was implemented on the entire dataset. A common way to approximate the PIF is to use a box-car or trapezoidal function, and an alternative is to fit a piecewise integral solution. The latter involves splitting the pyruvate profile into two segments described by separate equations: the first for constant pyruvate inflow, and the second describing signal decay and conversion to lactate (19,(34)(35)(36). Here we have used a Heaviside step function to incorporate a continuous PIF into the pyruvate two-way differential equation with the same characteristics as a box-car function: wileyonlinelibrary.com/journal/nbm where k i is the pyruvate flow rate and t e is the time at which the inflow ceases. We compared this to the piecewise one-way integral model described by Zierhut et al. (19) in two ways: fixing t e at the peak of the pyruvate curve or fitting for it as an extra parameter.

Statistical analysis
The quantitative parameters derived using each method were tested for their correlation with the known in vitro LDH concentration. These correlations were generally linear. Both Pearson (linear) and Spearman (rank) correlation coefficients and their two-tailed p-values were calculated, so that it was possible to compare all methods directly whilst making no assumptions about the nature of the dependences. In order to determine whether one analysis approach was significantly better than another, Steiger's z-test for dependent correlations was implemented pairwise on the Pearson then Spearman coefficients produced by each method, and the two-tailed p-values calculated (significance p < 0.05) (37). In addition, a simple linear regression model was applied in each case to calculate the adjusted R 2 values both with and without a robust bi-square weighting function applied to the data. The difference in R 2 with and without the weighting, denoted the 'instability' factor, provides a crude indicator of how much the fit is dominated by outliers and therefore how robust a particular method may be. The mean, range and SD of T 1 values obtained from all phantoms taken together were also analysed for each quantitative method to assess goodness of fit. Since T 1 is expected to be highly consistent in vitro, a large range and high SD suggests that a model is not fully describing the data or is underparameterized.
To assess how well each model was able to fit the in vivo data, the corrected Akaike information criteria (AICc) were calculated (38). The AICc uses the minimized likelihood function values to examine how well each model is able to describe the data, whilst imposing a penalty for each extra free parameter to guard against overfitting; a low AICc denotes higher model accuracy. The relative likelihood that each model is correct as compared to the model with the lowest AICc was then calculated. To investigate model-free parameters in vivo, correlation coefficients and R 2 values were again calculated; however because the enzyme activity was an unknown, they were compared against k P values derived using the most likely model with ROI or POI data as appropriate. The ROI and POI results were combined when calculating correlation coefficients due to the small size of the dataset.

In vitro modelling
Thirty-three 15 ml phantoms containing 0 U (n = 2), 20 U (n = 5), 40 U (n = 6), 60 U (n = 6), 80 U (n = 6), 100 U (n = 5) and 120 U (n = 3) of LDH enzyme were imaged in sets of three, for which representative images can be seen in Figure 2a. Ringing artefacts, noise spikes and other interference were visible in some images; however, since it was of particular interest to test the quantitative analysis methods for robustness against such common artefacts, none of the data was excluded. The quantitative parameters obtained with each analysis method were tested for their correlation with the known scale of phantom LDH concentrations. The two-tailed p-values calculated from the Pearson and Spearman correlation coefficients were all highly significant (p < 0.001).
The results of the analyses are summarized in Table 1. For the model-based approaches, application of Steiger's z-test to the Pearson coefficients for the ROI data demonstrated the ratiometric method to be significantly poorer than the other three models, which were not found to be significantly different. However, the POI data showed the one-way model to be worse than the other methods tested. In both cases, the differential  lines) extracted from the 40 U phantom using the ROI method and fitted using the differential two-way kinetic model (dashed lines). (c) Correlation with LDH enzyme concentration of the forward reaction rate constants k P derived from four model variants (differential, integral, ratiometric and one-way models). Note that the scale for k P varies between plots. and integral models are better able to constrain T 1 compared to the other three methods, which suggests that they are more closely modelling the data. A fit with the differential model is shown in Figure 2b. The variation in k P between the integral and differential versions of the two-way model is a function of the algorithm used for fitting the data, but this difference was shown to be insignificant (p > 0.2). Despite the simplicity of the FmR α approach, k P values obtained from the FmR α analysis were not significantly different from those obtained by modelling methods, other than the ROI integral model.
In contrast, the model-free methods showed greater variation, with the TTP proving to be significantly better, and the L-P ratio significantly worse, than other methods tested. The AUC from the extrapolated fits also performed well, being indistinguishable from the modelling approaches. When p-values were calculated from the Spearman coefficients, the results largely supported those from the Pearson coefficients with the exception of the TTP, which performed least well. Table 2 shows full significance matrices for both ROI and POI data from the pairwise application of Steiger's z-test.
The sensitivity of k P may be examined in terms of the offsets of the fit lines at the origin. The cause of these offsets is twofold: first there is a positive skew caused by the divergent k P values produced at high LDH concentrations, and second sensitivity limitations at very low concentrations cause an initial monotonic but non-linear rise in k P with increasing LDH. The relative sensitivity of each model to these features is characterized by the difference in the two correlation coefficients; they are accounted for well by the Spearman coefficients, which allow regions of monotonic non-linear increase, but will incur a penalty with the Pearson coefficients, which enforce linearity.

In vivo modelling
Each of the four models tested in vitro were fitted to both ROI and POI derived time-course data from four rats with subcutaneously implanted tumours. An additional three models were tested which incorporated a PIF. The AICc scores and the relative likelihood of each model correctly describing the observed data were then calculated; the results of this, along with the T 1 mean, range and SD, are shown in Table 3. The T 1 values obtained were slightly higher than those previously estimated in healthy rats of about 15 s, but consistent with other rodent tumour T 1 values (19,39). The in vivo results were similar to those found in vitro, with the differential and integral models statistically indistinguishable from each other, but significantly more likely than the one-way or ratiometric models to accurately describe the data. Models incorporating a PIF were applied to a larger number of data points than the remaining models to include data prior to the pyruvate peak. For this reason, the relative likelihood values for the PIF models were calculated separately to compare just these three, as shown in Table 3.
The differential model with the Heaviside step PIF provided the lowest AICc of the three, fitting the data significantly better than the piecewise models. Interestingly, the penalty for allowing the end of inflow time t e to vary as an extra free parameter within the piecewise model was greater than the improvement this made to the fit. It has been stated by previous users of the model that t e should correspond to the injection length starting from t 0 (19,35), which in the case of the rats was around 2 s. The fitted t e calculated here was 8-10 s, which corresponded to the pyruvate peak time instead. For this reason, t e was then fixed at the pyruvate peak time, as was the centre of the downslope for the Heaviside step function used in the differential model. Figure 3b shows fits of the differential PIF model as compared with the fixed t e piecewise model to ROI time courses from one animal (Rat 1).
In the absence of a gold standard as a comparator, it was difficult to assess the model-free parameters in vivo. However the differential model with a Heaviside step PIF was used as a surrogate standard, given its low AICc. Based on comparison with these calculated k P values, the TTP significantly out-performed the other approaches and the L-P ratio performed least well, being the only method to produce no significant correlation coefficients. The AUC also correlated very well with k P , with both correlation coefficients having p < 0.005. Parameter maps for k P were generated for each of the four rats and are shown in Figure 4: in vivo parameter mappings derived from the model-based methods were largely similar, and two representative examples are shown using the one-way and two-way non-PIF integral models. Unfortunately, many of the model-free parameter mapping Table 2. Significance matrix for in vitro data from ROI data (top) and POI data (bottom). Matrix shows p-values from applying Steiger's z-test for comparing the relative strength of correlations pairwise to Pearson coefficients; the Pearson coefficients test the correlation between quantitative parameters produced by each method, and known enzyme concentration. * indicates that the better ranking (high 1-9 low) analysis method of the two is significantly so (significance level p < 0.05) ROI analysis Pearson Raw rank k P integral k P one-way k P ratiometric k P FmR α L-P ratio AUC data AUC fit TTP wileyonlinelibrary.com/journal/nbm methods were very sensitive to noise in regions of low lactate, rendering them difficult to interpret, which is an inherent limitation of some of these more simplified approaches. A visual inspection of the two sets shows that both models were able to distinguish the implanted tumours from normal tissue; however, the two-way maps appeared to better highlight the full tumour and offered higher contrast-to-noise ratio, showing increased sensitivity over the one-way model. k P values were capped at 0.25 s À1 , a value chosen to be suitably above those previously reported in rat tumours (~0.1 s À1 ) (28); only one rat (Rat 4) had an area of very high activity greater than this. The mean tumour k P and range were both higher with the two-way than the one-way model, and in general the k P maps demonstrated significant intratumour heterogeneity.

DISCUSSION
Metabolic imaging with hyperpolarized 13 C-labelled molecules is an emerging clinical tool to non-invasively detect real-time metabolism. In order to utilize this dynamic dataeither as a research technique or ultimately as a clinical toolmathematical analysis methods are required that are sensitive to the small changes in metabolism that occur during tumour growth and following treatment, whilst being insensitive to noise and artefacts. Many of the approaches used to analyse the metabolism of hyperpolarized pyruvate are analogous to the methods used in other areas of MRI research (such as DCE-MRI), as well as those used with PET. We have performed a comprehensive analysis of the main quantitative techniques used to analyse hyperpolarized data, as well as some novel approaches, with the aim of determining which are the most appropriate as research tools and which may have potential for clinical application in the future. Here we have used imaging data, rather than spectroscopic data, as this will be more applicable to patient studies, and have compared two methods of extracting the dynamic time-course data from these images. The approaches we studied can be divided into two groups: model-based and simpler model-free analyses. Of the four kinetic model variants, the differential and integral two-way models performed best, showing strong correlation with in vitro LDH enzyme concentration and producing the most accurate modelling in vivo (as determined by a low AICc). Furthermore, the T 1 values produced by these models, which we expect to be constant in vitro, had the smallest ranges and SDs of those tested, implying good parameterization. The one-way model is a very popular approach because it allows for the use of Michaelis-Menten kinetics in order to solve for real, as opposed to apparent, reaction rate constants. However, this requires an assumption that is not fulfilled by the hyperpolarized exchange reaction: that the reaction has reached chemical equilibrium (40). The one-way model was found to be unstable at higher enzyme concentrations within the physiological range, producing divergent k P values almost three times higher than those given by the two-way models (Fig. 2c). This divergence was also seen in some of the in vivo data sets (Fig. 4, Rat 4), and is likely to be a consequence of the approximation k L = 0, which becomes less valid with increasing enzyme activity or in the presence of large pools of lactate that are freely exchanging with the pyruvate. A previous comparative study by Harrison et al. (41) demonstrated the shortcomings of the one-way model by showing that results from fitting to hyperpolarized data were incompatible with those from mass spectrometry data. This suggests an inconsistency between the model and the underlying biology; the breakdown of the model at high enzyme activity that we have observed here provides further evidence against the use of the one-way model. We also examined the effect of explicitly modelling the pyruvate inflow within both the differential and integral (piecewise) models. Kazan et al. suggest modelling the PIF with a gamma-variate function, which, although an excellent descriptor for the flow profile, requires accurate measurement of the arterial input function either with an invasive arterial line, or by image acquisition from a large vessel, which may be difficult to obtain from human hyperpolarized data (36). We therefore looked at incorporating simpler functions. From calculations of the AICc, the differential model with a Heaviside step PIF was significantly more likely to correctly describe the data than either variant of the piecewise model. The fitted pyruvate inflow end-time t e in this model correlated poorly with the measured injection length; this mismatch is indicative of the difficulty in fitting theoretical arterial input functions to hyperpolarized data with a low SNR and temporal resolution. Complicated inflow functions introduce several additional free parameters, which overfit noisy data and increase the error of the derived rate constants. With the current resolution limitations for clinical hyperpolarized imaging, simple approximations for pyruvate inflow, such as those used in this study, are therefore required.
The above methodologies are desirable when accurate fitting is required for research purposes; however, kinetic modelling can be computationally intensive and time consuming. We tested four model-free methods as candidates for providing simple, robust parameters that are fully representative of the metabolic exchange reaction. Such approaches may allow easy intra-patient and inter-patient comparison across clinical sites. The FmR α method described by Pagès and Kuchel (23) provided a better correlation with LDH concentration and tighter constraints on T 1 than two of the modelling methods. However, in our study it was necessary to calculate α explicitly in each case, rather than using a constant value as suggested by the authors, to produce realistic results. Simpler still, and producing similar statistical results, is the AUC ratio described by Hill et al. (24), which is independent of the PIF. The lactate-to-pyruvate ratio at the time of maximum lactate signal was consistently the poorest method for describing the data. The final approach using TTP showed the strongest correlation with enzyme concentration in vitro of any analysis and the strongest correlation with k P values derived using the differential PIF model in vivo of any model-free method.
The small size of the in vivo dataset means it is not possible to claim significance of the TTP over the AUC, and so further testing of these two methods on larger in vivo data sets is required. Nonetheless, these preliminary results along with the in vitro results suggest that the TTP may prove to be a very robust quantitative marker that is able to provide an equivalent assessment of metabolism to fitting a full PIF kinetic model. Finally, we looked at two methods for extracting dynamic timecourse data from the images and assessed them for their effect on quantification. Data extracted using the ROI method provided universally stronger correlations with the in vitro enzyme concentrations than the POI method, although this was not significant in the majority of cases. Averaging results over a tumour ROI may provide a more sensitive measure of metabolic change in that region which is robust to experimental noise; however, it tells nothing of the local heterogeneity nor does it allow for tumours outside the region to be detected in the way that pixel-by-pixel analysis could. Furthermore, determining the anatomical limits of a tumour is subjective, and therefore difficult to reproduce accurately, even if performed by the same operator using the same dataset (42,43). Our findings suggest that extracting data from a single POI is sufficiently robust to provide a metric that reports on the whole area, and may provide a simple and reproducible method for analysing the data objectively over time. This approach is analogous to the SUV max used routinely in the analysis of clinical PET data. Under normal conditions, the POI (representing the highest lactate concentration over time) will lie within the boundaries of the tumour. Although unlikely, theoretically the POI could lie outside the tumour margin if it was so poorly vascularized that the time required to perfuse the tumour was long compared with the half-life of the polarized pyruvate signal, or if the systemic lactate level was very markedly elevated. Since single-pixel analysis was shown to be sufficiently robust, parameter mapping of k P was performed in the rats. This showed marked intratumoral heterogeneity, with some areas of low exchange within the tumours and regions of high activity extending beyond the tumour boundary. Due to the spatial resolution of the 13 C images, it is not possible to determine whether these findings are due to artefacts from partial voluming or whether there is underlying biological heterogeneity due to changes in metabolism or vascularity, although a highly heterogeneous vasculature is a known feature of the subcutaneous tumour type studied here (44).
The rate constant k P is generally considered to be the gold standard for quantification of hyperpolarized data, but several of the simpler non-model-based approaches used here appeared to provide a good approximation to the reaction. A good probe for tumour biology should be repeatable, reproducible, quantitative, highly sensitive and highly specific. The results of this study show that k P is a robust biomarker of LDH activity. However, at low LDH concentrations the modelling is more susceptible to noise, and at high LDH concentrations greater divergence was demonstrated, which may relate to undersampling. Importantly, this work has shown that these modelling methods performed well in the physiological range, and further research is required from human studies to validate this. Variation in k P may occur secondary to other factors such as SNR, which is dependent on polarization (45,46), pyruvate concentration (35) and imaging protocol (47). Therefore the optimization and standardization of each of these is an important step for multi-centre comparison, or if k P is to be considered as a clinical tool.
In conclusion, for accurate kinetic analysis of the hyperpolarized pyruvate-lactate exchange reaction in vivo, the two-way differential model with a Heaviside step PIF centred on the pyruvate peak was best able to characterize the data with the fewest free parameters. If the data prior to the pyruvate peak is not included in the analysis, the two-way differential and integral models performed equally well and were significantly better than the one-way model in the presence of high enzyme activity or when applied to the pixel-by-pixel analysis. As a simple parameter for clinical quantification of hyperpolarized imaging data, the TTP performed best both in vitro and in vivo, providing excellent correlation with model-derived k P values. Extracting data from an averaged ROI may provide the most sensitivity to small changes in metabolism; however, the POI approach is sufficiently robust to be applied pixel-by-pixel, allowing tumour heterogeneity to be probed. This work provides a basis for analysing data from future human trials in hyperpolarized 13 C imaging.