For phytoplankton and other microorganisms, the Michaelis-Menten (MM) equation is most commonly applied to describe the dependence of uptake rate on nutrient concentration [Dugdale, 1967; Harrison et al., 1996]:
where VMM is the uptake rate, S is the nutrient concentration, and Ks is the MM half-saturation constant for nutrient S. This equation is often combined with Arrhenius-type [Goldman and Carpenter, 1974] or similar exponential [Eppley, 1972] temperature dependence for Vmax.
 Compared to equation (1), Affinity-based kinetics provides a more natural and theoretically well-founded representation of uptake [Aksnes and Egge, 1991]. Affinity, A, is defined as the initial slope of rate versus concentration at low nutrient concentrations [Healey, 1980], so that:
Aksnes and Egge  showed that MM kinetics is equivalent to affinity-based kinetics under the assumption of fixed physiology (no acclimation in response to changing nutrient concentrations); i.e., for constant Vmax and A, equation (2) is mathematically equivalent to equation (1). Furthermore, equation (2) with temperature dependent Vmax and A is equivalent to equation (1) with temperature dependent Vmax and Ks. If Vmax and A share identical temperature dependence, Ks must be independent of temperature.
 However, experiments with various single-species cultures have found temperature dependent Ks for uptake of nitrogen, phosphorus and silicon [Eppley et al., 1969; Dauta, 1982]. Therefore, I examine the possibility of distinct temperature sensitivities for Vmax and A, by defining energies of activation, Ea,V and Ea,A, respectively, such that:
where T is temperature in K, R is the gas constant, and Vmax,r and Ar are the values of Vmax and A, respectively, at reference temperature Tr. If Ea,A = Ea,V, one set of Arrhenius terms cancels out after substitution into equation (2), leaving only one Arrhenius term in the numerator, which is equivalent to the widely applied assumption of temperature dependence only for Vmax [Goldman and Carpenter, 1974].
 Optimal Uptake (OU) kinetics extends equation (2) to include a physiological trade-off, whereby phytoplankton allocate internal resources to increase either Vmax or A, at the expense of reducing the other [Pahlow, 2005; Smith et al., 2009]. V0 and A0 are defined as the potential maximum values of Vmax and A, respectively, and their actual values are determined by physiological acclimation to the ambient concentration of growth-limiting nutrient. The data examined here [Harrison et al., 1996] were from typical short-term uptake experiments [Harrison et al., 1989], in which a series of incubations with graded nutrient additions is conducted for each sample taken from water having ambient nutrient concentration, Sa. Assuming that phytoplankton were pre-acclimated to Sa, OU kinetics gives the following equations for the dependence on Sa of affinity-based parameters, as measured by short-term experiments during which the phytoplankton do not have time to acclimate [Smith et al., 2009; Smith, 2010]:
Note that Sa, the ambient concentration in the ocean, is not the concentration S in the short-term incubation experiments using graded nutrient additions. This predicts that such short-term experiments will measure values of Vmax that increase with Sa and values of A that decrease with increasing Sa. For temperature dependence with OU kinetics I apply Arrhenius terms for V0 and A0, respectively, by defining Ea,V, Ea,A, V0,r and A0,r, exactly analogous to the above treatment for Vmax and A with MM kinetics equations (3a) and (3b).
 Data were those of Harrison et al.  as rearranged by Smith  to match observed ambient temperatures and nitrate concentrations to the reported values of Vmax (n = 60) and Ks (n = 48) as obtained from their short-term (∼3h) incubation experiments. At each of the locations, which covered the North Atlantic ocean, graded nutrient additions were made to separate bottles containing sampled seawater, which were then incubated ship-board at ambient temperature in order to measure nutrient uptake rates. They calculated parameters of the MM equation for nutrient uptake by fitting to the data so obtained locally, i.e., for the set of experiments conducted at each location, respectively. I calculate values of affinity as A = Vmax/Ks. For the set of short-term experiments conducted using each ambient water sample, respectively, the MM equation described the shape of the uptake response well, albeit with different values of Vmax and Ks for different water samples [Harrison et al., 1996]. Either equation (1) or equation (2) describes this same shape, the only difference being whether Ks or A is employed.
2.3.1. General Approach
 The Adaptive Metropolis (AM) algorithm [Haario et al., 2001; Laine, 2008] yields a consistent Bayesian statistical interpretation of the data set as a whole, providing a way to disentangle the combined effects of temperature and nutrient concentration. I chose to fit the affinity-based equation (2) to the data, because this equation allows a concise representation of both MM and OU kinetics, whereas expressing OU kinetics in terms of Ks using equation (1), although possible, is cumbersome and counter-intuitive.
 Two cases are examined: (1) the ‘Affinity model’ assuming no physiological acclimation (equivalent to MM kinetics) and (2) the ‘OU model’ assuming physiological acclimation according to OU kinetics. Arrhenius-type temperature dependence was assumed for maximum uptake rate and affinity, respectively.
 For the Affinity model, the temperature dependent expressions for Vmax and A were fitted to the respective data values, using the corresponding values of ambient temperature as independent variables. For OU kinetics, the temperature dependent expressions for V0 and A0 were substituted into the short-term approximations for their dependence on ambient nutrient concentration, equations (4) and (5), respectively. The resulting equations were fitted to the data in the same way as for the Affinity model.
2.3.2. Adaptive Metropolis Algorithm
 The Adaptive Metropolis (AM) algorithm [Haario et al., 2001; Laine, 2008], including Gibbs sampling to estimate the distribution of the standard error (variance) of each observation type [Laine, 2008], was used to fit each set of equations to the data. This algorithm is for the most part automatic and non-parametric; i.e., there are few arbitrary constants to be adjusted by the user. This provides a consistent comparison of each model, respectively, with the data set as a whole.
 In this application, Gibbs sampling provides weights for each data type, based on the mismatch between model and data, so that the ensemble of the fitted model output (posterior distribution) matches the distribution of the data. Specifically, if the model-data mismatch (residuals) for each data type o is a Normally (Gaussian) distributed random variable with mean zero and variance σo and the prior estimate of 1/σo2 is assumed to have a Gamma distribution, then the conditional distribution for each 1/σo2 (given the data and model) is also a Gamma distributed random variable [Carlin and Louis, 1996; Gelman et al., 2004]. Here Gibbs sampling exploits this property, called conjugacy of the prior and conditional distributions, to sample the posterior distribution of 1/σo2 based on its prior estimate together with information about the distribution of model-data mismatch, which comes from the unweighted sum of squared residuals [Carlin and Louis, 1996, chapter 5; Gelman et al., 2004, chapter 14].
 Output includes the sampled distribution of values for each parameter value fitted and distributions of the standard errors for each data type. Combining these gives the predicted range within which observations should lie, assuming the model is correct [Gelman et al., 2004; Laine, 2008].
2.3.3. Likelihood Function
 The likelihood is calculated as in work by Laine , based on the probability density function of the Gaussian distribution. It includes a prior component (for deviations of parameter values from their prior expected values) and a term based on the sum of squared difference between the model and data. The log likelihood is thus:
where Cp is the prior covariance matrix (uncertainty in the prior estimates), np is the number of parameters fitted, δp = θ − η is the vector of deviations of parameter values θ from their prior values η, No is the number of data points for each observation type o, and SSo is the unweighted sum of squared errors for data type o. Previous studies found that for fits to these data a log transformation is required to make the distribution of residuals approximately Gaussian (Normal) [Smith et al., 2009; Smith, 2010]. Therefore, SSo is calculated as:
where yi is the ith observed value (here, of either Vmax or A) and f(xi, θ) is the modeled value as a function of the corresponding independent variables xi (in this case, ambient temperature and nitrate concentration) and the parameter values θ.
2.3.4. Tests of the Algorithm and Robustness of Results
 Identical twin tests confirmed that the algorithm functions correctly and is able to constrain the fitted parameters based on this data set. Fits to reduced data sets confirmed that the complete data set is more than sufficient to draw the conclusions herein. Methods and results are described in the auxiliary material.
2.3.5. Model Selection
 As in other Bayesian methods, the likelihood provides a relative measure of goodness of fit, and the Akaike Information Criterion (AIC) [Akaike, 1974] further accounts for the trade-off between bias and variance (roughly interpretable as accuracy versus complexity) when comparing models having different numbers of parameters. Thus application of AIC to compare models in a Bayesian context is analogous to the application of ANOVA in a frequentist context to compare linear regression models having different numbers of parameters. Here I calculate AICc, which is the AIC corrected for the effects of sample size [Burnham and Anderson, 1998; Anderson et al., 2000], using the ensemble mean log likelihood (log ) for each fitted model, respectively:
where p is the number of parameters fitted for each model, respectively, and N is the total number of observations.
 AIC provides only a relative comparison of models; i.e., its absolute value for any particular model is not meaningful, but only differences between models. The model with the lowest AIC is best, and differences in AIC for each mode, m, are calculated as:
where ‘min’ denotes the minimum value over the total number of models considered (M). Although ΔAIC,m alone can be taken as an approximate measure of the relative support for model m compared to the best model, a quantitative measure in terms of relative probability (of each model, respectively, given the observations) is provided by the Akaike weights [Burnham and Anderson, 1998; Anderson et al., 2000]:
Here I adopt the criteria wm > 0.95 for accepting model m, and conversely wm < 0.05 for rejection. This is not a hypothesis test as widely applied in frequentist statistics, but instead a relative ranking of the probabilities of all models considered given the observations [Anderson et al., 2000]. As an alternative, the Bayesian Information Criterion (BIC) could instead be used to calculate these weights, which would tend to favor more parsimonious models more strongly than does the AIC [Schwarz, 1978; Congdon, 2001].
2.3.6. Parameters for the Algorithm
 The prior covariance matrix, Cp, for parameters was chosen not to be very restrictive, by assuming a coefficient of variation of 2.0 for each parameter estimate, respectively. Thus, the diagonal element (variance) corresponding to each parameter was assumed to be the square of double the prior estimate of that parameter. Non-diagonal terms were taken to be zero (no cross-correlations). Prior estimates for temperature sensitivity, Ea/R, were taken as 5.7 × 103 K, which for a 10°C increase in temperature from 283 K to 293 K corresponds to a doubling of rate. Prior estimates for Vmax and A at the reference temperature, Tr = 293 K, were taken as the maximum observed values of each parameter, respectively.
 For the Gibbs sampling [Laine, 2008], parameter n0, which represents the prior uncertainty of observations, was taken as unity, and parameter S0, the prior mean for each σ, was taken as 0.01 based on trial fits. The results of fits were not sensitive to the specific choice of these parameters for any S0 ≲ 1. (For larger values of S0 the fits were not constrained because the resulting large values of σ gave very low weights to the data.) Because the sum of squares is calculated in log space as per equation (7) above, the algorithm samples the distribution of standard errors in log space as well.
2.3.7. Numerical Calculations
 Uniform (0, 1) pseudo-random numbers were generated using the Mersenne twister algorithm, as originally coded by Takuji Nishimura, in 1997, and later translated to FORTRAN90 by Richard Woloshyn, in 1999. From these the Gaussian (Normal) pseudo-random numbers, multivariate Gaussian pseudo-random numbers, and Gamma-distributed pseudo-random numbers needed for the AM algorithm were generated using the algorithms of Gentle , which I coded into FORTRAN90. Cholesky decompositions and matrix inversions were calculated using, respectively, the CHOLESKY [Healy, 1968b] and SYMINV [Healy, 1968a] routines, as coded into FORTRAN90 by John Burkardt, in 2008.