Radiative transfer is sufficiently well understood that its parameterization in atmospheric models is primarily an effort to balance computational cost and accuracy. The most common approach is to compute radiative transfer with the highest practical spectral accuracy but infrequently in time and/or space, though errors introduced by this approximation are difficult to quantify. An alternative is to perform spectrally sparse calculations frequently in time using randomly chosen spectral quadrature points. Here we show that purely random quadrature points, though effective in some large-eddy simulations, are not a good choice for models in which the land surface responds to radiative fluxes because surface temperature perturbations can be large enough, and persistent long enough, to affect model evolution. These errors may be mitigated by choosing teams of spectral points designed to limit the maximum surface flux error; teams, rather than individual quadrature points, are then chosen randomly. The approach is implemented in the ECHAM6 global model and the results are examined using “perfect-model” experiments on time scales ranging from a day to a month. In this application the approach introduces errors commensurate with the infrequent calculation of broadband calculations for the same computational cost. But because teams need not increase with size, and indeed may become better and more balanced with increased spectral density, improvements in radiative transfer may not need to be traded off against spatiotemporal sampling.
1. What Does it Mean for a Parameterization to be Accurate?
 Unlike many closure problems faced in models of the atmosphere, the environmental factors that control the distribution of radiation in the atmosphere are very well understood, so the solution to fully specified problems is known to great accuracy. Radiation parameterizations therefore seek primarily to find an acceptable compromise between accuracy and computational cost. The accuracy of radiative transfer calculations may be measured via comparison to benchmark models [Oreopoulos et al., 2012] which are themselves known to be in excellent agreement with observations [Mlawer et al., 2000; Turner et al., 2004]. Comparisons are normally made for clear-sky conditions, consistent with the way the parameterizations of absorption by gases are developed.
 State-of-the-art radiation parameterizations can reproduce benchmark calculations to within 1% for shortwave fluxes and fractions of a percent for longwave fluxes [Oreopoulos et al., 2012] but this accuracy is so computationally expensive that radiation parameterizations cannot be applied at every time step of the model. Instead, radiative heating and cooling rates are normally updated less frequently than are model dynamics and, in most cases, other physical parameterizations. The choice to update radiative heating rates less frequently than other fields is an approximation made, not in the radiation parameterization, but in the coupling to the rest of the model. The simulation errors caused by this approximation may range from modest changes in temperature fields [Xu and Randall, 1995; Morcrette, 2000] to the introduction of more dramatic instabilities [Pauluis and Emanuel, 2004] but are generally difficult to quantify. To minimize simulation errors prudence dictates that the radiation time step be as close to the dynamical time step as can be afforded, although precisely how close is a subjective choice.
 Several approaches have been developed to accelerate the calculation of radiative fluxes to allow for more frequent calculation. One is to use physically based radiative transfer models to train fast statistical models (normally artificial neural networks) to emulate fluxes based on the state of the atmosphere [e.g., Chevallier et al., 1998; Krasnopolsky et al., 2008]. An intermediate tactic is to apply physical models sparsely in space and/or time, use simple statistical models (e.g., regression) to predict changes since the last radiation time step, and selectively update calculations based on the expected error [Venema et al., 2007]. A third alternative exploits two facts, that cloud properties vary much more quickly in the atmosphere than does the concentration of gases, and that variations in clouds and gases affect fluxes in different, roughly disjoint spectral regions, to motivate updating only the cloud-affected portions of the spectrum at high frequency [Manners et al., 2009]. (Räisänen and Barker  and Hill et al.  take a similar approach to the related problem of representing cloud variability.) The calculation of cloud-affected fluxes can be further accelerated by reducing the spectral detail used to treat absorption by gases [Manners et al., 2009].
 Each of these methods, including infrequent radiation calculations, represent approximations which introduce errors in radiative heating rates. These errors depend on many factors including how quickly the optical properties of the atmosphere are changing. But the error characteristics of an approximation can be crucially important in determining whether the approximation affects model evolution. Since radiative fluxes at the top of the atmosphere are essentially in balance (after accounting for ocean heat storage), for example, even small (1<W/m2) biases in radiative fluxes affect multidecadal simulations and must be “tuned” away [Mauritsen et al., 2012] and/or balanced by compensating errors. Random, uncorrelated noise, on the other hand, does not affect the statistical evolution of most models, whether that noise comes from parameterizations of gravity wave drag [Eckermann, 2012; Lott et al., 2012] or radiation [Pincus et al., 2003] or is externally applied in an effort to diversify ensembles of medium-range forecasts [Buizza et al., 1999]. For the purposes of parameterization development this implies that unbiased algorithms, even if they introduce even quite substantial noise in heating rates, can be more accurate, in the sense of introducing smaller changes in model evolution, than other approximations including detailed algorithms used infrequently.
 Here we describe an approach to radiative transfer parameterization that emphasizes the accuracy relevant for hydrodynamic models, including both the radiation calculations and the ways those calculations are coupled to the rest of the model. The approach takes advantage of the local homogenization of heating rates arising from small-scale fluid dynamical processes. We have implemented these ideas in a new radiation package, PSrad (named because it is a postscript to the RRTMG package from which it descends), and initially implemented in the ECHAM climate model. PSrad is unique in that it allows only a small sample of the full broadband spectral integration to be performed, with the idea that these calculations should be performed at each time step. This spectral sampling introduces grid-scale noise in radiative fluxes, as does the more common use of stochastic samples to represent the subgrid-scale distribution of cloud properties [Pincus et al., 2003]. Experiments show that ECHAM is insensitive to even large grid-scale perturbations to radiative heating rates within the atmosphere, but that significant perturbations in surface fluxes can introduce systematic biases in the model trajectory. Simulation bias can be limited by bounding errors in surface fluxes using carefully selected subsets of the broadband calculation. The approach is applicable to dynamical models at all scales even as significant noise is introduced into individual calculations.
 Models of the atmosphere require broadband radiation calculations, i.e., those that account for all wavelengths of radiation emitted by the sun or the earth and its atmosphere. In parameterizations of radiative transfer this spectral integration is accomplished using weighted sums
where the individual fluxes Fg are computed using optical properties and boundary conditions appropriate to each pseudospectral interval (quadrature point). These quadrature points are frequently determined using k-distributions [Fu and Liou, 1992; Lacis and Oinas, 1991]; following this nomenclature we refer to these intervals as “g-points”. In the shortwave the weights wg are determined by the spectral distribution of incoming solar energy while in the longwave they depend on local temperature through the Stefan-Boltzmann relation.
 In most models of the atmosphere radiative fields are updated less frequently than other variables, i.e.,
where c represents correction factors that may be applied to account for time-varying solar zenith angles, surface temperatures, etc. In some implementations [e.g., Morcrette, 2000] spatial resolution may also be reduced.
2.1. Monte Carlo Spectral Integration
 Infrequent broadband calculations (e.g., equation (2)) can be described as a “spectrally dense, temporally sparse” approach to computing radiation transfer. Monte Carlo Spectral Integration [MCSI; see Pincus and Stevens, 2009] reverses these densities:
where each of the G′ samples is chosen randomly with replacement at each location and time step.
 MCSI was initially introduced for large-eddy simulation where it has the advantages of being consistent (i.e., converging with increasing temporal and/or spatial resolution, as do the other approximations used in LES) and in explicitly sampling temporal variability, especially in cloud optical properties. It takes advantage of the fact that local fluid instabilities homogenize sampling noise on small scales while, on the larger spatial and temporal scales where heating rates can effect the overall evolution of the flow, sampling noise is small. In several applications, including purely radiatively driven flow, LES using MCSI is statistically indistinguishable from LES using benchmark radiation calculations (e.g., equation (1)) [Pincus and Stevens, 2009].
 MCSI implemented similarly in a global model introduces much larger and more systematic errors. The green line in Figure 1 shows the global root-mean-square (RMS) difference in 2 m air temperature as a function of forecast lead time between a reference calculation that computes broadband radiation at every time step and every grid point and one using the same radiation code (described in section 'PSrad/RRTMG: A New Radiation Code for Climate Models') but applying equation (3) with G′ = 1. The figure shows the average over 29 independent forecasts (see section 'Assessing Approximation Impacts in a Global Model'). RMS differences with respect to the reference forecast exceed 1.5 K after the first day and grow over time. A ∼ 0.5 K diurnal cycle tracks the diurnal variation in global mean 2 m temperature and occurs because the land surface is not homogeneously distributed over Earth's surface.
 Why is the MCSI approximation accurate (in the sense of not disturbing the flow) in large-eddy simulations but not in a global model? There are at least two significant distinctions. First, the parameterizations used in global models, especially those for deep and shallow convection, depend more nonlinearly on the atmospheric state than do the simple subgrid-scale models for turbulence and microphysics used in LES, so even unbiased random sampling noise can bias the flow through nonlinearities. More importantly, surface properties are frequently fixed in large-eddy simulations, while global models almost always include land surfaces whose temperature changes in response to surface fluxes. Perturbations to the surface temperature caused by sampling errors are not homogenized by mixing with neighboring columns, while errors within the atmosphere are mixed by the fluid flow. The impact of surface temperature perturbations dominates: on an aquaplanet with globally specified sea surface temperatures (Figure 1, purple line), differences between simulations using equation (3) and simulations using equation (1) are small.
2.2. Bounding Errors in Surface Fluxes Using Teams of Spectral Points
 Figure 1 implies that the magnitude of instantaneous surface flux errors must be bounded if a radiation parameterization is to be useful in models including land surfaces. One approach would be to simply increase the numbers of samples, G′, chosen at each time step but this is painfully slow: like all Monte Carlo estimates the RMS error, for example, decreases as (see, e.g., section 4.2.5 of Evans and Marshak, 2005]. A more efficient strategy is to generate sets of g-point “teams” constructed to minimize some measure of sampling error, and to sample these teams randomly.
 Assume a set of A representative atmospheres and define a cost function C as some measure of the error accumulated over L possible estimates of the true flux Fa in each atmosphere. The number of teams, M, may be chosen to be small divisor of the total number of g-points G so that each team has the same number m = G/M of quadrature points. The members of these teams may then be chosen to minimize C.
 We have computed teams for several values of M using the radiative transfer code described in section 'PSrad/RRTMG: A New Radiation Code for Climate Models'. Our set of representative atmospheres is obtained from four snapshots taken over the course of a single day from a free run of ECHAM6 (A ≈ 73000). We optimize over L = MA clear-sky fluxes consistent with the way k-distributions are normally constructed. Since our goal is to minimize the possibility of very large errors in surface fluxes we use the 95% error in surface fluxes as our cost function C. Our minimization of C is informal: we compute fluxes for each g-point individually and choose the M g-points which make the worst proxies as that first member of each team. For each remaining team member we process teams randomly and choose the remaining g-point that minimizes the cost function for the provisional team. It is likely that the balance of teams could be modestly improved though further optimization (by simulated annealing following Kirkpatrick et al. , for example).
 Fluxes can then be computed by choosing one of the teams at random:
 Because each g-point is included in exactly one team equation (4), like equation (3), is an unbiased estimate of the true flux given by equation (1).
 Using teams of g-points is effective in limiting the error for a given computational cost (Figure 2). Though the teams are chosen to minimize errors in clear-sky surface fluxes the errors for all skies are commensurate, slightly lower in the shortwave where the presence of clouds simply reduces downwelling flux and slightly larger in the longwave where clouds both increase and change the spectral distribution of downwelling flux.
 Teams constructed in this way are more efficient in reducing error as team size increases. Figure 3 shows the ratio of errors obtained using teams of a given size (equation (4)) to those using Monte Carlo samples (equation (3)). The ratio is small in both the shortwave and longwave for m = 2 but increases at four or more samples, such that random sampling using equation (3) can achieve commensurate accuracy only by increasing sample sizes by a factor of 10 or greater––in other words, by increasing computational cost to nearly that of broadband integration using equation (1).
3. PSrad/RRTMG: A New Radiation Code for Climate Models
 We have developed a new radiation package, PSrad/RRTMG (so named because it is a postscript to the RRTMG package), designed for use in models of the atmosphere. The longwave and shortwave components are organized along parallel lines: driver modules call routines to compute the optical properties of gases, aerosols, and clouds, then combine these to compute fluxes at the boundaries and the interfaces between model layers. The codes are modeled after the RRTMG package [Mlawer et al., 1997; Iacono et al., 2008] and use the k-distribution from this package to determine gas optical thickness from concentrations, temperature, and humidity. (These k-distributions are well validated and among the most accurate available; see Oreopoulos et al. .) We follow the original RRTMG codes in using the two-stream approximation [after Meador and Weaver, 1980] to compute layer reflectance and transmittance and adding [after Oreopoulos and Barker, 1999] to compute flux profiles in the shortwave; we use the linear-in-tau approximation for the thermal source function [Mlawer et al., 1997] and consider only emission and absorption in the longwave. Cloud and aerosol optical properties are determined from custom-built lookup tables (S. Kinne, et al., A new global aerosol climatology for climate studies, submitted to Journal of Advances in Modeling Earth Systems, 2013).
 Subgrid-scale variability is treated using “subcolumns” [see, e.g., Räisänen et al., 2004; Pincus et al., 2006]: discrete random samples, each treated as internally homogeneous, that are consistent with the distributions of possible cloud states within each column, including fractional cloudiness in each layer and assumptions about the vertical correlations between layers (“cloud overlap”). This treatment is a generalization of the Monte Carlo Independent Column Approximation [McICA, see Pincus et al., 2003] and may be further generalized to include other kinds of variability, including the distribution of cloud liquid or ice water content as implied by, for example, the Tompkins  cloud scheme.
 PSrad supports a range of choices for spectral sampling, including broadband integration (all G quadrature points in order), arbitrary numbers of randomly chosen quadrature points for application of equation (3), and a number of predetermined “leagues” of g-point teams as described in section 'Strategies for Spectral Integration'.
 Though PSrad is currently intended as a drop-in replacement for RRTMG it was implemented almost entirely from scratch (of the original code, only the subroutine that computes longwave gas optical properties remains). The most important technical difference lies in the organization: each subroutine is vectorized over model columns, which increases computational efficiency on a wide range of platforms even when relatively few spectral intervals are used. Operational centers such as the European Centre for Medium-Range Weather Forecasts have often modified RRTMG in this way [Morcrette et al., 2008].
4. Assessing Approximation Impacts in a Global Model
 We have implemented PSrad in ECHAM6 [Stevens et al., 2013], a state-of-the-art atmospheric model used for climate simulations. We perform simulations with a version of the model differing modestly from the version used to produce data for the Fifth Coupled Model Intercomparison Project [Taylor et al., 2012]. The model is run at a horizontal resolution of T63 using 47 levels that extend to 1 hPa. These experiments use a 7.5 min time step.
 We consider an ensemble of 29 month-long integrations starting from initial conditions valid at 0Z on 1 April of the years 1976–2004 as simulated by the model in a long integration using specified, time-varying sea-surface temperatures. The benchmark is an integration in which broadband radiation fields are calculated just as frequently as the tendencies from other physical parameterizations (i.e the radiation and physics time steps are the same). We use this reference forecast to assess the error introduced by increasing the interval between broadband radiation calculations, on the one hand, and by limiting the number of spectral quadrature points used at each time, on the other.
 RMS differences with the reference forecast grow with time (Figure 4), but can be divided into roughly three regimes: slow but accelerating error growth in the first 10 days, rapid error growth over the next 10 days, and roughly saturated errors in the last 10 days. This may be very loosely described as the transition from weather (where individual trajectories are followed) to climate (where the statistics of trajectories are of interest), which leads us to examine errors in the first 10 days as one might evaluate forecasts, but to assess errors in the last 10 days statistically, as one might assess climatologies.
 The Monte Carlo sampling of fractional cloudiness and overlap introduces noise into the fluxes and causes even a second independent reference forecast to diverge from the benchmark over time: RMS differences between two sets of reference forecasts rise from about 0.05 K during the first day to almost 1.5 K after 10 days (Figure 5, top, purple lines), which we take as the rough limit of deterministic forecasts. Increasing the sparsity of radiation calculations in either time (purple lines, ) or spectral quadrature points (green lines, and ) increases the error by modest amounts in an absolute sense. No evidence has been found that such approximation trigger dramatic changes in the simulation [cf. Pauluis and Emanuel, 2004], though even hourly radiation calls increases the RMS difference by 50% of the difference introduced by sampling noise (Figure 5, bottom). The mean RMS difference over the first 10 days (Figure 6) is quite tightly related to computational cost (here expressed as the number of shortwave radiation computations per day).
 On time scales where the climatology of the model dominates the solution it is more informative to assess the degree to which each approximation produces a statistical distribution of temperatures consistent with the reference forecast. We apply the two-sided Student's t-test, for each approximation at every grid point and every time step during days 21–31, to compute the likelihood (p-value) that the distribution of 2 m temperatures across the 29 ensemble members is statistically indistinguishable at the 95% level between the reference forecast and the forecast using the approximation. Because we perform so many t-tests (∼18,500 per time step) roughly 5% have p-values corresponding to “significant” differences (at the 95% level) even if the underlying distribution of 2 m temperatures is the same in both experiments. The large number of false-positives can be reduced using false discovery rate estimation [e.g., Wilks, 2006] which exploits the known distribution of p-values expected under the null hypothesis to estimate η0, the proportion of uninteresting (or truly insignificant) p-values [Strimmer, 2008a], at every time step.
 Independent realizations of the reference run (i.e., two runs making broadband radiation calculations at every time step, but using different random number sequences to sample cloud states with McICA) are statistically indistinguishable: the time-mean value of η0 computed from this pair of experiments is 1. This is almost but not quite true when comparing the reference run to any of the possible approximations (see Table 1). One interpretation of is as the average fraction of the planet over which a given approximation does not change the simulation significantly. This fraction is greater than 94% for all approximations except MCSI (Table 1, line 1), indicating that changes are detectable but modest. The test statistic is not entirely robust: according to this measure radiation time steps of are slightly more consistent with reference calculations than are , which is physically implausible. Given this caveat we note that, in this relatively coarse-resolution model, infrequent broadband radiation calculations introduce slightly smaller changes compared to frequent calls using spectral teams with the same computational cost (cf. lines 2 and 6 of Table 1).
Table 1. Time Mean Fraction of Statistically Insignificant η0 Differences in 2 m Air Temperature Between a Reference Calculation and Various Approximations for Coupling Radiation to a Global Modela
The first three approximations are temporally dense, spectrally sparse calculations using equation (4) (the first is the special case using equation (3)). The latter four make spectrally dense broadband calculations at specified time intervals. All approximations change the simulation of 2 m air temperature by detectable amounts; for a given computational effort, frequent use of spectral teams introduces slightly larger changes than less frequent broadband calculations.
T = 15 m
T = 1 h
T = 2 h
T = 3 h
5. Conclusions: Parameterization Error, Simulation Error, and the Coupling of Radiative Transfer to Atmospheric Models
 Radiative fluxes respond nearly instantaneously to changes in the optical properties of the atmosphere, so the parameterization of radiation is normally considered a “one-way” problem in which the model provides the state of the atmosphere and the parameterization computes the heating rates and boundary fluxes. In the absence of coupling between radiation and model dynamics one naturally seeks instantaneous radiation calculations that are as accurate as is computationally feasible. Focusing on the accuracy of the overall simulation, including the way radiation calculations are coupled to the model, may allow for other kinds of optimization. As one example, the k-distribution developed for RRTMG was designed, as are most parameterizations, to minimize broadband flux errors. RRTMG aims to balance accuracy with computational cost primarily by minimizing the number of g-points. It would be possible, however, to construct k-distributions using different cost functions, and distributions constructed to minimize errors both across and among teams might be able to achieve greater aggregate accuracy by using more g-points while still limiting noise in surface fluxes. This would open the door to resolving the tension between overall accuracy, limited by of the number of g-points, and computational cost.
 Similarly, using spectrally sparse, temporally dense calculations provides a richer set choices in how radiation may be coupled to dynamical models to minimize biases. Both classes of approximations examined here appear to slightly modify the distribution of temperatures, relative to a reference calculation, at quasi-climatological time scales, in contrast to the introduction of sampling noise in cloud properties [Pincus et al., 2006; Barker et al., 2008]. The modification is small but undesirable, and we are seeking ways to reduce the impact of noise. One promising approach is to sample teams in equation (4) with replacement in time, rather than entirely independently at each location and time.
 The results of section 'Assessing Approximation Impacts in a Global Model', especially Figure 6 suggests that computational effort is the primary determinant of accuracy in coupling these two radiative approximations to short-term forecasting models. This comparison is limited, however, since it does not account for true model errors or how such errors might depend on sampling strategy. Some real-life forecasts errors, such as the damping of surface heating caused by convective clouds forming in response to initial surface heating, may well be due to under-sampled temporal variability. On the other hand, even modest noise in surface fluxes may lead to forecast errors when the coupling between radiation and atmospheric state is strong, as in nocturnal stable boundary layers. Thus it remains to be seen whether the statistical robustness of equation (4) will translate into improved weather forecasts.
 Spectrally sparse, temporally dense radiation calculations, at least as implemented here, disturb simulations with ECHAM6 at least as much as infrequent broadband calculations of the same computational cost. Time steps in ECHAM are relatively short so the comparison may be worse in models with longer time steps. Other considerations may make equation (4) desirable, however, especially the convergence of spectral teams with increasing resolution, more uniform distribution of computation time, and the book-keeping simplifications that arise when some estimate of radiation is computed every time step. In ECHAM, for example, shortwave fluxes are computed for all points (using a very small minimum solar zenith angle for nighttime points) so that temporal interpolation across the sunrise boundary is smooth. Replacing infrequent broadband calculations with frequent spectral samples makes this transition smooth (in aggregate), so that shortwave fluxes are not required at nighttime points, which alone represents a substantial computational savings.
 Though radiation ultimately determines earth's climate the coupling between radiative fluxes and the rest of the atmosphere is loose, such that radiation strongly influences the flow only where its effects can accumulate over time, as occurs in descending branches of the general circulation or at the tops of stratocumulus clouds. The approach to radiation calculations proposed here exploits this loose coupling, trading instantaneous accuracy at infrequent intervals for statistical accuracy with more complete sampling of time variability. By traditional measures of error (e.g., the comparison of instantaneous fluxes to benchmark calculations, as in Oreopoulos et al., ], large instantaneous errors make spectral sampling a poor idea. We argue that a more appropriate benchmark is the accumulated effect of approximation errors on the solution as a whole. By this more holistic measure of accuracy, frequent but sparse sampling becomes much more attractive because the loose coupling of radiation to the flow means that unbiased solutions with large local errors are more desirable than solutions with small biases, even if their local errors are also small. The present work demonstrates a new path toward accuracy that, in some situations, may converge more quickly to the desired solution.
 This work was supported by the Max Planck Society for the Advancement of Science, the National Science Foundation Science and Technology Center for Multi-Scale Modeling of Atmospheric Processes, managed by Colorado State University under cooperative agreement No. ATM-0425247, and by the Office of Naval Research under grant N00014-11-1–0441. Generous computing facilities were provided by the German Climate Computing Center (Deutches Klimarechengzentrum, DKRZ). We are grateful to the developers of the fdrtool software [Strimmer, 2008b] for making our field significance calculations easy. RP appreciates warm hospitality during summer visits to the MPI and practical advice from Thorsten Mauritsen on the care and feeding of ECHAM.