There are significant uncertainties inherent in precipitation forecasts and these uncertainties can be communicated to users via large ensembles that are generated using stochastic models of forecast error. The Met Office and the Australian Bureau of Meteorology developed the Short Term Ensemble Prediction System (STEPS) was developed to address these user requirements and has been operational for a number of years. The initial formulation of Bowler et al. (2006) has been revised and extended to improve the performance over large domains, to include radar observation errors, and to facilitate the combination of forecasts from a number of sources. This paper reviews the formulation of STEPS, discusses those aspects of the formulation that have proved most problematic and presents some solutions. The performance of STEPS nowcasts is evaluated using a combination of case study examples and statistical verification from the UK. Routine forecast verification demonstrates that STEPS is capable of producing near optimal blends of a rainfall nowcast and high resolution NWP forecast. It also shows that the spread of STEPS nowcast ensembles are a good predictor of the error in the control member (unperturbed) nowcast.
 In recent years, the Met Office in the UK and the Bureau of Meteorology in Australia have invested significant effort in quantifying the uncertainties inherent in hydro-meteorological nowcasts and forecasts and communicating these to users via large ensembles that are generated using stochastic models of forecast error. This strategy recognizes the impact of nonlinear error growth on the accuracy of high resolution forecasts and the needs of customers in relation to the management of risks associated with severe weather.
 In support of this strategy, the Short Term Ensemble Prediction System (STEPS) was implemented as an operational rainfall nowcaster in 2008 in both the UK and Australia [Bowler et al., 2006, hereafter BPS]. It was developed to generate ensembles of rainfall nowcasts using observations from weather radar and forecasts from a mesoscale Numerical Weather Prediction (NWP) model. Over the intervening years, STEPS has been revised and extended to account for the effects of radar observation errors, improve certain aspects of model design and performance, notably the noise generation, and extend the modeling framework to facilitate the combination of rainfall fields from multiple sources.
 STEPS now comprises a collection of nowcasting and NWP postprocessing algorithms formulated to produce seamless, composite rainfall forecasts for use in pluvial and fluvial flood forecasting. Its capabilities include the downscaling of coarse resolution NWP forecasts and the generation of large ensembles (with between tens and hundreds of members) incorporating either deterministic or ensemble NWP rainfall forecasts.
 In the Met Office, a STEPS-based short range (24 h), 24 member ensemble rainfall forecast incorporating an extrapolation nowcast and high resolution (∼1.5 km) NWP forecast from the Met Office's Unified Model [Davies et al., 2005] is now employed to drive an operational, distributed hydrological forecast model [Price et al., 2012] for the Flood Forecasting Centre. In the Australian Bureau of Meteorology, STEPS is exploited for a variety of applications including nowcasting, medium range forecasting, and design storm modeling. Products include a 1 km, 30 member rainfall nowcast ensemble with a range of 90 min and a 2 km, 50 member ensemble rainfall forecast with a range of 10 days. The latter is generated from a 50 km, global NWP forecast.
 Nowcasting techniques, involving the extrapolation (advection) of current observations of rainfall from weather radar and meteorological satellite remain superior to NWP-based rainfall forecasts over at least the first 2 h, partly because NWP models are too costly and time consuming to run on an update cycle shorter than several hours, but also because techniques for assimilating high resolution observations do not reliably replicate the distribution of rainfall in these observations. Consequently, an optimal, very short range forecast must evolve from an extrapolation-based solution to one dominated by a recent NWP forecast [Browning, 1980].
 The Met Office's first fully automated nowcasting system, Nimrod [Golding, 1998], used a simple weighted averaging technique to blend a 5 km resolution (grid length), radar and satellite-based extrapolation nowcast with a deterministic, 12 km resolution NWP forecast. The nowcast contribution decayed exponentially to zero over a 6 h period. Although more skilful on average than either component between 2 and 5 h ahead, the value of the resulting deterministic predictions was limited due to the hardwiring of the nowcast contribution, loss of spatial resolution beyond T + 3 h, and rapid error growth close to the grid scale [Werner and Cranston, 2009].
 During the late 1990s, the Australian Bureau of Meteorology implemented the Spectral Prognosis (S-PROG) nowcasting system [Seed, 2003] as part of the Sydney 2000 Forecast Demonstration Project [Fox et al., 2001]. S-PROG modeled the observed scaling behavior of rainfall with the aim of minimizing the root-mean-squared forecast error. This was achieved by smoothing the nowcast rainfall fields to remove small scale features at a rate consistent with their measured longevity. The resulting nowcasts were equivalent to an ensemble mean and were of limited value on their own due to the loss of detail (variance) with increasing forecast range. It was recognized that the S-PROG modeling framework could be extended to generate conditional simulations suitable for probabilistic hydro-meteorological forecasting. This work laid the foundations for STEPS, developed jointly by the Met Office and the Bureau shortly after the Sydney 2000 Olympic Games.
 With continuing improvements in the horizontal resolution of operational radar and satellite-based estimates of surface rainfall and the introduction of operational, convection resolving NWP models [Lean et al., 2008], the impact of nowcast and forecast errors on the utility of rainfall forecasts has now come to the fore because errors grow most rapidly at the smallest scales. This is not to dismiss the incremental benefits of resolution increases but to emphasize that the additional information content can only be fully exploited in a probabilistic context that conveys the associated uncertainties [Pierce et al., 2005]. High resolution NWP-based ensembles afford only a partial solution to this problem because they do not capture the very short range uncertainty adequately and their ensemble sizes are much too small to resolve the forecast uncertainties at the convective scales properly.
 STEPS provides a cost effective means of addressing the two issues outlined above: namely, the generation of skilful, very short range rainfall forecasts, and the more effective quantification of forecast uncertainty. The former challenge is addressed using a scale decomposition technique that allows multiple, time synchronous forecasts to be combined scale by scale, and weighted in proportion to estimates of their predictive skill at each scale. The latter issue is dealt with by generating time series of noise fields (pseudorandom numbers with the space-time characteristics of rainfall) with which to perturb the resulting combination of forecast components at each scale. This provides a cheap means of generating large ensembles.
 There were a number of weaknesses in the formulation of BPS. The most significant limitations were the use of a multiplicative cascade-based decomposition to combine and perturb rainfall fields incorporating wet and dry areas (i.e., a failure to treat the raining fraction of the field and dry areas as separate processes) and the use of a technique reliant on approximations of the spatial power spectra of rainfall fields to generate noise fields with which to perturb the nowcast and NWP forecast components. Other weaknesses included ignoring the impact of radar errors on the performance of the nowcast in the first hour and not accounting for the covariance between the nowcast and NWP forecast components when deriving weights subsequently employed to combine them. The focus of this paper is on the improvements made to STEPS to address some of the performance issues that have been outlined here.
 Section 2 of this paper provides a brief overview of the initial version of STEPS as described by BPS. A review of the space-time properties of rainfall fields in section 3 sets the scene for a description of STEPS's formulation enhancements in section 4. Section 5 presents summary performance statistics for both the STEPS control member and ensemble nowcasts. Conclusions are drawn and further work proposed in section 6.
2. A Description of STEPS
 STEPS exploits a multiplicative cascade-based scale decomposition framework and several noise generators to produce ensembles of high resolution, composite rainfall nowcasts whose space-time evolutions tend from those of extrapolated observations to those of an NWP forecast solution. A schematic of the production process is presented in Figure 1. Scale decomposition allows the skill of the nowcast and NWP forecast to be estimated on a hierarchy of spatial scales. The relative skill of these components is then used to determine their contribution to the blend on each level of the cascade (Figure 2).
 The noise generators model the space-time statistics of the decomposed nowcast and NWP forecast to generate time series of spatially disaggregated synthetic rainfall fields with which to perturb the disaggregated composite nowcast. These synthetic fields mimic the spatial power spectra of the radar observed and NWP forecast distributions of instantaneous rain rate and are designed to evolve in time according to the contributions made by these components to the composite, perturbed nowcast. Temporal auto-correlations in time series of these synthetic rainfall fields are modeled on those of the radar observations using a hierarchy of second-order auto-regressive (AR-2) processes [Box and Jenkins, 1976], one for each cascade level (BPS).
 Each blended, perturbed nowcast solution is generated by summing a time series of normalized, blended, perturbed cascades using cascade level field means and variances that are weighted combinations of those of the radar observations and the NWP forecast.
2.2. Scale Decomposition Framework
 Numerous authors have explored the use of cascade or scale decomposition-based models to simulate the temporal (dynamic) and spatial scaling properties of rainfall [Lovejoy and Schertzer, 2006]. STEPS employs one such scale decomposition of the form:
where dBR is the rainfall field in decibels of rain rate, N is the number of component levels in the cascade, and the kth field in the cascade, Xk(t), represents the variability in the original field with frequencies, ωk, in the range at time t. q is the ratio of the scales at level k and k + 1. Note that the decomposition is multiplicative when rewritten in terms of R. This form of decomposition is justified because rainfall exhibits fractal or scaling behavior consistent with its representation as a hierarchy of independent component processes.
 The components levels, Xk(t) for k = 1, N, are component fields derived by applying an inverse Fast Fourier Transform to a filtered FFT of the original field. The use of an approximation to a Gaussian window to perform the filtering offers a compromise between exact localization in spatial scale and avoiding ringing due to the Gibbs effect.
 The decision to work in units of 10 log10 (R + c) or decibels of rain rate (dBR) is justified on the grounds that the log transformation produces a distribution closer to a Normal distribution, and the addition of c (an arbitrary small, positive number) simplifies the treatment of no-rain areas. Working with at least an approximation of a Normal distribution is a requirement because this underpins key aspects of the STEPS formulation, including for example, the derivation of the weights used to combine the nowcast and NWP forecast components with noise (see below for details). However, the dBR transformation is imperfect and tends, amongst other things, to result in the under representation of extremes. For this reason, some authors have favored using alternative, stable distributions, for example, the Levy distribution [Lovejoy and Schertzer, 2006].
2.3. The Advection Nowcast
 Details of the optical flow advection scheme employed to produce the extrapolation nowcast can be found in Bowler et al. . The motion is calculated by using least squares regression to solve the optical flow constraint equation over a series of blocks. A smoothness constraint is then applied to the resulting block velocities. This has similarities with the Variational Echo Tracking method described by Germann and Zawadzki , although the latter uses a conjugate gradient method to minimize residuals from two constraints simultaneously.
 The Lagrangian temporal evolution of the nowcast cascade and the skill of the nowcast component are determined from the auto-correlation coefficients of a hierarchy of second-order auto-regressive (AR-2) processes [Box and Jenkins, 1976], one for each cascade level, k
where and have been advected forward in time by one and two steps, respectively. The parameters, ϕk,1 and ϕk,2, control the Lagrangian rate of evolution at each scale. These are derived from the lag 1 and lag 2 auto-correlation coefficients as detailed by BPS. Current estimates of the latter (valid at time, t0) are relaxed to climatological averages over the duration of the nowcast. Relaxation rates are consistent with the measured temporal auto-correlations in time series of these current estimates. In the very short range, this approach ensures that the contribution made by the extrapolation nowcast in a blended, perturbed forecast reflects changes in recent estimates of its skill. A similar approach is applied to the NWP forecast component in the blend: recent estimates of forecast skill on each level of the cascade at the start of the nowcast are relaxed to climatological skill estimates.
2.4. Ensemble Generation
 As well as replicating the spatial statistics of a given radar or NWP forecast rainfall field, the hierarchy of AR-2 processes described in equation (2) is employed to maintain a time series of synthetic rainfall fields (noise) with which to perturb the extrapolation nowcast cascades and/or NWP forecast cascades
where is the noise cascade valid for ensemble p, level k, and time, t, and is a cascade of temporally independent but spatially correlated noise.
3. The Space-Time Properties of Rainfall
 The main objective of STEPS is to generate ensembles of rainfall forecasts that can be used to force hydrological prediction systems. Thus, it is imperative that these ensembles exhibit similar space-time structures to those of observed rainfall over a range of space and time scales. Rainfall has a rich and subtle texture that is beyond our ability to simulate precisely, but there are certain fundamental properties that must be reproduced if STEPS's synthetic rainfall fields are to behave with adequate verisimilitude over a useful range of space and time scales.
3.2. Scaling Behavior
 A rainfall field has scaling properties if the spatial spectral density function, S, follows a power law function of frequency, ω, [De Michele and P. Bernardara, 2005]. Figure 3 shows the mean, normalized spectral density function derived from a year of radar data from Brisbane, Australia. The figure shows that the spectral density function of fields of radar reflectivity (dBZ) in Brisbane has a scale break at 20 km, where the slopes of the power spectra for the scales above and below the break were found to be β1 = 2.13 and β2 = 2.74, respectively. The flat tail at the Nyquist frequency is due to white noise contamination, possibly as a result of some ground and sea clutter that has not been removed by the quality control algorithms. The good fit of a power law to the power spectra leads one to the conclusion that, on average, spatial rainfall follows a scaling behavior with a break in the scaling at about 20 km. Similar behavior has been demonstrated in the UK.
3.3. Space-Time Heterogeneities
Ceresetti et al. , Badas et al. , Purdy et al. , and Germann and Joss  found that the statistical structure of rainfall varied with location in mountainous areas depending on the physical processes that were contributing to the orographic enhancement. Therefore, the parameters of the space-time stochastic model should vary in both space and time, depending on the physical processes governing rainfall generation. Chumchean et al.  compared the probability distributions for hourly rainfall after classification into “convective” and “stratiform” categories and found that convective rainfall had a probability distribution that was significantly skewed toward the higher intensities. Figure 4 shows a set of spectral density functions for widespread (frontal) and convective rainfall over the UK. It is clear that β1 for widespread rain is generally greater than β1 for convective rain. This is because convection is less organized on the large scale than widespread rain.
 Figure 5 shows a time series of β1 and β2 for an extreme storm that produced flooding in Brisbane, Australia. This demonstrates that the scaling nature of rainfall can evolve during a storm, suggesting that a single event may be composed of episodes of intense convective rain separated by periods of less intense rainfall with different spatial organization. Evidently, the statistical structure of rainfall can vary significantly in both space and time and this makes it difficult for stochastic simulations to reproduce the observed space-time structures.
 Rainfall is often anisotropic in space, it being organized into bands rather than isotropic features. Examples of anisotropic 2-D spatial correlograms for hourly accumulations can be found in Velasco et al. . Furthermore, the spatial structure of rainfall accumulation fields depends on both the structure of the distribution of instantaneous rain rate and the motion of the field during the accumulation period, particularly for accumulations of an hour or so.
4. Formulation Enhancements to STEPS-2
4.1. Noise Adaptations to Account for Spatial and Temporal Inhomogeneities in Rainfall Fields
4.1.1. Enhancements to BPS Noise Generators
 The BPS version of STEPS assumed rainfall fields were homogeneous over the modeling domain and had an isotropic spatial structure. It became apparent that these two assumptions were a major limitation when running STEPS over large (∼1000 km) domains in which frontal rain bands can coexist with convective showers and the banded structure of rainfall is more evident.
 Now, STEPS provides a choice of two noise generators: one is an enhanced version of the original power law model, referred to as the parametric model; the other does not rely on any parameter fitting and for this reason is referred to as the nonparametric noise generator. This latter approach is capable of replicating the precise 2-D power spectrum of a rainfall field including the predominant anisotropic structures. Both noise generators are described in more detail below.
22.214.171.124. A Parametric Generator
 A parametric modeling approach can be applied to approximate the Power Spectral Density (PSD) of a given rainfall field. This is achieved by filtering a field of white noise before the scale decomposition takes place. The filter takes the form
where the target field has horizontal dimensions of L by L pixels and ω0 = 1/L. ωb is the frequency at which the slope of the power law changes. This is sometimes referred to as a scaling break. β1 and β2 are the slopes of the power spectrum at frequencies that are, respectively, less than or greater than ωb = 1/20 km−1. The form of parametric model described here represents the PSD by two such straight lines with different slopes and a break between (see Figure 3).
 While this approach approximates the typical behavior of rainfall fields containing organized convection and is able to generate synthetic fields without reference to a radar-based analysis or a high resolution NWP forecast of surface rainfall rate, its validity is limited because significant departures from isotropic power law scaling relationships are quite common. Consequently, this technique is only employed in STEPS when the observed raining fraction is <5%, or when an alternative, nonparametric noise generator cannot be used.
126.96.36.199. A Nonparametric Generator
 For the majority of occasions when the precipitating area exceeds 5% of the modeling domain, a nonparametric noise generator is employed. This determines the exact power spectrum of a field. To obtain this power spectrum, the FFT of a given rainfall field is used as a filter. The rainfall field is first normalized to have a mean of zero and a variance of 1 and the filter, fx,y is then calculated as the modulus of the FFT at a particular pair of wave numbers.
 The advantage of this approach is that it is no longer necessary to fit a power law to the power spectrum and any anisotropy that may be present in the field is respected.
 Figure 6 demonstrates how this nonparametric noise generator can be used to produce a time series of synthetic rainfall fields whose spatial statistics evolve from those of a radar-based analysis for a showery rain event, to a UKV forecast distribution for an unrelated widespread rain event. To achieve this evolution, a weighted combination of the FFTs of the analysis and UKV forecast were used to filter white noise so that its spatial power spectrum tended seamlessly from one distribution to the other. The weights employed as a function of wave number followed the cascade level weights applied to the nowcast and NWP forecast components. Figure 7 shows the evolution of the power spectra of the stochastic fields.
 This artificial case study illustrates both the strengths and the weaknesses of the nonparametric generator. The weaknesses of using spatially homogeneous noise are evident. After one time step, the showery features in the stochastic field are more or less uniform in character and cannot reflect the geographical variability apparent in the radar observations. Nonetheless, the predominant characteristics of the field have been replicated, namely, the heavy showers and their north-west to south-east orientation. This is confirmed by the power spectrum plot labeled T + ΔT in Figure 7.
 With each successive time step (15 min), the noise fields are seen to evolve in character toward the UKV forecast shown bottom-right in Figure 6. This is confirmed by the evolution of the power spectra of the time series in Figure 7. The rainfall features become more extensive with reduced variability close to the grid scale. This is reflected in a reduction in the power contributed by the convective scales. A change of orientation toward that of the rain band in the UKV forecast is apparent by the end of the forecast sequence.
 In Figure 7, the power spectra of the noise fields have some interesting features at around the 400 km scale. These are not present in the power spectra of the radar-based analysis and forecast fields, but are artifacts of the way the FFT is performed. They arise because a rectangular rainfall field is embedded within an array of zeroes prior to decomposition.
188.8.131.52. Limitations of Parametric and Nonparametric Noise Generators
 The use of the spatial power spectrum of a radar-based analysis of rain rate and/or an NWP rainfall forecast to impose spatial structure on the noise is the most significant limiting factor on BPS performance because it imposes spatial homogeneity on the noise perturbations. In reality, rainfall fields are spatially heterogeneous. One way to ameliorate this problem involves constraining the perturbed, composite forecasts more closely to the nowcast and NWP forecast solutions by computing and on a regular grid, rather than over the model domain as a whole. However, this adaptation may degrade the spread-skill of the resulting ensemble forecasts unless the grid box size is carefully chosen to ensure that large scale errors can be adequately accounted for.
 In practice, this approach will not produce a well calibrated ensemble from a single NWP forecast at forecast ranges much beyond the nowcast time frame because synoptic scale phase errors become increasingly important beyond T + 6 h, and it is not possible to capture these in the noise if geographical heterogeneities in the means and variances of the component forecasts are to be simultaneously respected.
 Consequently, the above enhancement is not appropriate for ensemble generation from a single NWP forecast solution beyond T + 6 h, but may be applied effectively to a conventional, multimodel or time-lagged [Mittermaier, 2007] NWP ensemble to increase ensemble size. In this case, synoptic scale perturbations are provided by the NWP ensemble members, whilst the noise serves to represent smaller scale errors.
184.108.40.206. Impact of the Heterogeneous Distribution of Variance Within a Rainfall Field
 The form of the cascade decomposition described in equation (1) assumes that the cascade level fields, , are independent of each other. Figure 8a shows a field of radar reflectivity. This contains a rain band with the embedded convection. It is clear that the power on each cascade level, shown in Figures 8b–8f, is not uniformly distributed through the field or the raining areas, but has a subtle dependence on the larger scales. The variance is concentrated in areas where the rain/no rain perimeter is complex and is low inside the raining areas, for example, on the eastern flank of the rainband in Figures 8e and 8f. When perturbing a forecast cascade with noise as described in the previous sections, a consequence of ignoring this dependency is that too much power is injected into raining areas.
 To account for this behavior and reduce the biases introduced into large raining areas by the noise, an empirical correction was derived for these areas on cascade levels. This correction exploits the fact that there is strong relationship between the power at the top of the cascade and the degree of spatial organization of the rain. A multiplicative adjustment of the following form is applied to the normalized noise fields where the top level of the perturbed, blended nowcast cascade, , exceeds a value of 0.5
 This threshold has been calibrated to demarcate frontal rain bands. The magnitude of the adjustment increases down the cascade and is linearly relaxed to 1 at
 Experimentation has shown that this effectively reduces the cascade level biases inside large raining areas. An example of the impact of the adjustment is shown in Figure 9c. Nonparametric noise based upon the UKV forecast distribution of rain rate in Figure 9a was used to generate perturbed UKV forecasts for the same time without (Figure 9b) and with (Figure 9c) the multiplicative adjustment described in equations (5a) and (5b). Note that a different random seed was used by the noise generator in each case.
 Ignoring geographical differences in the distribution of rain rate due to a different distribution of noise perturbations, it is apparent that the character of the rainfall in the banded features over England and Wales and in the north-west corner of the domain in Figure 9c is more in keeping with the UKV forecast shown in Figure 9a. On closer scrutiny, this is because the variability in the perturbed forecast field is reduced, particularly close to the grid scale inside the larger areas of rain.
4.2. Improvements to the Method for Blending Nowcast and NWP Forecast Components
 The scale decomposition described in equation (1) can be applied to any observed or forecast rainfall field. Once decomposed, a rainfall nowcast and one or more, longer range NWP forecasts can be combined with a time series of synthetic rainfall fields (noise) using equation (6) below. This produces ensembles of seamless, composite forecasts, reflecting the error growth in its components over time
 In equation (6), the blended, normalized (zero mean, unit variance) rainfall field for cascade level, k, forecast step, t, and ensemble member, p, denoted by , is a weighted sum of N cascade level fields comprising a nowcast and one or more NWP forecasts, c = 1,2,…,M and a cascade of noise Yn that is correlated in both space and time.
 The method of calculating the weights was changed from BPS to account for the covariance between the forecasts and enable the blending of more than two forecasts. The weights, and , are determined from estimates of the skill of the forecast components, derived on each cascade level (scale) by cross correlating a recent, decomposed nowcast or NWP forecast of surface rainfall rate with a time synchronous, decomposed radar-based analysis of the same quantity. The vector of weights for level k, time lag, t, , is given by
where is the covariance matrix of the forecasts and is the covariance vector of the forecasts with the “true” rainfall for level k, lag t. If the model forecast errors are assumed to be independent of each other, and if the correlation between radar rainfall and observed rainfall is known then for two models (M = 2) we get
 The skill of the NWP forecast is estimated by calculating the correlation coefficient between the current (t = 0) radar-based analysis of surface rainfall rate and a time synchronous forecast from model, c, level k, lag t. ρe is the (estimated) correlation between the radar rainfall estimate and the “truth.” The weight (the standard deviation) of the remaining noise term is
 Current estimates of the skill of the NWP forecast component, c, on each cascade level, k, are relaxed to climatologies following a similar approach to that described for the nowcast. A skill climatology for the Met Office's highest resolution (∼1.5 km), convection resolving configuration of the Unified Model (UKV) is shown in Figure 10. This relates to an 8 level, q = 2 decomposition, and is based upon a comparison with Met Office radar-based analyses of surface rainfall rate [Harrison et al., 2012].
 It is interesting to note that whilst, on average, the forecast skill is very good at the top of the cascade (256 km–512 km–1024 km) and declines only gradually over the 36 h range of the forecast, skill decreases ever more rapidly down the cascade and with increasing forecast lead time. At around the 128 km scale, a very short range UKV forecast typically explains about a quarter of the variance in the quality controlled observations.
 From equation (7) it is evident that the weight assigned to the nowcast and each NWP forecast component on each cascade level (scale) in a composite forecast is a function of the fraction of the variance each explains in the observations at that scale. Since high resolution NWP models in the Met Office now directly assimilate radar reflectivity, these weights must incorporate covariance terms to account for correlations between the nowcast and NWP forecasts.
4.3. Treatment of Radar Errors
 BPS did not account for errors in the radar-based analyses of surface rainfall rate that are used as the starting point for extrapolation nowcasts. Failure to properly account for these observation errors will limit the predictive skill of STEPS ensemble nowcasts in the first hour or so. Studies involving collocated radar and disdrometer measurements show that radar errors exhibit measurable coherence over time scales of the order of 1 h and spatial scales of the order of tens of kilometers [Lee and Zawadzki, 2005a, 2005b, 2006; Lee et al., 2007].
 Radar errors fall into three, broad categories: measurement biases, physical biases, and random sampling errors [Austin, 1987]. Historically, most effort has been invested in correcting physical (e.g., ground clutter and beam blockage) and measurement (e.g., Z-R conversion) biases due to the emphasis on improving operational, deterministic estimates of surface rainfall rate and accumulation from weather radar. More recently, however, the treatment of random sampling errors has been given more attention. This shift of emphasis coincides with significant growth in the operational deployment of ensemble NWP models and a concomitant recognition of the importance of accounting for observation and forecast uncertainties when using meteorological predictions in decision making.
 Several approaches to the modeling of random sampling errors in weather radar observations have been described in the literature. One entails modeling the characteristics of individual sources of error [Jordan et al., 2003; Lee and Zawadzki, 2005a, 2005b, 2006; Lee et al., 2007] and the other relies upon a statistical description of the difference between the radar estimates and a reference [Ciach et al., 2007; Llort et al., 2008; Germann et al., 2009]. The difficulty with the former approach lies with the fact that the true error structure of radar observations can vary significantly depending on the meteorological conditions and is therefore very difficult to ascertain. The challenge with the second approach is the requirement for a reference field: this is usually a dense network of point observations from rain gauges.
Norman et al.  evaluated two radar based ensemble generators following methodologies proposed by Germann et al. , hereafter GM09, and by Lee et al.  and Jordan et al. , hereafter, Seed10. The Seed10 scheme models radar errors from two specific sources: those arising from the use of a fixed Z-R relationship, and those originating from assumptions made about the vertical profile of reflectivity (VPR). By contrast, GM09 is a statistical model of the errors in radar inferred estimates of subhourly (15 min) rain accumulations, constructed using a large historical rain gauge data set. In their case study orientated investigation, Norman et al.  demonstrated similar levels of performance by the two approaches, despite the fact that the VPR error model in Seed10 had not been rigorously calibrated using UK-based data sets prior to evaluation.
 In 2011, STEPS was extended to incorporate GM09, following a calibration exercise using four years' worth of radar and rain gauge data. A 15 min accumulation period was considered an adequate proxy for instantaneous rainfall rate. Four attributes of the radar errors were derived: a mean error vector (the systematic error), the variance of the errors (magnitude of the random error), and the spatial and temporal correlations in the error fields. A covariance matrix was used to store information about the standard deviation and spatial correlation of the errors. Since the covariance and mean error can only be measured at gauge locations, their point values were interpolated in space. A second-order auto-regressive (AR-2) process was used to impose temporal correlations. Figure 11 shows the bias and variance of the radar errors derived using the GM09 methodology.
 These four attributes of the radar errors enable STEPS to produce time series of perturbations representing the observation error with which to generate ensembles of analyses of surface rainfall from a single, best estimate of the distribution of surface rainfall rate [Harrison et al., 2012].This capability is demonstrated in Figure 12. In keeping with STEPS' cascade modeling framework, the perturbation fields are scale decomposed for combination with the scale decomposed analysis of surface rain rate.
Pierce et al.  evaluated the performance of ensembles of rainfall nowcasts produced by two versions of STEPS, one incorporating the observation error models, the other not. With a limited case study data set dominated by widespread rainfall events, they demonstrated a small improvement in the spread-skill relationship of T + 1 h nowcasts of 60 min rainfall accumulation in both cases.
5. Performance Evaluation of STEPS-2 Nowcasts
5.1. A Case Study-Based Appraisal
 In this section, the aim is to demonstrate the capabilities and performance of STEPS with some examples of nowcast products.
 An example of a STEPS control member nowcast of surface rainfall rate, blending an extrapolation nowcast and UKV forecast, is shown in Figure 13. The starting point (T + 0) is a radar-based analysis of rainfall rate (top-left) valid at 1500 GMT on 30/09/2013. Nowcasts for T + 30 min, T + 1 h, T + 3 h, and T + 6 h are shown in subsequent frames. Since the end point of the nowcast evolution is the UKV forecast valid at T + 6 h, this is shown in the bottom right-hand corner.
 The radar-based analysis is in fact constructed from a combination of quality controlled radar data, satellite estimates of surface rain rate and a short range UKV forecast. Outside the area of radar coverage, the latter two components are combined using a 2-D variational assimilation scheme [Wright, 1994]. In these areas, the NWP forecast tends to dominate and this is apparent in the subtle change of character of the field over the North Sea and the south-west corner.
 After four time steps (1 h), the nowcast is still recognizably an extrapolation of the analysis, although the convective scale detail in the raining areas has evolved. By T + 3 h, the handover to the UKV forecast is in evidence over north-western areas in the form of new areas of showery activity. The increasing influence of the UKV can also be seen in the subtle changes in character of the main rainfall areas over England and Wales and in the north-west corner of the domain. BY T + 6 h, the transition to the UKV forecast solution is clearly complete.
 This example demonstrates the ability of STEPS to produce a seamless, coherent nowcast evolution from the extrapolation nowcast to NWP (UKV) forecast. The relative contributions of these components to the final product are a function of their relative predictive skill, estimated on each level of the cascade. Since the extrapolation nowcast loses skill most rapidly at the bottom of the cascade, the handover to the UKV solution proceeds from the smallest scales upward with each successive forecast step. The statistical performance of this scale decomposition-based blending is examined in the following section.
 Figure 14 compares three members of a T + 3 h STEPS ensemble nowcast of rain rate from 1500 GMT on 30 September 2012 with the equivalent control member nowcast. The ensemble members were created by perturbing the blended extrapolation nowcast and UKV forecast shown in Figure 13 with nonparametric noise. A multiplicative correction of the form described in equations (5a) and (5b) was applied to the noise cascades.
 Although the three ensemble members are broadly similar to the control member, as would be expected, there are noticeable differences in the character of the perturbed solutions when compared to the unperturbed nowcast. For example, the small scale, linear features visible in the main frontal rain band (originating from the UKV forecast) are lost in the ensemble members. This is because the noise perturbations are homogeneous in form at the convective scales. Furthermore, the UKV derived showers in the north-west corner are more extensive and less clearly defined in the ensemble than in the control, although the orientation of the showers into lines is well captured (the direction of the predominant anisotropy).
 The positions of the two most significant bands of rain are similar in all three members. These are determined by the distribution of power on the top cascade level (256 km–512 km–1024 km). Since the skill of both the nowcast and UKV forecast is high at these scales, the contributions from the noise cascade are small and this is why the positional displacements are small also. The evolutionary differences between ensemble members are much more apparent at the convective scales because this is where the nowcast and UKV forecast are least skilful. Variations in the distribution of showers between members are pronounced over Scotland and Ireland, as are differences in the locations of the most intense rain rates within the two rain bands described above.
 An evaluation of spread-skill of STEPS' nowcast ensembles is presented in the following section.
 Figure 15 demonstrates the performance of STEPS' parametric noise generator. In this example, it has been used to downscale a single member of a MOGREPS-R (the regional configuration of the Met Office Global and Regional Ensemble Prediction System) ensemble rainfall forecast of rain rate. The dominant effects are a reduction in the raining fraction and a significant increase in the variance of the field. In some places, for example, in the north-eastern quarter of the domain, this has generated rain rates greater than 8 mm/h when the highest rates in the MOGREPS-R forecast are between 4 and 8 mm/h. It is apparent that the introduced variability is isotropic in form.
 No formal evaluation of this downscaling capability has been undertaken to date, but work is scheduled during 2013.
5.2. Performance of STEPS's Control Member Nowcasts
 STEPS's use of a scale decomposition framework has been justified on the grounds that it allows the dynamic scaling properties of rainfall fields to be modeled effectively. The use of such a framework should permit the generation of near optimal nowcast solutions, provided that estimates of the skill of the nowcast and NWP forecast components on each level of the cascade are a reasonable approximation of the true skill estimates and that geographical variations in skill at a given scale are not significant.
 To demonstrate this, the performance of the STEPS control member nowcasts has been compared with that of two STEPS variants in which the contribution of the extrapolation nowcast was fixed using weights loosely based upon the climatological performance of the extrapolation nowcast. The aim of this comparison was to clarify the benefits of the adaptive calibration achieved by recalculating an estimate of nowcast skill for each STEPS run.
 Figure 16 provides good evidence to support the case that the STEPS control member nowcasts are near optimal in terms of performance. One of the two hardwired variants is of comparable skill although it remains consistently inferior by a small margin. The main benefit of the dynamic calibration is confined to lead times between T + 2 h and T + 6 h. The performance of the 4 km configuration of the Met Office's Unified Model (UK4) is shown for comparison because this was used in the nowcast blend.
5.3. Performance of STEPS's Ensemble Member Nowcasts
 Ensemble forecasting systems are formulated in such a way that the forecast evolution predicted by each member should be equally likely to occur. In principle therefore, the likelihood of a given threshold exceedence is simply the fraction of ensemble members exceeding that threshold. However, this assumes that the ensemble forecast is perfectly calibrated: in other words, the observed frequency of a given event is always perfectly predicted by the ensemble. In reality, this level of skill is unattainable, because, a priori, it is not possible to model all sources of forecast error perfectly.
 BPS explored the performance of 100 member ensembles of STEPS nowcasts using a decomposition of the Brier Skill Score [Brier, 1950], the Relative Operating Characteristic [Mason and Graham, 2002] and ROC area. It was shown that STEPS ensembles are generally reliable with no significant loss of spread-skill for rain rates up to 1 mm/h. The aim of the summary verifications statistics presented here is to better elucidate the relationship between ensemble spread and error using a larger data set. To this end, verification statistics have been compiled from a month's worth of operational, 18 member ensemble nowcasts and quality controlled rain gauge measurements from approximately 400 rain gauges.
 One of the primary aims of ensemble forecasting is to predict the skill of the control (unperturbed) forecast. Ideally, when the spread of an ensemble forecast is small then the control member forecast should be more trustworthy than when the spread is large. Two aspects of ensemble spread-skill are of interest. First, on average, the spread of the ensemble about the ensemble mean should match the error in the ensemble mean forecast. This can be assessed by comparing average ensemble spread and ensemble mean error for a particular lead time. A second aspect concerns the relationship between ensemble spread and error in the ensemble mean at any given instant. This is normally established by examining the correlation between spread and error. Verification statistics pertinent to both of these attributes of spread-skill are presented below.
 Figure 17 shows a plot of STEPS average ensemble spread versus average RMS error in the ensemble mean as a function of nowcast lead time. For a skilful ensemble nowcast, the average spread about the ensemble mean should follow the average error in the ensemble mean nowcast. Furthermore, the spread about the ensemble mean should be less than the spread about the control. Similarly, the error in the ensemble mean (in this case, measured relative to rain gauge observations) should be less than the error in the control nowcast. In these respects one can conclude that STEPS ensemble nowcasts are skilful.
 In comparing ensemble spread and ensemble mean error it is important to consider the effects of observation error. The root-mean-squared error of an ensemble forecast against the truth can be approximated using
 Thus, accounting for observation error will tend to reduce the estimated forecast error. Since gauge errors are not insignificant, particularly for larger accumulations, the RMS error in the ensemble mean shown in Figure 16 will be an overestimate of the true error. Therefore, STEPS nowcast ensembles are somewhat more overspread than the figure suggests.
 Figure 18 shows a plot of the correlation between ensemble spread and ensemble mean error for an 18 member STEPS T + 3 h ensemble nowcasts of 1 h rainfall accumulation. Pairs of ensemble spread and mean error statistics have been grouped together according to the magnitude of the spread. A series of bins are defined whose ranges are such that an equal number of pairs fall into each bin. This avoids a scenario in which the bin associated with the largest values of ensemble spread contains the smallest number of samples. The mean value of the ensemble spread in each bin must then be compared with the mean value of the ensemble mean error in the same bin. A straight line has been fitted to the average values of spread and error in each bin using linear least squares regression.
 This line is compared with two reference lines, fitted to similar data derived from several reference ensembles, one representing an equivalent ensemble with no-skill (near horizontal line), the other an ensemble with perfect spread (the diagonal line with a slope of 1). The former is derived by associating each measurement (data sample) of ensemble spread with a randomly selected measurement of the ensemble mean error. The latter is derived by replacing the observation used in deriving the ensemble mean error with a nowcast value from a randomly selected ensemble member.
 It is evident that STEPS ensembles are under spread for the smallest errors and over spread for the largest errors. These findings are consistent with BPS in showing that STEPS ensembles tend to be overconfident.
 There is a growing demand for high resolution nowcasts and forecasts of rain that can be used to drive hydrological models and end-user impact models. On-going improvements in the availability of high resolution radar and satellite observations and convection resolving NWP forecasts dictate the need to quantify very short range forecast uncertainties because forecast errors grow most rapidly at the smallest scales. This implies the need to model the space-time characteristics of rainfall and associated forecast errors so that well calibrated, ensembles of nowcast solutions can be delivered to the user. The provision of ensembles will allow the propagation of forecast errors through a range of downstream models including those used for hydrological forecasting and flood warning. A multiplicative cascade model is a very convenient frame work for modeling errors in rainfall since it is able to generate fields that scale in both space and time, and allows for a scale-dependent treatment of the forecast errors.
 Time is of the essence in operational nowcasting because the forecast information that is to be found within the observations is highly perishable. This favors solutions where the complexity is kept to a minimum and simple but elegant approaches should be preferred over complex or highly engineered solutions. Operational experience with STEPS has highlighted a number of areas where the initial formulation was too simple. In particular, the impact of radar observation errors is significant in the first hour of a nowcast and this needed consideration. Two approaches to the modeling of errors in radar estimates of surface rainfall rate have been developed. A gauge-based approach is now used in the Met Office's version of STEPS whilst a stochastic space-time model of radar estimation errors is used in Australia.
 Rainfall is frequently, perhaps usually, arranged in anisotropic bands, and reproducing these anisotropic structures becomes more important as the scale of the modeling domain increases. The simplest way to reproduce anisotropic fields is to use the FFT of an observed and/or NWP forecast field to construct the filter. This can then be used to impose the precise spatial correlations on a field of white noise. Such an approach is particularly effective, perhaps too effective, because small-scale artifacts from poor quality radar data are also faithfully reproduced. An isotropic, parametric filter with a scale break at 20 km was developed for situations when a reference rainfall field is not available.
 The initial version of STEPS assumed that the statistical properties of rainfall fields were homogeneous over the forecast domain. This is a reasonable assumption if the horizontal dimensions of the domain are small, say 200 km, but is a limitation for domains that are of the order of 1000 km. Inhomogeneities arise due to geographical variations in the dynamics and physics of the meteorology and often give rise to coexisting frontal rain bands and convective showers. The space-time structure of rainfall is therefore not stationary over a large domain and it is not altogether clear how to account for this and still meet the stringent time constraints of an operational nowcasting environment.
 Forecast errors, both from the advection nowcast and NWP forecast, are currently assumed to be homogeneous over the forecast domain. This is unlikely to be the case, particularly in areas where topography plays an important role in the distribution and intensity of rainfall. The space-time structure of both the nowcast and NWP forecast errors, and how the errors depend on the meteorology of the day, is not well understood and requires further investigation if we are to make the spread of the ensemble represent the magnitude of the actual forecast error at any given location and time on any given day.
 The authors would like to thank the anonymous reviewers for their helpful contributions.