Ash clouds emanating from volcanic eruption columns often form trails of ash extending thousands of kilometers through the Earth's atmosphere, disrupting air traffic and posing a significant hazard to air travel. To mitigate such hazards, the community charged with reducing flight risk must accurately assess risk of ash ingestion for any flight path and provide robust forecasts of volcanic ash dispersal. In response to this need, a number of different transport models have been developed for this purpose and applied to recent eruptions, providing a means to assess uncertainty in forecasts. Here we provide a framework for optimal forecasts and their uncertainties given any model and any observational data. This involves random sampling of the probability distributions of input (source) parameters to a transport model and iteratively running the model with different inputs, each time assessing the predictions that the model makes about ash dispersal by direct comparison with satellite data. The results of these comparisons are embodied in a likelihood function whose maximum corresponds to the minimum misfit between model output and observations. Bayes theorem is then used to determine a normalized posterior probability distribution and from that a forecast of future uncertainty in ash dispersal. The nature of ash clouds in heterogeneous wind fields creates a strong maximum likelihood estimate in which most of the probability is localized to narrow ranges of model source parameters. This property is used here to accelerate probability assessment, producing a method to rapidly generate a prediction of future ash concentrations and their distribution based upon assimilation of satellite data as well as model and data uncertainties. Applying this method to the recent eruption of Eyjafjallajökull in Iceland, we show that the 3 and 6 h forecasts of ash cloud location probability encompassed the location of observed satellite-determined ash cloud loads, providing an efficient means to assess all of the hazards associated with these ash clouds.
 Volcanic eruptions often push robust eruption columns high into the atmosphere, forcibly injecting large quantities of fine ash that are carried downwind for thousands of kilometers, disrupting air travel and posing a hazard for human life and property. The frequency and density of air travel worldwide has increased the likelihood that an eruption anywhere in the world will disrupt air travel and place flights at risk. Quantitative prediction of the path of ash clouds in the atmosphere is essential to effective hazards mitigation, and quantifying uncertainty associated with the position of these clouds is essential to assessing that risk.
 The April 14–17 eruption disrupted European airspace for weeks, and focused scientific effort on tracking and characterizing volcanic ash clouds. This high impact event led to an examination of dispersion and transport models at a World Meteorological Organization sponsored conference in Geneva in September 2010, where most existing models and practices were reviewed. The results of this review are available at http://www.unige.ch/sciences/terre/mineral/CERG/Workshop/results.html. Chief among the technical aspects discussed was the operational use of dispersion model forecasts for volcanic ash clouds, and the sensitivity of these forecasts to uncertainties in wind, eruption column dynamics, and the time-varying nature of the delivery of ash to the atmosphere.
 A number of recent papers have compared transport models to observation data to determine optimal source parameters for the columns feeding ash clouds. Comparisons of highly developed transport models to a variety of different measurements of the 2010 Eyjafjallajökull ash clouds are presented by Dacre et al. , Kristiansen et al. , and Stohl et al. . Stohl et al.  and Kristiansen et al.  incorporate uncertainties as they invert for eruption source parameters for April and May, 2010, minimizing a cost function that closely follows Eckhardt et al.  for clouds of volcanic gas. Similar results are found by Dacre et al. , fitting a different transport model (NAME) to a wide range of data to track the April 14–17 ash cloud across Europe. These papers are representative of studies that show that ash cloud transport models using complex wind fields, obtained either from global forecast systems or from re-analyzed wind data, can be used to infer source parameters for ash injected into the wind by a volcanic eruption. These models, which passively transport ash, can generate useful maximum likelihood estimates (MLE) for the most probable combination of model input parameters with only a few hours of satellite data [Kristiansen et al., 2012; Stohl et al., 2011], and generate a posterior probability distribution for model input parameters. The results presented in these papers show that the sources of ash in the eruption column are characterized by a strong maximum with altitude or in the variation of emission rate with altitude. This characteristic may be used to constrain predictions of future ash transport.
 In this paper we exploit the nature of observed ash clouds in heterogeneous wind fields (e.g., in the presence of horizontal and vertical wind shear) to develop a method to forecast the uncertainty of future ash distributions by measuring the uncertainty in the source parameters. Two of the most important source parameters, plume height and ash emission rate with time and altitude within the plume, are chosen, though other parameters could be included as could uncertainty in atmospheric conditions. Observations of ash clouds are provided by imagery from the geosynchronous Spinning Enhanced Visible/Infrared Imager (SEVIRI) operated by Eumetsat (http://www.eumetsat.int/publications). From these data, ash is identified [Pavolonis et al., 2006; Pavolonis, 2010] and the total column loading (mass/unit area in g/m2 or tonnes/km2) of volcanic ash is determined [Pavolonis and Sieglaff, 2010; M. J. Pavolonis et al., Automated retrievals of volcanic ash and dust cloud properties from upwelling infrared measurements, manuscript in preparation, 2012]. SEVIRI is capable of detecting volcanic ash when not obstructed (from above) by liquid water or ice clouds and when the ash concentration filtering radiation observed by the satellite is at least 0.02 mg/m3. The time period of study was chosen to correspond with conditions favorable for satellite retrievals of ash loading. Comparing these satellite-derived data with model output for cloud load we determine a preliminary MLE for model input parameters. Using Bayes theorem, the MLE is combined with prior constraints on model input to determine the posterior probability of model input parameters. In addition, we show that applying a saddle point approximation to posterior distributions is equivalent to minimizing the cost function used byKristiansen et al. , Stohl et al. , and Eckhardt et al. to determine the posterior distribution of model input parameters. The posterior distribution is then integrated to forecast the uncertainty of future ash distributions 3 and 6 h in advance, and these forecasts are converted to cloud loads and compared with observed satellite-derived cloud loads at those times. For these forecast intervals, the probability of ash occurrence along any flight path can be integrated to determine the probability of ingesting ash along that path, given the uncertainties in model and satellite data, and can be used for hazards estimation.
2. Method of Analysis
 The fit of a model Hi with an ensemble of parameter inputs w to satellite observations D provides an estimate of the likelihood P(D|w, Hi) of observing D with that particular set of model inputs. The best fit (or minimum residual misfit) is the most probable set of values for the input parameters wMP. Using Bayes theorem [Mackay, 2003], the posterior probability of the parameter input ensemble w is
where the prior probability P(w|Hi) is found from the initial values the parameters for model Hi are expected to take, P(D|w, Hi) is defined above, and
is the integral over all possible combinations of likelihood estimates multiplied by the prior estimates for the data w. The probable range and distribution for each parameter in w can be estimated from other measurements, or may be unknown.
 Consider any given transport model Hi with a set of input parameters w that is used to calculate an ash cloud density and location. We describe the probability that the model result, when vertically integrated, will produce a model ash cloud load distribution Hw that overlaps a satellite-observed ash cloud load distribution y. A forward problem would hold the model parameters fixed and calculate the probability of ash distribution within a three dimensional model domain that is then integrated to get cloud load. The likelihood is determined by holding an observed data set D within that domain fixed and varying model input parameters w to determine which combination produces the best estimate for the observed cloud load, the maximum likelihood estimation. To obtain this maximum likelihood estimate, we maximize the L1 norm (http://mathworld.wolfram.com/L1-Norm.html) defined by
where σy2 + σw2 is the sum of the variance in the observations and the model parameters, respectively, and ym,n is the observed cloud load at the top of a column of cells at horizontal position m, n in the model domain, averaged over the top cell surface area. Since satellite data are measured as cloud load over an incremental, plan view surface in Earth's atmosphere, the inner sum (over k) adds model results vertically over a column of cells in the three-dimensional model domain to produce a value atm, n.The spatial complexity of the wind fields associated with ash transport and the thin ribbons of ash that characterize ash clouds usually result in a single, dominant maximum in the residual between model output and satellite-determined ash concentrations. This characteristic toward a centralized distribution may be exploited to rapidly determine the normalized posterior distribution given inequation (1), which is then used to estimate the uncertainty in the distribution of future ash clouds.
 Ribbons of ash typically disperse from a volcanic eruption column into a narrow range of elevations, and usually within a single elevation (rather than multiple elevations), as shown in Figure 1 for the May 5–8, 2010 eruption of Eyjafjallajökull in Iceland (http://en.vedur.is/earthquakes-and-volcanism/articles/nr/1884). Often the ash cloud trail and the eruption column that feeds it are visible for miles if the sky is clear around the column. A plan view of the May 7, 2010 plume from Eyjafjallajökull volcano is shown in a NASA satellite photo in Figure 2. The altitude of this cloud is constrained by satellite data, as well by visual observation and radar measurements near the volcano as shown in Figure 3 [Arason et al., 2011]. Here the maximum height of dispersal of ash from the eruption column remained between 5 and 6 km above sea level (asl) for the entire eruptive episode, providing a good test of a forecast model since the uncertainty in cloud height near the volcano is sharply reduced.
 To forecast ash dispersal, we estimate the posterior probability of model input data, using equations (1)–(3) and the program Ash3d [Schwaiger et al., 2012]. If the posterior probability for peak height at the maximum likelihood mass concentration is characterized by a prominent peak enclosing most of the probability, as illustrated in Kristiansen et al. , then a saddle point approximation is appropriate [Butler and Wood, 2004]. Applying this approximation, the maximum is approximated as a Gaussian. Expanding the logarithm of the Gaussian distribution that approximates the peak in P(w|D, Hi) in a Taylor series around the peak value (at w = wMP),
 The probability P(w|D, Hi) in equation (4)is the un-normalized Gaussian centered on w = wMP, or
 The normalizing constant in this distribution, normally obtained from equation (2), is instead obtained from the integral over the Gaussian approximation to P(w|D, H) at w = wMP, and is
 Generalizing this approach for a vector of variable input parameters w, each with its own probability distribution, the single curvature in equation (6) becomes the Hessian matrix (equivalent to error bars in w, a measure of the quality of fit)
where the divergence operators are derivatives with respect to parameters w. Thus if the un-normalized posterior probability has a dominant peak, then the vector of most probable input parameterswMP for that model is approximated by
 If for a single model Hi the posterior probability consists of two independent sets of data that may be described by two Gaussian distributions, one for the prior distribution of parameters wP and one for the misfit between individual observations y contained in D and the maximum likelihood predictions HwMP, then the un-normalized posterior probability is the product
 Normalizing this posterior probability distribution, using the properties of a Gaussian distribution (equation (7)), gives
where K is the dimension of parameter space and the square root sign encloses the product of the determinants of the covariance matrices [Mackay, 2003]. This approximation avoids the costly integral in equation (2), replacing it with the simpler equation (7).
 Taking the log of this normalized posterior probability distribution
 The solution is determined from the maximum in the function ln P(w|D, Hi)|MP. The first term on the right is the log of the product of the determinants of the covariance matrices for prior distributions and model misfits. The last two terms on the right have the same form as the cost functions used by Eckhardt et al. , Stohl et al. , and Kristiansen et al.  to fit atmospheric transport models to observational data to determine a maximum likelihood solution. In their papers, the covariance matrices which correspond to the C matrices in this paper are assumed diagonal. In this paper these Hessians are obtained from equations (13) and (14) and may or may not be diagonal, depending upon the shape of the likelihood and prior distributions with respect to the Δy and Δw variables. If the shape is symmetrical or asymmetrical and elongated in one coordinate direction (as it often is), the Hessian is diagonal, and indicates that the assumption of a diagonal covariance is appropriate. Uncertainty in defining Gaussians to fit the peaks in prior and likelihood distributions is dependent upon measurement uncertainty, which is included in the diagonal elements of the Hessians.
 As in the studies of Kristiansen et al.  and Stohl et al. , inclusion of multiple observational data sets broadens the posterior distribution and strengthens the solution. If, for example, three different types of satellite data sets are used to compare with model results, then each observation – model data comparison has its own peak and Hessian CYj and
 Predictions of future ash concentrations are based upon this model performance. The distribution function for a future observation of ash concentration c derived from applying model Hi to observed data D, or the conditional probability P(c|D, H), is given by the posterior probability densities (9), (15), or (17). The distribution function for an ash concentration c in excess of some threshold concentration c0 is
where the integration is over
 The probability (or uncertainty) that any future ash concentration c(w) will exceed any specified bound c0 is provided by equations (18) and (19).
3. Application to Ash Clouds From Eyjafjallajökull Volcano, Iceland
 During the April eruptive events meteorological clouds frequently obscured the ash plume near the volcano and increased uncertainty in the source of errors in modeled ash clouds over Europe. However, this was not the case for another eruptive sequence beginning on May 5th at 1800 h UTC and lasting for 3 days (http://www.earthice.hi.is/page/ies_Eyjafjallajokull_eruption). Unlike the April 14–17 plume, the tracking of the maximum height of this plume near the volcano was uninterrupted throughout the May 5–8 eruptive episode. This latter sequence produced a cloud extending into clear skies south of the volcano (Figure 1), so that most of the cloud development was visible and mapped by SEVIRI as well as observers on the ground. For this event the maximum height of the ash cloud near the volcano (Figure 3) was tracked visually and measured by radar [Arason et al., 2011]. Since in a heterogeneous wind field, plume height above the volcano is one of the eruption source parameters having the most effect on transport, this continuous maximum height constraint combined with the clear view of the ash cloud by the SEVIRI satellite provide strong constraints on ash transport during this eruptive episode and make the May 5–8 eruptive event an excellent candidate to test a forecast method. However, our method is not limited to two parameters, and may be used to measure the constraints placed by data on any number of model parameters. We have chosen plume altitude and ash emission rate versus altitude above the volcano as the two primary source parameters.
 To test our method, satellite data were first projected onto a computational grid, which in this case is in a polar coordinate system with three component directions, longitude, latitude, and elevation. The satellite data were processed using the methods described in Pavolonis et al. [2006, 2010], Heidinger and Pavolonis , and Pavolonis and Sieglaff , and the data were converted into the same reference frame and global coordinate system as the model grid. Since the satellite data are cloud loads (in tonne/km2), the ash concentrations output from Ash3d are converted to cloud load by integrating vertically the three-dimensional distribution of ash concentration within each column on the computational grid. This produces a separate cloud load value for each longitude-latitude column of cells on the grid that can then be directly differenced with average satellite cloud loads at the top of those columns usingequation (3). The total cloud load measured by SEVIRI for May 6th and 7th was found by integrating the satellite measurements over the surface of the Earth, and is shown in Figure 4. Also shown in the figure is the hour (hour 42 since eruption onset, noon on May 7th) when the maximum likelihood estimate was made relative to the start of the eruption (May 5th at 1800 h UTC). The dispersal of the ash cloud at about noon on May 7th is shown in Figure 2.
 The prior conditions for input to the transport model were made from these preliminary observations. The top of the plume above the volcano is assumed to be the same as the eruption column height shown in Figure 3. The ‘total’ mass is for the ash cloud emanating from the volcano, and is typically about 5% of the mass flux in the eruption column [Devenish et al., 2012]. The dispersal of ash from the column downwind into the cloud is typically in the size range of 2 to 25 microns and therefore is measurable by satellites [Devenish et al., 2012; Schumann et al., 2011]. Here a constant emission rate is assumed, so that after 42 h of continuous eruption the rate is estimated as the total mass divided by the elapsed time. For these conditions and winds obtained from the Global Forecast System (GFS) [Hamill et al., 2006], the prior distribution P(w|H) obtained is plotted as black line contours in Figure 6. The ranges investigated for model input are the distinct altitudes at which the eruption column disperses ash downwind into the cloud. To make a rapid assessment, this altitude is approximated coarsely, and ranges from 2 to 12 km in 1 km increments. For each altitude, the total mass emitted ranges from 1 to 2300 tonnes, emitted at a constant rate over a period of 42 h.
 Downwind, the position of the ash plume is most strongly dependent on the altitude and the time at which each part of the eruption column feeds ash into an unsteady wind field. We sum these contributions individually, breaking down the altitudes and mass flux as discussed above. Shown in Figure 5are the results of 6 model runs with the same mass emission rate but with different altitudes of ash delivery to the atmosphere, labeled as plume height in the figure. The model ash cloud loads overlap the satellite-determined cloud loads, given in tonnes/km2, by varying amounts. The maximum likelihood estimation is based on this overlap and specifically the value of cloud load predicted by the model versus that observed by the satellite. The distribution of likelihood estimates is shown in Figure 6, along with the much broader and flatter distribution obtained from the initial model parameters, the prior distribution. The maximum likelihood estimate is at the peak value, and may be determined by starting at the prior values and using a quasi-Newton method such as Broyden-Fletcher-Goldfarb-Shanno [Press et al., 1992] to determine the maximum. Any quasi-Newton or steepest-descent procedure can be used to find the maximum and provide an estimate of the Hessian at this location. The maximum likelihood estimate obtained in this way indicates that a total mass about 1600 tonnes was emitted over a period of 42 h at an altitude of about 4 km asl to produce the ash plume observed at noon on May 7, 2010. The likelihood distribution exhibits a strong localized maximum incorporating most of the probability, justifying the use of the saddle point approximation to normalize it [Butler and Wood, 2004]. Similar singular, localized maxima in posterior distributions are common in the a posteriori results presented by Kristiansen et al.  for the transport models they tested as applied to the entire eruptive sequence of Eyjafjallajökull volcano in April and May, 2010. A saddle point approximation is appropriate for eruption column mass distributions when ash mass emission rate is concentrated over a narrow altitude range.
 Applying the saddlepoint approximation here, the normalized posterior distribution takes the form of Figure 7, where it is normalized to its peak value and compared to a similarly normalized prior distribution. Unlike the prior probability distribution, most of the posterior probability is concentrated beneath a peak at around 4 km height and total ash cloud mass of about 1600 tonnes, the lower mass reflecting incomplete overlap between the model results and the satellite measurements as shown in Figure 5. This normalized posterior probability may be used to predict the uncertainty of determining future ash distributions from this eruptive plume using equation (18).
 For an assumed cloud depth (vertical dimension) of 1 km and a threshold concentration of 2 mg/m3, forecasts made using equation (18) may be compared in plan view to the ash cloud loads measured at forecast times with SEVIRI (measurements by Schumann et al.  show that typical ash cloud thicknesses are less than 1 km). The results of making this forecast test three hours in advance and six hours in advance are shown in Figures 8a and 8b, respectively. In both cases, the probability is integrated over the peak in the posterior probability surface in Figure 7 using equation (18) and the result of the individual model inputs with their associated probabilities are summed for each cell in the computational grid. The satellite measurements of cloud load are plotted on top of these results as black line contours. In plan view, the uncertainty estimates for these ash forecasts form an envelope around the contours of the clouds that actually occurred, and mimic the shape of the actual cloud loads.
 Satellite measurements of the ash plume emanating from the volcano during the May 5–8th, 2010 eruptive episode provide strong constraints on the altitude at which ash is fed into the plume and the ash emission rate that can be determined from the satellite data for that altitude. The three dimensional nature of the wind field displaces the model ash clouds away from the satellite-observed ash cloud for small changes in altitude of ash dispersal (Figure 5), and this results in a strong localized maximum in the maximum likelihood estimate (Figure 6) and correspondingly in the posterior estimate of optimal model parameters (Figure 7). If meteorological clouds obscure the ash cloud, then there is a loss of overlap and the strength of the source estimate is reduced. The MLE is the best estimate of the source for any atmospheric condition, some of which will be more favorable for source estimation than others.
 Here we have shown that if the prior distribution of model input data and the maximum likelihood estimation of model parameters are assumed to be Gaussian, then using the cost function developed by Eckhardt et al.  to fit a model to data and determine a posterior probability is functionally equivalent to assuming Gaussian distributions for the prior and likelihood distributions and using Bayes theorem to determine a posterior probability. The degree to which the posterior peak in Figure 7 is more focused than the prior peak is a measure of the Occam factor, which is a measure of how much the variation in the parameter space is reduced once a data fit is made [Mackay, 2003]. The Occam factor penalizes models that have many parameters unconstrained by data, a wide, unconstrained parameter range, or have to be finely tuned to fit the model to data y.
 The maximum likelihood fit determined here for May 7th at noon is consistent with the results of other studies. Both Stohl et al.  and Kristiansen et al.  modeled this eruptive sequence using the cost function mentioned earlier. Their results show that the model FLEXPART (http://transport.nilu.no/flexpart/model-information) produces posteriori plume heights around 6 km whereas NAME (http://en.wikipedia.org/wiki/NAME_%28dispersion_model%29) produces heights of about 4 km in accord with this paper. In both cases the mass emission rates, when integrated over time and plume height, give total mass estimates comparable to those reported here. The reduced altitude (4 km versus ∼6 km in Figure 3) indicates that though the measured top of the plume was about 5.5 km, the calculated interaction of the plume with the wind field indicates most mass was dispersed downwind from within the column at an altitude of 4 km. Settling of the particles is included in this model estimate, though the size range usually dispersed as a cloud (2–25 microns) settles very slowly.
 Data may be assimilated into a forecast model by using Bayes theorem to objectively combine data and model uncertainty. If quasi-Newton procedures are used to determine the maximum likelihood fit, then this also determines the Hessians inequations (15) or (16). Once this information is obtained (a few hours of satellite data may be sufficient), predictions can be made from the posterior distributions obtained from equations (14), (15), or (16).
 Predictions of ash concentrations 3 h and 6 h in advance, made by integrating the posterior distribution in Figure 7 as described in equation (18), are shown in Figures 8a and 8b, respectively, where the flooded contours are for a threshold concentration of 2 mg/m3 and zero probability regions have no color. Plotted on top of these uncertainty estimates is the cloud load measured by SEVIRI corresponding to the same times. The coarseness of the altitude sampling in the model resulted in some striping of the final distribution of uncertainties, but in general the predicted uncertainty in ash distribution encompasses the observed ash distribution, showing that the method described above provides a reasonable estimate of future ash distribution and provides a good basis for hazards estimation.
 The threshold for these uncertainties may be used in a number of different ways. For example, any flight path through this space would integrate these uncertainties along the flight path (with uncertainties varying with time) to determine the probability of ingesting a given mass of ash into each engine, providing an estimate of flight hazards for any given flight path.
 The three dimensional heterogeneity of the wind in the earths troposphere and stratosphere may be exploited to rapidly and stably estimate the altitude and rate of ash emission from a volcanic eruption column into the atmosphere. The position of the ash cloud is particularly sensitive to the altitudes at which it enters the wind field at the volcano, and the rate of ash dilution as the cloud disperses downwind depends upon the ash emission rate at each altitude above the volcano. Infrared satellite techniques may underestimate the total mass of ash near the vent where the cloud is optically thick, but that is not a significant factor in these calculations. What matters is the mass effusion rate that feeds into an ash cloud and the altitude(s) where this rate is a maximum.
 Given these conditions, transport models may be fit to ash clouds using Bayes theorem, in which existing satellite and surface measurements of an ash plume are used to determine the initial (or prior) conditions and uncertainties. Comparisons of model output to data to determine the maximum likelihood estimate can efficiently be done using quasi-Newton procedures, starting from the prior parameter values, and this method provides a rapid determination of optimal model parameters. The centralized nature of the maximum likelihood and maximum posterior allows a saddle point approximation to be used to normalize the posterior distribution. The result of this procedure is a robust estimate of posterior probability, from which equally robust estimates of the uncertainty in future ash concentrations may be made.
 This paper uses the program Ash3d, which the first author initially wrote with Larry Mastin in January 2010. Since then the program was improved and made truly operational through the efforts of Hans Schwaiger and Larry Mastin, and without the work they did the scripts we wrote to perform the Bayesian testing in this paper, using Ash3d, would not have been efficient. We are also grateful for three insightful journal reviews that greatly improved the paper.