Application of stochastic simulation to CO2 flux from soil: Mapping and quantification of gas release



[1] Conditional sequential Gaussian simulations (sGs) have been applied for the first time to the study of soil diffuse degassing from different volcanic and nonvolcanic systems. The application regards five data sets of soil CO2 fluxes measured with the accumulation chamber methodology at the volcanic areas of Solfatara of Pozzuoli (Italy), Vesuvio cone (Italy), Nisyros (Greece), and Horseshoe Lake (California) and at the nonvolcanic degassing area of Poggio dell'Olivo (Italy). The sGs algorithm was used to generate 100 realizations of CO2 flux for each area. Probabilistic summaries of these simulations, together with the information given by probability plots, were used (1) to draw maps of the probability that CO2 fluxes exceed thresholds specific for a background flux, i.e., to define the probable extension of the degassing structures, (2) to calculate the total CO2 output, and (3) to quantify the uncertainty of the estimation. The results show that the sGs is a suitable tool to model soil diffuse degassing, producing realistic images of the distribution of the CO2 fluxes that honor the histogram and variogram of the original data. Moreover, the relation between the sample design and the uncertainty of estimation was investigated leading to an empirical relation between uncertainty and the sampling density that can be useful for the planning of future CO2 flux surveys.

1. Introduction

[2] Recently, a great interest has been addressed to the study of CO2 Earth degassing. There are many objectives to such studies, for example, the definition of the relations between the flux and the tectonic structures [Etiope, 1999; Etiope et al., 1999; Lewicki and Brantley, 2000], the quantification of deeply derived CO2 release to the atmosphere in the framework of the carbon global budget [Allard et al., 1991; Brantley and Koepenick, 1995; Seward and Kerrick, 1996; Kerrick et al., 1995; Marty and Tolstikhin, 1998; Williams et al., 1992], and the study of volcanic degassing. In particular, numerous studies have been focused on the CO2 soil diffuse degassing from quiescent active volcanoes [Brombach et al., 2001; Chiodini et al., 2001, 1996, 1998; Hernandez et al., 1998; Gerlach et al., 2001; Salazar et al., 2001; Farrar et al., 1995]. Most of these studies showed that gas is not released uniformly from the whole volcanic apparatus, but rather from relatively restricted regions, which were named diffuse degassing structures (DDS) [Chiodini et al., 2001]. Moreover, quantitative estimates of hydrothermal-volcanic gas released from DDS highlighted the importance of gas and thermal energy released by DDS in the mass and energetic balance of quiescent volcanic systems [Chiodini et al., 2001]. Independently from the specific aims of investigations, the mapping of DDS and the quantification of the amount of released CO2 can be considered common objectives of all these studies.

[3] Mapping of CO2 fluxes from soil was mainly performed by use of interpolation algorithms, generally kriging [e.g., Bergfeld et al., 2001; Chiodini et al., 1996, 1998, 2001; Rogie et al., 2001; Gerlach et al., 2001]. The kriging algorithm is focused in providing the “best,” defined in a minimized least squares sense, hence unique, local estimate of a variable without specific regard to the resulting spatial statistics of the all estimates taken together [Deutsch and Journel, 1998], producing a set of estimated values at the unsampled locations whose variogram does not match that of the original data. Moreover, kriging smoothes out the extreme extrapolated values, with small values being overestimated while large values are underestimated, hiding the pattern of high values, which is important in our applications to define degassing structures.

[4] Total CO2 releases were usually calculated either by multiplying the arithmetic mean value of CO2 fluxes by the surveyed areas, or by applying volume and area integration algorithms to the grids produced to contour the CO2 flux, or by a graphical statistical approach (GSA) independent from any interpolation technique (described by Chiodini et al. [1998]). Together with the quantification of total CO2 release, the definition of the related uncertainty is essential, especially in volcanic surveillance for the recognition of anomalous states. Kriging provides only an incomplete measure of local accuracy, except if a Gaussian model for errors is assumed, and no indication of the accuracy if several points are considered together [Deutsch and Journel, 1998]. On the contrary GSA approach allows the definition of a confidence interval for the estimation, but this calculation does not take into account the spatial correlation between the data, resulting generally in an overestimation of the uncertainty. This overestimation of the uncertainty is particularly unsuitable for monitoring purposes, since smaller uncertainty makes it possible to detect smaller variations.

[5] The aim of this work is the application of stochastic simulation algorithms to soil gas flux data. This approach is becoming common and preferred to traditional interpolation algorithms in the soil science, where the spatial variability of the measured attributes has to be preserved [Goovaerts, 2000], for example, in the definition and characterization of contaminated soils and groundwater [Goovaerts, 1997, 1999b, 2001, and references therein; Lin and Chang, 2000; Lin et al., 2001; Istok and Rautman, 1996] and for assessment of corn yield risks connected to soil strength/compaction [Lapen et al., 2001].

[6] The basic idea of stochastic simulation is to generate a set of equiprobable representations (realizations) of the spatial distribution of the attribute, all reproducing reasonably the global statistic and spatial features of data samples (i.e., sample histogram and semivariogram model), instead of producing a single representation that yields the minimum error variance at each location. The ensemble of these realizations is thus an explicit representation of the uncertainty associated with our conceptual understanding of the single, but unknown reality [Rautman and Istok, 1996]. According to Goovaerts [2001], differences among many simulated maps have been used as a measure of the uncertainty.

[7] In this paper a stochastic simulation algorithm is applied to 5 data set of soil CO2 fluxes measured in different volcanic and nonvolcanic degassing areas, with the objective of mapping the degassing areas, i.e., defining the diffuse degassing structures, and evaluating the total emitted CO2 with the associated uncertainty. Moreover, we try to define a reasonable criterion for the definition of an “opportune” sampling design.

2. Materials and Methods

2.1. Experimental Data Set

[8] The CO2 flux data used in this paper were collected during the last decade at Solfatara of Pozzuoli crater (2000, Naples, Italy), Vesuvio cone (2000, Naples, Italy), Poggio dell'Olivo gas manifestation (1999, Viterbo, Italy), Nisyros caldera (1999–2001, Nisyros Island, Greece), and Horseshoe Lake (1997, Long Valley Caldera, California). Excluding the data of Solfatara of Pozzuoli and most of the Nisyros data that are unpublished, the remaining data were already treated and interpreted with different statistical and geostatistical tools [Brombach et al., 2001; Chiodini et al., 1999, 2001; Rogie et al., 2001; F. Frondini et al., Diffuse degassing at Vesuvio, Italy, submitted to Bulletin of Volcanology, 2003]. In all the study areas the survey of diffuse CO2 flux from soil was performed by the accumulation chamber methodology. This methodology allows quick direct measurements of the CO2 flux from soil without drastically altering the natural flux in a wide range of fluxes [Chiodini et al., 1996, 1998; Evans et al., 2001; Welles et al., 2001]. The measurement is made over a surface of about 0.03 m2 that can be considered a “point support” measurement.

[9] The data sets used for the application of stochastic simulation have been chosen for their different size, ranges of measured CO2 flux values and for the differences in the sampling design used in each study area (Table 1 and Figure 1). In the case of Horseshoe Lake a subset of data has been selected from those previously published [Rogie et al., 2001], to obtain a regularly shaped study area without large border areas with no data.

Figure 1.

Location map of study areas and measurement point locations: (a) Solfatara of Pozzuoli; (b) Poggio dell'Olivo, (c) Vesuvio cone, (d) Nisyros caldera, and (e) Horseshoe Lake.

Table 1. Summary of Sampling Design and CO2 Flux Statistics
Study AreaSurveyed Surface, m2Number of MeasurementsSampling DesignMinimum and Maximum, g m−2 d−1Mean CO2 Flux, g m−2 d−1
Solfatara of Pozzuoli1.4 × 106414random3.0–309871300.0
Vesuvio cone3.4 × 10411020 m spaced grid0.1–25220.0
Poggio dell'Olivo8.2 × 105196random0.5–8797489.0
Nisyros caldera2.0 × 106288320–25 m spaced grid0.01–617539.7
Horseshoe Lake1.3 × 10531320 m spaced grid5.7–8670800.0

2.2. GSA Method

[10] In DDSs CO2 flux from soil is fed by multiple gas sources such as biological and volcanic. This dual origin of the gas often results in a bimodal distribution of CO2 flux values, which plots as a curve with an inflection point on logarithmic probability plots (see, e.g., Figures 3a, 3b, 3c, and 3e). On a logarithmic probability plot, a curve with an inflection point describes in fact the theoretical distribution of two overlapping lognormal populations, while a single lognormal population would plot as a “straight” line and n overlapping lognormal populations would result on a curve characterized by n − 1 inflection points.

[11] GSA method [Chiodini et al., 1998] consists in the partition of these complex distributions into different lognormal populations and in the estimation of the proportion (fi), the mean (Mi), and the standard deviation of each population following the graphical procedure by Sinclair [1974]. Since the computed statistical parameters (i.e., mean, standard deviation and proportion) refer to the logarithm of values, the mean value of CO2 flux and the central 90% confidence interval of the mean are estimated by means of the Sichel's t estimator [David, 1977]. The estimated mean flux values are used to compute the total CO2 output associated to each population. An evaluation of the area covered by each population (Si) is obtained by multiplying the study area (S) by the corresponding proportion of the population (i.e., Si = fiS). The total CO2 output associated to each population is then estimated multiplying Si by Mi. The total CO2 release from the entire studied area can be obtained summing the contribution of each population (i.e., ∑SfiMi). In the same way, the central 90% confidence interval of the mean are used to calculate the uncertainty of the total CO2 output estimation of each population.

[12] Even if the GSA approach resulted a useful tool for the interpretation of the diffuse degassing process [e.g., Chiodini et al., 1998, 2001], the results obtained by the GSA can be affected by some arbitrary choices: (1) The polymodal lognormal distribution of CO2 flux values is a model convenient for the subsequent decomposition, but it is not a fact (natural distribution of CO2 flux can be more complex than lognormal), and (2) the partitioning procedure does not result into a unique solution. Even to avoid these limits, an alternative approach based on stochastic simulation has been investigated.

2.3. Stochastic Simulation

[13] The simulations were performed using the sequential Gaussian simulation algorithm (sGs) by the program sgsim [Deutsch and Journel, 1998]. SGs operates considering an attribute (CO2 flux from soil in this study) as the realization of a stationary multivariate Gaussian random function. The attribute values are simulated at locations defined by a grid covering the area of interest. The simulation is conditional and sequential, i.e., the variable is simulated at each unsampled location by random sampling of a Gaussian conditional cumulative distribution function defined on the basis of original data and of previously simulated data within its neighborhood.

[14] As the sGs procedure needs a multigaussian distribution, which implies first that the one-point distribution of data (i.e., histogram of data) is normal, CO2 flux data that generally are positively skewed, have to be transformed into a normal distribution (normal scores of data). The transformation consists in substituting the original values by the corresponding quantiles of a standard normal distribution. Moreover, the normality of two-point cumulative distribution function of normal scores has to be checked for the application of the multigaussian model.

[15] The transformed data are then used in the simulation procedure. In practice, due to the multigaussian assumption, mean and variance of the Gaussian conditional cumulative distribution function can be defined at each location as the simple kriging estimate and variance respectively [Goovaerts, 1997; Deutsch and Journel, 1998]. Simple kriging estimate and variance are computed according to the semivariogram model of normal scores. Defining the cumulative distribution function at one location means knowing the probability of any possible value to characterize that location. A random value is drawn from the conditional cumulative distribution as one “reasonable” simulated value for that location [Rautman and Istok, 1996; Castrignanò et al., 2002]. Once a value is simulated, it is added to the data set and can be used together with the original data to estimate the variable at the next locations of the grid. The simulation proceeds to the next grid location and loop until all nodes are simulated. Afterward, the simulated normal scores are back transformed in to values expressed in original data unit, applying the inverse of the normal score transform. This back transforming procedure needs the setting of an upper tail and of a lower tail, which represent the maximum and the minimum values allowable for the simulated values, and the definition of the extrapolation model between the original data limits and upper and lower tails. Changing the random path of grid nodes visited, by changing the starting point of simulation procedure (random-number seed), N alternative simulations can be performed and N equiprobable realizations can be drawn, each honoring the sampled data at their locations, reproducing the data univariate statistics (histogram) and the data bivariate property (experimental variogram of normal scores), within reasonable ergodic fluctuations [Deutsch and Journel, 1998].

3. Applications and Results

3.1. Treatment of the Data With the GSA Method

[16] The data histograms (Figure 2) and logarithmic probability plots (Figure 3) show two main maximums and curves with one inflection point respectively, for the areas of Solfatara of Pozzuoli, Poggio dell'Olivo, Horseshoe Lake and Vesuvio. These distributions can be interpreted as the combinations of two lognormally distributed populations of CO2 flux. The proportion, the mean, the total CO2 output, and the relative 90% confidence interval of each population (Table 2) were computed following the GSA method above described. The validity of the two-populations model was checked through calculation of ideal combinations of partitioned populations until a satisfactory agreement between ideal mixtures and real data was obtained. The observed bimodal distributions reflect the coexistence of (1) low CO2 fluxes, generally connected to the biological activity in the soil (background population, Table 2), and (2) high CO2 fluxes connected to the degassing processes of deeply derived CO2 (hydrothermal population, Table 2). The mean CO2 flux of background populations ranges from 1 g m−2 d−1 at Vesuvio cone, where biological activity is very low due to the absence of vegetation and of a soil rich in organic matter, to 47 g m−2 d−1 at Horseshoe Lake, where the high background value is likely due to the contribution of some diffusive degassing of deep sources. The “hydrothermal” populations are generally characterized by mean values of CO2 fluxes (1373–3621 g m−2 d−1) 2 orders of magnitude higher than the background. However, at Vesuvio the mean CO2 flux of the “hydrothermal” population is very low (29.5 g m−2 d−1); in this case the anomalous population is recognizable because of the very low value of the background.

Figure 2.

Histogram plot of CO2 flux and summary of data statistics: (a) Solfatara of Pozzuoli; (b) Vesuvio cone, (c) Poggio dell'Olivo, (d) Nisyros caldera and (e) Horseshoe Lake.

Figure 3.

Probability plot of CO2 flux: (a) Solfatara of Pozzuoli; (b) Vesuvio cone, (c) Poggio dell'Olivo, (d) Nisyros caldera, and (e) Horseshoe Lake. Figure 3 shows the original samples (dots), the theoretical partitioned populations following the procedure of Sinclair [1974] (black line), and the probability plots of simulated values relative to 50 realizations randomly selected for each area (shaded lines).

Table 2. Estimated Parameter of Partitioned Populations and Derived Total CO2 Output
Study AreaCO2 Flux PopulationMean CO2 Flux, g m−2 d−1Proportion, %Total CO2 Output, t d−190% Confidence Interval, t d−1
Solfatara of Pozzuolibackground23.926123.420.7–27.3
Vesuvio conebackground1.03300.00940.0071–0.0174
Poggio dell'Olivobackground8.43295.44.6–6.4
Horseshoe Lakebackground46.85622.31.6–4.2

[17] At Nisyros caldera, CO2 fluxes distribute in a wide range from 0.01 to 6000 g m−2 d−1 suggesting also in this case the contribution of different sources. However, in the probability plot the log values do not fit a clear polymodal curve and the definition of the background population based on the partitioning of statistical populations is not possible. The absence of a clear polymodal distribution is most probably due to the presence of a widespread low level degassing of deeply derived CO2 linked to the several degassing structures active in the area, i.e., phreatic craters and faults. Such features are absent in the south east sector of the area. In this zone, the values of log CO2 flux distributed in the probability plot (not shown here) along a straight line suggesting the presence of a unique population, characterized by a mean value of 8 g m−2 d−1, which has been assumed as the biological background affecting the entire Nisyros caldera.

3.2. Applications of Sequential Gaussian Simulation

[18] Because of the nonnormality each data set has been converted in to a distribution with a mean of 0 and a unit variance, a normal scores transform, using the program nscore [Deutsch and Journel, 1998]. For each study area the variogram (γ) of normal scores (Figure 4) has been computed and fitted with a standardized spherical model, described by the following equations:

display math
display math

except for Nisyros where the best fitting results a standardized exponential model, described by the equation

display math

where c0 is the nugget effect, c is the sill, a is the range and h is the distance between samples [e.g., Isaaks and Srivastava, 1989]. The variogram models parameters used for each study area are reported in Figure 4.

Figure 4.

Omnidirectional experimental variograms (γ) of CO2 flux normal scores: (a) Solfatara of Pozzuoli; (b) Vesuvio cone, (c) Poggio dell'Olivo, (d) Nisyros caldera, and (e) Horseshoe Lake. Lines represent the isotropic variogram models used in the simulation procedure. The parameters c0 (nugget effect), c (sill) and a (range) refer to the variogram models (equations (1), (2) and (3)).

[19] One hundred simulations have been performed for each data set using simulation grids of different spacing between nodes: 10 m for Solfatara of Pozzuoli and Nisyros, 5 m for Horseshoe Lake and Poggio dell'Olivo and 2 m for the Vesuvio. Even if the values are simulated over a “point support,” they have been considered representatives of square cells centered on the grid nodes and with the side length equal to the grid spacing. The cell size was chosen, taking into account the spacing between the data and the dimension of the surveyed area, in order to obtain a good resolution in the mapping and, at the same time, to avoid a redundant number of cells limiting the CPU and memory requirements.

[20] The basic statistics, the cumulative distribution and the variogram of simulated values have been compared to those of the original data to assess the performance of simulation procedure in terms of reproduction of the features of the original data. In the case of Solfatara of Pozzuoli and Poggio dell'Olivo the statistics and the cumulative distribution of original data have been computed considering the declustering weights that have been used also in the simulation.

[21] The basic statistic (mean and standard deviation) of data samples is well reproduced by the simulated values (Table 3). This fitting is not obvious because there is no guarantee that the original sample statistics are exactly those of the broader population of all possible samples [Rautman and Istok, 1996]. A good match is found also between the cumulative distribution of the original data and simulated CO2 fluxes from 50 randomly selected realizations, in particular the bimodal distributions of CO2 flux values are well reproduced (Figure 3). Furthermore, the undesired smoothing effect on extreme values, produced by many interpolation algorithms (e.g., kriging), is avoided.

Table 3. Summary Statistics of Measured CO2 Flux (g m−2 d−1) and Simulated CO2 Flux
Study AreaOriginal DataSimulated Values (50 Realizations)
MeanStandard DeviationMeanStandard DeviationMaximumMinimum
  • a

    Computed from declustered data.

Vesuvio cone2037.622.12–35.7236.2–89.0556–9930.009–0.11
Poggio dell'Olivo268a985.5a218–315859–124812836–149960.10–0.16
Nisyros caldera39.7165.938.4–47.1129.7–222.46175–97150.0003–0.01
Horseshoe Lake8001407.7694–8951186–14948670–176721.2–5.7

[22] The results obtained by the simulations are influenced by the choice of upper and lower tail and by the extrapolation mode between the original data limits and upper and lower tail. For each application the lower tail has been set to 0, 0 being the natural lowest limit of soil CO2 fluxes. The upper tail has been estimated referring to the specific probability plots and roughly extrapolating the curves to the quantile corresponding to the maximum number of simulated values. A linear model has been used for the extrapolation of the data between the minimum measured value and the lower tail, while a hyperbolic model with ω set to 1.5 (ω = utpar parameter in sgsim code) has been used for the extrapolation of the data between the maximum measured value and the upper tail. Practice has shown that, for positively skewed distributions, the hyperbolic upper tail distribution with ω = 1.5 is a general-purpose model that yields acceptable results in a wide variety of applications [Goovaerts, 1997; Deutsch and Journel, 1998]. The consequence of upper and lower tail choice is that the maximum and minimum simulated values are higher and lower, respectively, than the measured values, as expected because there is a very low probability that the true maximum and minimum flux values are measured in the survey (Tables 1 and 3 and Figure 2).

[23] Figure 5 compares normal scores omnidirectional variograms of original data with those computed for 15 randomly selected realizations of each study area. There is a good agreement in the spatial continuity between the sample and the realizations variograms. The variograms of different realizations show some fluctuations (Figure 5); these fluctuations are common and are referred to “ergodic fluctuations” [Goovaerts, 1997, 1999a; Deutsch and Journel, 1998]. Ergodic fluctuations are caused by the discrepancy between the realization statistics and the corresponding model parameter [Deutsch and Journel, 1998], and their magnitude is related to the relative dimension of the simulated domain with respect to the range of the model variogram [Goovaerts, 1999a; Rautman and Istok, 1996], i.e., tends to decrease as the ratio between the size of the modeled domain and the range of the variogram increases. Deutsch and Journel [1998] suggest that ergodic fluctuations generally are minor when the size of the modeled domain is in excess of 10 times the range of correlation.

Figure 5.

Variograms of normal scores of CO2 flux: (a) Solfatara of Pozzuoli; (b) Vesuvio cone, (c) Poggio dell'Olivo, (d) Nisyros caldera, and (e) Horseshoe Lake. The variograms (γ) of normal scores of original values (dots) are compared to the variograms of normal scores of simulated values (shaded lines) of 15 realization randomly selected for each area.

3.2.1. Probabilistic Summary of a Set of Simulations: Mapping of the Diffuse Degassing Structures

[24] The different level of reproduction of sample statistics may lead one to choose the realization fitting best the imposed statistics for mapping the CO2 flux. This choice would be correct only if we consider the original sample statistics exact and unquestionable, but this is not true, hence all the realizations should be considered equiprobable [Deutsch and Journel, 1998].

[25] One of the possible post processing of the sets of realizations to define the extent of the diffuse degassing structures (DDS) consists of drawing maps of the probability that the CO2 flux at any location is above a cutoff value. The probability is computed at any location from the proportion of all simulated values above the cutoff at that location.

[26] Figure 6 gives the maps of the probability of exceeding a CO2 flux value, which was assumed to be a reasonable upper limit for a background CO2 flux. Different cutoff values have been chosen for each area from inspection of the probability plots (Figure 3) to discriminate values belonging to the background populations from those of the anomalous populations. More in detail, cutoff values of 2, 50, 20 and 100 g m−2 d−1 have been selected for Vesuvio cone, Solfatara of Pozzuoli, Poggio dell'Olivo and Horseshoe Lake, respectively. Only 10% of background samples and a number ranging from 90% to 98% of the anomalous populations fall above these cutoff values. A cutoff value of 20 g m−2 d−1, i.e., 2 times higher than the mean estimated for the background CO2 flux, has been chosen for Nisyros caldera where the absence of a clear bimodal distribution did not permit a more sophisticated analysis (see section 3.1).

Figure 6.

Probability maps of CO2 flux: (a) Vesuvio cone; (b) Solfatara of Pozzuoli, (c) Nisyros caldera, (d) Poggio dell'Olivo, and (e) Horseshoe Lake. Color scale shows the probability of CO2 flux exceeding specific threshold selected to discriminate background from anomalous CO2 degassing (see the text).

[27] The maps of Figure 6 represent the probability that each location belongs to the DDS. In general the DDS are controlled by tectonic and volcanic structures: At Solfatara of Pozzuoli, the DDS is connected to the presence of NW-SE regional trending faults and to ENE-WSW fractures generated by pressure variations within the underlying hydrothermal system [Chiodini et al., 2001], at Nisyros the DDS match the NE-SW and NNE-SSW regional faults and the ancient hydrothermal craters, and at Poggio dell'Olivo the DDS develops along a NW-SE fault [Chiodini et al., 1999]. The straight boundaries of the Horseshoe Lake DDS suggest that in this case the degassing process is controlled by tectonics, while the small Vesuvio anomaly is coincident with an ancient crater rim of the volcano.

3.2.2. Quantification of Total CO2 Output

[28] The quantitative estimation of the total amount of CO2 released from a DDS is a primary objective of our study and in particular of researches devoted to surveillance of active volcanoes. In this framework, it is impotant to quantify the uncertainty of the total CO2 output quantification, for a correct interpretation of the temporal variations. The total CO2 output is computed for each realization by summing the products of simulated value of each grid cell by the cell surface. The mean and the standard deviation of the 100 simulated values of total CO2 output, computed for the 100 realizations, are assumed to be the characteristic values of the CO2 release and of its uncertainty for each area (Table 4). Total CO2 releases vary from 0.95 t d−1, estimated for Vesuvio DDS, to 1500 t d−1 computed for Solfatara DDS, and the uncertainties are always lower than ±12% of the total CO2 output.

Table 4. Total CO2 Output and Uncertainties Derived From 100 sGs Realizations
Study AreaMean, t d−1Standard Deviation, t d−15th–95th Percentiles, t d−1
Solfatara of Pozzuoli1513.00136.001266.4–1729.7
Vesuvio cone0.950.110.77–1.14
Poggio dell'Olivo233.5027.90191.9–274.6
Nisyros caldera84.003.8279.80–89.47
Horseshoe Lake104.305.0096.20–112.99

[29] The variation of the total CO2 output computation is visually appreciable in Figure 7 where a map of the simulated “expected” values at any cell (E-type estimates, obtained through a pointwise linear average of all the realizations [Deutsch and Journel, 1998]) is shown for each area together with two simulated maps relative to the realization giving the highest and the lowest value of total CO2 output.

Figure 7.

CO2 flux maps. For each area are reported, from the left to the right, the map of mean CO2 flux obtained from a pointwise linear averaging of all 100 the realizations (E-type estimates), the map relative to the realization giving the maximum estimate of the total CO2 output, and the map relative to the realization giving the minimum estimate of the total CO2 output are shown.

[30] The total CO2 outputs, estimated by sGs method, do not differ significantly from those estimated by GSA, however there are main differences among uncertainties estimated by the two methods. For a comparison of the uncertainties, the 5th and 95th percentiles relative to the 100 simulated values of total CO2 output, are reported in Table 4. These two percentile values are the limits of the central 90% of the total CO2 output simulated values and permit an easy comparison with the central 90% confidence interval estimated by GSA. In any case, the uncertainties estimated by GSA method are much higher than those estimated by sGs method, being the 5th–95th percentile range, computed by sGs, much lower than the 90% confidence interval estimated through the GSA approach.

3.2.3. Total CO2 Output Uncertainty and Sampling Design

[31] There are many factors influencing the magnitude of the error of an estimate, e.g., the number of samples, their spatial arrangement, the nature of phenomena under study, etc. [Isaaks and Srivastava, 1989]. In this section we approach this problem and try to derive empirical criteria for the definition of a reasonable sampling design to obtain an estimation of the total CO2 output with an acceptable uncertainty. On the basis of the entire sample data, the computed total CO2 output oscillates from ±5% to ±12% in the different areas (Table 4).

[32] In order to investigate the relation between the uncertainty and number of conditioning data, several subsets with different number of samples were randomly created from the original data, without giving consideration to whether or not the sample statistics of the subsets match those of the complete data set. For each subset, 100 simulations have been performed and the mean CO2 total output and the standard deviation have been computed, as described in the previous section.

[33] In all cases, the mean total CO2 output seems to be relatively insensitive to the number of conditioning data until that number exceeds a threshold which, in each area, is lower then 50% of the available samples (Figure 8). Below this limit, variable for each area, the computed total CO2 output starts to oscillate intensively. This finding (1) confirms that our experimental data sets can be considered exhaustive and (2) suggests that in each area there exists a minimum number of samples below which the conditioning data are inadequate to describe the study field and to provide a reliable quantitative estimation of the total CO2 output.

Figure 8.

Variation of the total CO2 output estimate as a function of the number of conditioning data reported as percentage of the total sample number. The total CO2 outputs computed from each subset of data are standardized to the total CO2 output computed from the entire data set of the corresponding area (CO2OUT(subset)/CO2OUT(100%)).

[34] Figure 9a shows that the standard deviation of the mean total CO2 output is linked with a hyperbolic function to the number of measurements. However, the correlation between number of measurements and uncertainty is specific to each survey and can not be used as a general law to design sampling of new areas.

Figure 9.

(a) Uncertainty of total CO2 output versus number of conditioning data. (b) Uncertainty of total CO2 versus number of conditioning data falling in a circle with radius equal to the range of the CO2 flux variogram (CRA) and the best fitting function.

[35] A more general relation can be derived considering the number of samples falling in the area contained by a circle with radius equal to the range of the CO2 flux variogram (circle range area (CRA)) instead of the total samples number (Figure 9b). In this case the same hyperbolic function can adequately fit the data of the five different surveys, suggesting that a suitable sampling density can be derived from the range of correlation of the parameter under study. Figure 9b, for example, suggests that uncertainty lower than 10% are obtained with a sampling density that guarantees at least 90 samples in the CRA.

[36] The empirical relation derived from Figure 9b can be a useful tool for the design of a soil CO2 flux survey. For example, a suitable sampling design, to obtain a reliable estimation of the total CO2 output, can be defined after a quick, preliminary survey aimed to roughly define the range of the correlation of soil CO2 fluxes. In the absence of any information on soil CO2 flux values, an approximate estimate of the CRA and consequently of the suitable sampling density can be done considering that (1) the CRA extension is close to the spatial extent of degassing structures (Figure 10) and (2) in many cases that extent can be evaluated “a priori” on the basis of macroscopic field evidences (e.g., bare soils, focused emission of gas, dead vegetation, etc.).

Figure 10.

The dimension of the circles with radius equal to the range of the CO2 flux variograms (CRA) are compared to the probability maps of CO2. Grey scale shows the probability of CO2 flux exceeding specific threshold (the same used in Figure 6) and highlights diffuse degassing structures (DDS). The extension of the CRAs approximately matches the dimension of the DDSs.

4. Conclusions

[37] The application of the stochastic simulation methodology, sequential Gaussian simulation, to the diffuse CO2 from soils can constitute a valid approach to the definition and characterization of diffuse degassing structures, DDS, affecting active volcanoes and geothermal areas. The approach, based on the production of several equiprobable images of the CO2 flux, permits both mapping of the DDS and quantification of the total CO2 output from surveyed areas. Moreover, the production of a set of equiprobable realizations of the same phenomena allows us to quantify the uncertainty of our representation of the reality.

[38] The method has been applied to four volcanic areas (Solfatara of Pozzuoli, Nisyros caldera, Horseshoe Lake, and Vesuvio) and to one nonvolcanic area (Poggio Olivo), all of which are characterized by active CO2 degassing processes. For each data set, 100 simulations have been produced. The simulated values at unsampled locations reproduce well the univariate and bivariate statistics of the original samples, suggesting that the sGs method produces more realistic representations than the traditional estimation technique, kriging, which reproduces neither the univariate (histogram) nor the bivariate (variogram) statistics of the soil diffuse CO2 fluxes.

[39] On the basis of sGs results, different types of maps can be chosen to highlight the DDS. In this work we show the maps relative to the maximum and minimum CO2 output realizations and the map of the mean values (E-type). However, a more appropriate visualization of the DDS is obtained through probability maps. Soil CO2 fluxes are typically characterized by polymodal density distribution, generally consisting in the combination of low CO2 fluxes, connected to the biological activity in the soil (background population), and high CO2 flux populations generated by degassing processes of deeply derived CO2. On the basis of suitable threshold values, chosen from probability plots of samples data, the probability that values exceed this threshold can be mapped for each area. Because the threshold values reasonably divide background from deeply derived CO2 fluxes, these maps represent the probability of each “location” to belong to the DDS. The soil CO2 flux maps of the studied areas both define the extension DDS and emphasize a strict control of tectonic-volcanic structures on the degassing process.

[40] The total amount of CO2 released through diffuse degassing has been estimated from each simulated realization attributing to the surface of each grid cell the corresponding simulated flux value, creating a set of equiprobable values for the total CO2 output. The mean total CO2 output of the 100 realizations is assumed as the “characteristic” value for each study area. The uncertainty associated to this estimation is taken as the standard deviation of all the possible values obtained from the 100 realizations. Solfatara of Pozzuoli releases through diffuse degassing 1513 ± 136 t d−1 of CO2, Poggio dell'Olivo releases 233 ± 28 t d−1, Nisyros caldera releases 84 ± 3.82 t d−1, Horseshoe Lake releases 104 ± 5 t d−1 and the small degassing area on the Vesuvio cone flank releases 0.95 ± 0.11 t d−1. The characteristic values of simulated total CO2 output from the different areas do not differ significantly from the values previously estimated with the GSA method, but the most interesting aspect of sGs method is the assessment of the uncertainty.

[41] There are many factors influencing the magnitude of the uncertainty of an estimate, most of them are not preventable. However, to obtain a reliable estimation of the total CO2 output through sGs we can at least adopt an appropriate sampling design for the CO2 flux survey. The results of this study suggest that a suitable sampling density is related to the range of correlation of the parameter under study. On the basis of the results on the five different areas, an empirical relation between the expected uncertainty and the number of the samples which fall in the CRA (an area corresponding to the circle of radius equal to the range of correlation of the CO2 fluxes defined by the variogram) has been derived. This relation may represent a useful tool for planning future CO2 flux surveys. This paper focuses the attention of CO2 flux surveyors on the necessity of choosing tools that allows some assessment of estimation uncertainty, in order to efficiently use soil diffuse degassing measurements in the framework of volcanic surveillance.


[42] This work was financially supported by EU (GEOWARN Project IST 1999-12310), by GNV-INGV (Volcanology National Group of Italy) and by MIUR (GEOCO2 Project).