New Insights Into the Relationship Between Mass Eruption Rate and Volcanic Column Height Based On the IVESPA Data Set

Rapid and simple estimation of the mass eruption rate (MER) from column height is essential for real‐time volcanic hazard management and reconstruction of past explosive eruptions. Using 134 eruptive events from the new Independent Volcanic Eruption Source Parameter Archive (IVESPA, v1.0), we explore empirical MER‐height relationships for four measures of column height: spreading level, sulfur dioxide height, and top height from direct observations and as reconstructed from deposits. These relationships show significant differences and highlight limitations of empirical models currently used in operational and research applications. The roles of atmospheric stratification, wind, and humidity remain challenging to detect across the wide range of eruptive conditions spanned in IVESPA, ultimately resulting in empirical relationships outperforming analytical models that account for atmospheric conditions. This finding highlights challenges in constraining the MER‐height relation using heterogeneous observations and empirical models, which reinforces the need for improved eruption source parameter data sets and physics‐based models.

real-time (e.g., Bear-Crozier et al., 2020;Freret-Lorgeril et al., 2021;Mereu et al., 2022Mereu et al., , 2023, but these pioneering applications are either not operational or limited to a few of the world's best-monitored volcanoes (e.g., Etna volcano, Italy). Therefore, computationally inexpensive empirical scaling relationships and one-dimensional (1D) eruptive column models remain the most common tools to estimate MER based on observed height. The scaling models are particularly widely applied owing to their simplicity. The canonical scaling model is an empirical power law relationship between MER and column height (Mastin et al., 2009;Morton et al., 1956;Sparks, 1986;Wilson et al., 1978). Development of these empirical relationshipsand eruptive column models in general (1D, 3D)-is limited by data sets with a narrow range of eruptive and atmospheric parameters, absent or sparse information on uncertainty, and the use of dependent data, for example, when MER is derived from height. To address these issues, the Commission on Tephra Hazard Modeling of the International Association of Volcanology and Chemistry of the Earth's Interior (IAVCEI) developed the Independent Volcanic Eruption Source Parameter Archive (IVESPA, Aubry et al., 2021). Here, we use IVESPA to explore new empirical relationships between MER and the height of both tephra and SO 2 eruption columns and compare these results with analytical scaling models that account for atmospheric conditions.

Overview of IVESPA
We use version 1.0 of IVESPA (http://www.ivespa.co.uk/, Aubry et al., 2021). It contains 134 eruptive events, that is, eruption or eruption phases for which we have estimates of tephra fall deposit mass, duration, atmospheric conditions, and column height. Using the classification of Bonadonna and Costa (2013), 111 events are small-moderate, 18 are sub-Plinian and 5 are Plinian. IVESPA uses the following height metrics (see Figure 2 in Aubry et al., 2021): • H top , the height of the top of the tephra column, available for 130 events • H spr , the spreading height of the tephra cloud, available for 41 events • so 2 , the height of SO 2 injection, available for 28 events.
Height values provided in km above sea level (a.s.l.) in IVESPA have been converted to km above vent level (a.v.l.) for this study. The measurement techniques used to estimate column heights (e.g., ground based radar, satellite or visual observations) are reported although a single best estimate based on all available measurements is provided. Estimates of column heights, tephra mass, and duration are independent, for example, tephra mass was not estimated by inverting information from column height. We define the MER as the mass of tephra fallout (excluding pyroclastic density currents) derived from mapping the deposits and empirical fitting of the thinning trends (e.g., Bonadonna & Costa, 2012), divided by the eruptive event duration. MER is thus a time-averaged value, and we denote it MER . For consistency, IVESPA provides observed column height from the published record that are also representative of a time-averaged value, denoted by top , spr and SO 2 . Most eruptive events do not have continuous time series of height, resulting in challenges in estimating the true time-averaged value (Aubry et al., 2021).
IVESPA parameters are assigned uncertainties representative of a 95% confidence level (Aubry et al., 2021). Both the best estimates and uncertainties are assigned an interpretation flag value between 0 (no interpretation) and 2 (heavy interpretation of the data source(s)). Events where parameters could not be estimated based on information in the literature were not included in the data set. Atmospheric profiles from two climate reanalyzes are provided and are time-averaged over each event duration. IVESPA also contains vertically averaged (between the vent and top ) values of the horizontal wind speed ( ) and stratification (Brunt-Väisälä frequency, ). The mean value from both atmospheric reanalyzes is used as the best estimate, and their difference (halved) as the uncertainty. Table S1 contains all parameters used in this study and their calculation is detailed in Supporting Information S1 unless directly provided in IVESPA.

Methodology
New empirical power law models were defined between top , spr , SO 2 and H iso,top and mass eruption rate MER in MATLAB™ using a non-linear least-squares fit procedure. We provide each model's: (a) confidence interval, reflecting the uncertainty on the fitted parameters and used to test if two models are significantly different; and (b) prediction intervals, reflecting the uncertainty on predictions based on both the uncertainty on the model parameters and the model error. All uncertainties and intervals are at the 95% confidence level. Information on all new empirical models is provided in Table S3.
Unlike empirical relationships, several analytical (derived from buoyant plume theory) scaling models explicitly account for atmospheric conditions (e.g., and ). We use IVESPA to evaluate five of these models (Aubry et al., 2017;Degruyter & Bonadonna, 2012;Hewett et al., 1971;Morton et al., 1956;Woodhouse et al., 2013; see Table 1 and Text S3 in Supporting Information S1), as well as the empirical relationship from Mastin et al. (2009). We use the adjusted coefficient of determination ( 2 adj , penalizing models with more independent variables) to compare model performance. Our key results (Table 1) are unchanged when using the bias-corrected Akaike Information Criterion instead (Table S4 in Supporting Information S1), which is generally more appropriate than 2 adj for non-linear models (Spiess & Neumeyer, 2010). To account for eruption source parameter uncertainties and IVESPA biases, we repeated model evaluation with different set of weights applied to IVESPA events (Section 4.2, Table 1; weight expressions in Text S4 in Supporting Information S1): • "Eruption" (column 4, Table 1): The same weight is given to each eruption in IVESPA reducing the influence of eruptions with numerous events (e.g., 18 events for the [1989][1990] Redoubt eruption) • "Uncertainty" (column 5, Table 1): Weights are inversely proportional to the uncertainty on the observed and predicted top values for each event. • "Interpretation Flag" (column 6, Table 1): Less weight is given to events that required heavy interpretation of the literature to attribute top and MER values. • "All" (column 7, Table 1): The weight for each event is proportional to the product of the weights described above in columns 4-6 to account for all three factors.

Empirical Scaling Relationships Specific to Different Column Height Metrics
SO 2 (c) and H iso,top (d) relate to MER , and corresponding empirical power law relationships.
MER values in IVESPA range from 2 × 10 1 -2 × 10 8 kg s −1 (median: 1.6 × 10 6 kg s −1 ), which is a larger range with a higher proportion of low-intensity events compared to previous studies (e.g., Mastin et al., 2009: MER range of 6 × 10 3 -2 × 10 8 kg s −1 with median of 10 7 kg s −1 ). Defining MER using the total mass of tephra (i.e., including pyroclastic density current contributions instead of fallout only) results in lower R 2 ( Figure S1 in Supporting Information S1). For the MER − top fit constrained by 130 events, we find best-fit relationships between the MER in kg s −1 and top in km above vent level (a.v.l.) of: with the MER as independent variable, and with top as the independent variable and using a log-linear fit. Parameters in Equation 1 are most sensitive to events with high MER values ( Figure S2 in Supporting Information S1). Best fits for all other types of height are provided in Figures 1e and 1f and Table S3, which aims to facilitate use of the new empirical fits, in particular by Volcanic Ash Advisory Centers (VAACs) and Volcano Observatories (VOs). Log-linear fits obtained using any   Figure S3 in Supporting Information S1). This is consistent with the expectation that isopleth-based height reflects an upper bound of the top height, whereas IVESPA top column heights aim to reflect a time-averaged value. In addition, the method of Carey and Sparks (1986) tends to overestimate plume height for eruptions affected by strong wind (Rossi et al., 2019). Unsurprisingly, predicted spr tends to be lower than predicted top , with the average spr ∕ top ratio of 0.76 in IVESPA matching exactly that predicted by Morton et al. (1956) for buoyant plumes rising in quiescent environments ( Figure S3 in Supporting Information S1). Predicted top and SO 2 are generally not significantly different (average SO 2 / top ratio is 0.97 in IVESPA, Figure S3 in Supporting Information S1) and in the absence of other information, top is an adequate metric for SO 2 injection height in gas dispersion and climate modeling studies.
The widely used empirical scaling of Mastin et al. (2009) compares best with our top fit, although it is closer to our H iso,top fit at high MERs for the power law fit (Figure 1e). This finding is unsurprising as although the plume SO 2 (c) and H iso,top (d) as a function of MER . Panels (e) (power law fit, MER as independent variable) and (f) (log linear fit, height as independent variable) compare the new empirical relationships for the four heights considered, along with relationships from Mastin et al. (2009) and Wilson and Walker (1987). Select events on panels (a)-(d) are labeled using their Independent Volcanic Eruption Source Parameter Archive (IVESPA) identifiers (see Table S1 for full details), for example, PIN1991_02 is phase 2 of the 1991 eruption of Mount Pinatubo.
10.1029/2022GL102633 6 of 12 height type is unspecified in Mastin et al. (2009), most heights in the literature reflect top height values, and Mastin et al. (2009) included isopleth-based heights in their compilation (unlike IVESPA). There are statistically significant differences between Mastin et al. (2009) and our new top height fits (up to 15% for predicted top and up to 0.6 for predicted log(MER ), that is, a factor of 4 for MER ). The relative root mean squared error (RMSE) on top (predicted from MER ) is 53% for Equation 1, 57% for Equation 2 and 60% for Mastin et al. (2009) (Figure S4a in Supporting Information S1). When using these relationships and observed top to invert for MER , duration or tephra fallout mass ( Figures S4b-S4d in Supporting Information S1), the RMSE on the predicted log(MER ) is 0.81 for Equation 1, 0.76 for Equation 2 and 0.80 for Mastin et al. (2009). The new empirical relationships for top (Equations 1 and 2) are thus broadly consistent with Mastin et al. (2009). However, we show that the optimal parameter values of empirical scaling relationships and corresponding predictions differ greatly depending on the height metric (i.e., top , spr , SO 2 or H iso,top ).  (Morton et al., 1956). The analytical scaling models have 2 adj values between 0.32 and 0.52, much smaller than the empirical power law.

Accounting for Atmospheric Conditions Using Analytical Scaling Models
This result is not explained by the use of the full IVESPA data set to both calibrate and test models (including the empirical power-law). If we split IVESPA into distinct calibration and evaluation data sets, we find a 99.6% probability that the 2 adj of the empirical power law exceeds that of any analytical scalings (Table S5 in Supporting Information S1).
The poor performance of analytical scaling relationships could be explained by poorly constrained parameter values in IVESPA, or the fact that multiple events from one eruption dominate the database. We find that weighting the data (Table 1, column 4-7) to account for these factors does not change the main results: (a) the empirical power law fit between top and MER outperforms (higher 2 adj ) the analytical models; and (b) the best-performing model is the empirical power law that includes and terms, with a positive (unphysical) exponent for . When weighting the eruptive events by parameter uncertainty, the performance of all scaling models improves, with greater improvement among the analytical models accounting for atmospheric conditions. For example, the difference in 2 adj values between the power-law fit and the best analytical scaling (Degruyter & Bonadonna, 2012) when applying all weighting procedures is 0.06, whereas it is 0.19 unweighted. For the power law fit, the MER exponent varies between 0.21 and 0.25 depending on the weighting procedures applied and is thus robust. However, fit parameters of analytical models are very sensitive to the weighting. For example, the calibrated value of entrainment coefficient ratio β/α in the Aubry et al. (2017) scaling model ranges between −0.43 (an unphysical value) and 4.4. Laboratory studies suggest that the ratio of β/α should be 0.6-20 (Aubry & Jellinek, 2018, and references therein).

Influence of Atmospheric Conditions
Using 25 eruptive events, Mastin (2014) demonstrated that a 1D plume model accounting for atmospheric conditions was not as good as an empirical power-law in predicting MER from column height. Despite having improved data compilation methodologies and over five times more events in IVESPA, we reach similar conclusions as the simple MER -top empirical power law outperforms analytical scaling models accounting for atmospheric conditions (Table 1) Table 1 even suggests that top increases with . These results contradict theoretical and experimental evidence that top should decrease in a more strongly stratified atmosphere (e.g., Morton et al., 1956;Woods, 1988), and cause the poor performance of analytical models in which top is proportional to −0.75 . One potential explanation is that generally increases with altitude ( Figure  S5a in Supporting Information S1) and in turn with top and MER . If is normalized for each event by the value obtained from the average atmospheric profile across IVESPA (which removes the dependence of on vent and column altitude), it becomes negatively although insignificantly correlated with std top ( Figure S6a in Supporting Information S1). Figure 2b shows that std top decreases with stronger horizontal wind speed , as expected from laboratory experiments (e.g., Carazzo et al., 2014;Hewett et al., 1971) and a few well-observed eruptions (e.g., Dürig et al., 2022;Poulidis et al., 2019), but that the two variables are not significantly correlated. We also do not detect any influence of relative humidity (Figure 3c), despite model predictions that the atmospheric water vapor entrained into a volcanic plume and the associated latent heat flux should boost top by over 5 km for small-moderate eruptions in a wet tropical atmosphere (e.g., Glaze et al., 1997;Herzog et al., 1998;Woods, 1993).
Although several studies have noted that tropical volcanic plumes commonly reach the tropopause (e.g., Carboni et al., 2016;Tupper & Wunderman, 2009), without any constraint on MER as in this study, the role of humidity can only be speculated. Removing the influence of altitude on and relative humidity ( Figure S5 in Supporting Information S1) only marginally increases their apparent influence on std top ( Figure S6 in Supporting Information S1).
Last, we tested the influence of volcanic plume morphology (i.e., weak, bent-over and spreading downwind only, vs. strong, spreading both upwind and downwind). This parameter is constrained using direct observations (e.g., pictures, infrared cameras) for only 44 events in IVESPA, so we complement it by calculating  (Table 1). for each event. Π is a non-dimensional parameter defined by the ratio of the wind entrainment and plume rise timescales (Degruyter & Bonadonna, 2012) and has been shown to relate to the plume morphology for a handful of eruptions (e.g., Dürig et al., 2023;Scollo et al., 2019). We use α = 0.1 and β = 0.5 in Equation 3, consistent with latest entrainment coefficient estimates (e.g., Aubry & Jellinek, 2018;Michaud-Dubuy et al., 2020). Values of Π in IVESPA range from 0.02 to 1.1 with weak plumes associated with lower values. Both types of plumes are found for 0.03 < Π < 0.35 (Figure 2d and Figure S3 in Supporting Information S1), suggesting a transition from weak to strong plumes at a critical value of Π ≈ 0.1. This value is considerably lower than that originally proposed in  for the 2011 Cordon Caulle eruption, Chile (Π ≈ 10) but is in agreement with Π = 0.5 as found in recent studies for Etna (Scollo et al., 2019) and Eyjafjallajökull (Dürig et al., 2023). Despite the absence of any clear relationship between std top and Π in Figure 2d, the statistically significant correlation hints to a small but discernible influence of the plume morphology on the top -MER relationship. Figure 3a shows the distribution of std top for 10 geographical regions. Across these regions, the median std top varies between 0.72 and 1.24, that is, the median observed top differs from the median value predicted using Equation 1 by −28% (Redoubt) to +24% (Central America). The distributions of std top for these two regions significantly differ compared to all other regions. Regional differences might reflect a range of factors including atmospheric conditions, the prevalence of certain magma or edifice types, or the prevalence of island volcanoes with limited deposition on land and low bias on the tephra fallout mass and MER . Even when subdivided into 10 geographical areas, most still contain 10-24 events. We can thus calibrate region or volcano-specific top -MER relationships and show select examples in Figure 3c. std top estimated from satellite-only measurements or a combination of satellite and ground-based instrumental measurements (e.g., radar) tend to be higher than for other measurement techniques (p-value < 0.1), consistent with Tupper and Wunderman (2009). In contrast, when visual measurements were used alone or in combination with satellite imagery, std top tends to be lower (p-value < 0.15). Figure 3d shows that bespoke top -MER relationships for these two categories (satellite vs. visual) differ at most MER values. The dependence of std top on other parameters was explored with examples for duration, median grain size and tropopause height shown in Figures S7 and S8 in Supporting Information S1. The 17 events with a duration smaller than 10 times the plume rise timescale tend to have smaller std top ( Figure S7a in Supporting Information S1) but giving these short-duration events less weights does not change Table 1 results (not shown).

Influence of Location and Column Height Measurement Technique
Last, among all sub-categories shown in Figure 3, the correlation coefficient between log( std top ) and log( ) is only significant for the subgroup of satellite and ground-based top measurement (r = −0.87). This emphasizes the difficulty of detecting atmospheric influence on the top -MER relationship in IVESPA v1.0.

Future Eruption Data Requirements and IVESPA Developments
The challenging detection of atmospheric influence on the MER-height relationship in IVESPA v1.0 may be due to the use of simple scaling (0D) models, and future studies could investigate application of more sophisticated eruptive column models (1D, 3D) or data analysis techniques (e.g., machine learning) to IVESPA. However, our study hints at developments of IVESPA, and eruptive data more generally, that will help build a better understanding of the relationship between MER, column height and atmospheric conditions. First, Figure 3b shows that future versions of IVESPA should explicitly separate column heights according to measurement type. Second, Figure 3a and other studies suggest that compiling information such as magma composition or type (e.g., Trancoso et al., 2022) and conduit information (e.g., Gouhier et al., 2019) would help constrain other factors modulating the relationship between height and MER. Whether interpolation of large-scale reanalysis data sets to infer local atmospheric profile is adequate for plume modeling could be further tested. Last, the use of time-averaged eruption source parameters might prevent the detection of atmospheric influence on plume dynamics in a database with such a variety of eruptions. Advances in near real-time measurements of MER (Bear-Crozier et al., 2020;Caudron et al., 2015;Freret-Lorgeril et al., 2018Mereu et al., 2022Mereu et al., , 2023 are critical to build time-dependencies in global databases like IVESPA. These database development should aid understanding of the dynamics of volcanic plumes, and in particular detect and model the influence of atmospheric conditions (Section 5.1).

Conclusions
We used the new Independent Volcanic Eruption Source Parameter Archive (IVESPA, Aubry et al., 2021) to explore the empirical power law relationship linking column height to MER. A key improvement over previous work is that our new relationships are specific to the type of column height considered, that is, the height of the SO 2 cloud ( SO 2 ), the spreading height of the tephra cloud ( spr ), and the top height of the ash cloud directly measured ( top ) or derived from the distribution of the largest clasts (H iso,top ) with significant differences among these four metrics (Figure 1 and Figure S3 in Supporting Information S1). We provide a summary of all newly constrained empirical relationships, their uncertainties and look-up tables to facilitate application by a wide range of users including VAACs or VOs (Table S3). The newly calibrated power law relationship between top and MER (Equation 1) still results in discrepancies of 50% for predicted top , and a factor of ∼6 for predicted MER . However, it still outperforms analytical scaling models accounting for atmospheric wind and stratification.
It is important to note that empirical relationships will always provide information within the range of eruptions considered and will evolve depending on the data set considered, while analytical models are designed to be of wider application. Discrepancies in the analytical scaling shown in our analysis might be related to both uncertainty in the data and uncertainty in the existing models. More work is required to better interpret these results and provide more accurate models. Further improvements to IVESPA might be needed to detect atmospheric influences on plume dynamics, but we may simply be identifying an inherent limitation in the accuracy with which secondary controls on plume dynamics can be captured in global databases using time-averaged plume heights or erupted mass by deposit mapping.

Data Availability Statement
All data and MATLAB™ scripts used in this study is available from a publicly available repository at https:// doi.org/10.5281/zenodo.8085934, with scripts also available from GitHub at https://github.com/thomasaubry/ IVESPA_GRL2023_scripts. The core data used is from the Independent Volcanic Eruption Source Parameter Archive (IVESPA) Version 1.0 at http://ivespa.co.uk/data.html. IVESPA is curated by the IAVCEI Commission on Tephra Hazard Modeling and supported by the British Geological Survey.