Enriching flood risk analyses with distributions of soil mechanical parameters through the statistical analysis of classification experiments

The distributions of soil mechanical parameters required for a comprehensive flood risk assessment are often taken from the scarce literature available. This article therefore presents a method to indirectly obtain the distributions from the results of often conducted classification tests. Empirical correlation terms are used for the transformation of the classification data into stability‐relevant parameters, in particular the void ratio, the soil unit weight, the friction angle and the saturated permeability. The method is applied exemplarily to a data set collected throughout Germany in the immediate vicinity of water bodies and plausible distributions are obtained for 2/3 of the 13 soil classes considered. For the validation of the results, the extension of (inter)national databases by samples of the considered soil mechanical parameters is recommended due to the current poor validation basis.


| INTRODUCTION
The availability of stochastic distributions of designrelevant parameters is a prerequisite for the quantification of flood risk according to the EU Flood Directive (2007) (Drews et al., 2019;Schwiersch & Stamm, 2020). These allow a probabilistic evaluation of the limit state equations and thereby the assessment of the reliability respectively the failure probability of flood protection structures (Ang & Tang, 2006;Baecher & Christian, 2003;Schweckendiek & Slomp, 2018) and the conditional flooding probability.
This article presents a methodology to indirectly determine distributions for selected parameters relevant to slope stability from the results of classification experiments for 13 soil classes. So far, these have often been taken from scarce literature (Bachmann et al., 2013;Möllmann, 2009;Weißmann, 2014). The dataset used for an application of the method mainly includes samples obtained in the vicinity of hydraulic engineering infrastructure because this is, after all, the relevant environment for flood protection structures. However, this equally means that the results have a limited representation for soils far from water bodies.
The addressed objective of this study is to provide distributions for a probabilistic stability assessment of flood protection structures, especially river levees.

| METHODOLOGY
The methodology presented serves the purpose for improving the determination of flood risk through the probabilistic description of soil mechanical parameters. The ideas are based on the application to newly built river levees, and therefore a known related relative density (here: I D = 0.5) is assumed. Additionally, influences such as the spatial variability, natural water content and loading history are not considered. This assumption and the outlined focus considerably limit the transferability of the method to natural conditions.

| Classification of soils for civil engineering purposes
The classification of soils is generally used to assess their usability for construction purposes. For this study, the soils were classified based on their grain size distributions and water contents by the authors according to the German standard DIN 18196 (DIN = German Institute for Standardisation). A reference to additional classification systems, such as the international (ASTM D 2487) or US-System (USCS), is provided in Table 1. Soils of one class are similar with regards to the soil composition (e.g., grain size distribution, water content), but may contain components from adjacent main soil classes. This circumstance is described by Kuzmani c and Mikoš (2020) using the example of (quartz) sand. The grain size distribution is determined by sieve and/or hydrometer analysis, recording which mass fraction M of a sample is smaller than a reference diameter d i . The index i denotes the corresponding mass percentage.
Furthermore, soils of one class should have approximately the same structural properties within their class limits. In the context of this article, the mechanical properties of a soil class are considered as random variables and are expressed by a distribution type, an expected value and a standard deviation.
Since transformation models are usually developed by regression based on a specific soil dataset, the models exhibit both bias and scatter when implemented to other data (ISSMGE-TC304, 2021a). This so-called transformation uncertainty is not explicitly stated in this study, but is only taken into account through the plausibility check.

| Void ratio
The void ratio e is a spatial quantity that defines the ratio of pore volume to grain volume. The scattering of soil particles causes the pores to be filled up by grains from the immediate vicinity. This results in a dependence of void ratio e on the grain size distribution. In general, the void ratio e of narrow-graded soils tends to be larger than the of wide-graded soils. In the case of a single-grain material with spherical grains (most likely applicable to coarsegrained soil in practice), the void ratio e has a minimum value for the densest state e min and a maximum value for the loosest state e max : e min ¼ 0:35 ≤ e ≤ 0:92 ¼ e max (Lang et al., 2011). This results from the different packing arrangements. In order to get estimates for both states, Jänke (2000) suggests two approximations (Equations 1 and 2) that depend on the grain roughness r, the uniformity coefficient C u ¼ d 60 =d 10 and the mean diameter d 50 .
The grain roughness r ranges from 1.0 for sharpedged material to zero for round material.
If the relative density of a soil I D ¼ e max À e ð Þ = e max À e min ð Þis known, the existing void ratio e can be determined from Equations (1) and (2). According to Equation (3).
e ¼ e max À I D Á e max À e min ð Þ ð 3Þ

| Soil unit weight
The mean density of a soil ρ results from the grain density ρ s , the mean density of the pore phase ρ p and the void ratio e. Under the influence of the gravity constant g, this results in the unit weight γ.
In the absence of pore water (ρ p ¼ 0 g=cm 3 ), the dry unit weight γ d results and, in return, in the case of fully saturated soil (ρ p ¼ 1 g=cm 3 Þ, the saturated unit weight γ sat . Taking into account Equations (3) and (4), a correlation term corresponding to Equation (5 emerges for the calculation of the unit weight. Using the correlations from Equations (1) and (2), the unit weight in Equation (5) can thus be estimated from information on the grain size distribution. For a single grain material with spherical grains, a dry unit weight in the range of 12.7-18.1 kN/m 3 derives from the limit values of the void ratio (Zilch et al., 2013). For the corresponding saturated unit weight the expected interval ranges from 17.5 to 20.7 kN/m 3 . These values are of course also dependent on the grain density ρ s , which plays a subordinate role due to its comparatively small variability.

| Friction angle
The friction angle of soils is dependent on the grain size distribution, grain shape and grain roughness as well as the relative density (Caquot & Kérisel, 1967;Kempfert & Raithel, 2007;Kézdi, 1969;Kolymbas, 2016;Witt, 2017). Since the grain shape is usually not measured in classification tests, only the other three parameters can be considered in the correlation terms (Equations 6 and Equation 13). Mogami and Yoshikoshi (1968) suggest the following approximation for sand, gravel and crushed material for the determination of the friction angle φ 0 f (Engel, 2002): The coefficient k is a function of the grain size distribution and the grain shape. To calculate this value, Moroto and Shimobe (1993) state the following relationships as functions of the void ratio in the densest state e min (Engel, 2002). Thus, for soils with a uniformity C U of approximately two and variable grain shape R Equation (7.1) and for soils with greater uniformity and spherical grains Equation (7.2) can be used.
Additionally, the following approach is applicable to natural soils and crushed material (Witt, 2017).
In very fine-grained soils, the cohesion (due to insufficient correlation terms between the classification data and the cohesion not considered in this study) also plays an important role for soil strength properties (Kempfert & Raithel, 2007;Lang et al., 2011;von Soos & Engel, 2017;Witt, 2017;Zilch et al., 2013).
To estimate the friction angle for cohesive soils, the unit w 0 and pulp w 1 water index are determined by using the empirical correlation (Equations 10 and 11) developed by (Ohde, 1950), which takes the liquefaction limit value w L and the plasticity index I P into account.
Using Equations (10) and (11), and the stiffness coefficient ν nc (von Soos & Engel, 2017), the friction angle can be determined for normal consolidation ϕ nc 0 and for over consolidation state ϕ oc 0 = ϕ 0 . For further information on critical state theory, please see for example, Kolymbas (2016)

| Distribution fitting and plausibility check
Based on experience from preliminary studies, normal (N), lognormal (LN) and Weibull (WB) distributions were fitted to the classification data set and soil mechanical properties using the maximum likelihood method (Hedderich & Sachs, 2016). Other extreme value distributions were not included in the study due to physical inconsistencies (e.g., no zero crossing) or insufficient fitting. Due to its distribution-free applicability, the fitting was evaluated by using the Anderson-Darling test (Hedderich & Sachs, 2016) to a significance level of 5%. If none of the distributions represent the data with sufficient accuracy (rejection of null hypothesis), the data are evaluated using empirical frequency distributions (emp). In addition to the statistical parameters (expected value E(X) and coefficient of variation COV(X)), a distribution type recommended in literature is then indicated in brackets, in the Results section. Taking into account the method's own uncertainties (see Section 4) and the uncertainties associated with documented experience, a distribution is considered plausible, if E(X) meets the bounds of the experience interval taken from (Engel, 2014) and if COV(X) is consistent with observations documented in the literature. The results considered plausible are illustrated in the tables by italicized values.

| Trans-regional data set on soil classification experiments
For the study, a data set was collected that compiles the results of soil classification experiments trans-regionally ( Figure 1). This is to smooth out local trends in a soil class and thus gain a picture of the totality of a soil class. In order to specifically include soil properties of relevance to structures near rivers (such as river levees) in the analysis, mostly (78.27%) data sets collected in the context of soil investigations for waterway infrastructure were used. These data are supplemented by results of soil investigations from the Dresden/Saxony region (21.78%). A total of 19,584 records (grain size distributions) were classified into 13 soil classes. The largest share of the total data set, 88.0% (17,234 data records), refer to non-cohesive soils. Cohesive soils account for the remaining 12.0% (2350 data records). For 1804 data sets of the cohesive soils, additionally, information on the water content is also available. At least 250 data sets are available for nine of the 13 soil classes. The smallest data volumes are available for slightly plastic silt (UL) with an amount of 103 and for widely and intermittently graded sands (SW, SI) with 106 grain size distributions.
The Box-Whisker plots shown in Figure 2 provide an overview of the raw data evaluated. For each soil class, the variability of the grain diameters is shown on the basis of four or, in the case of more comprehensive data availability, five selected reference grain fractions (d 10 , d 30 , d 50 , d 60 and d 85 ). The median values (highlighted by a circle) show a mean, empirical grain size distribution of the respective soil class. Based on these empirical grain size distributions, it can be seen that the wider a soil class is graded, the more strongly it is characterised by adjacent main soil groups (clay, silt, sand and gravel). As can be seen in the case of SE and GE, the median values of all fractions lie within their own main soil group, while for SU*, ST* and GU*, GT* silt fractions of up to 10 mass percent are found. For all soils, the interquartile range (highlighted by the boxes) is narrow which indicates peaked frequencies. The observed grain sizes of cohesive soils (UL, UMUA, TL, TM and TA) do not display any significant differences, rather they show similar characteristics. While the grain sizes of classified gravel and F I G U R E 1 Sample size for classification experiments and location in Germany, green points: Subsoil investigations for water way infrastructure, yellow point: Subsoil investigations from Dresden/Saxony, source of the map: Esri Germany sand can be found within the respective main soil groups, silt and clay cannot be distinguished in grain size due to the consistency-based classification of cohesive soils using Atterberg's test.
Occasional deviations from the rules of classification (e.g., in the proportion of silt grains in SU, ST) also show that scattering up to 30% (e.g., GU, GT) in the soil composition (and thus also in the design-relevant properties) is to be expected within a soil class.
For the data sets of cohesive soils containing information on sample water contents, the plasticity index I P = w Lw P can be determined (DIN Deutsches Institut für Normung e. V., 2020). Small values for I P indicate a small and large values in turn indicate a large water content interval for plastic soil behaviour. From the data sets, the plastic behaviour of UL and TL shows the greatest sensitivity to water (Figure 3). On average, an interval of 5% (UL) and 11% (TL) is observed. In contrast, medium F I G U R E 2 Visualisation of the raw data for the considered soil classes using box-whisker-plots (whisker multiplier equals 1.5 times the interquartile range) of the fractions d 10 , d 30 , d 50 , d 60 and d 85 . The sets shown correspond to the number of data sets evaluated for each soil class to highly plastic soils (UMUA, TM and TA) have a greater robustness. With an average of 36%, the interval for plastic soil behaviour is greatest for TA.
To model the raw data, distributions were fitted to the grain fractions d 10 , d 30 , d 50 , d 60 and d 85 for each soil class (Figure 4). The resulting probability model for the grain size enables the transformation of grain sizes into the design-relevant parameters.
In general, the distributions of each soil class lie within its own main soil group and show typical patterns corresponding to the classification. The distributions of the clays (purple) and silts (green) additionally show that the scatter of grain sizes within a fraction increases from small to large fractions. This is due to the consistencybased classification, in which the admixtures of larger grains are disregarded due to their subordinate influence on the soil properties. For example, the distributions for the particle fraction d 10 are predominantly in the range of silt grain and have steep gradient. For the particle fraction d 85 , the visibly flatter distributions are in large parts located in the range of sand grain. In contrast, in the case of sands (yellow) and gravels (orange), the smaller particle fractions (d 10 , d 30 ) are increasingly mixed with adjacent, finer main soil groups. The distributions of the d 10 fraction for sand and gravel with a lot of fines (SU*, ST* and GU*, GT*) are almost entirely in the silt grain range. For gravels, even the distributions of the d 30 fraction are still found in the range of sand grain.

| Void ratio
As shown in Table 2, with the exception of the widely graded gravels, the expected values of the void ratios range from 0.3086 (SU*, ST*) to 0.6352 (SE). The widely graded gravels have void ratios between 0.1108 (GU*, GT*) and 0.2492 (GW, GI). In general, the expected decrease in void ratio with decreasing uniformity can be observed for the non-cohesive soils. Only the results for SW, SI break this trend, which the authors attribute to the small sample size and the associated low validity. Against the background of the limiting void ratios of ideally smooth spheres named by de Marsily (1993) as a function of possible packings (loose, dense), the observed values appear plausible. This can be seen in the soil classes GE (e = 0.4984) and SE (e = 0.6352), which are closest to ideal spheres. The plausibility of the lower result values for widely graded gravel can be supported by the high fine grain contents described above. In addition, in a matrix of infinitely wide-graded material, a smaller grain can always fill the pores between the adjacent, larger grains.
In the case of cohesive soils, it can be seen that the void ratio increases with increasing plasticity. This is consistent with the idea that their typical plastic behaviour is related to a partially saturated state, where the escape of air contained in the pore space becomes more and more difficult as the pore space decreases. Furthermore, it is noticeable that in average the COV of the cohesive soils is greater than that of the non-cohesive soils.
The observed coefficients of variation are at the lower limit of documented experience intervals of 7%-30% (Uzielli et al., 2006) and 15%-30% (Lumb, 1974), but this is attributed to the initial assumption of I D = 0.5. In case of an analysis of additional relative densities, an increase of the coefficients of variation is expected.

| Dry and saturated unit weight
The soil unit weight is significantly affected by the volume of the lower density phases (air, water). Therefore, the analyses of the dry and saturated unit weight F I G U R E 3 Box-Whisker plot (whisker multiplier equals 1.5 times the interquartile range) of the plasticity index I P , μ is the expected value and COV is the coefficient of variation for I P (number of data sets: UL: 91, UM/UA: 109, TL: 506, TM: 512, TA: 586) represent lower and upper limits for a constant void ratio. Both show a strong dependence on the void ratio, so that the trends depicted in the results above are propagated in the results for the dry and saturated unit weight.
The results of the dry unit weight distributions (see Table 3) provide expected values between 15.9 kN/m 3 (GE) and 23.4 kN/m 3 (GU*, GT*). Overall, the expected values in 76.9% of the cases are around the lower empirical value of the moisture density described in the literature (tolerance ±0.5 kN/m 3 ). The expected values that do not reach the empirical value are partly above (GU*, GT*) and partly below it (SU, ST, TL). The observed  coefficients of variation for non-cohesive soils is in the range of 0.35% (SW, SI) to 1.87% (GE, SE) and for cohesive soils above (3.05% for UL-6.09% for TA). When analysing the saturated unit weight, 46.2% of the cases show an expected value that lies within the plausible interval of empirical values ±0.5 kN/m 3 (shown as italicized values in Table 3). The results deviating from the empirical values underestimate the saturated unit weight in three cases and overestimate them in four cases. No typical pattern can be detected. The COV is smaller for both non-cohesive and cohesive soils than for the dry unit weight. This indicates that there is a similar absolute deviation from the expected value in both cases, but that it is less significant for the saturated unit weight due to the generally higher values.
The observed mean values lie predominantly in the experience intervals according to (Engel, 2014) and the coefficients of variation confirm the observations of (Lee et al., 1983;Lumb, 1974;Uzielli et al., 2006). Overall, the results for the investigation of the dry and saturated unit weight can be regarded as plausible. Since the results of the void ratio are incorporated into the determination of the dry and saturated unit weight, the distribution types of the dry and saturated unit weight agree almost completely (exceptions: TL, TA) with those of the void ratio.

| Friction angle
The results show that the expected values of the friction angle lie within the empirical interval for eight of 13 soil classes. Due to the superposition of the input distributions (fractions d 10 , d 50 and d 60 as well as the void ratio e), the results show different distribution types. Thus, normal, lognormal and Weibull distributions as well as empirical distributions are used for the parameter modelling. The results show variation coefficients between 0.66% and 17.7%. These reflect the expected coefficients of variation of 2%-15% for non-cohesive (ISSMGE-TC304, 2021b;Lee et al., 1983;Lumb, 1974;Uzielli et al., 2006) and 12%-56% for cohesive soils (Lee et al., 1983;ISSMGE-TC304, 2021b). Therefore, the majority of the results are considered plausible (shown as italicized values in Table 4).

| Saturated permeability
Based on the expected values of the result distributions (Table 5), a decrease of the saturated permeability can be seen both with decreasing uniformity (non-cohesive soils) and with increasing plasticity (cohesive soils). Only the result for SW, SI deviates from this trend, with an expected value of 3.61 Â 10 À4 m/s equal to the permeability of SE (3.27 Â 10 À4 m/s). The coefficients of variation, which are expected at 200%-300% according to Lee et al. (1983) and Lumb (1974), show values between 73% (SW, SI) and 1275% (TM). There is no discernible trend within the individual soil classes. When comparing the expected values with the experience interval used for the validation (Engel, 2014), the results of 69% of the cases are within the interval limits. The results of SU, ST and T A B L E 4 Soil-group-related evaluation of the friction angle in ( ) (assuming I D = 0.5) of the clays do not agree with the experience according to (Engel, 2014). Furthermore, the saturated permeability for all soil classes can be described using lognormal and Weibull distributions. A modelling of the saturated permeability by means of lognormal distribution is recommended by (Baecher & Christian, 2003;Benson, 1993;de Marsily, 1993;Uzielli et al., 2006). Table 6 provides an overview of which of the result distributions are considered plausible (1) or implausible (À). It can be seen that for more than half of the soil classes, distributions can be confirmed by comparison with experience intervals and literature recommendations for at least four of the five parameters considered. At the same time, plausible distributions can be obtained for the void ratio, dry unit weight and saturated permeability for at least 10 soil types.

| DISCUSSION
Since the result distributions of this study are intended to improve the probabilistic stability assessment of river levees and thus ultimately flood risk management, specifically a data basis on the results of classification tests collected close to water bodies was evaluated. The information on grain sizes used is all influenced by water-dependent, geomorphological processes, so that characteristics of soils far from water bodies are not taken into account.
The statistical analysis of the classification results allows the description of the trans-regional natural variability of grain sizes and water contents for 13 soil classes, in which it was found that silts and clays can hardly or not at all be distinguished in grain size. As a consequence, the results of both soil groups are similar, which, however, does not have to be an accurate representation of reality. Due to the large sample size, the aleatory uncertainty (natural variability) is represented in the results. Since the subsequent quantification of epistemic uncertainty (impossibility of exact description) is not possible on the T A B L E 5 Soil-group-related evaluation of the saturated permeability (assuming I D = 0.5) basis of the original data set, it is neglected. By using transformation models in the form of correlation terms, an additional source of epistemic uncertainty is added to the analysis, whose quantification can be conducted according to (ISSMGE-TC304, 2021b), but requires a calibration dataset which is not present within this study.

Soil class Approach
Although there is some information in the literature on the statistical properties of soils (Baecher & Christian, 2003;Benson, 1993;Lumb, 1974), it is mostly vague and characterised by the specification of intervals of the expected value or the coefficient of variation. The mostly small values for the COV result from the investigation of only one relative density (I D = 0.5). Since databases such as those compiled in 304db database compilation (ISSMGE-TC304, 2021b) contain no to poor information regarding the parameters considered in this study, the results can only be considered plausible, while validation is still pending.

| CONCLUSION
In this article, a method is presented to determine soil mechanical parameter distributions by indirect statistical deduction-as they are needed for reliability analyses in flood risk management. The method is based on classification test results and requires an initial assumption on the relative density I D .
The plausible parameter distributions presented for about two thirds of the soil classes enable the probabilistic stability assessment of river levees (e.g., slope stability) and thus improve the basis for flood risk assessment. In the context of further research, it is now necessary, on the one hand, to create a validation basis by supplementing (inter)national databases with information on void ratio, soil weight, friction angle and saturated permeability and, on the other hand, to quantify the epistemic uncertainty associated with transformation models.