Less disagreement, better forecasts: Adjusted risk measures in the energy futures market

This paper develops a generic adjustment framework to improve in the market risk forecasts of diverse risk forecasting models, which indicates the degree to which risk is under ‐ and overestimated. In the context of the energy commodity market, a market in which tail risk management is of crucial importance, the empirical analysis shows that after this adjustment framework is applied, the forecasting performance of various risk models generally improves, as verified by a battery of backtesting methods. Additionally, our method also lessens the risk model disagreement among post ‐ adjusted risk forecasts.


| INTRODUCTION
Interest in understanding the risks related to large price movements in the energy futures market has increased following the large influx of financial investors in this market after the financialization of the commodity market (Čech & Baruník, 2019;Qiao & Han, 2023;Xu & Lien, 2020).Recently, Ding et al. (2021) utilize artificial intelligence techniques to extend traditional financial econometric models for volatility risk forecasting allowing for liquidity effects.This highlights that the frequent sharp drops in prices and highly leveraged margin trading in the energy futures market, however, have made it challenging for energy producers, index investors, speculators, exchanges, and commodity market regulators to accurately evaluate the risks associated with potential extreme price movements.The accuracy of such risk forecasts constitutes a major concern in effective risk management practices.Risk measurement procedures often involve specification error in the underlying dynamic processes and estimation error in the model parameters-a concern invoked by numerous studies on risk estimation and forecast evaluation (for a comprehensive overview, see Christoffersen, 2012).In practice, then, when assessing market risk, relying on a single risk forecasting model across different market conditions and for various assets is implausible. 1 Lazar and Zhang (2019) show that commodity risk measures carry a higher model misspecification risk among other assets, because of the stylized facts of the commodity returns (heavy-tailedness and negative skewness).This difficulty has made risk managers and regulators more aware of the forecasting ability of particular risk models.
Another challenge in accurately evaluating risk is that financial institutions are required to allocate risk capital to shield themselves from unexpected negative shocks.Here, the amount of risk capital is determined by the risk model being used.However, the use of diverse risk models results in risk disagreement, leading to varying degrees of capital buffers.As the risk numbers given by diverse risk models are typically different, as shown in Danielsson et al. (2016), a regulatory arbitrage opportunity arises; that is, financial institutions can report less risk to regulators with respect to inadequate risk models to reduce their capital requirements (Liu & Stentoft, 2021).Hence, the model disagreement between numerous risk models makes risk management inefficient in allocating risk capital to cover potential losses quantified by risk measures.This inefficiency is even pronounced under Basel III (Basel Committee on Banking Supervision, 2019), as pointed out by Liu and Stentoft (2021).As such, regulators are incentivized to reduce the model disagreement.For example, the Basel III reforms (Basel Committee on Banking Supervision, 2021), also known as Basel IV, published revisions to the market risk capital requirements, showing more tendency towards the standardized approach, away from the internal model approach.Though the standardized approach is praised for its simplicity and consistency, it fails to allow for specific characteristics of individual institutions.In contrast, the internal model approach can overcome this issue.Motivated by these findings, our proposed adjustment methodology is aimed at not only providing adequate risk forecasts but also mitigating the market risk model disagreement while embracing the diversity in risk models.
To determine extreme risk exposures of an investment position in times of stress, standard risk measures are typically used.These measures are value-at-risk (VaR) and expected shortfall (ES) (Basel Committee on Banking Supervision, 2019): VaR measures the worst possible loss of an investment within a given probability level, and ES quantifies the average loss exceeding the VaR threshold.Our proposed methodology, an internal model approach, is built on the seminal work of Fissler and Ziegel (2016), which identifies the joint elicitability of VaR and ES measures.This desirable elicitability property enables the optimal risk estimates to be uniquely obtained by minimizing the average loss for a given loss function within the family of loss functions proposed by Fissler and Ziegel (2016) (hereafter, FZ loss function for simplicity).By exploiting this property, we are able to improve on the ex ante risk forecasts of risk models and reduce the model disagreement towards a recognized objective (i.e., the minimization of average loss over a time period).Specifically, given the time series of pairs of ex ante VaR and ES forecasts of a risk model at a predefined probability level and the associated return data, within a rolling-window scheme we obtain the time-varying adjustments for VaR and ES, respectively, via the minimization of average loss.If the value of the adjustment multiplier is smaller (resp.larger) than 1, this suggests that the risk forecast is overestimated (resp.underestimated).This adjustment methodology is applicable to different risk forecasting models, unlike the studies of Farkas et al. (2020) and Patra (2021), which modify risk estimates restricted to a certain model.In addition, it gives a clear indication of the inaccuracies of risk forecasts, rather than relying on the relative model performance within the model set (Barrieu & Scandolo, 2015;Danielsson et al., 2016).
Empirically, we test this proposed methodology by applying it to a battery of commonly used risk forecasting models for several futures in the energy futures market. 2Our proposed adjustment methodology identifies the under-or over-estimated risk forecasts generated by these risk models.After making adjustments to the VaR and ES estimates, the forecasting ability of risk models generally improves, as verified via various VaR and ES backtesting methods.Additionally, we investigate eight energy crisis periods and the margin-level change date that signals a high-volatility market regime and find an abatement in risk disagreement among post-adjusted risk forecasts.
Our paper mainly relates to three strands of existing literature.First, this study focuses on the accuracy of risk measures.In general, the adequacy of VaR and ES forecasts is affected by different sources: (1) the model misspecification involving inappropriate assumptions about true data-generating processes that may lead to the choice of inadequate risk models (see Danielsson et al., 2016;Liu & Stentoft, 2021;Patra, 2021); and (2) the estimation errors in model parameters, as shown in Pitera and Schmidt (2018) and Farkas et al. (2020).This inevitably leads to varying risk predictions depending on which risk estimation methodology is used.In practice, divergence in the risk forecasts of different risk models can induce speculative institutional investors to use models that produce inferior forecasts, with the aim of mitigating the risk capital required by regulators.Further, without knowing the statistical form of the underlying asset behavior, capturing model misspecification and parameter estimation errors is problematic.To address these issues, using the newly proposed methodology, we provide adjustments to quantify the extent to which one risk model underestimates or overestimates risk.Thus, we use these adjustments to build better risk estimates, which we subsequently evaluate with various backtesting methods.Our proposed methodology is a practical tool for making adjustments to the ex ante VaR and ES forecasts of a given model, and it facilitates the refinement of internal risk models for risk managers.It also helps to align the model performance across various models, thus disincentivizing financial institutions from reporting inadequate risk numbers to the regulatory authority.
Second, this paper relates to the abundant literature on market risk forecasting models.The VaR and ES forecasting methodologies can be classified into three main categories (Engle & Manganelli, 2004;Taylor, 2008): nonparametric, parametric, and semiparametric.Nonparametric approaches (see, e.g., Chen, 2007) rarely rely on assumptions about the conditional distribution of asset returns.Within this framework, VaR and ES are treated as quantiles of a selected sample of returns over a specific window at a given significance level.Nonparametric methods are model-free and easy to implement, but they are often criticized because of their sensitivity to window size selection.Conversely, before applying parametric models for VaR and ES forecasting (e.g., Escanciano & Olmo, 2010), we make a specific assumption on the asset returns distribution.The selection of the distribution function can cause differences in the forecasting performance of VaR and ES.Semiparametric models impose a parametric structure on the dynamics of VaR and ES but require no assumptions on the conditional distributions of financial returns (Engle & Manganelli, 2004;Patton et al., 2019).The parameters of semiparametric models are estimated by minimizing a specified loss (scoring) function instead of maximizing a distributional likelihood function described in the parametric techniques. 3The forecasting accuracy of these risk models depends heavily on the data.Parsimonious models can easily outperform other benchmarks in a stable market, but they are often less efficient than highly sophisticated models during a turbulent period.While a myriad of studies propose VaR and ES models to fit the market data thoroughly, practitioners and academics prefer a limited number of them.Thus, in this study, we consider a group of commonly used VaR and ES models covering the three categories mentioned above. 4Besides the risk models considered in this paper, our proposed methodology can also be applied to several sophisticated machine learning models incorporating liquidity and micro-structure information, for example, the LIQ-GARCH model in a genetic programming framework, proposed by Ding et al. (2019).
Last, our study also relates to the line of research on statistical backtesting methods, which are necessary to validate risk forecasting models.To thoroughly evaluate whether the forecasting accuracy of post-adjusted VaR and ES forecasts based on our methodology is improved, we consider traditional and comparative backtesting; the former is designed to directly test the forecasting ability of a given model, whereas the latter focuses on model performance comparisons among models.With respect to traditional backtesting, we adopt a series of commonly used and recently proposed traditional backtesting methods tailored to different desirable criteria.Specifically, the unconditional coverage (UC) test (Kupiec, 1995), the conditional coverage (CC) test (Christoffersen, 1998), and the dynamic quantile (DQ) test (Engle & Manganelli, 2004) are used to backtest VaR; the exceedance residual (ER) test (McNeil & Frey, 2000), the conditional calibration (CCA) test (Nolde & Ziegel, 2017), and the regression-based expected shortfall (ESR) test (Bayer & Dimitriadis, 2022) are used to backtest ES.With respect to comparative backtesting, we use the well-known Diebold-Mariano (Diebold & Mariano, 2002) and model confidence set tests (Hansen et al., 2011) to make model comparisons based on the joint FZ loss of VaR and ES.
The paper is structured as follows: Section 2 briefly introduces the proposed methodology to adjust risk forecasts based on the FZ loss function; Section 3 discusses the forecasting approaches for VaR and ES; the empirical data are presented in Section 4; Section 5 compares the risk model disagreement of pre-and post-adjusted VaR and ES forecasts and validates the efficiency of this adjustment methodology via various backtesting methods; and Section 6 concludes.

| ADJUSTMENT METHODOLOGY FOR RISK FORECASTS
The theoretical property of the FZ loss function that optimal VaR and ES forecasts can result in the lowest average loss has motivated the applications of FZ loss functions in risk estimation and evaluation (Patton et al., 2019;Nolde & Ziegel, 2017).In this section, we propose a generic and empirical approach to jointly improve the accuracy of VaR and ES forecasts built by any given risk forecasting model via minimizing the FZ loss function.
Here, we would like to adjust risk forecasts generated by a risk model using past realized observations and 1-day ahead risk forecasts produced by the same model.A general setting is constructed to illustrate our methodology.Let r t denote the asset return on day t, and correspondingly +1 , in which α will be suppressed for brevity, represent the pairs of 1-day-ahead VaR and ES forecasts for day t + 1 based on the information available until t at α level. 5Throughout this paper, the signs of VaR and ES are considered negative, which indicates the potential loss occurring in the left tail of the return distributions, following Nolde & Ziegel (2017).The pair of risk forecasts can be adjusted by where a a ( , ) 2, are a set of arbitrary adjustment multipliers for VaR and ES forecasts formed at t, which satisfies that a t 1, , a > 0 to ensure the negativity of VaR and ES forecasts.The proposed form of adjustments, that is, the pair of multipliers, can provide direct evidence of under-and over-estimation risk and thus allow for model comparisons among diverse models.
The derived consistency of FZ loss functions implies that the true values of (VaR, ES) forecasts can minimize the expected loss (Fissler & Ziegel, 2016).When we use a consistent loss function, the optimal adjustment multipliers ( ) consequently minimize the expectation of the FZ loss function: (2) where  t denotes the information set up to time t, and is set to ensure that for a pair of risk forecasts, ES values are always smaller than VaR values, showing a more extreme level of risk than VaR.L FZ0 is the FZ0 loss function considered in this paper, formulated as follows: where refers to the indicator function, which is 1 if ≤ r v, that is, a VaR exceedance occurs, and 0 otherwise.Of the broad FZ family, the aforementioned loss function FZ0 is the primary option in terms of risk estimation and backtesting; see related papers, Nolde and Ziegel (2017), Patton et al. (2019), andMerlo et al. (2021).
Then, we can obtain the estimators a a ( ˆ, ˆ) We use a rolling window scheme with the window length of M to obtain the time series of the pair of optimized multipliers for further analysis in this paper.The detailed procedure can be found in Algorithm 1.

| RISK FORECASTING MODELS
A large number of risk forecasting models are used in the academic, industry, and regulator system.The model choice highly depends on the preference and demand of end users.In this paper, we broadly select 10 commonly used risk forecasting models as our candidates, including nonparametric, parametric, and semiparametric models.

| Historical simulations (HS)
As a simple nonparametric approach, the standard HS is selected, which is implemented within an estimation window with the size n = 1000.The HS predicts the 1-day-ahead VaR by taking α quantile of past returns empirically.Additionally, we calculate the average left-tail returns beyond VaR to proxy ES.

| Weighted historical simulations (WHS)
We consider the WHS method, which is based on a geometrically declining scheme (see Boudoukh et al., 1998).Within this method, more recent observations are assigned with higher weights for forecasting.The weight of day , where η = 0.99.

| Cornish-Fisher (CF) approximation
Another nonparametric method for VaR and ES estimation is the CF approximation method, which is an extended normal quantile technique by considering the in-sample estimated skewness δ ˆ1 and kurtosis δ ˆ2: and Φ α −1 denotes the inverse of the Gaussian cumulative distribution function, μ ˆt and σ ˆt denote the in-sample mean and standard deviation, respectively.Boudt et al. (2008) develop the CF approximation technique to estimate ES: where for is even, (2 + 1) (2 + 1) . ⋅ ϕ ( ) denotes the Gaussian probability density function.

| Parametric methods
The generalized autoregressive conditional heteroskedasticity (GARCH) framework is proposed by Bollerslev (1986) to model conditional volatility with a specified distribution of innovations.In this paper, we consider the GARCH(1,1) model with the Student's t (GARCH-t) and skewed t (GARCH-skt) distributional innovations as our candidates.They are more accurate in terms of describing heavy tails and negative skewness (Patton et al., 2019).The (VaR, ES) forecasts can be obtained via the following specification: where ν denotes the estimated degree of freedom (DoF) parameter of the t distribution, and ⋅ t ( ) is the Student's t distribution probability density function.
3.2.2| GARCH(1,1)-skewed t model (G-skt) Next, we suppose that the innovations follow a skewed-t distribution.Thus, we forecast risk measures as in the following expression: where ⋅ f ( ) skt denotes the skewed-t distribution probability density function.

| Extreme value theory with peaks-over-threshold (EVT-POT)
The EVT provides a tool for estimating the tails based on available observations in the left tail of distribution.McNeil and Frey (2000) propose a semiparametric model applying EVT to describe the tail of the conditional distribution, which is developed by Samuel (2008).Following this, we use the POT method to consider the exceedances of past observations over a typically high threshold, where a generalized Pareto distribution (GPD) is employed to fit negative returns over this specified threshold: where the quantile F α ( ) can be estimated as both the scale parameter s ˆand the shape parameter ξ ˆestimated from the fitting of the GPD distribution.The term n u denotes the number of observations exceeding the selected threshold u.Consequently, the predicted ES can be calculated as One-factor GAS model (GAS-1F) Patton et al. (2019) propose a group of dynamic semiparametric models for VaR and ES forecasting.We select the onefactor generalized autoregressive score model (GAS-1F), where we use the scaled score of the FZ0 loss function to drive the time variation in the target parameter.The GAS-1F model is expressed as where the score variable s t is defined as the Hessian factor H t is set to one in our paper for simplicity.To estimate parameters in the framework, we minimize the proposed FZ0 loss function.

| Filtered historical simulations (FHS)
To estimate VaR and ES in a GARCH(1,1) framework without assuming the underlying conditional distribution, Barone-Adesi et al. (1999) propose a semiparametric model called FHS.In this framework, we randomly draw the historical standardized innovations (with replacement) B times to form a bootstrapped set: { } . The bootstrapped returns can be computed as follows: where σ ˆt is estimated by GARCH(1,1) model.Next, we apply the HS method to the bootstrapped returns { } to obtain the estimated VaR and ES at α significance level.

| CAViaR model with symmetric absolute value (CAViaR-SAV)
Taylor ( 2019) extends the ES estimation from the conditional autoregressive Value at Risk (CAViaR) models introduced by Engle and Manganelli (2004).In this study, we select CAViaR-SAV, which considers the symmetric absolute values of historical observations: Here, the parameters of this model are estimated by maximizing the sum of logarithm of the asymmetric Laplacelikelihood function proposed by Koenker and Machado (1999): instead of minimizing the loss function.7 3.3.5 | CARE model with symmetric absolute value (CARE-SAV) Newey and Powell (1987) define the expectile of a distribution as the tail expectation if values above it were more likely to occur than they actually are.Efron (1991) estimates VaR at α level by mapping it to the expectile at τ level, denoted by q τ ( ).Taylor (2008) proposes a set of conditional autoregressive expectile (CARE) models to forecast expectiles and ES.We consider the CARE-SAV, which is shown as: where the parameters are estimated by minimizing the loss function proposed by Newey and Powell (1987): We calculate daily excess returns of long positions based on the assumption of fully collateralized futures positions (e.g., Bakshi et al., 2019;Boons & Prado, 2019;Gorton et al., 2013;Koijen et al., 2018): where P t 1 is the time-t closing price of the 1 st front-month contract (the contract with the 1 st shortest maturity at time t among all available contracts).Focusing on the 1st front-month contract could ensure sufficient liquidity.The frontmonth contract is rolled on the 7th day of the month or the next closest business day if 7th day of the month is not a business day (Kang et al., 2020).
In Table 1, we present the summary statistics of their daily excess returns, open interest, and trading volume.Over the time period studied, the means of all return series are close to zero, and the NG market is found to be the most volatile one.Non-zero skewness and high kurtosis reflect that all return series (especially of WTI CL and HO) are slightly skewed and display leptokurtic distributions.This indicates that the return distributions of energy commodities are not normally distributed and often have fat tails.It is also notable that WTI CL, as an important component of the downstream refined oil products and highly financialized commodity, has the highest open interest and daily trading volume in the energy sector.

| EMPIRICAL INVESTIGATION
Considering the widely used risk models in the current literature, we employ the proposed methodology to improve VaR and ES forecasts for the energy commodities.We evaluate 1-day-ahead VaR and ES forecasts for the four commodity futures returns and for the following three significance levels: 1%, 2.5%, and 5%, of most interest to financial regulators and institutions (Basel Committee on Banking Supervision, 2019).One-day-ahead VaR and ES forecasts are made with parameter values estimated within a fixed window of 1000 observations starting from April 4, 1990, for each model (except for nonparametric ones) and each probability level. 9Then, the adjustment parameters are estimated using a rolling window of 2000 returns and risk forecasts, which moves forward by 1 day at a time.The out-of-sample period for each asset contains 4971 days, between March 27, 2002, andDecember 31, 2021.F I G U R E 1 Daily energy commodity futures excess returns. 8The sample period of the COVID-19 crisis starts on 2020-01-23, the day of the Wuhan lockdown in China, marking the beginning of the pandemic.
We use 2021-02-17 as the end of Covid crisis, which is the first day when CL price was higher than its close price in 2019, following the impact of COVID-19. 9The fixed window scheme has been widely accepted for parameter estimation of parametric and semiparametric models among practitioners but has been criticized for its lack of model flexibility in adapting to market conditions.To illustrate the effectiveness of our methodology, risk forecasting models considered in this paper are mainly estimated using this fixed window strategy.

| Model disagreement of risk forecasts
Different characteristics of risk models lead to disagreement among their risk forecasts.One of the benefits of our risk adjustment method is to reduce these disagreements among diverse risk models; this method is not restricted to any model or model set used.To investigate the disagreement level of pre-and post-adjusted risk forecasts, we follow Danielsson et al. (2016) to calculate the ratio of the maximum to the minimum tail risk forecasts-referred to as the risk ratio-by utilizing all candidate models.At time t + 1, the risk ratio for one asset is defined as where N is the number of candidate models, which is equal to10 in this paper.According to the definition, the risk ratio must be greater than one.The risk ratio is approaching one, indicating less model disagreement.
Panel A in Table 2 displays average risk ratios from pre-adjusted 1% daily VaR forecasts by using 10 candidate models and by excluding one particular model over the full out-of-sample period and during seven energy crisis periods.Overall, risk model disagreement for oil-related products, including WTI CL, HO, and RBOB/Unleaded Gasoline, differs from NG. First, when all candidate models are considered, a larger disagreement between risk forecasts can be observed in oil-related products (risk ratio is more than 2.28) relative to NG products (risk ratio is 1.79).Second, for oil-related products, the risk ratio is relatively low when the CARE-SAV model is excluded, but the lowest risk ratio is observed after excluding the EVT-POT model in the NG market.It is interesting to note that both the CARE-SAV and EVT-POT models produce the highest tail risk in the (oil products) NG market, since a selection of threshold is required when implementing these procedures.In a crisis period, a biased selection of threshold may result in significant estimation errors.Last, during 2008 global financial crisis and 2020 global Covid crisis, risk disagreement for oil-related products is higher than NG, especially for WTI CL.This is because CL is a highly financialized energy commodity and can be significantly influenced by shocks.
Market conditions potentially have impacts on risk ratios-for example, a change in political environment (i.e., wars), supply/demand disruptions (i.e., hurricanes), and so on.Inspired by the key geopolitical and economic events that affect oil prices, we consider eight events as energy crises, covering wars, natural disasters, a financial crisis, OPEC supply disruptions, and Covid global pandemic. 10The mean of risk ratios during the crisis period in Panel A in Table 2 shows that risk disagreement is relatively high during the crisis period.For oil-related products, risk disagreement is extremely high during the 2003 Iraq War, 2008Global Financial Crisis, 2008 OPEC cuts in production, 2014-2016 oil prices collapse, and 2020 Covid crisis.However, a high risk ratio for NG is observed during Hurricane Ivan and Hurricane Dennis (in 2004 and 2005, respectively), which resulted in a reduction in the production of NG.
Panel B in Table 2 shows risk ratios obtained from post-adjusted VaR forecasts.First, compared to Panel A in Table 2, the risk ratios of VaR forecasts calculated from all candidate models witness a drop after utilizing an adjustment method to 1% daily VaR over the out-of-sample period.11For example, the risk ratio decreases from 3.06 and 1.79 to 2.54 and 1.56 in the WTI CL market and NG market, respectively.Second, the decrease in risk disagreement after adjusting VaR forecasts can also be observed in crisis periods.In the 2008 financial crisis period, VaR forecasts from different risk models are highly divergent in the WTI CL market (average risk ratio in this period is 3.08).This disagreement reduces to 2.24 after applying the risk adjustment method.Last, after the VaR adjustment, the risk ratios are more stable even if one risk model is excluded from the candidate models.This means that the risk adjustment method leads to the VaR level becoming less sensitive to one particular model.
Panels A and B in Table 3 show the risk ratios calculated from pre-and post-adjusted ES, respectively.Consistent with the VaR disagreement, the ES disagreement also lessens after applying the adjustment method in the full sample and most of the crisis periods.Thus, the risk adjustment method not only helps reduce the VaR disagreement but also improves the ES disagreement.
In addition, the change in margin requirements often happens in the energy commodity futures market, as regulators intend to adjust the margin level to stabilize the market in the face of high fluctuating commodity futures prices.For example, CME Clearing uses a variety of VaR-based models to determine their benchmark margin levels. 12T A B L E 2 Risk ratio sensitivity: 1% VaR.As an additional exercise, we would like to explore the efficiency of this adjustment methodology around the margin level change date that characterizes the market state as highly volatile (Hedegaard, 2014;Park & Abruzzo, 2016).
Figure 4 shows the maintenance margin of the front-month contract for WTI CL, NG, HO, and RBOB/Unleaded Gasoline between 2009 and 2021. 13,14Large margin variation could be observed over time for all four energy commodities.Oil products share common factors that have impacts on their supply/demand side, which results in high risk in the market.Regulators could change the margin level across oil products to stabilize the market.Thus, the changes in the margin level for WTI CL, HO, and RBOB/Unleaded Gasoline have a similar trend.As the most traded products, the margin level of WTI CL changes more frequently than the other oil products.
When market uncertainty, particularly the level of extreme losses, is high, regulators can raise the margin levels to calm down the market.In Figures 2 and 3, we plot the range of pre-and post-adjusted 2.5% daily VaR and ES forecasts calculated from 10 candidate models around the days when maintenance margins are increased for WTI CL, NG, HO, and RBOB/Unleaded Gasoline. 15First, before CME Group raises the maintenance margins, the median of both pre-and post-adjusted 2.5% daily VaR forecasts falls significantly.This implies that CME Group steps into the market The range of average pre-and post-adjusted 2.5% VaR 7 days before and after the increase in margin requirements for CL, HO, NG and XB.This boxplot represents the range of average VaR forecasts from 10 risk models around the days of margin requirements increase.We first take the average of VaR forecasts from each risk model on (and specific days before and after) the event day.Then plot the range of these average VaR forecasts across 10 different models.The black box is for pre-adjusted 2.5% VaR, and the red box is for postadjusted 2.5% VaR.The dot in the middle of each box is the median of the sample.The bottom and top of each box are the 25th and 75th percentiles of the sample.On the x-axis, zero indicates the day that margin requirements increase, and a negative (positive) number means the number of days before (after) the day of that margin increase.[Color figure can be viewed at wileyonlinelibrary.com] 13 CME Group usually sets the maintenance margin at first and then sets initial margins by adjusting the maintenance margin with a specific ratio, which almost always remains the same for a contract.with a margin increase policy only when it observes the tail risk measurements reaching an extremely high level.After the increase in margin levels, the downward trend in both pre-and post-adjusted VaR and ES has stopped; thus, a margin increase policy is an effective way to reduce tail risk and calm down the market.Hence, the risk model adjustment methodology could successfully capture the trend of tail risk movements.Second, the range of postadjusted VaR (ES) forecasts around the time with a margin level increase is significantly narrower than that of the preadjusted VaR (ES) forecasts in general.This suggests that our adjustment methodology to the individual risk model is effective at reducing tail risk disagreements between different models, even in super-chaotic periods.

| VaR and ES backtesting
To justify the efficiency of adjustments made to risk estimates, it is necessary to investigate the forecasting ability of adjusted VaR and ES forecasts.At first glance, Table 4 compares the actual VaR exceedance rates before (resp.after) applying optimized adjustment multipliers to improve on the out-of-sample raw risk estimates at α = 1%, 2.5%, and 5% with the expected VaR exceedance rates (i.e., three α levels), for various risk models and several energy futures.This highlights that the improved VaR forecasts can achieve a desirable level of actual VaR exceedances, which generally matches the expected level.Tables 5 and 6 compare the average FZ0 loss of pre-and post-adjusted risk forecasts estimated at 1%, 2.5%, and 5% for 10 candidate models and four energy futures over the full out-of-sample period and during energy crisis periods, respectively.The lower the average loss, the better forecasting performance the pair of risk F I G U R E 3 The range of pre-and post-adjusted 2.5% ES 7 days before and after the increase in margin requirements for CL, HO, NG, and XB.This boxplot represents the range of average ES forecasts from 10 risk models.We first take the average of ES forecasts from each risk model on (and specific days before and after) the event day.Then plot the range of these average ES forecasts across 10 different models.The black box is for preadjusted 2.5% ES, and the red box is for post-adjusted 2.5% ES.The dot in the middle of each box is the median of the sample.The bottom and top of each box are the 25th and 75th percentiles of the sample.On the x-axis, zero indicates the day that margin requirements increase, and a negative (positive) number means the number of days before (after) the day of that margin increase.[Color figure can be viewed at wileyonlinelibrary.com] forecasts have.The results show that our adjustment method indeed improves the forecasting ability of raw risk estimates.
In the following, we adopt the formal statistical backtesting procedures including traditional tests and comparative tests.The traditional backtesting methods are designed to test whether the forecasting model is correctly specified and provides appropriate forecasts by comparing the realized returns, VaR and ES data, whereas the comparative backtesting methods focus on making model comparisons.By examining the backtesting performance of pre-and post-adjusted risk forecasts via these two types of backtesting methodologies in an exhaustive manner, we check whether the risk model performance is improved after applying the proposed adjustment method.
Regarding the traditional backtesting, we organize well-known and recently developed tests into two broad categories: (1) VaR backtests and (2) ES backtests. 16We backtest VaR individually via the unconditional coverage (UC) test of Kupiec (1995), the conditional coverage (CC) test of Christoffersen (1998), and the dynamic quantile (DQ) test of Engle and Manganelli (2004).For the evaluation of the VaR forecasting accuracy, the UC test simply considers the exceedance frequency and is deficient in detecting clustered exceedances, which is made up by CC and DQ tests.Both CC and DQ tests are designed to jointly test for the frequency and independence of exceedances.
Compared with VaR backtesting, ES backtesting is not straightforward as it often needs more information (such as VaR estimates at multiple levels required in Emmer et al., 2015) relative to the ES per se.To backtest ES, we employ the exceedance residual (ER) test of McNeil and Frey (2000), the conditional calibration (CCA) test based on moment conditions (Nolde & Ziegel, 2017), and several specifications of regression based expected shortfall (ESR) tests In the format of colormaps, Figures 5, C1, and C2 visually display the p-values of backtests for VaR and ES forecasts at the 1%, 2.5%, and 5% level, respectively, over the out-of-sample time period (27-Mar-2002to 31-Dec-2021) and for various models and four futures, before (shown in the left column) and after (shown in the right column) adjustments are made based on our proposed methodology.For the sake of brevity and clarity, the backtests are labeled 1-11, representing the following tests: UC, CC, DQ, two-sided ER, one-sided ER, two-sided CCA, one-sided CCA, ESR Strict, ESR AUX, two-sided ESR INT, and one-sided ESR INT, accordingly; the risk models are numbered 1-10, denoting HS, WHS, CF, G-t, G-skt, EVT-POT, GAS-1F, FHS, CAViaR-SAV, and CARE-SAV, respectively.If the p-value is smaller than 0.05, the cell is colored red; if the value falls between 0.05 and 0.1, the cell is in yellow; otherwise, it is in green.The red and yellow colors suggest that the model fails the test at the 5% and 10% significance levels, respectively.Regarding all the aforementioned backtests, the forecasting performance of risk models considered in this paper is indeed improved after our adjustment methodology is applied to improve on raw risk forecasts.Notably, the UC test (backtest 1) with respect to the frequency of VaR exceedances and the ER test focused on the magnitude of ES forecasts (backtests 4-5) benefit most from adjustments made to VaR and ES forecasts.The UC backtesting results indicate that the actual VaR exceedance rate becomes closer to the nominal α level after making the optimized adjustments.And the Note: This table presents the actual VaR exceedance rates of pre-and post-adjusted risk forecasts based on the FZ0 loss function across four energy futures and three significance levels.Results based on pre-and post-adjusted VaR forecasts are labeled as column "before" and "after", respectively.
ER backtesting results show that the magnitude of ES forecasts becomes reasonable neither overestimating nor underestimating risks) and highlights that our proposed adjustment methodology can facilitate market risk measures in capturing the appropriate size of tail losses.It is expected that, in terms of the CC and DQ tests (backtests 2-3) the backtesting performance of risk models is slightly affected by our adjustment methodology, which is limited in removing the endogenous issue of volatility clustering.Additionally, we find that G-t, G-skt, EVT-POT, GAS-1F, CAViaR-SAV, and CARE-SAV are unable to produce adequate risk forecasts at different levels for various energy futures considered, as shown by the high rejection rates of backtests.After making the refinements to these models based on our methodology, the rejection rates sharply decline.As seen from Figure 1, which shows the dynamics of daily returns of futures contracts, the market regimes are clearly changed during crisis periods, except for NG futures in a consistently high-volatility regime.To further test the efficiency of our adjustment methodology, in similar fashion we present the backtesting results of risk models over the global financial crisis period ranging from 01-Dec-2007 to 30-Jun-2009 in Figures 6, C3 and C4.As expected, the Note: Each panel presents average FZ0 loss values of 10 candidate models before and after optimized adjustments are made to VaR and ES estimated at 1%, 2.5%, and 5%, for a given asset.Results based on pre-and post-adjusted VaR and ES forecasts are labeled as columns "before" and "after," respectively.The average loss values indicating the outperformance of post-adjusted forecasts than pre-adjusted forecasts are highlighted in bold.# imp.denotes the number of models that experience improvements after adjustment.
simple models, HS and CF, are less responsive to large market fluctuations and fail most of the backtests, whereas the more complex models such as G-t, G-skt, CAViaR-SAV, and CARE-SAV perform better over the crisis period.
Generally, the risk models can generate better risk forecasts after the adjustment methodology is applied, as evidenced by the better backtesting performance in Figures 6, C3 and C4.This adjustment methodology not only can improve on ex ante risk forecasts but also can give an indication of timevarying over-or underestimation of risks.Figure 7 presents a series of quantiles of optimal adjustments obtained in Equation ( 5), made to VaR (in the left column) and ES (in the right column) estimated at α = 2.5% for several energy futures contracts and across various models, in which each line represents the quantile ranging from 5% to 95% with an increment of 10%.The value of the optimal adjustment being equal to 1 indicates that no correction will be made to VaR and ES forecasts.As shown in this figure, the underestimation or overestimation of risks is noticeable, thus calling for appropriate adjustments.Specifically, if the value of adjustment is above (below) 1, the current risk forecast is underestimated (overestimated), and the corresponding capital buffer should be increased (decreased).Among the Note: Each panel presents average FZ0 loss values of 10 candidate models before and after optimized adjustments are made to VaR and ES estimated at 1%, 2.5% and 5%, for a given asset.Results based on pre-and post-adjusted VaR and ES forecasts are labeled as column "before" and "after," respectively.The average loss values indicating the outperformance of post-adjusted forecasts than preadjusted forecasts are highlighted in bold.# imp.denotes the number of models that experiencing improvements after adjustment.model set considered, the G-t applied to various energy futures tends to overestimate risks, whereas the HS, WHS, FHS, and GAS-1F methods tend to underestimate risks.
Complementary to traditional backtesting for model performance, we add comparative tests based on the FZ0 losses, namely, Diebold-Mariano (DM) (Diebold & Mariano, 2002) and model confidence set (MCS) tests (Hansen et al., 2011), to make model comparisons to check whether our adjustment methodology improves on existing risk forecasts.
In the DM test, a negative t-statistic indicates that the row forecast outperforms the column forecast with a significant loss difference.An absolute t-statistic greater than 2.575 (1.96 or 1.64) indicates that the average loss difference is significantly different from zero at the 1% (5% or 10%) significance level.Figures 8 and 9 present the results with the null hypothesis that the row forecast and the column forecast have equal values based on the loss function for 1% VaR and ES over the full sample and the crisis period, respectively.Blue blocks mean that the row forecast has a lower average loss than the column forecast at different significance levels (the darkest color means that we reject the null hypothesis at the 1% significance level and so on).White blocks mean that there is no significant difference between the row forecast and the column forecast.Red blocks mean that the row forecast has a higher average loss than the column forecast (the darkest shade means that we reject the null hypothesis at the 1% significance level and so on).To evaluate the efficiency of the adjusted forecasts compared with the original forecasts, we only focus on the blocks in the diagonal of the color maps.Blocks in the diagonal that are shaded blue indicate that our methodology significantly improves the forecasts; otherwise, there is no significant improvement after adjustment, or even worse.In Figures 8 and 9, it is obvious that most of the blocks in the diagonals are shaded blue, and it is rare to see red blocks in the diagonals.Overall, the DM test indicates the significant improvement of our methodology for 1% VaR and ES forecasts. 17Note: The number of commodity futures for which each method is within the model confidence set at the 75% and 95% confidence levels based on the FZ0 loss function.Results based on pre-and post-adjusted VaR and ES forecasts are labeled as column "before" and "after," respectively.
Alternatively, we exploit the MCS test to compare the forecasts based on the losses generated from the FZ0 loss function.In this paper, we adopt two methods: (1) the R method using sums of absolute values for calculating the test statistics for MCS and (2) the SQ method using the summed squares. 18Table 7 shows the backtesting results via the MCS test, in which the block bootstrap is used with a block length of 12 and 10,000 replications over the financial crisis period at 75% and 95% confidence levels, respectively.In general, the total number of the post-adjustment models included in the best model set is obviously larger than the one of preadjustment models, especially when we apply the SQ method.Specifically, after adjustment, the EVT-POT, GAS-1F, and CAViaR-SAV models are contained in the best model set more often than before the adjustment.

| CONCLUSION
To facilitate efficient financial risk management, this paper develops a generic adjustment methodology to improve on VaR and ES forecasts via the minimization of the average loss of the FZ loss function.This methodology is advantageous in explicitly indicating the degree of under-and overestimation of tail risks, a topic of concern to regulators investors and applicable to any risk model that produces VaR and ES forecasts.Moreover, this adjustment methodology is built on the objective of minimizing a loss function, and as a result, we expect the risk disagreement among post-adjusted risk forecasts to be lessened.
In the empirical analysis, we apply the proposed methodology to a battery of risk forecasting models for several futures contracts in the energy commodity market in which tail risk management plays a significant role.After making adjustments to VaR and ES forecasts built using the risk models considered for four energy futures, the risk ratios decline over the full out-of-sample period and during seven energy crisis periods, indicating the abatement of risk disagreement within the model set.In addition, with the margin level change date signaling high volatility faced by market participants, around these dates, the shrinking range in values of post-adjusted VaR and ES forecasts given by various models shows the reduction in model disagreement.Taken together, this adjustment methodology helps alleviate the model disagreement among diverse risk models.Further, the forecasting accuracy of VaR and ES estimates is generally improved, as verified via various backtesting tests for VaR and ES.Notably, the UC test with respect to the frequency of VaR exceedances and the ER test focused on the magnitude of ES forecasts benefit most from adjustments made to VaR and ES forecasts.Note: The mean of the ratio of the highest to the lowest daily pre-and post-adjusted ES forecasts (risk ratio) across four commodity futures and three significance levels.Each panel presents average risk ratios of 10 candidate models before and after optimized adjustments are made to ES estimated at 1%, 2.5%, and 5%, for a given asset, including WTI Crude Oil (CL, Panel A), Heating Oil (HO, Panel B), Natural Gas (NG, Panel C), and Unleaded/RBOB gasoline (XB, Panel D). Results based on pre-and post-adjusted ES forecasts are labeled as column "before" and "after," respectively.The sample period is from March 27, 2002 to December 31, 2021.Risk ratios presented in the "None" row are calculated from daily VaR forecasts by using all 10 candidate models.The rest of rows display risk ratios when one specific model is excluded to avoid the effect of outlying forecasts.The excluded model is indicated by the row name.VaR and ES over the full sample.Forecasts marked with 1* to 10* are the adjusted risk measures forecasts based on the original forecasts marked with 1-10.Blue blocks mean that the row forecast has a lower average loss than the column forecast at different significance levels (the darkest shade means that we reject the null hypothesis at the 1% significance level and so on).White blocks mean that there is no significant difference between the row forecast and the column forecast.Red blocks mean that the row forecast has a higher average loss than the column forecast (the darkest shade means that we reject the null hypothesis at the 1% significance level and so on).CL, HO, NG, and XB are abbreviations for WTI Crude Oil, Heating Oil, Natural Gas, and RBOB/Unleaded Gasoline, respectively.Forecasts 1-10 are generated from HS, WHS, CF, G-t, G-skt, EVT-POT, GAS-1F, FHS, CAViaR-SAV, and CARE-SAV, accordingly.[Color figure can be viewed at wileyonlinelibrary.com]F I G U R E C7 Color map based on the Diebold-Mariano (DM) test comparing the average losses using the FZ0 loss function for 5% VaR and ES over the full sample.Forecasts marked with 1* to 10* are the adjusted risk measures forecasts based on the original forecasts marked with 1-10.Blue blocks mean that the row forecast has a lower average loss than the column forecast at different significance levels (the darkest shade means that we reject the null hypothesis at the 1% significance level and so on).White blocks mean that there is no significant difference between the row forecast and the column forecast.Red blocks mean that the row forecast has a higher average loss than the column forecast (the darkest shade means that we reject the null hypothesis at the 1% significance level and so on).CL, HO, NG, and XB are abbreviations for WTI Crude Oil, Heating Oil, Natural Gas, and RBOB/Unleaded Gasoline, respectively.Forecasts 1-10 are generated from HS, WHS, CF, G-t, G-skt, EVT-POT, GAS-1F, FHS, CAViaR-SAV, and CARE-SAV, accordingly.[Color figure can be viewed at wileyonlinelibrary.com]F I G U R E C8 Color map based on the Diebold-Mariano (DM) test comparing the average losses using the FZ0 loss function for 5% VaR and ES over the crisis period.Forecasts marked with 1* to 10* are the adjusted risk measures forecasts based on the original forecasts marked with 1-10.Blue blocks mean that the row forecast has a lower average loss than the column forecast at different significance levels (the darkest shade means that we reject the null hypothesis at the 1% significance level and so on).White blocks mean that there is no significant difference between the row forecast and the column forecast.Red blocks mean that the row forecast has a higher average loss than the column forecast (the darkest shade means that we reject the null hypothesis at the 1% significance level and so on).CL, HO, NG, and XB are abbreviations for WTI Crude Oil, Heating Oil, Natural Gas, and RBOB/Unleaded Gasoline, respectively.Forecasts 1-10 are generated from HS, WHS, CF, G-t, G-skt, EVT-POT, GAS-1F, FHS, CAViaR-SAV, and CARE-SAV, accordingly.[Color figure can be viewed at wileyonlinelibrary.com]

14
See https://www.cmegroup.com/clearing/risk-management/historical-margins.html for more details.15Weuse 2.5% significant level since the Basel III regulations propose a shift from a 1% VaR framework to a 2.5% ES.

F
I G U R E 4 Historical margin requirements from Chicago Mercantile Exchange (CME) Group for first front contracts of CL, HO, NG, and XB between January 2, 2009, and December 31, 2021.16 For more details of VaR and ES backtesting methods, see Appendices A and B, respectively.including the strict ESR (ESR Strict) test, the auxiliary ESR (ESR AUX) test, and the intercept ESR (ESR INT) test, introduced by Bayer and Dimitriadis (2022).

F
I G U R E 5 1% VaR and ES backtesting results with respect to p-values before and after adjustments, over the full period.A p-value smaller than 0.05 (between 0.05 and 0.1) is shaded with red (yellow); otherwise, it is green.Models 1-10 are HS, WHS, CF, G-t, G-skt, EVT-POT, GAS-1F, FHS, CAViaR-SAV, and CARE-SAV, accordingly.Backtests 1-11 are UC, CC, DQ, two-sided ER, one-sided ER, two-sided CCA, one-sided CCA, ESR Strict, ESR AUX, two-sided ESR Int, and one-sided ESR Int.[Color figure can be viewed at wileyonlinelibrary.com]F I G U R E 6 1% VaR and ES backtesting results with respect to p-values before and after adjustments, over the crisis period.A p-value smaller than 0.05 (between 0.05 and 0.1) is shaded with red (yellow); otherwise, it is green.Models 1-10 are HS, WHS, CF, G-t, G-skt, EVT-POT, GAS-1F, FHS, CAViaR-SAV, and CARE-SAV, accordingly.Backtests 1-11 are UC, CC, DQ, two-sided ER, one-sided ER, two-sided CCA, one-sided CCA, ESR Strict, ESR AUX, two-sided ESR Int, and one-sided ESR Int.[Color figure can be viewed at wileyonlinelibrary.com]

F
I G U R E 7 Adjustments made to risk forecasts, based on daily data of energy futures contracts.CL, HO, NG, and XB are abbreviations for WTI Crude Oil, Heating Oil, Natural Gas, and RBOB/Unleaded Gasoline, respectively.Each line represents the quantile, from 5% to 95% with an increment of 10%, of adjustments made to 2.5% VaR (in the left column) and ES (in the right column) forecasts across various risk models considered.[Color figure can be viewed at wileyonlinelibrary.com] 17 The backtesting results for VaR and ES at other significance levels are presented in Figures C5, C6, C7, C8 in Appendix C. The results are consistent.F I G U R E 8 Color map based on the Diebold-Mariano (DM) test comparing the average losses using the FZ0 loss function for 1% VaR and ES over the full sample.Forecasts marked with 1* to 10* are the adjusted risk measures forecasts based on the original forecasts marked with 1-10.[Color figure can be viewed at wileyonlinelibrary.com]F I G U R E 9 Color map based on the Diebold-Mariano (DM) test comparing the average losses using the FZ0 loss function for 1% VaR and ES over the crisis sample.Forecasts marked with 1* to 10* are the adjusted risk measures forecasts based on the original forecasts marked with 1-10.[Color figure can be viewed at wileyonlinelibrary.com]T A B L E 7 The Model Confidence Set (MCS) test with the R and SQ methods over crisis periods.Panel A: The 95% model confidence set (MCS) test Summed absolute values (R method) APPENDIX C: ADDITIONAL RESULTS See Tables C1, C2 and Figures C1-C8.

F
I G U R E C1 2.5% VaR and ES backtesting results with respect to p-values before and after adjustments, over the full period.A p-value smaller than 0.05 (between 0.05 and 0.1) is shaded with red (yellow); otherwise, it is green.CL, HO, NG, and XB are abbreviations for WTI Crude Oil, Heating Oil, Natural Gas, and RBOB/Unleaded Gasoline, respectively.Models 1-10 are HS, WHS, CF, G-t, G-skt, EVT-POT, GAS-1F, FHS, CAViaR-SAV, and CARE-SAV, accordingly.Backtests 1-11 are UC, CC, DQ, two-sided ER, one-sided ER, two-sided CCA, one-sided CCA, ESR Strict, ESR AUX, two-sided ESR Int, and one-sided ESR Int.[Color figure can be viewed at wileyonlinelibrary.com]F I G U R E C2 5% VaR and ES backtesting results with respect to p-values before and after adjustments, over the full period.A p-value smaller than 0.05 (between 0.05 and 0.1) is shaded with red (yellow); otherwise, it is green.CL, HO, NG, and XB are abbreviations for WTI Crude Oil, Heating Oil, Natural Gas, and RBOB/Unleaded Gasoline, respectively.Models 1-10 are HS, WHS, CF, G-t, G-skt, EVT-POT, GAS-1F, FHS, CAViaR-SAV, and CARE-SAV, accordingly.Backtests 1-11 are UC, CC, DQ, two-sided ER, one-sided ER, two-sided CCA, one-sided CCA, ESR Strict, ESR AUX, two-sided ESR Int, and one-sided ESR Int.[Color figure can be viewed at wileyonlinelibrary.com]F I G U R E C3 2.5% VaR and ES backtesting results with respect to p-values before and after adjustments, over the crisis period.A p-value smaller than 0.05 (between 0.05 and 0.1) is shaded with red (yellow); otherwise, it is green.CL, HO, NG, and XB are abbreviations for WTI Crude Oil, Heating Oil, Natural Gas, and RBOB/Unleaded Gasoline, respectively.Models 1-10 are HS, WHS, CF, G-t, G-skt, EVT-POT, GAS-1F, FHS, CAViaR-SAV, and CARE-SAV, accordingly.Backtests 1-11 are UC, CC, DQ, two-sided ER, one-sided ER, two-sided CCA, one-sided CCA, ESR Strict, ESR AUX, two-sided ESR Int, and one-sided ESR Int.[Color figure can be viewed at wileyonlinelibrary.com]F I G U R E C4 5% VaR and ES backtesting results with respect to p-values before and after adjustments, over the crisis period.A p-value smaller than 0.05 (between 0.05 and 0.1) is shaded with red (yellow); otherwise, it is green.CL, HO, NG, and XB abbreviations for WTI Crude Oil, Heating Oil, Natural Gas, and RBOB/Unleaded Gasoline, respectively.Models 1-10 are HS, WHS, CF, G-t, G-skt, EVT-POT, GAS-1F, FHS, CAViaR-SAV, and CARE-SAV, accordingly.Backtests 1-11 are UC, CC, DQ, two-sided ER, one-sided ER, two-sided CCA, one-sided CCA, ESR Strict, ESR AUX, two-sided ESR Int, and one-sided ESR Int.[Color figure can be viewed at wileyonlinelibrary.com]F I G U R E C5 Color map based on the Diebold-Mariano (DM) test comparing the average losses using the FZ0 loss function for 2.5% Our analysis focuses on four commodities in the energy sector, including WTI crude oil (CL), heating oil (HO), natural gas (NG), and RBOB/unleaded gasoline (XB), which are listed on the Chicago Mercantile Exchange (CME).We obtain daily prices for individual commodity's futures contracts and the corresponding open interests and volumes data from Bloomberg.The sample period used in this paper is between April 4, 1990, and December 31, 2021 (7971 days), since NG futures have a relatively short history, which is available starting from April 4, 1990.The XB futures contracts were replaced by the RBOB Gasoline futures contracts in 2006.Thus, we use XB before 2006 and RBOB Gasoline after 2006 to represent RBOB/XB.
Summary statistics of energy commodity futures excess returns and their corresponding open interests and trading volumes T A B L E 1 Actual VaR exceedance rates of pre-and post-adjusted risk forecasts over full sample.
T A B L E 4 Average loss values of pre-and post-adjusted risk forecasts over the full sample.
T A B L E 5 Average loss values of pre-and post-adjusted risk forecasts over crisis periods.
T A B L E 6 T A B L E C1 Risk ratio sensitivity over full sample-VaR.The mean of the ratio of the highest to the lowest daily pre-and post-adjusted VaR forecasts (risk ratio) across four commodity futures and three significance levels.Each panel presents average risk ratio of 10 candidate models before and after optimized adjustments are made to VaR estimated at 1%, 2.5% and 5%, for a given asset, including WTI Crude Oil (CL, Panel A), Heating Oil (HO, Panel B), Natural Gas (NG, Panel C), and Unleaded/RBOB gasoline (XB, Panel D). Results based on pre-and post-adjusted VaR forecasts are labeled as column "before" and "after," respectively.The sample period is from March 27, 2002 to December 31, 2021.Risk ratios presented in the "None" row are calculated from daily VaR forecasts by using all 10 candidate models.The rest of rows display risk ratios when one specific model is excluded to avoid the effect of outlying forecasts.The excluded model is indicated by the row name.
Note:T A B L E C2 Risk ratio sensitivity over full sample-ES.