Estimating expected shortfall using a quantile function model

Distribution of financial returns defined by the existing GARCH models usually focus on the overall features such as the location, scale, skewness and kurtosis of the distribution. When using such GARCH models for expected shortfall (ES) estimation, it is difficult to consider specific information about the tails (such as the shape of the tails of the distribution), resulting in possible bias in ES estimation. We propose a quantile function threshold GARCH model to overcome some of the limitations of existing models. The model allows us to use the information including the skewness and tail shape of the distribution and the structure changes in the volatility of financial returns to obtain ES estimates. Our results show that the proposed model outperforms the benchmark models, confirming that tail shape of the distribution also plays an important role in ES estimation.


| INTRODUCTION
There are two commonly used risk measures in finance: Value-at-Risk (VaR) and expected shortfall (ES). By definition, VaR measures the conditional quantile of the distribution of financial returns, while ES measures the conditional expectation of loss given that the loss is beyond the VaR level. Both measures have some limitations. For example, VaR ignores any loss beyond the VaR level, it is not a coherence risk measure; it may provide mislead information to rational investors who wish to maximize expected utility, and it is hard to use when investors want to optimize their portfolios, etc.; while ES needs very large history to see extreme events and it is hard to be back-tested by using actual breaches etc. See, for example, Artzner et al. (1997Artzner et al. ( ,1999, Acerbi and Tasche (2002) and Yamai and Yoshiba (2002) and references therein.
Since the work of Artzner et al. (1997Artzner et al. ( ,1999, ES has become a more and more popular financial risk measure and, like VaR, it has also been used widely nowadays in practice. In this paper, we focus on ES. Various estimation methods for conditional ES can be found in the literature. Recently, Nadarajah et al. (2014) gave an excellent review on the estimation methods for ES, in which more than 150 references were discussed.
The majority of the estimation methods for ES are parametric in the sense that they are estimated by using a parametric model that defines the conditional distribution of returns. For example, Standard GARCH (denoted by S-GARCH) models of Engle (1982) and Bollerslev (1986) are not only popular for estimating volatility but also often used for studying VaR of returns, see, for example, Poon and Granger (2003). ES can then be calculated easily from the distributions defined by the S-GARCH models.
Recently, in addition to the normal distribution, other standard distributions have also been used for the innovation term of the S-GARCH models in order to capture various important features of financial data, which include the skewed t-distribution, normal-inverse Gaussian distribution (NIG), skewed generalized error distribution (GED), generalized hyperbolic distribution (GHYP) and Johnson's SU distribution (JSU), see, for example, Socgnia and Wilcox (2014), Necula (2009), Önalan (2010), Theodossiou (2015), Corlu and Corlu (2015) and references therein. For convenience, we call the skewed t-, NIG, skewed GED, GHYP and JSU distributions as standard distributions in order to distinguish them from the distribution that we will introduce later in the paper.
S-GARCH models have some limitations. For example, they assume that only the magnitude and not the positivity or negativity of unanticipated excess returns affect the volatilities of financial returns. To overcome the limitations of S-GARCH models, alternative models such as threshold GARCH (T-GARCH) models have been developed in the literature. For example, Glosten et al. (1993) proposed a T-GARCH model, the so-called GJR-GARCH model, Zakoian (1994) also proposed a threshold GARCH model, denoted by T-GARCH, and Nelson (1991) developed the exponential GARCH model, denoted by E-GARCH. More recently, Yang and Chang (2008) considered a double-threshold GARCH model and Yu et al. (2010) extended the CAV-iaR idea (Engle and Manganelli, 2004) to T-GARCH and mixture-GARCH models in order to take into account possible nonlinearity and structural changes in the VaR process.
These T-GARCH models can also be defined by using the standard distributions. An excellent review on threshold time series models in finance was given by Chen et al. (2011).
It is worth noting that GARCH models (including S-GARCH and T-GARCH) were originally developed for the conditional volatility of financial returns, which measures the stability of financial returns with respect to the average returns. The distribution of returns defined by GARCH models often mainly focuses on the overall features such as location, scale, skewness and kurtosis of the distribution. When using GARCH models for ES estimations, specific information about the tails such as tail shape of the distribution cannot be taken into account, leading to possible bias in ES estimation. Hence, we may ask: is it safe to use such GARCH models for the estimation of conditional ES? An answer to this question is obviously very important for financial risk management, but to our best knowledge, there exists a gap in the literature because not much work can be found in the literature to address this research question.
Our main contribution to the literature is that we propose a novel quantile function model in order to address this research question. We illustrate our approach by focusing on the T-GARCH model of Yu et al. (2010) because it gives a more general formulation for T-GARCH models, it includes the S-GARCH models as a special case, and it also allows us to take account of structural changes in the volatility of financial returns. One of the advantages of the proposed model is that it allows us to obtain ES estimates by using the information not only on the overall features but also on the tail shape of the distribution, which provides a means for filling the gap in the literature.
On the other hand, for model estimations, apart from the maximum likelihood estimation (MLE) method, Bayesian approach for GARCH models has often been used recently in the literature. For example, Bauwens and Lubrano (2002) studied option pricing with asymmetric GARCH models by using a Bayesian method; Ausín and Galeano (2007) considered a Bayesian estimation of the Gaussian mixture-GARCH models; a Bayesian method for estimating volatility asymmetries with a class of tree structured multivariate GARCH models was proposed by Dellaportas and Vrontos (2007); and Jensen and Maheu (2013) proposed a Bayesian approach to semiparametric multivariate GARCH modelling. More recently, Virbickaite et al. (2015) gave an excellent review on Bayesian inference methods for GARCH models. It is worth noting that the above-mentioned Bayesian estimation methods are generally based on the likelihood functions defined by density functions. For our proposed model, two main difficulties are involved in the model estimation. One is that the delay parameter and multiple thresholds also need to be estimated, and the other is that the likelihood function of the model parameters is defined by quantile functions. It is worth noting that estimating a T-GARCH model with multiple thresholds and a delay parameter is difficult if the MLE method is used. Hence, in this paper, we use a Bayesian approach and propose a MCMC estimation method for parameter estimation in order to facilitate the use of the proposed model for ES estimation. As the likelihood function of the model parameters is defined by quantile functions, our MCMC algorithm also illustrates how a quantile function model could be estimated by using a Bayesian approach, which could be viewed as our another contribution to the literature.
We further conducted an empirical study on six financial return series in this paper, which allows us to gain some insight into the performance of our model and other models regarding the estimation of ES. In fact, our results show that the proposed model outperforms all benchmark models considered in this paper, which confirm that the information about the tail shape of the conditional distribution of returns also plays a crucial role in the estimation of ES. Hence, such information should not be ignored when estimating ES of financial returns.
The paper is organized as follows. Section 2 introduces our quantile function T-GARCH model and Section 3 discusses the estimation method. Section 4 provides some details about the benchmark models and evaluation methods for ES estimates. Results on the empirical study are given in Section 5. Finally, some concluding discussions are given in Section 6. Moreover, proofs of Propositions are given in Appendices A-D, G, and H, the prior density functions and detailed MCMC algorithm are given in Appendices 5 and 6, and some tables about the BIC values of the estimated models, the results about the coverage probabilities of the VaR estimates and the evaluation results about the ES estimates are given in Appendix I.

| QUANTILE FUNCTION T-GARCH MODEL
Consider the general -regime T-GARCH model where I(Á) is the indicator function, the regime number J and the delay parameter d are positive integers, the γ j 's are real numbers (thresholds) such that − ∞ = γ 0 < γ 1 < Á Á Á < γ J − 1 < γ J = ∞, and p j ≥ 0 and q j ≥ 0 define the order of the model. Moreover, α 0j > 0, α j ð Þ i ≥0, β ℓj ≥ 0. The ε t 's are assumed to be i.i.d. with mean 0 and variance 1. Liu et al. (1997) proved that, under some regularity conditions, there exist stationary and ergodic solutions satisfying model (1). It is worth noting that in this paper we assume the delay parameter d D = {1, …, d 0 }, where, by following Yu et al. (2010), we let d 0 = 3.
As discussed in Section 1, if the innovation term ε t follows a standard distribution, then the conditional ES estimates obtained from Model (1) may be biased. Hence, our approach here is to identify a new distribution for the innovation term so that not only the overall shape of the distribution but also specific features about the tails can be taken into account. Freimer et al. (1988) and Fournier et al. (2007) discussed the generalized Lambda distributions, denoted by GLD, that can be a good candidate for our model because GLD can accurately approximate many commonly used distributions such as normal, log-normal, Weibull, t-, F-and skewed t-distributions as well as many others. Moreover, it contains two parameters that control the skewness of the distribution and that also determine the relative weights of the tails. In other words, the skewness of the distribution is modelled as a result of tail shape and not as an independent feature of the distribution (Gilchrist, 2000). Therefore, a model defined by the GLD is more robust to model specification errors and it is certainly worth considering this distribution for the estimation of ES of financial returns.
It is worth noting that the GLD has been used in the finance literature. For example, Corrado (2001) used the GLD for option pricing, Lee (2003) showed that the GLD provides a good model for spot exchange rates, Tarsitano (2004) used the GLD to model income data, and Corlu and Corlu (2015) studied the performance of GLD in capturing the leptokurtic and skewed behaviour of exchange rate returns. To our best knowledge, the GLD has not been used to define a T-GARCH model for the estimation of ES in the finance literature.
Hence, to make use of the GLD for our model, we first let ε t = {e t − μ(θ)}/σ(θ), where e t follows the GLD with mean μ(θ) and variance σ(θ) 2 , and θ is a vector of parameters of e t . It is seen that ε t is a standardized GLD. Due to the general popularity of order (1, 1) for GARCH models in finance, we let p j = q j = 1 in the rest of this paper. Then in this case, model (1) becomes Model (2) says that the conditional distribution of y t is a scaled standardized GLD and the conditional volatility of y t is again defined by ffiffiffiffi h t p . It is worth noting that the GLD is defined by a quantile function, which can be expressed by Q gld (τ, θ) = μ gld + s gld Q(τ, η), where τ (0, 1), and θ = (μ gld , s gld , η) is the parameter vector of the GLD, in which η = (η 1 , η 2 ), μ gld is the location of the GLD and s gld is the scale of the GLD. Hence, function (3) defines the GLD with location 0 and scale 1. Moreover, η controls the skewness of the distribution: if η 1 = η 2 then the distribution is symmetric, otherwise it is skewed. Furthermore, 1/(−η 1 ) and 1/(−η 2 ) also determine the relative weights of left and right tails respectively. Hence, both skewness and tail shape of the distribution are controlled by η.
As the GLD is used for the innovation term of our model, it is reasonable to consider the GLD with location μ gld = 0 and scale s gld = 1 for e t . Hence, we let e t follows the GLD defined by Q(τ, η) given by (3). For model (2) to be well defined, we need to show that e t has finite mean and variance, which can be seen from the following result.
Proposition 1 Suppose e t follows the GLD defined by (3). Then where M is a fixed very large positive number and m 0 is a fixed very small positive number, then both μ(η) and σ(η) are finite on See Appendix A for a proof. Note that we let M = 10 20 and m 0 = 10 −20 in this paper to make the parameter space as large as possible and that B(η 1 , η 2 ) can be calculated numerically. As both μ(η) and σ(η) are finite on (2) is well defined. Furthermore, we have the following result.
See Appendix B for a proof. Hence, it follows from Proposition 2 that Model (5) can be expressed by which is well defined on the parameter space (6) says that conditional distribution of y t is a scaled standardized GLD with mean 0 and variance h t , and the skewness and tail shape of the conditional distribution of y t are jointly controlled by η 1 and η 2 . Moreover, given model (6), the ES can be estimated by using Proposition 3 below.
Proposition 3 Suppose y t follows Model (6). Then the conditional expected shortfall of y t at level τ, denoted by ES tτ , is given by: See Appendix C for a proof. Clearly, to obtain the conditional ES estimates, we first need to know how to estimate the model parameters, which we discuss in the next section.

| Likelihood function
Various methods have been developed in the literature in order to estimate GLD models. For example, the method of moment was proposed by Karian et al. (1996), the method of least squares was discussed by Ozturk and Dale (1985), the method of percentiles was developed by Karian and Dudewicz (1999), a starship estimation method was proposed by King and MacGillivray (1999), the MLE method was discussed by Gilchrist (2000) and a two-step procedure that combines the method of moment or percentile and the MLE to fit the GLD to data was developed by Su (2007a). Su (2007b) also pointed out that the MLE method not only is more efficient but also tends to produce GLD that has closer first four moments to the data analysed.
For our model, to write out the likelihood function of the model parameters, we first need to note that the relationship between the density function f(y) of a random variable Y and its quantile function y = Q(τ) is given by f (y) = (d(Q(τ))/dτ) −1 . Moreover, it follows from model (6) that, for each y t there exists τ t $ U(0, 1) such that . Then, the likelihood of the observed returns y T , conditional on the initial observations y d 0 , is given by where σ(η) is given by (4). However, it is very difficult to use the above estimation methods for the proposed model because of the thresholds and the delay parameter involved in the volatility process. Hence, we propose a MCMC estimation method for the parameters which requires us to derive the posterior density function of β, γ, η and d.

| Posterior distribution
It follows from the Bayesian theorem that the posterior density function of β, γ, η and d is proportional to the product of the likelihood of observed returns and the prior density function of the parameters. Let π(β, η, γ, d| y T ) and π(β, η, γ, d) be the posterior and prior density functions respectively. Then, conditional on y d 0 , we have If π(β, η, γ, d) is a well-defined density function on Ω, i.e., the integral of π(β, η, γ, d) on Ω is finite, then the posterior density function (8) is also well defined on Ω.
See Appendix D for a proof. It is seen that the difference between Ω and Ω is in the range of values that α 0j can take. As m 0 is very small, this difference can be safely ignored from a practical point of view, but the proof in Appendix D shows that it guarantees that the posterior distribution is well defined on Ω, which is important for the parameter estimations using a MCMC method.

| Prior distribution
In practice, it is reasonable to assume that the thresholds γ j (y min , y max ) for all possible j, where y min = min{y 1 , …, y T } and y max = max{y 1 , …, y T }. To simplify the calculations, we let the prior density function π(β, γ, η, d) = π(β)π(γ)π(η)π(d), where γ j is uniformly distributed on (y min , y max ), d is uniformly distributed on D, and α ij , β 1j and −η v follow a log-normal distribution truncated on the parameter space Ω. Clearly, the prior distribution is well defined on the parameter space of the posterior distribution.
The detailed formulas for the prior density function are given in Appendix E, where the σ ij 's, s 1j 's and λ v 's are the corresponding scale parameters, which measure the strength of the prior information: large (small) values represent weak (strong) prior information used in the parameter estimation.
In this paper, we deliberately let σ ij = s 1j = λ v = 2. This means that, for example, the standard deviation (SD) of α ij is given by ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi e σ 2 ij e σ 2 ij −1 r = 54:1, which is very large. Hence, the results obtained in this paper are robust to the choice of π(β, γ, η, d).

| MCMC algorithm
One of the basic MCMC methods is the Metropolis-Hastings algorithm (see for example, Gamerman andLopes, 2002, andGeyer, 2011). In this method, we first generate a candidate parameter value from a chosen distribution and then we accept this proposed value as the next parameter value with a known probability. By repeating the two steps multiple times, we produce a sequence of parameter values that form a Markov chain in the parameter space Ω, whose equilibrium distribution is the posterior distribution of the parameters; see for example, Brooks (1998) andO'Hagan andForster (2004).
For our method, we let β, γ, η and d be the current parameter values, and β ' , γ ' , η ' and d ' the proposed values. Appendix F shows the detailed steps of our MCMC algorithm. Hence, after a burn-in period, we may collect posterior samples of parameters from the Markov chain generated by our MCMC algorithm, based on which the model parameters can be estimated, and hence, conditional ES estimates at a given level can be obtained.
It is seen from Appendix F that our estimation method can handle multiple thresholds and the delay parameter easily and has no difficulties in handling the complex structure of the parameter space. However, it is worth noting that, in the MLE method, a grid search procedure for the threshold and delay parameter usually needs to be used, which becomes impractical when the number of thresholds is greater than one.

| Benchmark models and ES estimations
There are many models in the literature that have been used to estimate ES, see Nadarajah et al. (2014). For the purpose of this study, we selected our benchmark models as follows.
First, note that our model is a parametric model that is related to a GARCH model. Hence, our main benchmark models are also chosen to be parametric. Specifically, we use S-GARCH, GJR-GARCH, E-GARCH and T-GARCH models as our benchmark models, for which the models of order (1, 1) are defined by y t = σ t ε t , where σ t is given below: Moreover, the innovation term of each GARCH model is assumed to follow any one of the following standard distributions, that is, the skewed t-, NIG, skewed GED, GHYP and JSU distributions, resulting in 20 benchmark GARCH models in total. These distributions were chosen because they are able to capture much more complicated data structures of financial returns and have become more and more popular distributions in the finance literature. It is seen that, apart from the S-GARCH models, all other models are able to deal with the asymmetric impact of positive and negative returns.
As these benchmark GARCH models are parametric, a detailed formula for ES estimation can be derived for each model. However, rather than doing this for each model, we give a general formula for ES in a quantile function form in Proposition 5 for all benchmark GARCH models.
Proposition 5 For the above benchmark GARCH models, the conditional expected shortfall at the τ 0 th level is given by where Q ε t v ð Þ is the quantile function of the innovation term of the models.
See Appendix G for a proof. Note that ES tτ 0 given in Proposition 5 can now be estimated by using the statistical software R.
It is worth noting that our proposed model is also a quantile-based model. Recently, Taylor (2008) proposed an exponentially weighted quantile regression method (denoted by E-QR) for VaR and ES. This method is also quantile-based but it is semi-parametric because it does not assume a distribution when estimating VaR. Hence we choose the E-QR model as our another benchmark model. For the E-QR model, the conditional VaR can also be estimated by using the statistical software R and the conditional ES estimates can then be obtained by using the formulas (13) and (14) in the paper of Taylor (2008).
Finally, we compare our ES estimates with those obtained from the empirical estimation method. This is the simplest non-parametric method, according to which, the ES can be estimated as the average of the observed returns that are smaller (or larger) than the empirical τ 0 th quantile of the returns if τ 0 ≤ 0.5 (or τ 0 > 0.5).

| Evaluation of ES estimates
We follow the suggestions of McNeil and Frey (2000) to back-test our ES estimates. First, we define the standardized residuals, denoted by r t . Specifically, for our model, , and for the benchmark GARCH models, we let r t = y t −ÊS tτ 0 À Á =σ t , whereÊS tτ 0 is the estimated conditional ES of y t at level τ 0 and ffiffiffif h t p and σ t are the estimated conditional volatilities from our model and the benchmark GARCH models respectively. Then, we have the following result. Proposition 6 For model (6), let R t = y t −ES tτ 0 ð Þ = ffiffiffiffi h t p and for the benchmark GARCH models, let R t = y t −ES tτ 0 ð Þ =σ t . Then under these models, the corresponding R t are i.i.d. random variables and See Appendix H for a proof. We further define the exceedance residuals by r 1t = r t ; y t ≤V aR tτ 0 , otherwise. Then, following the suggestions of McNeil and Frey (2000), we used the Ljung-Box test for zero autocorrelation in {r t }, and the Bootstrap method (see, for example, Efron and Tibshirani, 1993) for the zero mean of {r jt }, where j = 1, 2. A rejection on the null of either of the tests suggests that the models used for ES estimation may not be satisfactory, and hence the ES estimates could be biased.
For the E-QR model, we do not have the estimates for the volatility process. Hence, we follow the work of Taylor (2008) and calculate the standardized residuals by r t = y t −ÊS tτ 0 À Á =V aR tτ 0 , whereV aR tτ 0 is the estimated τ 0 th conditional quantile of y t . Then we again use the Bootstrap test for the exceedance residuals and the Ljung-Box test for the standardized residuals in order to evaluate the dynamic properties of the ES estimates.
For the empirical method, we define r t as we did for the E-QR model because the sample SD can be seriously affected by extreme returns. Similarly, the Bootstrap and Ljung-Box tests are also used for zero mean and iid of the residuals respectively.

| EMPIRICAL STUDY
We conducted an empirical study on six daily stock returns, denoted by   Table 1 provides some summary statistics about the data. It is seen that the average returns in the observation periods are slightly negative and standard deviations of the returns are similar. Moreover, all returns are only slightly skewed as all skewness statistics are between −0.5 and 0.5 or very close to them. However, the excess kurtosis for all returns is very high. Table 1 suggests that the overall shape of the distribution of returns is slightly skewed with heavy tails. We adopted a moving window approach, suggested by Kuester et al. (2006), for model estimations. Specifically, we defined a moving window W t = {x t , …, x t + 1000 − 1 }, where t = 1, …, 1000. Hence, each moving window contains 1,000 observations. The first part of the data (before the lighter vertical line) in each plot of Figure 1 corresponds to the first moving window W 1 of 1,000 days. Note that we did not estimate the conditional mean of the returns by following the common practice (see Poon and Granger, 2003). Hence, we let y ℓ = x ℓ − x , where x ℓ W t and x is the mean of the 1,000 in-sample returns in W t , and the parameter estimation methods were applied to y ℓ .
Due to the popularity of T-GARCH models with two regimes in the finance literature, in this empirical study, we let J = 2 in model (6). Note that the benchmark threshold GARCH models also have two regimes. We checked BIC values of the benchmark GARCH models with various orders. We found that the minimum value of BIC always corresponds to a model of order (1, 1). See Tables 4-6 in Appendix I. Hence, all relevant models used in this empirical study have order (1, 1) for comparison purposes.
In this study, we consider the conditional ES estimates at the following six levels: 1%, 2.5%, 5%, 95%, 97.5% and 99%. Hence, for each return series and on each moving window, 20 benchmark GARCH models, 1 empirical model and 6 E-QR models (one for each level) were estimated, resulting in 162 estimated benchmark models on each moving window as we considered six return series in total. For illustration purposes, the estimated parameters of our models corresponding to W 1 and W 350 are given in Table 2, where W 350 contains observations in the 2008-2009 financial crisis period. It is seen that the volatility persistence of all six return series is much stronger during the financial crisis period. It is worth noting that the estimates of η 1 and η 2 are very similar, which suggests that the conditional distribution of the returns is only slightly skewed. Moreover, the absolute value of η 1 and η 2 are small, suggesting that the conditional density function of the y t has heavy tails on both sides and that both tails have similar shapes. For each moving window W t , the conditional ES estimates at a given level can be calculated by using Proposition 3 for our model and those discussed in Section 4 for the benchmark models. Hence, when t takes values from 1 to 1,000, we obtain one-day-ahead postsample ES estimates for the next 1,000 days at a given level. Figure 2 shows the one-step-ahead conditional ES estimates at levels 1%, 5%, 95% and 99% for the last 1,000 days of the six return series by using our model. Clearly, the dynamics of the ES processes are similar to those of the returns.
Before checking the ES estimates, we check the coverage probability ofV aR tτ 0 at six quantile levels, that is, let τ 0 equal to 1%, 2.5%, 5%, 95%, 97.5% and 99% respectively. The MSE value between the true levels of τ 0 and the corresponding estimated coverage probabilities ofV aR tτ 0 provides a simple measure about the performance of a model in studying VaR: A good model corresponds to a smaller value of MSE. The MSE values for all models are given in Table 7 in Appendix I. Clearly, all models performed similarly well in studying VaR, which suggests that the proposed model is compatible with the benchmark models on this aspect and which also provides     Table 3 gives the total number of rejections for each back-test among the six return series. So the results for our model are very good because there is only one Bootstrap rejection that occurred for CAC40 at a single level only.
Several findings can be seen from Table 3: (i) Our model outperforms all benchmark models regarding the post-sample ES estimation at all levels considered in this paper, which provides further evidence on the robustness of the proposed model as it performed very well throughout the observation period including the financial crisis period. (ii) Among the benchmark GARCH models, the overall performance of the skewed t-distribution is better than all other standard distributions, which provides some support to the used of skewed t-distribution in practice. (iii) The empirical method did not reject the null of the Bootstrap test but the standardized residuals could be correlated. (iv) The ES estimates obtained from most benchmark GARCH models and the E-QR models rejected the null hypothesis of the Bootstrap test, which suggests that the ES estimates obtained from these models could be biased.
Some possible reasons for the satisfactory performance of our model are discussed below. One is that the ES estimates from our model can take account of more information about the tails of the conditional distribution of y t , but such information is not considered by benchmark models. Another possible reason could be that our T A B L E 7 MSEs between the true level τ 0 and the estimated coverage probability ofV aR tτ0 for all models and all data sets model estimates the threshold, while the benchmark T-GARCH models fix the threshold at value 0, representing a zero return. However, Yu et al. (2010) pointed out that the threshold value could be slightly greater or less than 0, which suggests that a small amount of variation around 0 in financial returns may not be significant enough to change their volatility dynamics. Hence, estimating rather than fixing the threshold value would be more reasonable. Our results provide some further support to this argument.

| CONCLUDING DISCUSSIONS
This paper proposed a novel quantile function T-GARCH model for estimating ES of financial returns. Different from the GARCH models defined by standard distributions, the proposed model defines the conditional distribution of financial returns by using the GLD. We found that the ES estimates obtained from our model are significantly better than those obtained from the benchmark models considered in this paper. Our results show that, apart from the overall shape of the distribution of returns, it is also important to take account of specific features about the tails of the distribution when estimating the ES of returns. It is also worth reemphasizing that the GLD is defined via a quantile function as its other equivalent functions do not have an explicit mathematical expression. There are many other distributions that are also defined by quantile functions. It is certainly worth further investigations on the use of quantile functions to build up useful models for financial analysis.

ACKNOWLEDGEMENT
We are grateful to the editor and two anonymous referees for their valuable comments and constructive suggestions that have led to substantial improvements in this paper.

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are openly available from the following Yahoo Finance websites, which have also been cited in Section 5 (see page 16) and included in the Reference section of this paper. Yahoo Finance (2019a).
APPENDIX C.
Step 5. Obtain the proposed thresholds γ ' as follows: • Let a 1 = y min , b = y max .
• For j = 1, …, J − 1, simulate γ Step 7. Accept the proposed value with probability min{AB, 1}, where A and B are given below.
Similarly, for the benchmark GARCH models, it can also be shown that R t are iid with zero mean. This completes the proof.

Tables
Tables 4-6 give the BIC values for all benchmark GARCH models. These BIC values show that the minimum value of BIC always corresponds to a model of order (1, 1) in each case. Table 7 gives the MSE values between the true level τ 0 and the estimated coverage probability ofV aR tτ 0 for all models and all data sets.