Spatial and Spatiotemporal Volatility Models: A Review

Spatial and spatiotemporal volatility models are a class of models designed to capture spatial dependence in the volatility of spatial and spatiotemporal data. Spatial dependence in the volatility may arise due to spatial spillovers among locations; that is, if two locations are in close proximity, they can exhibit similar volatilities. In this paper, we aim to provide a comprehensive review of the recent literature on spatial and spatiotemporal volatility models. We first briefly review time series volatility models and their multivariate extensions to motivate their spatial and spatiotemporal counterparts. We then review various spatial and spatiotemporal volatility specifications proposed in the literature along with their underlying motivations and estimation strategies. Through this analysis, we effectively compare all models and provide practical recommendations for their appropriate usage. We highlight possible extensions and conclude by outlining directions for future research.


Introduction and motivation
When observations of a random process have a natural ordering, such as the temporal ordering for time series or geographical locations for spatial processes, they are typically dependent. For time series, observations close together in time will be more closely related than observations further apart. Similarly, for spatial processes, Fisher (1935) stated that "patches in close proximity are commonly more alike, as judged by the yield of crops, than those which are further apart", or more commonly known as Tobler's first law of Geography: "Everything is related to everything else, but near things are more related than distant things" (Tobler, 1970). The similarity may be reflected in the mean or trend behaviour, which motivates autoregressive processes and integrated processes, but also in the process variation or volatility, motivating autoregressive heteroscedasticity models or stochastic volatility models. Spatial and spatiotemporal data models must fulfil this key property: instant and direct dependence due to spatial or temporal proximity.
Various types of models have been introduced in time series analysis and spatial analysis to describe the dependence over time and space. The most popular approaches are based on linear structures where, e.g., the value of the time series (spatial process) at a certain time point (point in space) is a linear combination of preceding or neighbouring observations and possible regressors. Such types of processes, e.g. SARIMA models, have been analysed in time series analysis in detail (see, e.g., Davis, 2009, 2002). Additionally, over the last few decades, the field of econometrics has seen the development of numerous linear spatial models, which have proven to be highly practical and useful (see, e.g., Anselin, 1988;Lee, 2004;LeSage and Pace, 2009;Kelejian and Prucha, 2010;Elhorst, 2010Elhorst, , 2014. Unfortunately, these linear approaches are only of limited use to describe a dependence behaviour in the process' volatility. This point was discussed in detail by Engle (1982); Bollerslev (1986) for temporal processes. They introduced a new type of nonlinear temporal process, which has turned out to be extremely useful for modelling returns of financial data. They are based on a multiplicative decomposition and are, thus, nonlinear in the white noise process.
Their main advantage is that they can describe a time-varying behaviour of the conditional variance, the so-called volatility. In general, there are different definitions of volatility (see Andersen et al., 2010, for a discussion of different volatility measures). For instance, for ARCH and GARCH models, the volatility term coincides with the conditional variance of the response variable. For other volatility time series models, such as stochastic volatility models, the conditional variance is non-explicit (see Francq and Zakoian, 2019). The time-varying volatility makes these nonlinear models quite attractive in practice since the risk behaviour of the process may change over time. Linear time series, e.g., ARMA processes, do not possess this property; they have a constant volatility. For an overview of nonlinear time series models, we refer the interested reader to Turkman et al. (2016) or Fan and Yao (2003).
In time series analysis, generalised autoregressive conditional heteroscedasticity (GARCH) and stochastic volatility models are widely used to model time-varying volatility (see also Francq andZakoian, 2011, 2019, for a thorough overview). However, many real-world phenomena are observed at multiple geo-referenced locations and, thus, exhibit spatial or spatiotemporal dependence, which means that the volatility of neighbouring series may influence the volatility of a series. Spatial and spatiotemporal volatility models have been recently developed to account for this spatial dependence. After first being mentioned "as a byproduct" in Bera and Simlai (2005), Otto et al. (2016) introduced spatial ARCH models (published in Otto et al. 2018, and Sato and Matsuda (2017) introduced spatial log-ARCH models. Interestingly, both models were developed simultaneously and independently of each other to be able to represent the ARCH-like dependencies in the variance of spatial processes. Moreover, Robinson (2009) and Taşpınar et al. (2021) introduced spatial stochastic volatility models that include an additional stochastic term in the volatility process.
Since then, various nonlinear spatial and spatiotemporal volatility models with spatial and spatiotemporal dependence in the (conditional) variance have been proposed. However, there has been little comparison among these models (e.g.,  compared some models in a unified framework). In this paper, we aim to provide a comprehensive review of the recent literature on spatial and spatiotemporal volatility models. As these models resemble their time series counterparts, we begin by briefly reviewing important aspects of discrete time series volatility models. Subsequently, we describe spatial and spatiotemporal volatility models considered in the literature. Besides motivating each model, we describe important estimation strategies that are applicable in the spatiotemporal context, e.g., quasi-maximum-likelihood (QML) approach, generalised methods-of-moments (GMM) approach, and the Bayesian MCMC sampling schemes. We describe possible extensions and provide directions for future research.
The remainder of the paper is structured as follows. In Section 2, we start with discrete time series volatility models and their multivariate extensions, which are also applicable to spatiotemporal data. In Section 3, we discuss spatial GARCH properties (i.e., instantaneous GARCH-type interactions across the cross-sectional or spatial domain) and compare different specifications that have been proposed in the literature. In Section 4, our focus is on spatiotemporal volatility models allowing for these instantaneous spatial GARCH-type interactions in addition to the temporal GARCH structure. Further extensions and related models are discussed in the ensuing Section 5. Finally, in Section 6, we conclude the review with an outlook on future research and a discussion of open problems. Some of the technical details are collected in an appendix.

Time series volatility models
This section will briefly describe two classes of time series volatility models: (i) ARCH and GARCH-type models and (ii) stochastic volatility models. These models provide the basis for all spatial and spatiotemporal ARCH/GARCH and stochastic volatility models developed in the literature. It is well known that the volatility of many financial time series, such as asset returns and exchange rates, changes over time. For example, Figure 1  Both classes of models aim to describe the dependence in the conditional variance, i.e., volatility, which should be separated from the mean behaviour of the random process, such as temporal trends, seasonality, or autoregressive dependence. For ARCH and GARCH models, the volatility is a function of the squares of previous observations and past conditional variances (i.e., volatilities). In the case of stochastic volatility models, the volatility is modelled through a latent stochastic process, also depending on the temporally lagged squared observations. In Sections 2.1 and 2.2, we briefly describe important univariate and multivariate versions suggested in the literature for both classes of models. Throughout this section, we consider the discrete time series {Y t ∈ R r : t ∈ Z}, i.e., an r-dimensional random process Y t with equidistantly ordered temporal observations. 2.1 ARCH and GARCH models 2.1.1 Univariate ARCH and GARCH models In the univariate case, r = 1, the random process Y t is given by where h 1/2 t is a scaling factor and {ε t } is a sequence of independent and identically distributed (i.i.d.) variables with mean 0 and variance 1. Moreover, h t depends on the past realisations of Y t , and it coincides with the conditional variance of Y t . Thus, it can be interpreted as the volatility of the series. This idea traces back to the seminal work of Engle (1982). To be precise, the volatility process is defined as for an ARCH(p) process with unknown parameters α 0 , . . . , α p , which have to be estimated.
To obtain a positive conditional variance, the constant term α 0 is assumed to be positive and α i ≥ 0 for all i = 1, . . . , p. In practice, large lag orders p are often needed to capture the volatility dynamics, e.g., those of inflation indices (Engle, 1983). A GARCH process extends this model by allowing for autoregression and moving average components in the conditional variance equation, reducing the required lag orders and, thus, the computational burdens. The basic GARCH(p,q) model is given by with β i ≥ 0, i = 1, . . . , q, being additional parameters to be estimated (see Bollerslev, 1986).
A GARCH process is weakly stationary if p i=1 α i + q j=1 β j < 1. For a detailed discussion of the statistical properties of ARCH and GARCH models and the parameter estimation, we refer the interested reader to the textbook of Francq and Zakoian (2019).
ARCH and GARCH models are also particularly useful as error processes of other regression and time series models to describe time-dependent and dynamic model uncertainties. Since they have an expectation of zero, they can be directly applied to any other mean model, as Engle (1982) already demonstrated for an ARCH regression model with a linear regression component. Later, Weiss (1984Weiss ( , 1986b considered ARCH models for the errors of autoregressive moving average processes. These models exhibit an autoregressive dependence in both the mean and the conditional variance.
Since then, several extensions and adaptations of GARCH models have been proposed.
Each of these models has its own strengths and weaknesses, and choosing the best model for a particular data set depends on the nature of the data, the research question, and other factors.
First, to avoid the non-negative constraints of the model coefficients, Geweke (1986);Pantula (1986); Milhøj (1987) proposed a logarithmic expression of the (log-)volatility equation, i.e., Consequently, the process exhibits multiplicative dynamics in the volatility, while the standard ARCH and GARCH models have additive volatility dynamics. Moreover, this logarithmic GARCH (log-GARCH) model allows for a direct transformation of the log-squared observations to an autoregressive moving average process of orders p and q. Interestingly, the log-GARCH model is invariant to power log-GARCH models where log h t is replaced by log h δ t with the power δ > 0, which acts as a scaling factor of the logarithmic terms (Sucarrat, 2019). Since , the conditional variance is always positive for real-valued coefficients {α i : i = 1, . . . , p} and {β i : i = 1, . . . , q}. This is particularly interesting when exogenous covariates influence volatility. That is, with δ j being the corresponding regression coefficients of the j-th covariate x j,t (Sucarrat, 2019).
In the GARCH counterpart, the regressors are assumed to be almost surely positive with nonnegative coefficients .
Second, to replicate the so-called leverage effect, which is often observed in financial return data, the model has been extended to Exponential GARCH models (EGARCH, Nelson 1991) that allow for asymmetric dependence by including the sign of the residuals in the log-volatility equation: Third, Higgins and Bera (1992) proposed a non-linear ARCH (NARCH) model, which nests the linear ARCH model of Engle (1982) as a special case and converges to the log-ARCH model in the limiting case. More precisely, the volatility equation of the NARCH model is given with ϕ i ≥ 0 for i = 0, 1, . . . , p and δ > 0. This model converges to a log-ARCH model for δ approaching zero.
Apart from these models, several other extensions have been proposed, which should only briefly be mentioned, e.g., threshold GARCH models (TGARCH, Zakoian 1994;Glosten et al. 1993, adding a threshold term to the GARCH equation for different volatility dynamics below and above the threshold), Glosten, Jaganathan and Runkle GARCH models (GJR-GARCH, asymmetric volatility by including both positive and negative residuals in the GARCH equation), or fractionally integrated GARCH models (FIGARCH, long memory in the volatility process by using fractional integration in the conditional variance equation). For a more detailed overview of univariate time-series ARCH and GARCH models, we refer the interested reader to the survey paper of Bera and Higgins (1993) and the textbook of Francq and Zakoian (2019).

Multivariate ARCH and GARCH models
In general, financial asset returns tend to move together over time. Figure 2  Recall that Y t is an r-dimensional vector of returns at time point t. It is assumed that where {ε t } is a sequence of i.i.d. r-dimensional random variables with E(ε t ) = 0 and Var(ε t ) = I r . The square root has to be understood in the sense of the Cholesky factorization, i.e. H 1/2 t is the unique symmetric and positive symmetric matrix with H 1/2 The matrix H t is allowed to depend on certain parameters, and it is assumed to be a measurable function with respect to the σ-algebra generated by Y v , v < t. Thus, i.e., H t is the volatility matrix of Y t .
Most of the published papers on this topic appeared at the end of the 80s and the 90s of the 20th century. In principle, all these models only differ in modelling the matrix H t . Here we want to sketch some of the most relevant approaches briefly.
The first paper on MGARCH models is due to Bollerslev et al. (1988). They introduced the so-called vector GARCH model, briefly VEC model. It is based on the idea of transforming the matrix H t to a vector by using the vech-operator (cf. Harville, 1998). For a square r × r matrix B, the operator vech(B) is defined as the r(r + 1)/2-dimensional vector obtained by stacking the columns of the lower triangular part of B. Let h t = vech(H t ). Then, it holds for a VEC-GARCH(1,1) process that where η t−1 = vech(ε t−1 ε ′ t−1 ), A and B are assumed to be r(r + 1)/2 × r(r + 1)/2 parameter matrices, and ω is a r(r + 1)/2 parameter vector. The total number of parameters of this model is r(r + 1)(r(r + 1) + 1)/2. For r = 2 it is 21, for r = 3 it is 78, and for r = 4 it is already 210.
Thus, the number of parameters increases very fast as r increases. This is the reason why in practice, the model is only used for small values of r, e.g., r = 2 or r = 3.
Another possibility to reduce the number of parameters is to assume that A and B are diagonal matrices (Bollerslev et al., 1988). This model is denoted as the diagonal vector GARCH model, briefly DVEC-GARCH(1,1) model. Using the Hadamard product (cf. Harville, 1998), it can be written as where A # , B # , and Ω # are r×r-matrices implied by A = diag(vech(A # )), B = diag(vech(B # )), and Ω = diag(vech(Ω # )). Consequently, h ij,t depends only on its own lag and on ε i,t−1 ε j,t−1 .
This restriction dramatically simplifies the parameter estimation; however, it is still not suitable for large-scale systems.
A further simplification of this model was discussed by Ding and Engle (2001). They choose A # and B # as matrices of rank one or as multiples of the matrix consisting purely on the element 1. Further, Riskmetrics (1996) uses the exponentially weighted moving average model, which can be written as which corresponds to a scalar VEC-GARCH(1,1) model. Riskmetrics recommends choosing the factor λ equal to 0.94 for daily data and 0.97 for monthly data (critically discussed by Bollen, 2015;González-Rivera et al., 2007, based on empirical evidence).
Strong restrictions on the parameters are necessary to ensure the matrix H t to be positive definite. For that reason, Engle and Kroner (1995) proposed another approach, the BEKK (Baba, Engle, Kraft, and Kroner) model. The BEKK-GARCH(1,1,K) process is given by where A i , B i , and Ω are r × r matrices and Ω is positive definite. The BEKK-GARCH model is a special case of the VEC-GARCH approach, but the converse is not true (Stelzer, 2008).
The number of parameters is again high. Similar to the VEC-GARCH and DVEC-GARCH, the application of the BEKK-GARCH model reduces to cases where r is small. Bollerslev (1990) introduced a type of MGARCH model where the conditional correlations are constant (CCC model). For the CCC-GARCH(1,1) process, it holds that where R = (ρ ij ) and D t = diag(h 1/2 11,t , ..., h 1/2 rr,t ) and Here, the number of parameters is r(r + 5)/2.
A model with dynamic conditional correlations (DCC model) was proposed by Engle (2002) and Tse and Tsui (2002). For the DCC-GARCH(1,1) model of Engle (2002) it holds that with D t as above and and u t = (u i,t ) with u i,t = ε i,t / h ii,t .Q denotes the unconditional covariance matrix of u t and α and β are non-negative numbers satisfying that α + β < 1.
Besides these models, many further proposals have been made. The above models seem to be the most applied ones. There are also other attempts to overcome the problem of dimensionality. Factor GARCH models are one step in that direction. Here the idea is that some factors drive the behaviour of the stock returns. Such an approach was discussed by, e.g., Engle et al. (1990), Lin (1992) and Bollerslev and Engle (1993).
We refer to the overview papers by Bauwens et al. (2006) and Silvennoinen and Teräsvirta (2009), as well as the book by Francq and Zakoian (2019), where the presented multivariate models and many further ones are discussed in more detail.

Stochastic volatility models 2.2.1 Univariate stochastic volatility models
In stochastic volatility models, the volatility process is modelled through a latent stochastic process. Although it is difficult to determine the exact origin of these models as they arose from various research efforts addressing different issues, Taylor (1982Taylor ( , 1986 seem to be the first to consider a univariate discrete version that can be considered as an alternative to the ARCH process. To learn more about the origin and development of stochastic volatility models, refer to Ghysels et al. (1996) and Shephard (2005). A standard discrete time stochastic volatility model is specified in the following way: where Y t is the observed response variable, {h t } is the sequence of the unobserved log-volatility, assuming an AR(1) process with a mean parameter µ h and an autoregressive parameter |ϕ| < 1.
The model includes two independent disturbance terms denoted by ε t and u t . The sequence {ε t } includes the independent random variables with an identical distribution with mean 0 and variance 1. The disturbance term {u t } in the log-volatility equation are independent and have identical distribution with mean 0 and variance σ 2 h . In this specification, the sign of Y t is determined by that of ε t , and the volatility clustering and fat tail properties observed in the marginal distribution of Y t are delivered by the time-varying log-volatility (see Ghysels et al. (1996) on the statistical properties of stochastic volatility models). This standard model can also be obtained as a discrete-time approximation to various diffusion processes in the continuous-time asset pricing literature (Hull and White, 1987;Wiggins, 1987;Melino and Turnbull, 1990;Chesney and Scott, 1989).
The outcome and log-volatility equations can be modified to formulate alternative specifications. The log-volatility equation is specified as an AR(1) process and can be generalised to any ARMA process. For example, if h t follows an AR(p) process, then it will take the following form: where ϕ 1 , . . . , ϕ p are unknown autoregressive parameters.
An alternative specification can be obtained by assuming the presence of infrequent jumps in the outcome equation. Adding a jump component to the outcome equation can improve the fit of the observed time series of returns because the jump component may capture outliers as well as asymmetry in the return distribution (Andersen et al., 2002;Chib et al., 2002). The stochastic volatility model with a jump component can be specified as where q t is the jump random variable, and k t is the jump size random variable. The jump random variable is a Bernoulli random variable with success probability P (q t = 1) = κ, and the jump size is modelled as log(1 + k t ) ∼ N (−0.5δ 2 , δ 2 ). In this model, κ and δ are additional unknown parameters that we need to estimate along with µ h , ϕ, and σ 2 u . Another variant can be defined by allowing the volatility feedback in the outcome equation (Koopman and Hol Uspensky, 2002): where the scalar unknown parameter α gives the effect of volatility on the outcome variable.
Chan (2017) extended this model by considering time-varying parameters: where x t is the k × 1 vector of covariates with matching time-varying parameter vector β t .
This model generalises the model suggested in Koopman and Hol Uspensky (2002) by allowing time-varying parameters β t and α t in the outcome equation.
vector of time-varying parameters. Chan (2017) assumes an a random walk process for γ t such that γ t = γ t−1 + ν t , where ν t ∼ N (0, Γ), and Γ is the (k + 1) × (k + 1) covariance matrix. Harvey and Shephard (1996) consider a variant that allows for the so-called "leverage effect" via introducing correlation in the disturbance terms of the outcome and the log-volatility equations. See also Jacquier et al. (2004), Yu (2005) and Omori et al. (2007) on the different versions of this model. Shephard (2005) notes that Hull and White (1987) were the first to propose a continuous-time stochastic volatility model incorporating the leverage effect. This work, in turn, inspired the development of the EGARCH model proposed by Nelson (1991) (Shephard, 2005). The variant proposed by Harvey and Shephard (1996) can be specified as where ϱ is the correlation parameter. In this specification, Yu (2005) defines the leverage effect as the negative relationship between E(h t |Y t ) and Y t , and derived the following equation: This result suggests that this specification will exhibit the leverage effect whenever ϱ < 0.
Another variant can be obtained by assuming a scale mixture distribution for the outcome variable (Jacquier et al., 2004;Chib et al., 2002). This version takes the following form: where {ω t } is the sequence of latent variables that are independent and have identical distribution. Under the assumptions that ε t ∼ N (0, 1) and ω t ∼ IG(ν/2, ν/2), where IG denotes the inverse gamma distribution, it can be shown that the marginal distribution of ω 1/2 t ε t (unconditional on ω t ) is the standard t distribution with ν degrees of freedom (Geweke, 1993). Omori et al. (2007) consider the same model under the assumption that log(ω t ) ∼ N (−0.5τ 2 , τ 2 ), where τ is a scalar unknown parameter with τ 2 ∼ Gamma(1, 1). Chan (2013) introduces a class of models that includes both the moving average and stochastic volatility components that can nest a variety of specifications as special cases. This specification takes the following form: where µ t is the time-varying conditional mean process, ψ 1 , . . . , ψ q are the unknown moving average parameters, and the disturbance terms v t and u t are independent of each other for all leads and lags. Let which indicates that the conditional variance of Y t is a moving average of q + 1 most recent variances e ht , e h t−1 , . . . , e h t−q . Moreover, unlike the standard stochastic volatility model in (19) and (20), Y t is serially correlated even after conditioning on h. Chan (2013) shows that where ψ 0 = 1. Thus, the conditional covariances are also time-varying because of the presence of the log-volatility h t . As stated in Chan (2013), popular specifications can be obtained from this model by choosing a suitable conditional mean process µ t . For example, some of these specifications are (i) an AR(p) model: The likelihood function of a stochastic volatility model is a mixture over the distribution of h and therefore requires evaluation of a high-dimensional integral. For example, the likelihood function of the model in (19) This feature indicates that the estimation based on the exact maximum likelihood is not readily available and poses difficulties for likelihood-based estimation procedures. In the literature, various estimation methods exist, including the generalised method of moments (GMM), the quasi-maximum likelihood method, spectral GMM based on the characteristic function, indirect inference methods, Monte Carlo maximum likelihood methods, and Markov chain Monte Carlo (MCMC) based methods. According to , the choice of an estimation method can be based on the following properties: (i) efficiency, (ii) estimation of volatility, (iii) optimal filtering, smoothing, and forecasting methods, (iv) computational efficiency, and (v) applicability to flexible models. Among others, see Jacquier et al. (1994), Kim et al. (1998), and Broto and Ruiz (2004) on the estimation methods suggested in the literature for the univariate stochastic volatility models.

Multivariate stochastic volatility models
As in the case of multivariate ARCH and GARCH models described in Section 2.1.2, the cross-dependence among the volatility of different financial assets leads to modelling volatility in a multivariate framework, which can lead to efficient estimation. Let h t = (h 1t , . . . , h rt ) ′ be the r × 1 vector of log-volatility terms, and H 1/2 t = diag e h 1t /2 , . . . , e hrt/2 be the r × r diagonal matrix with the ith diagonal element e h it /2 . Harvey et al. (1994) consider the following multivariate version: 1 Note that f (Y |θ) is also called the observed-data likelihood function or the integrated likelihood function.
Two other functions that can be defined are (i) the conditional likelihood function f (Y|h, θ), and (ii) the complete-data likelihood function given by f (Y , h|θ) = f (Y |h) × f (h|θ). The conditional and complete-data likelihood functions are readily available for all stochastic volatility models.
where µ and ϕ are the r × 1 vectors of unknown coefficients, • denotes the Hadamard product, P ε = (ρ ij ) is a positive definite correlation matrix with ρ ii = 1 and |ρ ij | < 1 for i, j = 1, . . . , r, and Σ u = (σ u,ij ) is the r × r positive definite covariance matrix. Harvey et al. (1994) also consider the multivariate t distribution for ε t . The volatility process can be generalised to a VARMA structure in the following way: Note that if the off-diagonal elements of P ε and Σ u are zeros, then this multivariate model simply specifies a univariate standard stochastic volatility model for each component of Y t . Although the non-zero off-diagonal elements of Σ u introduce correlation across the volatility terms, the model does not allow for the leverage effect. To introduce the leverage effect,  consider the following model: where L = diag (λ 1 σ u,11 , . . . , λ r σ u,rr ). Thus, the model exhibits the leverage effect when λ i < 0 for i = 1, . . . , r.
Some other alternative versions that may not lead to the leverage effect as defined by Yu (2005) are also considered by Daníelsson (1998) and Chan et al. (2006). See  for further details on the models that can deliver the leverage and asymmetric effects. There are also alternative specifications in the literature, including parsimonious specifications based on additive and multiplicative factors structures, time-varying correlation matrix models, matrix exponential models, Cholesky decomposition-based models, Wishart models, and range-based models. The details of these models and the estimation approaches considered in the literature are surveyed in , Yu and Meyer (2006), and Chib et al. (2009).

Spatial volatility models
Now, suppose that the process is observed across space. In contrast to time series, where the index is a scalar value, the index of each observation is now at least two-dimensional. Consider typically a subset of R d with d > 1 and positive volume. For instance, if D ⊆ Z d , the spatial domain would be called lattice (e.g., satellite image sequences for d = 2, or CT images for d = 3), while a continuous spatial process is present if D ⊆ R d (e.g., soil samples, or species distributions). Another typical example is the case when D is a discrete set {s 1 , . . . , s n } of locations or polygons (e.g., air quality measurement stations, economic country/county-level data). Furthermore, D could be considered to be a spherical space, with the radius c, which is particularly useful for modelling global data on the Earth. Figure 3 presents an example of a purely spatial process where the monthly log returns of condominium prices in Berlin are shown on the map. Each location, postcode region, is represented by a polygon precisely defined by geographical coordinates.

Spatial ARCH and GARCH models
Spatial ARCH models were first mentioned "as a byproduct" in Bera and Simlai (2005, World econometrics conference proceedings). Later, Otto et al. (2018) and  introduced the first purely spatial ARCH model (jointly published on arXiv in Otto et al. 2016), which has an ARCH-like dependence structure in the conditional variances. The first spatial ARCH models were introduced for univariate processes and purely spatial domains, i.e., the process is observed only once for all locations in D. In other words, there is only one time point t, and we do not repeatedly observe the process over time. For the univariate case with r = 1 and n spatial locations, a spatial ARCH model is defined by where h(s i ) is a scaling factor of the error ε(s i ) of the i-th location, analogously to the timeseries case. The errors are supposed to be i.i.d. across all spatial locations and have a zero mean and a constant variance of 1. Further, h(s i ) depends on all adjacent realisations of the response variable Y , where the adjacency is defined by a so-called spatial weights matrix Typically, the definition of the weights matrix depends on the geographical space and the coordinates of all locations. For example, w ij can be chosen as the inverse-distance between s i and s j , or w ij could be equal to 1/k for all k nearest neighbours. Then, Due to this temporal simultaneity, the volatility h(s) at locations s given all other locations is difficult to interpret because all other locations simultaneously depend on Y (s). Thus, the interpretation of h(s) is slightly different compared to time-series ARCH models, and h(s i ) does not coincide with the conditional variance at location s i given all neighbouring observations. Thus, we refer to {h(s i )} as the volatility process. For parameter estimation, Otto et al. (2018) proposed a quasi-maximum-likelihood estimator, which is computationally implemented in the R-package spGARCH (Otto, 2019).
Moreover, higher-order spatial dependence can be considered by where {W k = (w k,ij ) i,j=1,...,n : k = 1, . . . , p} is a set of suitable weight matrices, e.g., separating the influence for different directions (i.e., k = 1 corresponds to the northward direction, k = 2 to the eastward direction, and so on). In contrast to time-series models, higher-order spatial dependence is often directly included in the first spatial lag by choosing w ij to be positive also for larger lag-orders or distances between s i and s j . For instance, typical choices of w ij are inverse-distance-based weights, i.e., w ij = d(s i , s j ) −k where k controls weight decay across space and d(s i , s j ) is a suitable metric to measure the distance between s i and s j .
Alternatively, a spatial autoregressive term of the volatilities can also be included. In this way, a spatial GARCH model of order (1, 1) can be defined as which was introduced by Otto and Schmid (2022). This model was first discussed within a unified framework proposed by Otto and Schmid (2019, arXiv). By allowing for higher-order spatial lag terms, we obtain the following volatility equation of spatial GARCH models In contrast to time-series models that often employ the natural one-way ordering (i.e., past observations can only influence future observations, but not vice versa), spatial models must allow for two-sided influence. Thus, there is typically no causal order between the observations and further assumptions are needed for the existence of a real-valued process. Furthermore, if the spatial locations s are one-dimensional (i.e., d = 1), the spatial ARCH models coincide with the time-series ARCH model by Engle (1982), where W acts like a backward-shift operator.
More precisely, if the locations are ordered and equidistantly spaced, W would be a sparse matrix with ones on the first subdiagonal for an ARCH(1) process. Similarly, an ARCH(p) process can obtained if W k has ones on the k-th subdiagonal for k = 1, . . . , p. Note that such models with a triangular weight matrix lead to directional spatial processes (e.g., Merk and Otto, 2021). In these cases, a causal ordering of the locations exists.
Below, we again focus on the special case of spatial ARCH(1) and GARCH(1,1) processes with two possibly different weight matrices W 1 and W 2 for the ARCH and GARCH term, respectively. Like for time-series GARCH models, also spatial GARCH models require additional assumptions on the parameters and/or weight matrix to ensure the non-negativity of h(s i ) for all i = 1, . . . , n. To analyse this in more detail, let h = (h(s i )) i=1,...,n and Y (2) = (Y (s i ) 2 ) i=1,...,n be the n-dimensional vectors of all h(s i ) and the squares of Y (s i ), respectively. Then, (43) can be written in a matrix notation as For a spatial GARCH model, the volatility is specified in a matrix notation as follows or in the reduced form, i.e., For the existence of a real-valued process, i.e., h(s i ) ≥ 0 for all i = 1, . . . , n, we need to ensure that (i) the inverse S(β 1 ) = (I − β 1 W 2 ) −1 exists, and (ii) the inverse ofS(α 1 ) = (I − α 1 A 2 ) −1 exists and all elements are non-negative.
The first condition is a typical assumption in spatial econometrics, and several choices of the weights matrix and the corresponding parameter space have been discussed in the literature, e.g., if W 2 is a row standardised weights matrix and β 1 ∈ [0, 1] the inverse exists. Generally, the spectral radius of β 1 W 2 must be smaller than one. The second condition is more complicated to check in practice. On the one side, it can be guaranteed by limiting the squared error terms {ε(s i ) 2 }, e.g. by assuming a truncated error distribution (see also . On the other hand, it can be ensured by considering the characteristics of underlying spatial dynamics implied by W 1 . In several specific cases, the condition is always fulfilled, e.g., in the case of directional processes in which W 1 can be expressed as a strictly triangular matrix (cf. Merk and Otto, 2021).
By contrast, in the logarithmic setting, weaker restrictions on the parameter space are needed for the non-negativity of h, but they also imply a different volatility structure. Sato and Matsuda (2017) introduced a spatial log-ARCH process, for which the (log-)volatility equation is given by With log h = (log h(s 1 ), . . . , log h(s n )) ′ and log Y (2) = (log Y (s 1 ) 2 , . . . , log Y (s n ) 2 ) ′ being the vectors of the element-wise logarithms of h and the squared observation Y (2) , the model can be expressed in a matrix notation as Using the log-squared transformation (cf. Robinson, 2009), the model can be transformed into a spatial autoregressive model of the log-squared observations. The transformed errors {log ε(s i ) 2 } of this spatial autoregressive representation follow a log χ 2 1 distribution. Thus, they no longer have zero means and are heavily left-skewed. For the existence of the process, the regular conditions of spatial autoregressive models apply, i.e., the inverse (I − α 1 W) −1 must exist.
As an extension of the spatial log-ARCH model, Su et al. (2023) proposed the following expression of the log volatility with F (α) = I n − A(α) and The coefficients a l (α) are real-valued deterministic functions with A(0) = I n . In this general form, different kinds of spatial dependence structures can be considered, e.g., spatial autore- Furthermore, the log-ARCH model can be generalised to a spatial log-GARCH model.
For that reason, Sato and Matsuda (2021) defined the log-volatility process as follows Moreover, Sato and Matsuda (2021) also allowed for regressive effects on the volatility, which can easily be included by adding of exogenous variables at location s i . Dogan and Taşpınar (2023) consider a higher-order version of the spatial log-GARCH model suggested in Sato and Matsuda (2021), where the scaling factors in (42) follow for i = 1, 2, . . . , n. Here, p and q are two finite positive integers, and (w 1r,ij ) p r=1 and (w 2l,ij ) q l=1 are non-stochastic weights matrices with zero diagonal elements. The corresponding {α 1r } p r=1 and {β 1l } q l=1 are unknown scalar parameters. For the estimation, Dogan and Taşpınar (2023) transform (42) by taking the square of both sides and then taking its natural logarithm (i.e., the log-squared transformation). Then, the transformed outcome equation can be written as with E(ε * (s i )) ≈ −1.2704 and Var(ε * (s i )) = π 2 /2 ≈ 4.9348. This density function is highly skewed with a long left tail, as visualised in Figure 4.
Then, the higher-order spatial GARCH model in Dogan and Taşpınar (2023) can be written as where W 1r = (w 1r,ij ) and W 2l = (w 2l,ij ) are the n×n weights matrices and Z = (z(s 1 ), . . . , z(s n )) ′ is the n × k matrix of exogenous variables. Let S(β 1 ) = (I n − q l=1 β 1l W 2l ), where I n is the n × n identity matrix and β 1 = (β 11 , . . . , β 1q ) ′ . Also, let α 1 = (α 11 , . . . , α 1p ) ′ . Under the assumption that 1 l=1 β 1l W 2l < 1, S(β 1 ) is invertible (Horn and Johnson, 2013). Then, the reduced form of equation (60) is given by Substituting this equation into (59) and rearranging yield Let G(θ) = I n − p r=1 α 1r W 1r − 1 l=1 β 1l W 2l and θ = (α ′ 1 , β ′ 1 ) ′ . Then, under the assumption that ∥ p r=1 α 1r W 1r + q l=1 β 1l W 2l ∥ < 1, we obtain When the weights matrices are row normalised, ∥ p r=1 α 1r W 1r + q l=1 β 1l W 2l ∥ < 1 simplifies to p r=1 |α 1r | + q l=1 |β 1l | < 1 by the triangle inequality. Sato and Matsuda (2021) propose a Gaussian pseudo maximum likelihood estimator for their spatial log-GARCH model by approximating the distribution of the transformed error terms with a normal distribution. They show that the resulting likelihood estimator attains the standard large sample properties. However, it is well known in the time series literature that the Gaussian pseudo maximum likelihood estimator obtained in this way might have poor finite sample properties because the normal approximation to the distribution of the log-squared error terms provides a poor approximation (Jacquier et al., 1994;Shephard, 1994;Kim et al., 1998;Sandmann and Koopman, 1998). Dogan and Taşpınar (2023) instead propose approximating the distribution of ε * (s i ) using a mixture of Gaussian distributions and develop a Bayesian estimation algorithm. They assume that the distribution of ε * (s i ) can be approximated by the following 10-component Gaussian mixture distribution (Omori et al., 2007): where φ(ε * (s i )|µ j , σ 2 j ) denotes the Gaussian density with mean µ j and variance σ 2 j , and c j is the probability of j-th mixture component. The parameters of the 10-component Gaussian mixture distribution are given in Table 1. A comparison between the 10-component Gaussian mixture distribution and the normal distribution in approximating the distribution of ε * (s i ) is illustrated in Figure 4. It is evident that the 10-component Gaussian mixture distribution provides a very accurate approximation, whereas the normal distribution offers a poor approximation.

Spatial stochastic volatility models
In this section, we review extensions of the standard stochastic volatility models in the time series literature to spatial data. Our starting point will be the stochastic volatility model considered in Robinson (2009) and Taşpınar et al. (2021), where the log-volatility terms are modelled through a first-order spatial autoregressive process. The resulting spatial stochastic volatility model shares similar properties with the standard stochastic volatility model in time series, and it is designed to capture volatility clustering in spatial data. Taşpınar et al. (2021) specify the spatial stochastic volatility (SSV) model as is an i.i.d. normal random variable with mean zero and variance σ 2 u . The w ij 's are the nonstochastic spatial weights such that they are zero when i = j. These elements represent the degree of spatial association between the log-volatility terms. The scalar parameter ϕ is the spatial autoregressive parameter and provides a measure of spatial correlations among h i 's. This model can be considered a spatial extension of the stochastic volatility model in (19)-(20).
Let h = (h(s 1 ), . . . , h(s n )) ′ be the n×1 vector of log-volatilities and u = (u(s 1 ), . . . , u(s n )) ′ be the n × 1 vector of error terms. Also, let W = (w ij ) i,j=1,...,n be the n × n non-stochastic matrix for the spatial weights. Then, the state equation can be written in vector form as correlated. Let k i (ϕ) denote the ith row vector of B(ϕ) placed in a column vector, and let r ∈ N be an even number. Then, it follows that This covariance is generally not zero unless ϕ = 0. Thus, the higher moments of Y (s i )'s are correlated, implying that y(s i )'s are spatially dependent.
For the estimation, it is more convenient to turn the spatial stochastic volatility model into a linear state-space model, and to this end, both Robinson (2009) as for spatial GARCH models. Robinson (2009) approximates the distribution of ε * (s i ) with the normal distribution and proposes a Gaussian pseudo maximum likelihood estimator for the estimation. The pseudo maximum likelihood estimators obtained in this manner may attain the standard large sample properties, but they tend to have poor finite sample properties, as mentioned above. Alternatively, Taşpınar et al. (2021) propose to approximate the distribution of ε * (s i ) using a mixture of Gaussian distributions as in (64) so that the resulting estimation system turns into a linear Gaussian state space model. They then use the data augmentation technique to facilitate the Bayesian estimation by treating h as an additional parameter vector.
The Bayesian MCMC algorithm is described in detail in Algorithm 2 shown in the Appendix.

Spatiotemporal volatility models
In this section, we discuss spatiotemporal volatility specifications, which may allow for instantaneous spatial effects. In the context of these models, Y t (s) is now repeatedly observed for t = 1, . . . , T and at all locations s ∈ D. In the case of spatial econometrics models, the set of locations is assumed to be constant over time. This means that we can observe the outcome variable's realisation in the same space across multiple time periods. It is important to note that spatiotemporal models are naturally included in the purely spatial models, as time could be considered as one dimension of the points s. In such cases, the weight matrix W has to be chosen accordingly so that future values do not influence past observations, as we will point out in Section 4.1.
Compared to multivariate time series described in Sections 2.1.2 and 2.2.2, spatiotemporal models account for spatial, temporal, and spatiotemporal dependence. Typically, a certain (geographical) structure of the effects is assumed to interpret the parameters in a geographical sense. Moreover, this implied structure makes the models suitable for cases when n is larger than T .
As an example of a spatiotemporal process, we can consider the log-returns of the condominium sales in Berlin across time. Figure  Another example of a spatiotemporal process from finance is the series of returns of all Dow Jones stocks across time. The similarity between the companies leads to interdependence between the series. For instance, Fülle and Otto (2023) showed that the similarity of firms regarding their balance sheet data can be used to model interactions in the log-volatilities across financial networks. To represent the closeness of the stocks, we have plotted the returns in an artificial space reflecting the distances in the volatility behaviour according to Piccolo (1990), so-called Piccolo distances. Figure 6 shows the log returns and squared returns for one selected time point, 30 December 2022, in a spatial representation on a map with Voronoi cells separating the stocks.

Multi-index representation
In general, spatiotemporal models are already included in the spatial models when time is treated as one dimension of the locations s, and the spatial weight matrices are chosen appropriately. All the above-mentioned models assume an arbitrary (finite) dimension of the underlying spatial domain. For instance, consider a spatiotemporal process on the surface of  the Earth with degrees latitude and longitude, (lat, long) ′ . Then, the index of the i-th observation, s i , would be composed of the spatial coordinate (lat i , long i ) ′ and the time point t i of the i-th observation, i.e., s i = (t i , lat i , long i ) ′ . The key difference between the time index and the spatial coordinates is that future observation cannot influence past observations. That is, the weight matrix W must account for the causal ordering in time, and w ij must be equal to zero if t i > t j . For example, let Y = (Y (1, lat 1 , long 1 ), . . . , Y (1, lat n , long n ), Y (2, lat 1 , long 1 ), . . . , Y (T, lat n , long n )) ′ be an nT dimensional vector of a spatiotemporal random process at n locations and T time points. With two spatial weight matrices where L 1 is a T -dimensional shift matrix (i.e., first subdiagonal equalling one), ⊗ is the Kronecker product, and W is a regular n-dimensional spatial weights matrix (e.g., contiguity matrix), we obtain a spatiotemporal GARCH model with a first-order temporal lag implied by W time and a first-order spatial lag with a constant weight matrix W directly from spatial GARCH models as proposed by Otto and Schmid (2022). Similarly, the log-GARCH models of Sato and Matsuda (2021) can be constructed for spatiotemporal data. In such a way, spatiotemporal processes can be modelled using spatial volatility models, and all theoretical results can be directly applied in the spatiotemporal setup.
Notice that the above-defined matrices are sparse. Current computational algorithms for sparse matrices are highly efficient from a time and memory perspective, as they only store the indices and values of the non-zero entries in the weight matrices. Thus, spatiotemporal models can often be estimated in this multi-index representation. However, when not using sparseelement objects and operations, the computational requirements can quickly explode with an increasing dimension n and T . Thus, it is generally meaningful if the dimension of the spatial weight matrices is as small as possible, and we will return back to the representation with an index t below, i.e., Y t (s) is observed for t = 1, . . . , T at all locations s ∈ D.

Spatiotemporal ARCH and GARCH models
The total number of coefficients in multivariate ARCH and GARCH models can increase faster than the cross-sectional dimension of these models. For example, in the BEKK specification considered by Engle and Kroner (1995), the number of parameters has an order of O(n 2 ), where n is the cross-sectional dimension of the model. Hence, multivariate models are often not applicable in realistic spatiotemporal settings. Caporin and Paruolo (2015) consider parsimonious structured specifications using spatial econometrics tools. Consider the following BEKK specification: where Y t = (Y t (s 1 , . . . , Y t (s n )) ′ is the n × 1 vector of the outcome variable, Σ t is the n × n matrix of covariances, ε t is the n × 1 vector of i.i.d. random variables that have 0 mean and unit variance, and A, B and C are n × n matrices of unrestricted parameters. Caporin and Paruolo (2015) re-parametrise this model such that the number of parameters has an order of O(n) by setting: for j = 0, 1, where W is an n × n spatial weights matrix, and s (1) , v, α (j) and β (j) are n × 1 vectors of unknown parameters. Note that a homogeneous parameters version can be obtained by assuming that the parameter vectors do not vary over the cross-sectional dimension. In this specification, the spillover effects arising through AY t−1 Y ′ t−1 A ′ and BΣ t−1 B ′ depend on the specification adopted for W. Caporin and Paruolo (2015) discuss alternative specifications for W and consider a quasi-likelihood estimation approach for the model. Billio et al. (2021) suggest an extended version of this model by assuming that W has a time-varying structure.
However, notice that these models do not allow for instantaneous spatial spillovers in a GARCH sense. All spatial interactions enter the model at the first temporal lag, i.e., it must always take one time period for information to spill over to neighbouring locations.

Otto et al. (2022a) consider a spatiotemporal ARCH model with a logarithmic represen-
tation for the volatility equation. This model takes the following form: for i = 1, 2, . . . , n and t = 1, . . . T . Here, log h t (s i ) is considered as the log volatility term in the region s i at time t, and ε t (s i )'s are i.i.d. random variables that have mean zero and unit variance. In (74), {m l,ij } p l=1 , for i, j = 1, . . . , n, are the non-stochastic spatial weights, where p is a finite positive integer, and {m l,ii } p l=1 are zero for i = 1, . . . , n. The spatial, temporal, and spatiotemporal effects of the log-squared outcome variable on the log-volatility term are measured by the unknown parameters γ 0 , {ρ l0 } p l=1 , and {δ l0 } p l=1 , respectively. In (74), x t (s i ) is a k × 1 vector of exogenous variables with the associated parameter vector β 0 , and the regional and time fixed effects are denoted by µ 0 = (µ 0 (s 1 ), . . . , µ 0 (s n )) ′ and α 0 = (α 10 , . . . , α T 0 ) ′ . Both µ 0 and α 0 can be correlated with the exogenous variables in an arbitrary manner.
To motivate the presence of spatial, temporal, and spatiotemporal effects in the log-  Applying the log-squared transformation to the outcome equation in (73) yields where Y * t (s i ) = log Y 2 t (s i ), h * t (s i ) = log h t (s i ) and ε * t (s i ) = log ε 2 t (s i ). Then, the model can be expressed in vector form as where M l = (m l,ij ) is the n × n spatial weights matrices, s 1 ), . . . , x t (s n )) ′ , and 1 n is the n × 1 vector of ones. The process in (77) indicates that h * t depends on the higher-order spatial lags  (Otto et al., 2022a) of Y * t and Y * t−1 . Substituting (77) into (76), we obtain for t = 1, . . . , T . The elements of ε * t in (78) are i.i.d across i and t but their mean may not be zero. Otto et al. (2022a) suggest adding and subtracting E (ε * t ) to obtain the following estimation equation: where u t = (u t (s 1 ), . . . , u t (s n )) ′ = ε * t − E (ε * t ), and µ ε = E (ε * t (s i )). Let σ 2 0 = E(u 2 t (s i )). Then, it follows that the elements of u t are i.i.d across i and t with mean zero and variance σ 2 0 . To eliminate µ 0 and µ ε 1 n from the model, they consider an orthonormal transformations. Otto et al. (2022a) suggest a GMM approach based on a set of linear and quadratic moment functions (Lee and Yu, 2014). The Monte Carlo simulation results indicate that the suggested GMM estimator has good finite sample properties. For details, we refer the interested reader to Otto et al. (2022a) directly.
When considering the same temporal ARCH parameter across spatial locations, one can see an additional benefit of spatiotemporal ARCH and GARCH models. Whereas their timeseries counterparts often require long time series (without structural breaks) to reliably estimate the parameters, spatiotemporal models can also be estimated for shorter time frames because they can borrow information from all locations to estimate the temporal ARCH parameter γ 0 . Simultaneously, spatial interactions between the series are included to account for the cross-sectional dependence.
We conclude this section by considering the circular spatiotemporal GARCH model considered by Hølleland and Karlsen (2020). This model is a circular version of the following spatiotemporal GARCH model: where {V t (s)} is a sequence of i.i.d. random variables that have zero mean and unit variance, , µ is an unknown scalar parameter, . . . , p} and β = {β i (v), v ∈ ∆ 1i , i = 1, . . . , p} are unknown parameters. Let R = Z d /(mZ d ) be the quotient group of order m ∈ Z d + . The circular model considered in Hølleland and Karlsen (2020) is obtained by replacing Z d with R. In this way, {Y t (s)} becomes a process indexed on Z × R, and the difference (v − s) and the sum (v + s) are points in R with respect to modulus m. Thus, the circular model takes the following form: Hølleland and Karlsen (2020) present the statistical properties of this circular model and suggest a quasi-maximum likelihood estimator for estimating the parameters.

Multivariate spatiotemporal ARCH models
When we observe vectors of several features across space and time, we can model such data as a multivariate spatiotemporal process. Otto (2022) extended the spatiotemporal log-GARCH models for multivariate response variables. For that reason, they follow the idea of VEC-GARCH models, where the matrix of volatilities is transformed using the vech-operator (Bollerslev et al., 1988;Engle and Kroner, 1995). More precisely, the multivariate spatiotemporal log-ARCH model is given by where Y t is an n × r-dimensional matrix of all r features of the response variable at all n locations. Moreover, Ξ t is an n × r-dimensional matrix of i.i.d random errors with mean zero and unit variance, and H t is the matrix of log-volatilities, which are defined as follows The operations (log) and (log, 2) should be understood as element-wise operations, i.e., H ..,n,j=1,...,r denotes the n×r-dimensional matrix of log-squared observations. The model has three coefficient matrices: (i) the locationspecific intercepts A (n × r-dimensional), (ii) own and cross-variable spatial effects in Ψ (r × rdimensional), and (iii) own and cross-variable temporal effects in Π (r × r-dimensional). Using the log-square transformation, the model can be transformed into a multivariate spatiotempo-ral autoregressive model. Otto (2022) propose a QML estimator based on the normal approximation of the transformed errors. The QML estimator is based on the estimation approach proposed by Yang and Lee (2017) for a multivariate spatial autoregressive model and Yu et al. (2008) for the univariate spatiotemporal autoregressive model. Alternatively, the Gaussian mixture approximation could also be considered.

Spatiotemporal stochastic volatility models
We can utilise two approaches to define a spatiotemporal stochastic volatility model. The first approach, known as the geostatistical approach (Cressie and Wikle, 2015), involves using known parametric covariance functions to model the correlation over time and space. This approach usually includes models obtained by extending a stationary and isotropic Gaussian process (GP), which can be defined in the following way: where σ > 0 is a scalar unknown parameter and V t (s) is a Gaussian process with mean 0, variance 1, and stationary covariance function Cov (V t 1 (s 1 ), V t 2 (s 2 )) = C(d, λ), where d = s 1 − s 2 and λ = t 1 − t 2 . In this approach, C(d, λ) takes a known parametric function that satisfies certain properties such as stationarity, separability, and full symmetry. Gneiting et al. (2007) review main covariance functions suggested in the literature. Craigmile and Guttorp (2011) consider an extension of (86) for temperature series, which takes the form of Y t (s) = σ t (s)V t (s), where σ t (s) is modelled such that it can capture the spatially varying seasonality in the variance of Y t (s). Huang et al. (2011) suggest another extended version taking the following form: where σ > 0 and α > 0 are scalar unknown parameters, h t (s) is a Gaussian process with mean 0, variance 1, and correlation function ρ h , and V t (s) is another independent Gaussian process with mean 0, variance 1 and covariance function ρ V . Huang et al. (2011) assume that the correlation functions ρ h and ρ V are isotropic and stationary in time in the following sense: This model can be considered as a spatiotemporal extension of the non-Gaussian geostatistical model proposed by Palacios and Steel (2006). For the estimation of the model, Huang et al. (2011) consider an approximation to the likelihood function of the model and show how an importance sampling approach and Monte Carlo integration can be used to evaluate and maximise the approximation.
In the second approach, tools from spatial econometrics are used to specify spatial, temporal, and spatiotemporal effects in the log-volatility equation. These effects are incorporated into the model specification through spatial weights matrices that specify the degree of spatial dependence in the outcome variables. Otto et al. (2022b) suggested the following specification: where µ = (µ(s 1 ), . . . , µ(s n )) ′ is the n × 1 vector of the time-invariant site-specific effects, and U t = (u t (s 1 ), . . . , u t (s n )) ′ is the n × 1 vector of i.i.d. disturbance terms with u t (s i ) ∼ N (0, σ 2 ) for all i and t, where σ 2 is a scalar unknown parameter. The n × n spatial weights matrix W specifies the degree of linkages among the elements of h t . The scalar parameter ρ 1 captures contemporaneous spatial correlation, ρ 2 measures the temporal effect, i.e., the time dynamic effect, and ρ 3 represents the spatiotemporal effect, i.e., the spatial diffusion effect. The reduced form of the volatility equation can be expressed as where S(ρ 1 ) = (I n − ρ 1 W) and A(ρ 2 , ρ 3 ) = (ρ 2 I n + ρ 3 W). When the cross-sectional dimension is fixed, the process for the log-volatility is stable if all eigenvalues of S −1 (ρ 1 )A(ρ 2 , ρ 3 ) lie inside the unit ball. Otto et al. (2022b) show that an easy-to-check sufficient condition for the stability of the volatility equation is |ρ 1 | + |ρ 2 | + |ρ 3 | < 1 when W is row normalised. The statistical properties of this model are described in Section of Appendix. Otto et al. (2022b) introduce a Bayesian estimation approach by assuming the following independent prior distributions: ρ 1 ∼ Uniform(−1, 1), ρ 2 ∼ Uniform(−1, 1), ρ 3 ∼ Uniform(−1, 1), , and σ 2 |a, b ∼ IG(a, b). The estimation approach is based on the logsquared transformation and the finite Gaussian mixture approach described in Section 3 for the SSV model. The log-squared transformation gives where Y * introduced in Section 3.2. In both spatial and spatiotemporal volatility models covered in Sections 3 and 4, we usually consider spatial lag terms formulated with W to introduce spatial and spatiotemporal effects in the volatility equations. In Sections 5.2 and 5.3, we consider alternative methods based on the matrix exponential and conditional autoregressive approaches to specify spatial dependence in the volatility equations.

Extensions of spatial stochastic volatility models
Following the time series literature on the stochastic volatility models described in Section 2.2.1, the SSV model described above can be extended in several directions. To this end, we consider several different spatial stochastic volatility models depending on the spatial specification adopted for the outcome and the log-volatility equations. Let x(s i ) = (x 1 (s i ), . . . , x k (s i )) ′ be the k × 1 vector of explanatory variables for i = 1, 2, . . . , n, X = (x(s 1 ), . . . , x(s n )) ′ be the n × k matrix of explanatory variables, and W = (w ij ) i,j=1,...,n and M = (m ij ) i,j=1,...,n be two non-stochastic n × n spatial weights matrices that have zero diagonal elements.
The first extension we consider allows for a spatial lag of the outcome variable as well as some exogenous explanatory variables in the outcome equation: for i = 1, . . . , n. The first-order spatial autoregressive process for Y (s i )'s introduces spatial correlations in the outcome variable. As before, ε(s i )'s are i.i.d. standard normal random variables, and u(s i )'s are an i.i.d. normal random variables with mean zero and variance σ 2 u . The scalar spatial autoregressive parameters are ρ and ϕ, and µ h is the constant mean parameter. We will refer to this model as SAR-SSV.
The spatial autoregressive process allows for the global transmission of a shock. On the other hand, the spatial moving process transmits a shock locally (Anselin, 1988;Fingleton, 2008a,b;Dogan and Taşpınar, 2013;Dogan, 2015). An alternative specification to the SSV model is to use a spatial moving average process for the log-volatility terms: where ϕ is the scalar spatial moving average parameter and u(s i )'s are an i.i.d normal random variables with mean zero and variance σ 2 u . We refer to this model as the SMA-SSV model. One can also allow for both a spatial autoregressive process and a spatial moving average process in the log-volatility equation to define the following extension of the SSV model: where W 1 = (w 1,ij ) i,j=1,...,n and W 2 = (w 2,ij ) i,j=1,...,n are two non-stochastic n × n spatial weights matrices that have zero diagonal elements. We refer to this model as the SARMA-SSV model.
Another variant can be defined by allowing the volatility feedback in the outcome (observation) equation, which can be considered as an analogous version of the model suggested by Koopman and Hol Uspensky (2002). This model is specified as where the scalar parameter α indicates the effect of the stochastic volatility on Y (s i ). We refer to this model as the SARM-SSV model.
An analogous version of the leverage effect model described in Section 2.2.1 can be defined by introducing correlation in the error terms of the outcome and log-volatility equations. This analogous takes the following form: where (ε(s i ), u(s i )) ′ follows the following bivariate normal distribution and ϱ is the unknown correlation parameter. We refer to this model as the SAR-SSVL model.
The next extension considers a scale mixture of normal distribution representation for the outcome variable. This is the analogous version described in Section 2.2.1 and takes the following form: where the latent scale variables ω(s i )'s are i.i.d random variables having distribution IG(ν/2, ν/2).
It is well-known that this representation implies that the marginal distribution of ω(s i ) 1/2 ε(s i ) (unconditional on ω(s i )) is the t distribution with ν degrees of freedom (Geweke, 1993). This specification can be considered as the spatial extension of the stochastic models considered in Harvey et al. (1994), Ruiz (1994) and Jacquier et al. (2004). We refer to this model as the SAR-SSVt model.
The Bayesian estimation approach described in Algorithm 2 can also be considered for some of these alternative specifications. The main requirement of the estimation approach is that the model obtained through the log-squared transformation and the Gaussian mixture approximation should be in the form of a linear Gaussian state space model. Therefore, a similar approach can be adopted to estimate the SAR-SV, SMA-SSV, SARMA-SSV, SAR-SSVL, and SAR-SSVt models. However, the approach described for estimating the SSV model may not be extended to the SARM-SSV model.

Matrix-exponential dependence structure
An alternative way to define spatial lag terms is through a matrix exponential term defined as where α is a scalar spatial parameter. The matrix exponential terms were first considered by LeSage and Pace (2007) to specify spatial dependence in an outcome variable as an alternative to the commonly used spatial autoregressive process. This specification type introduces an exponential decay rate for the cross-sectional dependence and can provide some computational advantages. Su et al. (2023) consider a matrix exponential term for spatial log-ARCH models, which is a special case of their logarithmic spatial heteroscedasticity model (log-SHE model). For further information, refer to Debarsy et al. (2015) and Yang et al. (2021Yang et al. ( , 2022Yang et al. ( , 2023 for the properties of spatial models defined in terms of matrix exponential terms.
Using matrix exponential terms, we can alternatively define the spatial ARCH and GARCH processes mentioned in Section 3.1, respectively, in the following way: where α 1 and β 1 are scalar parameters. When α 1 = β 1 = 0, both processes reduce to h = α 0 1 n .
Since a matrix exponential term is always invertible, the spatial GARCH process in (111) always has the reduced form given by where e −β 1 W 2 is the inverse of e β 1 W 2 . Importantly, this reduced form does not invoke any restriction on the parameter space of β 1 , which was not the case for the model considered in Section 3.1. Similarly, we can also consider the matrix exponential terms to define alternative versions of the spatial stochastic volatility models introduced in Section 3.2. For example, the log-volatility equation in the SSV model can be specified in the following way: where ϕ is the scalar spatial parameter. The reduced form of this process is h = µ h e −ϕW l n + e −ϕW u, which does not require any restrictions for ϕ. Similarly, we can also introduce spatial and spatiotemporal effects in the models considered in Sections 4.2 and 4.4. For example, the matrix exponential version of the spatiotemporal ARCH model considered by Otto et al. (2022a) can be formulated as where e p l=1 ρ l0 M l and e p l=1 δ l0 M l are higher-order terms formulated by the sequence of weights matrices {M l }. Similarly, the log-volatility equation in the spatiotemporal model suggested by Otto et al. (2022b) can alternatively be expressed as where ρ 1 and ρ 3 are spatial and spatiotemporal parameters.

Conditional autoregressive dependence structure
Another alternative way that can be used to introduce spatial and spatiotemporal dependence in a volatility process is to use the conditional autoregressive (CAR) specification. We start with the following model considered by Besag et al. (1991): where µ is a scalar unknown overall mean parameter, ε(s i ) is a disturbance term that has N (0, σ 2 ε ) distribution with the unknown variance parameter σ 2 ε , ϕ(s i ) is a spatially structured random effect term that has the CAR specification in (117). In the CAR process, we assume that b ij 's are known constants with b ij = b ji and b ii = 0, and σ 2 ϕ is an unknown scalar variance parameter. The constants b ij can be considered as spatial weights that determine the relationship between regions s i and sj. For example, b ij may be set to 1 if s i and s j are neighbours, and 0 otherwise. In this specification, the conditional variance of ϕ(s i ) is spatially varying and depends on j̸ =i b ij . Besag (1974) shows that the joint distribution of ϕ = (ϕ(s 1 ), . . . , ϕ(s n )) ′ can be determined as where B is the n × n matrix with the (i, j)th element B ij = b ij n k=1 b ik and D ϕ is the n × n diagonal matrix with the ith diagonal element D ii = 1 n k=1 b ik . Note that this result requires that (I n − B) −1 D ϕ is a positive definite matrix. Yan (2007) extends this model by assuming that ε(s i ) has a stochastic volatility term as specified below: where µ h is a scalar mean parameter and h(s i ) is the log-volatility term assumed to follow the CAR process specified in (119). In the CAR process, the weights c ij play the same role as b ij in the CAR process assumed for ϕ(s i ), and σ 2 h is a scalar variance parameter. As in the case of ϕ, the joint distribution of h is h|σ 2 h ∼ N (0, σ 2 h (I n − C) −1 D h ), where we assume that (I n − C) −1 D h is a positive definite matrix. In order to achieve identification for µ and µ h , Yan (2007) respectively requires that n i=1 ϕ(s i ) = 0 and n i=1 h(s i ) = 0. The posterior distribution of the model can be expressed as Yan (2007) (116) and (117) that involves an alternative CAR process for the random effect term ϕ(s i ). In vector form, their version can be specified as where ε = (ε(s 1 ), . . . , ε(s n )) ′ is the vector of disturbance terms, ϕ = (ϕ(s 1 ), . . . , ϕ(s n )) ′ is the vector of random effects, σ 2 ϕ is a scalar variance parameter and ρ is a scalar spatial parameter. Parent and LeSage (2008) consider this model for the knowledge spillovers arising from patent activity between European regions and specify the elements of the diagonal matrix M as the output gap and the elements of the symmetric matrix W as either based on a technological proximity index or based on an index of transport infrastructure. To ensure that (I n −ρW) −1 M is positive definite, we should require that ρ ∈ (1/ψ min , 1/ψ max ), where ψ min and ψ max are the minimum and maximum eigenvalues of M −1/2 WM 1/2 , respectively. In order to allow for outliers, Parent and LeSage (2008) assume that the elements of ε have a scale mixture of normal distribution, i.e., ε(s i )|ω(s i ), σ 2 ε ∼ N (0, σ 2 ε ω(s i )). The scale mixture components ω(s i )'s are independent with ω(s i )|ν ∼ IG(ν/2, ν/2) and ν ∼ Exp(λ 0 ), where Exp(λ 0 ) is the exponential distribution with the rate parameter λ 0 . For the remaining parameters, Parent and LeSage (2008) assume the following independent prior distributions: (i) σ 2 ϕ ∼ IG(a ϕ , b ϕ ), (ii) σ 2 ε ∼ IG(a ε , b ε ), (iii) µ ∼ N (µ 0 , V 0 ) and (iv) ρ ∼ Beta(a 0 , a 0 ), where Beta(a 0 , a 0 ) is the beta distribution defined as (2008) show that setting a 0 = 1.01 gives a relatively uninformative prior for ρ over the interval (1/ψ min , 1/ψ max ). Under these priors, the conditional posterior distributions of ϕ, ω, µ, σ 2 ϕ and σ 2 ε are in standard forms while those of ρ and ν take unknown forms. In the case of ρ, Parent and LeSage (2008) use the univariate numerical integration over the interval (1/ψ min , 1/ψ max ) to produce the conditional posterior distribution. As for ν, Parent and LeSage (2008) utilised the random walk Metropolis-Hastings algorithm described in LeSage and Pace (2009) to generate posterior draws.

Conclusion and outlook
Spatial and spatiotemporal volatility models constitute a promising new class of models for modelling dependence among the volatility of neighbouring sites in spatial and spatiotemporal data. These models have been recently developed to account for the spatial and spatiotemporal effects in the volatility of an outcome variable. This paper has provided a comprehensive review of the recent literature on spatial and spatiotemporal volatility models. Compared to multivariate time-series GARCH models, which are typically not applicable in spatial settings because of the large number of cross-sectional locations, spatial and spatiotemporal volatility models incorporate a geographical structure to specify dependence in the volatilities. This structure allows modelling spatial and spatiotemporal spillovers across neighbouring locations in a GARCH-like sense. High volatilities may instantaneously spill over to neighbouring regions, thus forming spatial volatility clusters.
Besides motivating different alternative specifications and summarising estimation strategies, we discussed possible extensions and indicated future research directions. Notably, the strand of spatial and spatiotemporal volatility models can be extended in the following ways: 1. Asymmetric and anisotropic spatial and spatiotemporal dependence in volatilities: The study of asymmetric dependence in volatility is crucial for comprehending the dynamics of financial markets and the impact of shocks on volatility. Volatility exhibits a typical asymmetry known as the leverage effect. In time series analysis, exponential GARCH models have proven to be valuable in such scenarios. While exponential GARCH models for spatial data were briefly mentioned in Otto and Schmid (2019), they have not been thoroughly investigated. Considering housing prices, exploring asymmetric spatial spillovers presents an intriguing strand for future research. For instance, determining whether negative shocks on real-estate prices exert a stronger influence on local prices than positive shocks would be a compelling area to investigate.
2. Matrix-exponential specification and conditional autoregressive heteroscedasticity models, and further extensions: The effectiveness of spatial and spatiotemporal models heavily relies on the underlying dependence structure, which is typically unknown in practical applications. Therefore, exploring different dependence structures in future research would be of great interest. For example, matrix-exponential specifications for the spatial dependence in GARCH models and stochastic volatility models present intriguing possibilities, given their computational advantages.
3. Spatial GARCH models for continuous spatial fields: Presently, all spatial and spatiotemporal GARCH models have been designed for discrete spatial domains. Consequently, they cannot be directly utilised for predicting volatility at unknown locations, also known as kriging. This field would also relate to continuous-time GARCH models (see, e.g., Klüppelberg et al., 2004). As a notable exception, Huang et al. (2011) introduced a spatial stochastic volatility model within the geostatistical framework. However, in general, considering spatial (autoregressive) dependence in the volatility of a process has received relatively little attention and holds promise as a compelling direction for future research.
4. Spatiotemporal GARCH model for financial networks: Spatial weights matrices can be interpreted as adjacency matrices in networks, making spatiotemporal GARCH models akin to GARCH models for nodal attributes on networks. In this context, the spatial interactions describe the dependence across a (non-random) network, with W representing the network structure. Consequently, the connections in W should not be strictly understood in a geographical sense, rendering these models appealing for financial network data. For instance, Mattera and Otto (2023) constructed financial networks based on Piccolo distances and utilised spatiotemporal log-ARCH models for volatility forecast-ing. Given the prevalence of GARCH and stochastic volatility models in finance, the application of spatiotemporal volatility models holds promise as a compelling pathway in finance, particularly when dealing with large financial networks.
In summary, we believe that the class of spatial and spatiotemporal models offers intriguing opportunities for new theoretical developments and practical applications. Bearing in mind that a process' volatility is often interpreted as risk, identifying and predicting local areas of high volatilities -risks -is important in various fields, such as finance, economics, or environmental science. In this section, we describe the estimation approach suggested by Dogan and Taşpınar (2023) for the high-order spatial GARCH model in (56). Note that the 10-component Gaussian mixture distribution can be represented using an auxiliary random variable a i ∈ {1, 2, . . . , 10} that serves as the mixture component indicator. In other words, ε * (s i )|(a i = j) ∼ N (µ j , σ 2 j ) and P (a i = j) = c j for j = 1, 2, . . . , 10 and i = 1, 2, . . . , n. Let a = (a 1 , . . . , a n ) ′ , d a = (µ a 1 , . . . , µ an ) ′ , and Σ a = diag(σ 2 a 1 , . . . , σ 2 an ). Then, we have ε * |a ∼ N (d a , Σ a ). Then, from equation (63),

Glossary
Furthermore, let us define Y * * = S −1 (β 1 )G(θ)Y * − S −1 (β 1 )Zδ. Then, it follows that Y * * |a ∼ N (d a , Σ a ). Then, the joint posterior distribution f (θ, δ, a|Y * ) can be stated as where f (Y * |θ, δ, a) denotes the conditional likelihood function of the transformed model. Let b (g) be the draw generated at the gth iteration, where b ∈ {θ, δ, a}. Then, the following Gibbs sampler describes the steps for estimating the higher-order spatial GARCH model.
Algorithm 1 (Estimation of the spatial GARCH model).

Estimation of the spatial stochastic volatility models
In this section, we describe the estimation approach suggested by Taşpınar et al. (2021) for the estimation of the SSV model. Taşpınar et al. (2021) assume that the distribution of ε * (s i ) can be approximated by the 10-component Gaussian mixture distribution stated in (64). The parameters of the 10-component Gaussian mixture distribution were provided in Table 1. Note again that the 10-component Gaussian mixture distribution approximates the distribution of ε(s i ) closely, as shown in Figure 4.
We will again represent the 10-component Gaussian mixture distribution using an auxiliary random variable a i ∈ {1, 2, . . . , 10} that serves as the mixture component indicator.
Algorithm 2 (Estimation of the SSV model).
2. Sampling step for h: where H h = Σ −1 a + 1 3. Sampling step for σ 2 u : 4. Sampling step for µ h : where V µ = V −1 µ + 1 5. Sampling step for ϕ: which does not correspond to any known density function. A random-walk Metropolis-Hastings algorithm can be used to sample from this distribution (LeSage and Pace, 2009).
The simulation results presented in Otto et al. (2022b) demonstrate that this Gibbs sampler performs satisfactorily.