Similarity‐based clustering for patterns of extreme values

Statistical modelling of the magnitude and the frequency of extreme observations is fundamental for a variety of sciences. In this paper, we develop statistical methods of similarity‐based clustering for heteroscedastic extremes, which allow us to group time series of independent observations according to their extreme‐value index and scedasis function (i.e., the magnitude and frequency of extreme values, respectively). Clustering scedasis functions and extreme‐value indices involves the challenge of grouping objects composed of both a function (scedasis) and a scalar (extreme‐value index), and thus the need to partition a product‐space. Our analysis reveals an interesting mismatch between the magnitude and frequency of extreme losses on the London Stock Exchange and the corresponding economic sectors of the affected stocks. The analysis further suggests that the dynamics governing the comovement of extreme losses in the exchange contains information on the business cycle.

Statistical modelling of the magnitude and the frequency of extreme observations is fundamental for a variety of sciences.In this paper, we develop statistical methods of similarity-based clustering for heteroscedastic extremes, which allow us to group time series of independent observations according to their extreme-value index and scedasis function (i.e., the magnitude and frequency of extreme values, respectively).Clustering scedasis functions and extreme-value indices involves the challenge of grouping objects composed of both a function (scedasis) and a scalar (extreme-value index), and thus the need to partition a product-space.Our analysis reveals an interesting mismatch between the magnitude and frequency of extreme losses on the London Stock Exchange and the corresponding economic sectors of the affected stocks.The analysis further suggests that the dynamics governing the comovement of extreme losses in the exchange contains information on the business cycle.
cluster analysis, cluster functions and scalars, extreme values, risk diversification, statistics of extremes

| INTRODUCTION
Small-probability events-such as a stock market crash-often lead to devastating economic and financial aftershocks.The need to assess the likelihood of such rare events is now widely understood, and statistics of extremes (Balkema & Embrechts, 2007;Beirlant et al., 2004;Coles, 2001;Davison & Huser, 2015;de Haan & Ferreira, 2006;Embrechts et al., 1997;Resnick, 2007) provides an appropriate probabilistic framework to address this issue in a mathematically rigorous manner.An overarching principle of statistics of extremes is that any sensible assessment of risk requires the application of resilient methods that are able to extrapolate into the tails of a distribution-often beyond observed extremes in a dataset.For random samples, a key result in statistics of extremes is that if there is a nontrivial limiting distribution for the normalized sample maxima, then it must be a generalized extreme-value (GEV) distribution (McNeil et al., 2015, Theorem 7.3).The shape parameter of the GEV distribution governs the rate of tail decay, and is also known as extreme-value index, or tail index.
The main goal of this paper is to develop cluster analysis methods for highly volatile time series of extremes.Applying methods from statistics of extremes to finance has a long history, including, Danielsson and de Vries (1997), Longin and Solnik (2001), Poon et al. (2003), Herrera and Schipp (2013), Hilal et al. (2014), andChavez-Demoulin et al. (2014).The monograph recently edited by Longin (2016) contains an up-to-date survey of methods and applications for statistical modelling of extreme values in finance.To model the dynamics of extremes over time, parametric methods to handle nonstationary data have been proposed by Davison and Smith (1990), Chavez-Demoulin and Davison (2005), and Eastoe and Tawn (2009), among others.More recently, Einmahl et al. (2016) have developed a semiparametric modelling and inference framework for heteroscedastic extremes; their two main objects of interest being the scedasis function, which describes the dynamics of extremes over time, and the extreme-value index, which describes the magnitude of extremes.
A recent paper by Ando and Bai (2017), which relates to ours in terms of applied motivation, considers the issue of clustering financial time series based on observable and unobservable factors.In contrast, our approach clusters financial time series based on their resemblance in terms of risk.More precisely, we build on Einmahl et al. (2016) and develop a clustering algorithm that allows us to group stocks that share similarities in their univariate distribution in terms of the overall magnitude and the frequency of extreme losses.Since the extreme-value index carries information on the rate of tail decay while the scedasis tracks the evolution of extreme losses over time, these parameters form a natural basis for clustering stocks to address our motivating question.
Clustering is an unsupervised learning problem, in the sense that the 'true' cluster labels are unknown and need to be estimated from the data.Introductions to the subject of cluster analysis include Hastie et al. (2009), Everitt et al. (2011), andKing (2014).The huge literature on data clustering is difficult to survey in a few paragraphs, but a concise description of mainstream approaches is offered by Hennig and Liao (2013).The most popular clustering approaches may be classified among similarity-based, model-based, and/or hierarchical clustering techniques.Similaritybased methods include K-means (MacQueen, 1967) and K-medoids (Kaufman & Rousseeuw, 1987).Model-based clustering is typically based on mixture models (Fraley & Raftery, 2002), whereas hierarchical clustering builds hierarchies of clusters, often represented in so-called dendograms (Hastie et al., 2009).Hennig and Liao (2013) recently highlighted the lack of clear guidance on choosing an appropriate clustering algorithm for a given problem, despite the many approaches that have been proposed in the literature.The literature discussing extremes is much sparser, although much has been written about the clustering of extreme events within time series induced by short-term temporal dependence (see, e.g., Leadbetter et al., 1983).Clustering algorithms for parallel time series, tailored to specific extreme-value applications, are often intrinsically related to the concept of dimension reduction.For example, Bernard et al. (2013) developed a clustering method for spatial climate extremes, and Chautru (2015) and Vettori et al. (2020) proposed clustering and dimension reduction techniques for multivariate extreme events based on tree mixtures.
Clustering stocks with a similar scedasis function and extreme-value index entails clustering objects defined by the combination of a function (scedasis) and a scalar (extreme-value index); recently, there has been much interest in clustering complex objects such as functions, and the development of methods for clustering functional data is still an area of ongoing research, both from applied and theoretical viewpoints (Delaigle et al., 2012;Peng & Müller, 2008;Wang et al., 2015).
To our knowledge, our paper is pioneer on tackling the challenging problem of performing a cluster analysis of both functions and scalars; to tackle this problem, we develop a clustering approach that can be seen as a K-means method, but where clustering is executed in a product-space defined in terms of the space of all scedasis functions and the positive real line.A weight parameter is then applied as defined by the analyst to control whether the clustering algorithm should prioritize the frequency or the magnitude of extreme losses, which are respectively controlled by the scedasis function and the extreme-value index.The proposed approach also allows for clustering stocks with similar risk loss patterns, by identifying affinities in time-varying value-at-risk functions.To examine the periods of highest market uncertainty and how these affect different stocks, we also explore how the maxima of the stocks' scedasis functions are spread over time.Intuitively, if the maxima of all scedasis functions coincided at a single time point, this would imply a perfect synchronization among all the examined stocks, and thus the largest market fluctuations would reflect similarly on all stocks.To summarize this information, we define the mode mass function, which can be used to assess the degree of synchronization of the maximum frequency of extreme losses between stocks.The latter approach has some connections with multivariate extreme value modelling, which has been widely used by practitioners after the seminal papers of Longin and Solnik (2001) and Poon et al. (2004).Note, however, that in the current paper we focus on univariate, but time-varying, characteristics of extremes, and as such we do not estimate extremal dependence.
In Section 2, we discuss a K-cluster configuration for statistics of heteroscedastic extremes, and we introduce our similarity-based clustering algorithm, as well as the mode mass function.Numerical experiments are reported in Section 3, and the analysis of the London Stock Exchange data is documented in Section 4. Section 5 concludes with a discussion.

| Cluster-based proportional tails model
Our starting point for modelling relies on an extended version of Einmahl et al. (2016) formulated for K time series clusters.We assume that the true data generating process consists of N time series partitioned into blocks with homogeneous scedasis functions and extreme-value indices.
Specifically, the N time series are partitioned into K ⩽ N blocks Y ½1 , …, Y ½K , where the kth block consists of N ½k time series of length T, that is, In Section 2.2, we assume that we observe all time series, but we ignore their underlying cluster labels (i.e., the first element in the superscripts in (1)), which need to be estimated.Here, we assume that the jth time series in the kth cluster, that is, the jth row in group Y ½k , is formed by independent random variables with infinite upper endpoints and with respective continuous distributions F ½k,j 1 , …,F ½k,j T .Following Einmahl et al. (2016), we assume that there exists a non-negative limit function c ½k , called the scedasis function, such that for each time series j in the kth cluster, for some unknown continuous baseline distribution function F ½k,j with an infinite upper endpoint.Notice that in each cluster (k) the different time series may possess various baseline distributions, but they share the same scedasis function (c ½k ), as the left-hand side of (2) does not depend on j.For identifiability reasons, we assume that Ð 1 0 c ½k ðwÞdw ¼ 1, such that c ½k is a valid density on the unit interval ½0,1.Generally speaking, the scedasis density carries information on the time dynamics of extremes; for instance, a uniform scedasis function (for which c ½k ðwÞ ¼ 1, w ½0,1) corresponds to a constant frequency of extremes over time.We further assume that for each cluster k, time series j, and time for some constant γ ½k > 0, where U ½k,j t Þg and ' ' is the left continuous inverse function.In other words, the data in each cluster k are heavy-tailed with rate of tail decay controlled by the extreme-value index γ ½k .Thus, despite the fact that all variables in cluster k have their own distributions, we assume that their tail behaviour is characterized by the pair ðc ½k , γ ½k Þ.Therefore, the extremal behaviour of these N ½k time series can be summarized by a single function c ½k , describing the time variation in the frequency of extremes and a scalar γ ½k > 0, controlling the overall amplitude of the largest extremes.
For example, (2) and (3) are satisfied when the data have a Pareto-like tail, that is, where a ½k,j > 0 is an overall scale parameter, in which case the baseline distribution may be chosen as the Pareto distribution , with y > a ½k,j .From this example, it is clear that the extreme-value index γ ½k controls the tail weight, while c ½k ðt=TÞ is a time-varying relative scale component, both specific to the kth cluster.The extra time-invariant overall scale parameter a ½k,j > 0 accounts for possible differences in scales between the different time series grouped together in the same cluster.

| Estimation and inference
As mentioned above, in practice the cluster labels are unknown, so instead of having data partitioned as in (1), we observe N time series written as Y i 1 , …, Y i T , which consist of independent observations respectively distributed as F i 1 , …, F i T , for all i, but we ignore the data structure and their tail properties.Instead, the first step of our approach targets estimations for a family of scedasis densities and extreme-value indices denoted by Under the K-cluster proportional tails model described in Section 2.1, some of these scedasis functions and extreme-value indices are the same, but we estimate them separately.
be the order statistics from the ith time series, Y i 1 , …, Y i T , and k k T be an intermediate sequence, that is, k T !∞ and k T =T !0, as T !∞.To estimate each element in the family of scedasis functions, we use the nonparametric kernel-based estimator of Einmahl et al. ( 2016) for all i, that is where It is well known that the choice of kernel does not affect the inference much, whereas bandwidth selection is crucial.Following Marron (2001, p. 533), we advocate conducting inference over a wide range of bandwidths as a way to assess the sensitivity and reliability of the inference to the bandwidth.
To estimate the family of extreme-value indices, we follow Einmahl et al. (2016) and use the Hill estimator defined as for all i.As shown in Einmahl et al. (2016), the estimator ( 6) is strongly consistent and asymptotically normal under the heteroscedastic proportional tail model described in Section 2.1 and mild additional regularity conditions.Furthermore, it does not depend on the associated scedasis Asymptotic normality, and thus weak consistency, for the kernel-based estimator ( 5) is also established in Einmahl et al. (2016, p. 49), but under rather stringent conditions.These appealing large sample properties suggest that our clustering algorithm developed in Section 2.3 should perform well in practice, which is confirmed by our numerical experiments in Section 3, especially for a large sample size T. The outcome of this estimation stage yields N pairs of scedasis and extreme-value estimates, fðĉ i , γi Þg N i¼1 , and we next describe how such pairs can be partitioned into K clusters.

| Similarity-based clustering for heteroscedastic extremes
Clustering scedasis functions and extreme-value indices requires overcoming the challenge of clustering objects composed of both a function (scedasis) and a scalar (extreme-value index).To approach this problem, we consider a similarity-based clustering algorithm.Similarity-based clustering methods usually require the true number of clusters K ⩽ N to be either known or specified a priori.For these methods, a data point T should be assigned to a certain cluster if its distance to the cluster centre T is minimal.Different variants lead to different concepts of a cluster centre, the most well-known being K-means (MacQueen, 1967) and K-medoids (Kaufman & Rousseeuw, 1987).Whereas in the case of the K-means algorithm a cluster centre m ½k is defined by a statistic (e.g., a cluster mean), in the K-medoids algorithm a cluster centre is defined by a data point itself.To measure the dissimilarity of a data point x i from a cluster centre m ½k , it is common to resort to a dissimilarity function D : ℝ 2d !½0,∞Þ, such that for all possible data points and cluster centres the following properties are satisfied: The most widely used dissimilarity function is the squared Euclidean distance As mentioned above, here we are faced with the need of extending the standard similarity-based clustering framework so to be able to cluster both a function (scedasis) and a scalar (extreme value index).We mostly focus on the case where each cluster Y ½k in (1) is characterized by both the same scedasis function and extreme-value index, which we refer to as partitioning in the product-space Π ¼ C Â ð0, ∞Þ, where C denotes the space of all scedasis functions (i.e., all densities defined on the unit interval).However, it might also be interesting in practice to cluster time series according to their scedasis function-resulting in K c clusters-or extreme-value index-resulting in K γ clusters.We refer to these cases as partitioning in the profile-spacesC and ð0,∞Þ.Clearly, K does not necessarily coincide with K c and K γ .This is the case when there are no two time series that have the same scedasis function but a different extreme-value index, or vice versa.In general, one has Clustering is performed here with the aid of an encoder, that is, a map L : f1,…, Ng !f1,…, Kg assigning each time series i ¼ 1,…,N to a cluster label k f1,…,Kg.Standard encoders assign single observations to a cluster (Hastie et al., 2009), while here whole sets of observations are assigned simultaneously to a cluster.
In the time series context of heteroscedastic extremes, suppose that we are given a family of estimated scedasis functions and extreme-value indices, fðĉ i , γi Þg N i¼1 , obtained using the procedure discussed in Section 2.2.We measure dissimilarity by relying on a family of functions being dissimilarity measures specific to the scedasis function and to the extreme-value index, respectively.Equation ( 7) requires dissimilarity in the product-space to be compatible with the dissimilarities in the corresponding profile-spaces, when α !0 and α ! 1. Roughly speaking, α ð0,1Þ can also be interpreted as a tuning parameter, controlling whether the dissimilarity function should apply more weight to the dissimilarity in terms of the scedasis density or the extreme-value index.Examples 1 and 2 below are two possible valid families of dissimilarity functions obeying ( 7), although many others may be designed.Example 2. (Max-linear dissimilarity in product-space).Let D max Algorithm 1 describes the K-means type of algorithm underlying our clustering procedure, detailing the three main steps: Initialization, cluster (re)assignment and cluster-centre updating, of which the last two need to be recursively iterated until convergence.Below, we focus on although there are many other variants, for example, replacing the ℓ 2 dissimilarity with the ℓ 1 dissimilarity, or focusing on specific features of the scedasis function (e.g., its maximum) rather than its overall behaviour.
Unreported results show that the choice of dissimilarity measure (e.g., ℓ 1 versus ℓ 2 ) does not affect the results much.With our choice of DE CARVALHO ET AL.
dissimilarity measure, the algorithm outlined in Algorithm 1 becomes a genuine K-means algorithm in the sense that cluster centres are indeed found using the actual means; more precisely, Step 2 can be replaced with μ ðnewÞ ¼ fðc ½k , γ ½k Þg K k¼1 , where with N½k ¼ P N i¼1 IðLðiÞ ¼ kÞ, for all k; see Appendix A for a proof.Note that c ½k ðwÞ and γ ½k are compatible with the assumptions made in Section 2.1, namely, Ð 1 0 c ½k ðwÞdw ¼ 1 and γ ½k > 0. This implies that the cluster centres are formed by valid scedasis functions and extreme-value indices, corresponding to a distribution in the maximum domain of attraction of the Fréchet distribution.
A K-medoids type of algorithm can also be applied, in which case the cluster centres correspond to the scedasis and extreme-value index of the observed time series.In practice, this implies that Steps 0 and 1 are the same as for the K-means algorithm but that the cluster-centre updating rule in Step 2 needs to be modified as follows: For each cluster k, search for the time series in this cluster that minimizes the total dissimilarity to other time series in the same cluster , γi 0 ÞÞ, and set c ½k ĉi * k , γ ½k γi * k , for all k, and The K-medoids algorithm requires however a larger computational investment than the K-means approach.
The cluster centres of the K-medoids algorithm, c ½k ¼ ĉi * k and γ ½k ¼ γi * k , are obviously compatible with the assumptions made in Section 2.1; that is, c ½k is a valid scedasis function and γ ½k > 0, for all k.Other variants of the K-means and K-medoids algorithms for clustering heteroscedastic extremes can also be constructed similarly, for example, using Partition Around Medoid (PAM) and Clustering Large Applications (CLARA) (Kaufman & Rousseeuw, 2009, Ch. 2-3).

| Clustering risk loss patterns
This section capitalizes on the setup introduced so far to offer another natural way to cluster financial time series.Time-varying value-at-risk (VaR), at confidence level p ð0,1Þ, can be defined as VaR t ðpÞ ¼ inffy : F t ðyÞ ⩾ pg, where p is typically taken as a high probability such as p ¼ 0:95 or p ¼ 0:99.More specifically, using the scedasis function and extreme-value index estimators (5) and ( 6), respectively, we can obtain an estimator for the (functional) value-at-risk for each time series i ¼ 1, …, N, by where with N½k ¼ P N i¼1 IðLðiÞ ¼ kÞ; the proof is tantamount to that in Appendix A.

| Mode mass function
To summarize the periods of highest market uncertainty and to see how these periods reflect on each specific stock, we suggest studying how the modes of the various scedasis functions for different stocks are spread over time through the mode mass function.Assume that each estimated scedasis function ĉi has a unique global maximum and define its unique mode m i ½0,1 as Note that (11) can also be written as , under the Pareto-like tail specification in (4); recall Equation ( 9).A smoothed version of the mode mass function may be computed using a beta kernel density as where βðw; a, bÞ denotes the Beta density function with parameters a, b > 0. In the parametrization (12), the beta kernels have mean m i and ν > 0 is a concentration parameter.From an analytical perspective, the smoothed mode mass function in ( 12) is related to the smooth Euclidean likelihood spectral density estimator proposed by de Carvalho et al. (2013, p. 16).Conceptually, it operates similarly to standard kernel smoothing, in the sense that it 'centres' a beta kernel at each scedasis mode, but here the kernels are asymmetric for m i ≠ 0:5 and they are tailored for observations defined in the unit interval.
If the modes of all scedasis functions coincide at a single point in time, that is, MðwÞ concentrate around a certain w 0 ½0,1, then the periods at which the frequencies of extreme losses are the largest for the different stocks are perfectly synchronized; on the other hand, if the modes of the scedasis functions are uniformly spread over time, that is, MðwÞ ≈ 1, w ½0,1, then this suggests that the different stocks' maximum frequencies of extremes are not related at all.

| Simulation setting and preliminary experiments
We start by describing the data-generating processes used to assess the performance of our clustering algorithm.We simulate independent observations according to the structure (1) using a generalized extreme-value distribution defined as , and time series j ¼ 1,…,N ½k , where a þ ¼ maxð0,aÞ.We consider three simulation scenarios.
In Scenario A, the true scedasis densities are defined as c ½k ðwÞ ¼ τðw; p k Þ, p k ½0,1, where, yielding triangular-shaped scedasis functions with peaks at the centre point w ¼ 0:5, and the true extreme-value indices are set to γ ½k ¼ 0:7 or 1.In Scenario B, the true scedasis functions c ½k ðwÞ ¼ βðw; α k , β k Þ are Betaðα k , β k Þ densities and the extreme-value indices are set to γ ½k ¼ 0:7 or 1.In Scenario C, the scedasis densities are either τðw; p k Þ or βðw; α k , β k Þ, and the extreme-value index is set to γ ½k ¼ 0:7.
The simulation settings and our choice of parameter values are reported in Table 1 and illustrated in Figure 1; for Scenarios A and B, we consider K ¼ 4 true clusters in the product-space, but K c ¼ K γ ¼ 2 clusters in the respective profile-spaces, comprising a total of N ¼ 30 time series.
We simulate five independent time series with parameters ðc 1 , γ 1 Þ, five with ðc 2 , γ 1 Þ, 10 with ðc 1 , γ 2 Þ, and 10 with ðc 2 , γ 2 Þ; the scedasis densities c 1 , c 2 with their parameters and the extreme-value indices γ 1 , γ 2 are specified in Table 1.Scenario C differs from this setting, as it is characterized by three different scedasis functions and a single extreme-value index, which yields K ¼ K c ¼ 3 and K γ ¼ 1.For each cluster, we simulate N ½k ¼ 10 time series independently for a total of N ¼ 30 time series.
For Scenarios A, B, and C, a single-run experiment was performed using our methods to obtain the estimated cluster centres and scedasis functions.These are compared with the true scedasis functions in Figure 1.To estimate each scedasis function, we consider the sample sizes T ¼ 500,1000,2000,5000 with the number of upper order statistics k ¼ 34,62,112,250, chosen according to the intermediate sequence k T / bT=logTc, which corresponds to the increasing threshold probabilities 93.2%, 93.8%, 94.4%, and 95.0%, respectively.We use the kernel-based estimator of Einmahl et al. (2016) with K b ðwÞ ¼ 15f1 À ðw=bÞ 2 g 2 =ð16bÞ, for w ½À1,1.We select the bandwidth values based on b ¼ Ak À1=5 , with the constant A > 0 set by visual inspection in order to improve estimation in a sensitivity analysis (not shown) considering a grid of possible values for A; using this approach, the values of the bandwidth b that we consider here are 0.2, 0.15, 0.1, and 0.1 for each sample size, respectively.The extreme-value index is estimated using the Hill estimator.
Here and below, we use the convex linear combination dissimilarity from Example 1, that is, recall that in such a case the cluster centres can be obtained as in (8).In Figure 1, we set α ¼ 0:5.While Figure 1 suggests that our clustering method performs rather well, firm conclusions can only be drawn from a rigorous Monte Carlo study.This is investigated in the next section.

| Monte Carlo simulations
Here, we conduct an extensive simulation study to assess the performance of our clustering algorithm in grouping different time series into separate clusters defined in terms of magnitude and dynamics of the extremes.We calculate the Rand (Rand, 1971) and silhouette indices (Rousseeuw, 1987), whose mathematical definition are given in the Supporting Information, based on 1000 runs of our algorithm under the settings described in Section 3.1.Notice that the silhouette index can be computed even if the true data partition is unknown.By contrast, the Rand index is used to measure the similarity between two different partitions, one of which is considered the true partition in our simulation study.
Table 2 reports the Monte Carlo means of the Rand and silhouette indices for the simulation settings described in Section 3.1, as a function of the sample size T and the weight parameter α.We can observe that both validity measures are almost always maximized when α ¼ 0:5 for all scenarios (i.e., 'weighting' equally the scedasis density and the extreme-value index).However, as T !∞, the clustering performance in Scenario C should improve as α ! 1 (i.e., more weight on the scedasis density than on the extreme-value index), because K γ ¼ 1 and therefore all information for identifying distinct clusters is contained in the scedasis profile space.At subasymptotic levels (in this case, T ¼ 5000 and k ¼ 250), values of α close to 0.5 seem to work better.Furthermore, as expected, performance improves as T increases; so with larger sample sizes, the estimation of the scedasis functions and extreme-value indices in the first step of our procedure is more accurate, which then in the second step reflects on the estimated clusters.
When the number of clusters K is unknown or uncertain, as is typically the case in practice, it is useful to perform clustering over a range of values for K, and then compare the resulting diagnostics.We now explore the use of the elbow method, proposed by Tibshirani et al. (2001) and adapt it to clustering heteroscedastic extremes.The method is presented here specifically for the K-means algorithm and convex combination D α of ℓ 2 dissimilarity measures (recall Example 1), but it can also be designed more generically.The elbow method is based on the statistic D ½k , defined as T A B L E 1 Data-generating scenarios.For Scenarios A, B, and C, each row reports the number of clusters in the product-space (K), the number of clusters in the respective profile-spaces (K c and K γ ), the different scedasis functions and extreme-value indices involved, and the cluster sizes.
where I ½k is the index set corresponding to the kth cluster C ½k , and μ ½k ¼ ðc ½k , γ ½k Þ is the cluster centre obtained using the K-means algorithm; D ½k represents the sum of intracluster dissimilarities between 'points' in a given cluster C ½k , and the corresponding cluster centres.The sum of normalized intracluster sums of squares measures the overall 'compactness' of our clustering procedure: The dashed blue lines depict the cluster centre estimates produced by our method, plotted against the true scedasis functions (solid black) defined in Table 1; the scedasis function estimates are depicted in grey.Scenarios A, B, and C (left to right) are presented for sample sizes T ¼ 500,1000,2000,5000 (top to bottom).
The elbow method runs the K-means algorithm for heteroscedastic extremes on the dataset for a range of values of K and then plots the sum of squared errors W K as a function of K.The plot should, in principle, be monotonically decreasing with K until the true value of K is reached, and then stabilize for larger values of K.The optimal number of clusters is therefore found at the 'elbow' of the plotted curve.To illustrate this approach, we consider the three aforementioned simulation Scenarios A, B, and C with a sample size T ¼ 5000 and weight parameter α ¼ 0:5.The results are displayed in Figure 2. As expected, the 'elbow' is found at K ¼ 4 for Scenarios A and B, but at K ¼ 3 for Scenario C, which all correspond to the true number of clusters.

| London Stock Exchange: Data and main modelling goals
We gathered the data from Yahoo Finance.After some preliminary analysis detailed in Section 4.2, we retained 139 stocks from the London Stock Exchange, and Figure 3 presents the daily negative log-returns for a selection of 26 stocks, including GlaxoSmithKline, Royal Dutch Shell B, Tesco, among others.The stocks displayed in Figure 3 cover nine economic sectors of the exchange, namely, oil and gas, basic materials, industrials, healthcare, consumer goods, consumer services, financials, utilities, and technology; about 40% of these selected stocks are FTSE 100 companies.
The period under analysis ranges from December 1989 to May 2016, and thus it includes stock market crashes clearly visible from the time series plots, such as black Wednesday (1992), the dot-com bubble (2000), and the most recent financial crisis (2008)(2009); it also takes into account the turbulence in the eminence of the Brexit referendum, which took place on 23 June 2016.The preparations for post-Brexit have actually been creating a wealth of challenges and opportunities for institutional investors, professional money managers, and traders; for example, there are plans for the exchange to take on board further companies from the Middle East in a near future (Reuters, 2017).One of our applied goals is to develop and apply statistical methods that can be used to partition stocks such as those presented in Figure 3   viewpoint of risk; the latter might, or might not, reflect the nine economic sectors.More precisely, an overall objective of our analysis is to group stocks according to the magnitude and frequency of univariate extreme losses.As negative log-returns can be regarded as a proxy for losses, we focus on the right tail of the data shown in Figure 3.Our quantitative analysis is here intended to complement the overall picture provided by exploratory, qualitative or expert-based diagnostics.For example, a quick visual inspection of Figure 3 could perhaps suggest that Stocks 1 (Royal Dutch Shell B) and 5 (Johnson Matthey) look reasonably similar in terms of magnitude of extreme losses, but whether the dynamics of their extreme losses over time is similar is far from clear.The methods presented in this paper will offer answers to these and related questions.

| Preliminary analysis and selection of stocks
We now apply our model to the London Stock Exchange data briefly presented in Section 4.1.To use our model, we need to test whether the extreme-value index γ is constant over time for each stock.For this, we consider Test 4 of Einmahl et al. (2016, p. 37), based on the comparison of γ estimates from four subperiods.For each stock, we assume that the negative log-returns on each day follow possibly different heavy-tailed distributions as in (1), and we test whether the extreme-value index of the negative log-returns is constant between 1989 and 2016.More concretely, first, we identify the stocks for which we had data available in the period between 1989 and 2016, obtaining 240 companies; second, we consider all stocks for which we do not reject the null hypothesis of constant extreme-value index using the same criteria as in Einmahl et al. (2016, Test 4, p. 37).The main findings are similar for the analysis for the 139 stocks retained according to these two step procedure and for the selection of 26 stocks displayed in Figure 3; therefore, we focus below on the latter for presentation clarity; in the Supporting Information, we also present the extended analysis based on the full dataset with 139 stocks.The 26 selected time series, plotted in Figure 3, are of length T ¼ 6893 days.In the Supporting Information, we show that extremes of these stocks are weakly dependent, and that thus that the methodology proposed in Section 2 can be applied.Table 3 presents a list of the 26 selected firms, their economic sectors, their estimated scedasis functions (recall ( 5)) and the Hill estimates (recall ( 6)) corresponding to the negative log-returns of each of these stocks, setting k ¼ 332 (corresponding approximately to taking the threshold as the 95%-percentile) and b ¼ 0:1.At first glance, we can see that the estimated scedasis functions and extreme-value indices do not appear to be homogeneous within each economic sector, but rather there is a remarkable match between each stock's estimated scedasis function and its corresponding cluster centre as determined by our new clustering procedure; more details about our results are provided in Section 4.3.

| Do clusters mirror economic sectors?
In this section, we examine whether the clusters obtained through our approach have any connection with the stocks' economic sectors.Ex-ante, an investor could be tempted to believe that stocks belonging to the same economic sector might be characterized by similar extreme-value properties, but whether this actually holds true is another matter.
To conduct the proposed inquiry, we run the proposed K-means algorithm for heteroscedastic extremes from Section 2.3 to group the selected companies into new clusters.To run our clustering procedure, we first need to choose the number of clusters K; sum of squared errors W K defined in (13) as a function of K ¼ 1,…, 20 for various weight parameters α ¼ 0,0:1, …,0:9,1.Following the elbow method described in Section 3.2, the number of clusters K should be only about two or three when α is small (placing heavier weight on the extreme-value index), perhaps due to the high variability in the estimation of the extreme-value index γ.For larger values of α (placing heavier weight on the scedasis function), the elbow method suggests that K > 3.However, the optimal value of K is not obvious as the curve does not completely stabilize.Here, we fix K ¼ 9 for the rest of the analysis to contrast our results with the nine economic sectors.To reduce the possibility that the K-means algorithm for heteroscedastic extremes generates a suboptimal clustering, we set the initial cluster centres using the approach by Arthur and Vassilvitskii (2007).In addition, for each value of α, we run the algorithm 500 times and select the solution L * with the smallest value of the objective function IðL * ðiÞ ¼ kÞD α ððĉ i , γi Þ, ðc ½k , γ ½k ÞÞ: Figure 5(i) and Table 4 show the scedasis function estimates of the selected companies and the cluster centre estimates produced by our method, when α ¼ 0:5.The choice of the latter value is motivated by the numerical experiments reported in Section 3, which suggest that the validity measures are almost always maximized when α ¼ 0:5; a sensitivity analysis with respect to the choice of α is reported in the Supporting Information.We can see a significant difference between the grouping of companies in Table 3 and the new classifications given in Table 5, suggesting that there is no direct connection between the stocks' economic sectors and their classification in terms of magnitude and frequency of extreme losses.
F I G U R E 4 Sum of squared errors W K in (13) for the data described in Section 4.2, plotted as a function of the number of clusters for various values of the weight parameter α ¼ 0,0:1, …,0:9,1 (bottom to top).
F I G U R E 5 (i) The dashed blue lines correspond to cluster centres estimates obtained by our method, plotted against the scedasis function estimates (grey lines) for the stocks under analysis.(ii) The mode mass function using ν ¼ 75 (solid), ν ¼ 100 (dashed), and ν ¼ 120 (dotted); the grey rectangles correspond to contraction periods in the US economy as dated by the NBER, and the rug corresponds to the modes of the estimated scedasis functions.

| DISCUSSION
This paper develops and applies statistical similarity-based clustering techniques for heteroscedastic extremes, which can be used to group time series of independent observations that are alike in terms of magnitude and dynamics of extreme observations.Our method yields a data-driven partition of stocks into clusters of risk, and can be used to uncover which stocks resemble each other the most in terms of probabilistic features that are critical from a risk management perspective.
Clustering scedasis functions and extreme-value indices involves the challenge of grouping objects composed of both a function (scedasis) and a scalar (extreme-value index), and thus the need to partition a product-space.To address this issue, we have proposed valid dissimilarity measures able to transition from one profile-space to the other, characterized by a tuning parameter to be set by the analyst, deciding where the clustering algorithm should place more weight.To our knowledge, this is the first paper tackling the challenging problem of conducting a cluster analysis of both functions and scalars.Similarly to Einmahl et al. (2016), our approach cannot treat stocks for which the extreme-value index varies in time.
Our clustering algorithm is designed to group stocks whose marginal distributions of extremes over time have similar characteristics.In particular, considering the scedasis function and extreme-value index as objects of interest, our clustering method puts strong focus on the dynamics of univariate extremes over time (through the scedasis function) and the overall magnitude of the most extreme losses (through the extremal index); in addition to the scedasis function ('relative' scale parameter) and extreme-value index (tail shape parameter), it is possible to use a Pareto-like tail specification to estimate the 'overall' scale parameter and deduce the time-varying value-at-risk, which provides a more complete description of extreme losses.As described in our paper, this can be used to cluster univariate risk loss patterns.In future work, it would also be interesting to devise a clustering algorithm that takes extremal dependence into account as a way of tracing and outlining portfolios exhibiting joint (i.e., simultaneous) extreme losses.In particular, the characterization of extremal dependence among stocks is crucial for assessing the risk of an index, defined for example as a weighted sum of the prices of the individual stocks.We leave such an important question for future research.
Some final remarks on our financial illustration are in order.In practice, return series may exhibit serial dependence and volatility clustering.
While an option to come closer to the independence assumption is to work with monthly returns, that also leads to a substantial reduction in the amount of data, hence weakening the case for learning about limiting objects-such as the scedasis and the extreme value index.In addition, it be noted that the allocations devised via our method are not static as the scedasis evolves over time.Thus, recent events such as the COVID-19 pandemic, the Black Monday 2020 crashes, rising inflation and the war in Ukraine, can naturally impact the allocations reported in our illustration-and so can any other future event that influences stock markets.Finally, we acknowledge survivorship bias in our empirical findings, as the analysis focuses on companies that have succeeded in the market-while ignoring those that have failed.

DATA AVAILABILITY STATEMENT
The data are publicly available from the R package extremis.
The data that support the findings of this study are available in Yahoo Finance at https://finance.yahoo.com.These data were derived from the following resources available in the domain: -R package extremis, https://cran.r-project.org/web/packages/extremis/index.html.
with D c : C ! C and D γ : ð0,∞Þ !½0, ∞Þ being dissimilarity measures.It can be shown that if D c and D γ are distance functions, then so are D conv α and D max α in Examples 1 and 2, respectively, for every value of α ð0,1Þ.
into meaningful clusters from the T A B L E 2 Validity measures for Scenarios A, B, and C as defined in Section 3.1 obtained from 1000 Monte Carlo simulations.The Rand and silhouette indices evaluate the clustering of the K-means algorithm for heteroscedastic extremes as a function of the sample size (T) and the weight parameter (α).
Figure 4 displays the F I G U R E 2 Sum of squared errors W K in (13) for Scenarios A, B (left), and C (right) described in Section 3.1 with T ¼ 5000 and α ¼ 0:5, as a function of the number of clusters and averaged over 1000 Monte Carlo simulations.The blue and red curves correspond to Scenarios A and B respectively, while the magenta curve corresponds to Scenario C. F I G U R E 3 Negative log-returns for 26 selected stocks from the London Stock Exchange.
the estimated scale parameter âi ¼ Y ðTÀkÞ k K-geometric means algorithm in the sense that cluster centres can be shown to be geometric means; more precisely, Step 2 can be replaced with μ ðnewÞ ¼ fVaR Values closer to one correspond to better performance.
T A B L E 3 Estimated scale parameters (â), extreme-value indices (γ), and scedasis functions (ĉ) (solid black) plotted over time, for the stocks under analysis and corresponding economic sectors.The dashed blue curves correspond to the scedasis' cluster centres estimated by our algorithm.The daggers ( † ) represent FTSE 100 companies.