In this work we present for the first time an application of the Pareto approach to the modelling of the excesses of galaxy clusters over high-mass thresholds. The distribution of those excesses can be described by the generalized Pareto distribution (GPD), which is closely related to the generalized extreme value (GEV) distribution. After introducing the formalism, we study the impact of different thresholds and redshift ranges on the distributions, as well as the influence of the survey area on the mean excess above a given mass threshold. We also show that both the GPD and GEV approaches lead to identical results for rare, thus high-mass and high-redshift, clusters. As an example, we apply the Pareto approach to ACT-CL J0102−4915 and SPT-CL J2106−5844 and derive the respective cumulative distribution functions of the exceedance over different mass thresholds. We also study the possibility to use the GPD as a cosmological probe. Since in the maximum likelihood estimation of the distribution parameters all the information from clusters above the mass threshold is used, the GPD might offer an interesting alternative to GEV-based methods that use only the maxima in patches. When comparing the accuracy with which the parameters can be estimated, it turns out that the patch-based modelling of maxima is superior to the Pareto approach. In an ideal case, the GEV approach is capable to estimate the location parameter with a per cent level precision for less than ∼100 patches. This result makes the GEV-based approach potentially also interesting for cluster surveys with a smaller area.