On a new class of tests for the Pareto distribution using Fourier methods

We propose new classes of tests for the Pareto type I distribution using the empirical characteristic function. These tests are $U$ and $V$ statistics based on a characterisation of the Pareto distribution involving the distribution of the sample minimum. In addition to deriving simple computational forms for the proposed test statistics, we prove consistency against a wide range of fixed alternatives. A Monte Carlo study is included in which the newly proposed tests are shown to produce high powers. These powers include results relating to fixed alternatives as well as local powers against mixture distributions. The use of the proposed tests is illustrated using an observed data set.


Introduction
The Pareto distribution, nowadays commonly known as the Pareto type I distribution, was originally introduced by Pareto (1897).Since then several extensions of this distribution have been proposed.These extensions are achieved via the inclusion of a location, scale and inequality parameter, corresponding to the Pareto type II, III and IV distributions, respectively.
Additionally, a so-called generalised Pareto distribution has been introduced.For an in depth discussion of the various types of Pareto distributions as well as the relationships between them, the interested reader is referred to Arnold (2015).
The Pareto distribution is a popular model in engineering, economics, finance and actuarial science, especially where phenomena characterised by heavy tails are studied, see, e.g.Fisk (1961), Ismaïl (2004), Nofal & El Gebaly (2017).Concrete examples of the use of the Pareto distribution include the modelling of failure times of mechanical components, see Bourguignon et al. (2016), as well as the modelling of excess of losses in insurance claims, see Rytgaard (1990).Due to its heavy tail, this distribution also plays a pivotal role in extreme value theory see, Beirlant et al. (2004).Further examples of the use of the Pareto distribution can be found in Amin (2007) and Soliman (2000).A number of characterisations for the Pareto distribution have been developed in the literature, see, e.g.Gupta (1973) as well as Pareto (1897).
Due to the popularity of this distribution as well as its wide range of applications, goodnessof-fit tests have been developed in order to test the hypothesis that an observed dataset is compatible with the assumption of being realised from this distribution.For recent overview papers and discussions of some of these tests, see Chu et al. (2019) and Ndwandwe et al. (2022) as well as the references therein.Chu et al. (2019)  We propose new classes of goodness-of-fit tests for the Pareto distribution based on a characterisation involving the distribution of the sample minimum.In order to proceed, we introduce some notation.Let X 1 , . . ., X n be independent and identically distributed (i.i.d.) realisations from a continuous random variable, X, with unknown distribution function F .
Let X 1:n < • • • < X n:n denote the order statistics of X 1 , . . ., X n .X is said to follow the Pareto distribution with shape parameter β, denoted by X ∼ P (β), if it has distribution function for some β > 0. The composite null hypothesis to be tested is for some unspecified β > 0. This hypothesis is tested against general alternatives.Throughout this paper, the value of β is estimated by its method of moments estimator where X n is the sample mean.
The remainder of this paper is structured as follows.In Section 2, new classes of tests are proposed for the Pareto distribution, whereas Section 3 contains theoretical results pertaining to the asymptotic behaviour of the tests.A Monte Carlo study is presented in Section 4, while Section 5 contains an example pertaining to observed data.The paper concludes in Section 6.

A new class of tests for the Pareto distribution
The proposed tests are based on a characterisation of the Pareto distribution via the distribution of the sample minimum.This characterisation is discussed in Allison et al. (2022) and is as follows: Theorem 2.1.Let X, X 1 , . . ., X n be i.i.d.random variables from a continuous distribution with distribution function F .Let m be an integer such that 2 ≤ m ≤ n.X 1/m and min(X 1 , . . ., X m ) have the same distribution if, and only if, for some β > 0.
Since a random variable is characterised by its Fourier transform, we can base a test on the V and U empirical characteristic functions of X 1/m and min(X 1 , ..., X m ).Tests based on these quantities have been shown to not only possess desirable asymptotic properties, but also produce high powers in finite sample settings; the interested reader is referred to Meintanis (2016). Let be the characteristic functions of X 1/m and min(X 1 , . . ., X m ), respectively.Denote the empirical versions of φ m and ξ m by e itmin(X k 1 ,...,X km ) .
Theorem 2.1 implies that, for all t ∈ R, φ m (t) = ξ m (t) if, and only if, X ∼ P (β) for some β > 0. We propose a class of tests for the hypothesis in (1.1) based on a weighted L2 distance between φ n,m and ξ n,m : where w a (t) is an appropriate weight function depending on a user defined parameter a.This weight function is included in order to ensure the existence of the integral.We choose w a such that ∞ −∞ w a (t)dt < ∞.Popular choices of w a include the Laplace and Gaussian kernels.Note that where Therefore, S n,m,a is a V statistic of order 2m with kernel h.The form of S n,m,a specified in (2.1) is computationally expensive (e.g., if m = 4, then computing S n,m,a requires the evaluation of an eight fold summation).However, after some combinatorics and algebraic manipulation, ξ n,m (t) can be expressed as a single sum in terms of the order statistics: When using a Laplace kernel as weight function, w a (t) = e −a|t| , we denote the resulting test statistic by S (1) n,m,a ; Upon setting the weight function equal to a Gaussian kernel, w a (t) = e −at 2 , we obtain S (2) n,m,a ; Above we consider S n,m,a , based on V statistics.We now turn our attention to the situation where the empirical characteristic functions are estimated using U statistics.Denote the difference between the U empirical characteristic functions of X 1/m and min(X 1 , ..., X m ) by where After some algebra it follows that ψ n,m (t) can be expressed as a single summation; where From Theorem 2.1 it follows that, if X 1 , . . ., X n is a random sample from the Pareto distribution, then the difference between φ n,m (t) and ψ n,m (t) should be close to zero.We thus suggest the test statistic After some algebra, we obtain the following easily calculable expression for the test statistic based on the choices w a (t) = e −a|t| and w a (t) = e −at 2 , respectively: and

Consistency of the tests
In this section we only present the results pertaining to T n,m,a ; the derivations relating to S n,m,a follows from analogous arguments and are therefore omitted for the sake of brevity.
Before proceeding to prove the consistency of T n,m,a , some comments about the asymptotic null distribution of the test statistic are in order.
T n,m,a is formulated as a weighted L2-type statistic involving empirical characteristic functions.The asymptotic null distribution of these classes of statistics are studied in, amongst others, Feuerverger & Mureika (1977), Baringhaus & Henze (1988), Klar & Meintanis (2005) as well as Baringhaus et al. (2017).The asymptotic null distribution of T n,m,a will typically correspond to that of , where V (•) is a Gaussian process with zero-mean.T m,a has the same distribution as ∞ j=1 λ j χ 2 j , where χ 2 j are i.i.d random variables following a chi-squared distribution with one degree of freedom.However, the covariance matrix of V (•) as well as the eigenvalues λ j depend on the unknown underlying distribution F , usually in a complicated way.We therefore make use of a parametric bootstrap procedure in order to estimate the critical values of these tests (see Section 4.1).
The following theorem is concerned with the asymptotic behaviour of T n,m,a under fixed alternative distributions.
Theorem 3.1.Let X 1 , ..., X n be independent copies of a continuous random variable X with finite mean, then as n → ∞, with ∆ m,w = 0 if, and only if, X ∼ P (β).
Proof.Recall from (2.2) that where We have, by the law of large numbers, that φ n,m (t) . By the continuous mapping theorem it follows that • A test, proposed by Zhang (2002), based on the likelihood ratio.The test statistic is given by where is the likelihood ratio statistic and . The computational form of the test statistic is • A test based on Mellin transform proposed by Meintanis (2009).The test statistic is given by where Choosing w(x) = e −ax , one has and The value of the tuning parameter a is set to 1 in order to obtain the numerical results presented.

Simulation setting
Power (and size) estimates are calculated at a significance level of 5% for sample sizes n = 20 and n = 30 using 50 000 independent Monte Carlo replications.Since the null distributions of the test statistics depend on the value of the unknown shape parameter β, we use a parametric bootstrap procedure to calculate numerical critical values.For computational efficiency, we employ the warp-speed bootstrap methodology proposed by Giacomini et al. (2013).This methodology is outlined in the following algorithm: 1. Draw a sample of size n, say X 1 , . . ., X n from an alternative distribution and estimate the parameter β by β n = X n /(X n − 1).
2. Calculate the value of the test statistic say S = S n (X 1 , . . ., X n ).
3. Generate a bootstrap sample X * 1 , . . ., X * n by independently sampling from a P ( β n ) distribution.Calculate the value of the test statistic using the bootstrap sample, S * = S n (X * 1 , . . ., X * n ).
.Alternative  The simulation study presented considers two sets of power results.The first is concerned with powers against the fixed alternative distributions specified in Table 1.The resulting empirical powers for sample sizes of n = 20 and n = 30 can be found in Tables 2 and 3, respectively.Second, we consider some local power estimates where we simulate data from two families of mixture distributions.In the first of the mixture distributions used, we simulate from a LN (1) with probability p, and from a Pareto distribution (with the same mean as the LN (1)) with probability 1 − p; the empirical powers obtained are reported in Table 4.The second family of mixture distributions is obtained upon replacing the LN (1) distribution by the exponential distribution with mean 0.5; the calculated powers can be found in Table 5.

Density function Notation
The results shown in Tables 4 and 5 include two powers for each listed distribution; the first is associated with a sample size of 20, while the second is the estimated power based on a sample of size 30.
The reported empirical powers of the new tests were obtained by setting m = 3 and a = 2 in all instances.Several other values for these parameters were considered when performing the Monte Carlo study; however, the specified choices generally resulted in high powers.For the sake of brevity, we omit the results pertaining to other parameter configurations and only display those associated with m = 3 and a = 2.All calculations were performed in R; see R Core Team (2020).

Simulation results
The power estimates in Tables 2 to 5 are the percentages (rounded to the nearest integer) of the number of samples resulting in a rejection of the null hypothesis.For ease of comparison the highest two powers in each row (including ties) are printed in bold.
The results shown in Tables 2 and 3 indicate that all the tests maintain the nominal significance level of 0.05.Furthermore, the results demonstrate that the proposed test S (2) n,3,2 outperforms all the other tests for the majority of alternatives considered, closely followed by S (1) n,3,2 and G n,2 .The powers of T (2) n,3,2 but are still competitive in terms of power against the traditional tests; i.e., KS n , CV n and AD n .We also note that the tests T

Practical application
Below, we apply each of the tests considered to an observed data set.The data concerned is the the lifetime tournament earnings, up to 1980, of all professional golfers whose earnings exceeded $700 000, as reported in the Golf magazine, 1981 yearbook.This data set was also discussed and analysed by Arnold (2015).The reported salaries, in thousands of dollars, can be found in Table 6.
The support of the data in  We now turn our attention to the results obtained using the goodness-of-fit tests discussed in Section 4. The is in accordance with the findings of Arnold (2015).Based on the numerical performance of the tests, we recommend using the tests based on V statistics (using a Gaussian kernel and setting the tuning parameters to m = 3 and a = 2; this test is denoted S (2) n,3,2 in the text).
review tests for the generalised Pareto distribution as well as the Pareto types I and II, whereas Ndwandwe et al. (2022) investigates several existing tests specifically for the Pareto type I distribution.Although tests exist for the Pareto type I distribution, they are few in number when compared to those for other distributions such as, for example, the normal or exponential distributions.In the remainder of this paper, we refer to the Pareto type I distribution simply as the Pareto distribution.

•
hence an application of Lebesgue's theorem of dominated convergence yields (3.1).In view of the characterisation given in Theorem 2.1, it follows that ∆ m,w is zero if, and only if X ∼ P (β) 4 Monte Carlo study In this section Monte Carlo simulations are used to compare the finite sample performance of the newly proposed tests to the following five existing goodness-of-fit tests for the Pareto distribution: The traditional Kolmogorov-Smirnov (KS n ), Cramér-von Mises (CM n ) and Anderson-Darling (AD n ) tests.
4. Repeat steps 1-3, MC times to obtain S 1 , . . ., S M C and S * 1 , . . ., S * M C , where S j and S * j denote the values of the test statistic for the j th sample generated in Steps 2 j > S * M C(1−α) :M C for j = 1, . . ., M C, where S * j:M C denotes the j th order statistic of S * 1 , . . ., S * M C , where • denotes the floor function and I(•) denotes the indicator function.
2 produce the highest powers against the LN (2.5) alternative by a substantial margin.The results obtained for the mixture distributions are in accordance with those associated with the fixed alternatives.
the resulting data are realised from a Pareto distribution.When fitting a Pareto distribution to the data using the method of moments, we obtain β n = 2.495.Before proceeding to the results pertaining to the formal testing procedures, we consider visual tests of fit for the Pareto distribution.Figure1shows the empirical distribution function of the rescaled data, together with the fitted Pareto distribution function.Figure2shows a quantile-quantile plot comparing the empirical quantiles to those of the fitted distribution.Both figures indicate a close correspondence between the empirical properties of the data and those expected under the null hypothesis of the Pareto distribution.
Figure 1: Empirical and fitted distribution functions.

Table 1 :
Alternative distributions used

Table 2 :
Empirical powers against fixed alternatives for n = 20

Table 3 :
Empirical powers against fixed alternatives for n = 30

Table 4 :
Empirical powers against lognormal mixturesp KS n CV n AD n ZA n G n,2 S

Table 5 :
Empirical powers against exponential mixturesp KS n CV n AD n ZA n G n,2 S

Table 6 :
The golfer data set.
Table 7 contains the test statistic values with the corresponding p-values of the tests.These p-values were calculated based on 10 000 samples of size 50 simulated from a It is clear from the reported p-values in Table 7 that none of the tests considered reject the assumption that the data are realised from a Pareto distribution at a 5% level of significance.