Generalised Income Inequality Index

This paper proposes a deep generalisation for income inequality indices. A generalised income inequality index that depends on two parameters and that involves a large set of income inequality indices in the same framework is proposed. The two parameters control the sensitivity of the generalised index to different levels of the income distribution. A thorough investigation of the generalised index paves the way for understanding the influence of the low, middle and high incomes on various income inequality indices and thereby facilitates the choice of multiple indices simultaneously for a better analysis of inequality as advocated by several recent studies. Moreover, two methods for estimating the generalised index in the case of finite populations are shown. A new method for estimating the inequality indices is proposed.


Introduction
Since the end of the 19th century, the degree of inequality in the distribution of income and wealth has been a matter of great interest.Income inequality is considered an important measure for forecasting the wealth of a country (Stiglitz, 2012).Its increase is considered as a brake for economic growth and a danger for the political and economical stability of a country.While, on the contrary, its decrease is a symptom of well-being and of best prospects for the future.Therefore, it is crucial to have an index for keeping track of the level and trend of income inequality and for monitoring economic policies or forecasting their effects on it.
The Gini index (Gini, 1914) is the most famous and widespread inequality measure.It is available for almost all the countries in the world from various international organisations' datasets, such as the World Bank 'Inequality around the world' dataset, the UNU-WIDER World Income Inequality Database, the EurLIFE database and the UNDP annual report (Decancq & Lugo, 2012).Furthermore, in the most developed countries, National Statistical Institutes carry out yearly surveys and measure the income inequality and its evolution through time (see Osier, 2009).
However, recent studies suggest that the use of just one inequality index is not sufficient to have a complete picture on income inequality (Piketty, 2015;Osberg, 2017).In fact, each inequality measure proposed in the literature has its own sensitivity to different parts of the income distribution.That is, it pays more attention to a certain part of the income distribution while diminishing the other parts.Therefore, it is necessary to be backed up by more than one inequality measure to prevent this drawback and to have a global view of the inequality.
In this perspective, we propose a generalised income inequality index that can be seen as the result of the continuation of the work by Nygård and Sandström (1981,1985,1989) and Sandström et al. (1985Sandström et al. ( ,1988)).Indeed, Nygård andSandström (1981,1985) consider a generalisation for several inequality measures.
The generalised index we propose depends on two parameters which then control the sensitivity of the inequality index to different levels of the income distribution.Several well-known inequality indices (such as the Bonferroni index, the Gini index, the Mehran index, the Piesch index, the De Vergottini index, the Pietra index also known as the Robin Hood index) descend from or are related to the expression of the generalised index, and therefore, once and for all it is possible to establish their sensitivity to different parts of the income distribution.In fact, the study of the generalised income inequality index makes it possible to analyse and compare the influence of the low, middle and high incomes on various inequality indices.Furthermore, several new inequality measures can be simply derived by tuning the two parameters according to the required sensitivity.
The paper is organised as follows.In Section 2, the notation is introduced and some propaedeutic results are covered.In Section 3, the expression of the generalised index is presented.Section 4 demonstrates how the well-known inequality indices descend from the generalised index, while their sensitivity to different parts of the income distribution is discussed in Section 5. Furthermore, two methods for approximating the generalised index in the case of finite populations and the related sampling estimators are shown in Sections 6 and 7. Finally, after a simulation study on real data (Section 8), conclusions are drawn in Section 9.

Notation
Consider Y a non-negative continuous random variable with cumulative distribution function FðyÞ and probability density function (PDF) f ðyÞ ¼ dFðyÞ=dy.Its quantile function is defined by QðpÞ ¼ inf fyj FðyÞ ≥ pg, p ∈ ½0; 1.Its mean, if it exists, is defined by and its partial mean by μðyÞ ¼ R y 0 tdFðtÞ FðyÞ : The Lorenz (1905) curve is The Lorenz curve can also be defined using the partial mean: The Bonferroni curve is and the complementary Bonferroni curve is 88 DONG ET AL.

Generalisation
For unification and comprehensive understanding of the existing inequality indices in the same framework, a generalised income inequality index is proposed.The generalised index is composed of the complementary Bonferroni curve and the beta distribution.
Consider the beta distribution whose PDF is where Bða; bÞ is the beta function: The generalised index can be defined as follows: BonðpÞ gðpja; bÞ dp: (1) It is bounded between 0 and 1 as BonðpÞ is bounded between 0 and 1.Some particular cases of the generalised index are given in Table 1, which involves the Bonferroni index, the Gini index, the Mehran index and the Piesch index.The Pietra index (also known as the Robin Hood index) and the De Vergottini index are also related to the generalised index.All these indices will be briefly presented in the next section.
By substituting p by FðyÞ in Expression (1), GIða; bÞ ¼ 1 Bða; bÞ This integral does not always converge because it depends on Fð:Þ and the parameters a and b.
Consider the incomplete beta function: The regularised incomplete beta function is defined as I y ða; bÞ ¼ Bðy; a; bÞ Bða; bÞ : Result 1 If a > 1, the generalised index GI(a,b) can also be written as where I FðyÞ ða À 1; bÞ is the regularised incomplete beta function.
The proof of Result 1 is given in the supporting information.Corollary 1.If the expectation of the random variable Y exists and if a > 1, the integral given in (2) converges.
Proof According to Result 1, since I FðyÞ ða À 1; bÞ ≤ 1 and the expectation of the random variable Y exists, the integral converges.□

Family of GI(a,b) and Counterexamples
The income inequality indices associated with the generalised index are presented in this section.Their historical background and development have been introduced.It has been shown mathematically how these indices are linked with the generalised index.Moreover, indices that cannot be expressed as special cases of the generalised index are added in order to clarify the family of inequality indices that GI(a,b) encompasses.

Bonferroni Index
The Bonferroni index has been proposed by Carlo Emilio Bonferroni in 1930.At the begining, it was ostracised by Corrado Gini and his followers (see Giorgi, 1998, for more details).However, it has been re-discovered 40 years later by Piesch (1975) and Nygård & Sandström (1981).The Bonferroni index shares a lot of properties with the Gini index, but it is more sensitive to the left tail of the income distribution than the Gini index (Pizzetti, 1951;Dong et al., 2021).Furthermore, several extensions and interpretations proposed for the Gini index hold also for the Bonferroni index (see Tarsitano, 1990, for a comprehensive review).
The Bonferroni index is defined as

Gini Index
The Gini (1914) index is the most famous and widespread inequality measure, also known in the literature as the Gini coefficient or the Gini ratio.The Gini index has been proposed for the first time by the namesake author in 1914 with the name of Concentration Ratio.
The success and spread of the Gini index are mainly justified by its simplicity and ease of interpretation due to its intuitive graphical relation with the Lorenz curve (see Giorgi, 1992;2020).It satisfies the anonymity, scale independence, population independence and the Pigou-Dalton transfer principle (De & Chattopadhyay, 2017).Furthermore, it can be decomposed by sources and by groups in several and different ways.It has good inferential properties (see Langel & Tillé, 2013;Graf & Tillé, 2014;Giorgi & Gigliarano, 2017) as well as original interpretations, extensions and application in different fields.
The Gini index is defined as Substitute p for FðtÞ in Expression (5):

Mehran Index
The Mehran (1976) index belongs to the class of linear measures proposed by Piesch (1975).Mehran derived the expression of his index looking at the condition under which linear measures satisfy the Pigou-Dalton transfer principle.The Mehran index satisfies this principle even when stronger transfer principles are stipulated and is defined as Substitute p for FðyÞ in Expression ( 7):

Piesch Index
The Piesch index has been proposed in 1975 in a pioneeristic volume on income inequality measures by the namesake author.Same as the Mehran index, it belongs to the class of linear meausures.Both indices can be seen as special cases of a general algorithm proposed by Giaccardi (1950) and by Benedetti (1980) (see Giorgi & Pallini, 1990, for further details).The Piesch index is defined as Substitute p for FðtÞ in Expression (9):

1 st New Index
A new index is here proposed.The generalised index expression for a=1 and b=2 is developed to fill the gap of the indices defined in the previous subsections for adding a more combination of the parameters a and b between the integers 1 and 3, and introduce an index with this level of sensitivity in addition to those already known in the literature.The 1 st new index of income inequality is defined as Substitute p for FðyÞ in Expression ( 11):

2 nd New Index
As done for the 1 st new index, the two parameters in the expression of the generalised index are tuned for obtaining a second new index with a different levels of sensitivity for completing the combinations of the parameters a and b among the integers 1, 2 and 3.The 2 nd new index of income inequality is obtained setting a ¼ 1 and b ¼ 3 in the expression of the generalised index, and it is defined as Substitute p for FðyÞ in Expression ( 13): 4.7 De Vergottini Index Mario De Vergottini (1950) proposed another measure of inequality, the De Vergottini index, which is more sensitive to the right tail of the income distribution than the Gini index.Furthermore, he defined a class of inequality indices that includes both the Gini and Bonferroni indices and, of course, the De Vergottini index.
The De Vergottini index is not a particular case of the generalised index but is related to it.The De Vergottini index is defined as Substituting p for FðyÞ in Expression (15): The De Vergottini index does not always exist since the integral does not converge for any cumulative distribution function.

Pietra Index
The Pietra index is also related to the generalised index.Few months after the publication of the Gini Concentration Ratio, Gaetano Pietra proposed a simple geometrical interpretation of it.He provided the formulation of the Lorenz curve in the continuous case for the first time in the literature, and he derived the expression of the longest vertical distance between the Lorenz curve and the line of perfect equality, which would later receive the appellation the Pietra index.The same result has been proposed by Hoover (1936Hoover ( ,1984) ) and Schutz (1951).Like the Gini index, it satisfies the anonymity, scale independence, population independence and the Pigou-Dalton transfer principle.The Pietra index is defined as This index is also known as the Robin Hood index due to its simple economic connotation as it can be directly interpreted as the proportion of incomes that should be taken from the wealthier half of the population to the poorer half of the population in order to reach the state of perfect equality.
Result 2 The Pietra index can also be written as Proof The first derivative of fp À LðpÞg with respect to p is The second derivative of fp À LðpÞg with respect to p is negative: Thus, the maximum distance between p and LðpÞ is obtained when arg max p fp À LðpÞg ¼ FðμÞ; The Pietra index is related to the generalised index as

Counterexamples
Counterexamples are the inequality indices that cannot be expressed as special cases of GI(a,b), which include the Zenga concentration index (Zenga, 1984;Tarsitano, 1990;Zenga, 2007;Greselin et al., 2010;Langel & Tillé, 2012).Zenga index can be expressed using the (complementary) Bonferroni curve.See, where the Zenga function ZðpÞ is defined by One argument in favour of the Zenga index is described by Greselin et al. (2010, p. 3): '[…] the Zenga index detects, with the same sensibility, all deviations from equality in any part of the distribution'.However, the Zenga index is not a particular case of the generalised index.
Additionally, the Theil index ( 1967) is based on the divergence of Kullback & Leibler (1951), which follows a completely different principle from that of the Lorenz curve.The Theil index is a particular case of the generalised entropy index (Shorrocks, 1980).The whole family of Atkinson indices (Atkinson, 1970) can be presented as a monotone transformation of the generalised entropy index.All these indices are unrelated to the generalised index in this paper.

Comparisons of Inequality Indices
From Expression (1), the generalised index can be seen as a functional in which the complementary Bonferroni curve is weighted by the beta density function.In view of this, the parameters a and b defined in the generalised index control the sensitivity to different levels of the income distribution.The beta density function exhibits a wide variety of shapes with different values of a and b assumed.To be specific, the beta density function is positively skewed for a < b, negatively skewed for a > b and symmetric for a ¼ b.When a ¼ b ¼ 1, GI(1,1) equals the Bonferroni index and the beta distribution is the same as the standard uniform distribution.Thus, the complementary Bonferroni curve is weighted equally for all income levels for GI (1,1), which consequently is used as a baseline for discussion.When a < 1 and b < 1, the beta density function is U-shaped, and therefore, compared with the Bonferroni index, both the highest and lowest incomes have relatively significant impacts on the generalised index.On the contrary, when a > 1 and b > 1, the beta density function is unimodal with its mode equal to ða À 1Þ=ða þ b À 2Þ, and thus, the generalised index takes more into account low incomes for a < b, high incomes for a > b and middle incomes for a ¼ b than the Bonferroni index.Furthermore, when a < 1; b ≥ 1 or a ¼ 1; b > 1, the beta density function is strictly decreasing, which implies that the generalised index is relatively sensitive to the lowest incomes compared with the Bonferroni index.On the other hand, when a ≥ 1; b < 1 or a > 1; b ¼ 1, the beta density function is strictly increasing, which results in the generalised index being more sensitive to the highest incomes as opposed to the Bonferroni index.
Consequently, the Gini index plays down lower incomes and emphasises higher incomes in contrast to the Bonferroni index because the Gini index has a(=2) and b(=1) with the underlying weighting distribution being a straight line with slope +2 towards the complentary Bonferroni curve.The Piesch index puts more weights on the highest incomes than the Gini index since for the Piesch index, að¼ 3Þ > 2 and bð¼ 1Þ ¼ 1, resulting in the beta density function being convex and strictly increasing.These results complement the recent findings of Gastwirth (2017).The 1 st new index instead takes more into account lower incomes and downplays higher incomes compared with the Bonferroni index because for the 1 st new index, a=1 and b =2, leading the weight function to be a straight line with slope À2.Subsequently, the 2 nd new index places more importance on the lowest incomes than the 1 st new index since for the 2 nd new index, að¼ 1Þ ¼ 1 and bð¼ 3Þ > 2, bringing about the beta density function being convex and strictly decreasing.The Mehran index focuses more on middle incomes in comparison with the Bonferroni index since for the Mehran index, a ¼ bð¼ 2Þ > 1, for which the beta density function is symmetric and unimodal with its mode equaling 1/2.
The Pareto (1897) distribution marks the starting point of statistical investigations on personal incomes.Another commonly used distribution to model incomes is the log-normal distribution.There is scientific evidence that the distribution pattern of the log-normal with power law tail is the universal structure of personal income distribution (Souma, 2001).Out of courtesy for history, the Pareto distribution is chosen as a first-step example for calculating the indices.
Suppose Y follows the Pareto distribution.Its PDF is where y m > 0 is the minimum possible value of Y and α > 0 is its shape parameter.The Pareto distribution has an infinite expected value when 0 < α ≤ 1.Therefore, it is sensible to restrict the shape parameter to α > 1 henceforth for a reasonable model of income distribution.
The various indices including the new indices derived from the generalised income inequality index are illustrated in Table 1.

Result 4
The generalised income inequality index under the Pareto distribution is where ψðxÞ is the digamma function: The proof of Result 4 is given in the supporting information.
Result 5 For the Pareto distribution, the De Vergottini index takes the form: Result 6 For the Pareto distribution, and thus, the Pietra index takes the form: In Figure 1a, the values of the indices with parameters a; b ∈ 1; 2; 3; …; 500 f g under the Pareto distribution for different shape parameter α are plotted.When the shape parameter α increases, the value of the index decreases for any given parameters a and b.This is due to the fact that when the shape parameter α increases, the income data become less dispersed, and therefore, the inequality level shrinks.
In Figure 1b, only the values of the indices under the Pareto distribution with shape parameter α ¼ 2:0 are presented.Thus, the aforementioned results can be appreciated by a simple visual inspection.As explained, parameters a and b of the beta distribution define a system of weights exerting on the complementary Bonferroni curve.The complementary Bonferroni curve is a non-increasing curve.As observed, when a; b ≥ 1, fixing a and enlarging b result in the generalised index being gradually more (resp.less) sensitive to the low (resp.high) incomes and the left side of the complementary Bonferroni curve being increasingly weighted, and thus, the value of the index rises.On the other hand, when a; b ≥ 1, fixing b and increasing a cause the generalised index to be gradually more (resp.less) sensitive to the high (resp.low) incomes and the right side of the complementary Bonferroni curve being increasingly weighted, and therefore, the value of the index falls.When a ¼ b→∞ and a=ða þ bÞ→1=2, the index tends to Bonð1=2Þ.Furthermore, the generalised index seems to be sensitive to changes in a; b when both parameters are of very small values.For the Pareto distribution, in particular, the index changes also vastly in its value when fixing a to be large and increasing/decreasing b when b is relatively small.This is due to the fact that the Pareto distribution is heavy-tailed and indices of such forms are sensitive to the highest incomes.

Finite-Population Representations
Since, in the real-world applications, finite populations are of interest, the expression of the generalised index is further developed for finite populations.Three rules of defining the income inequality indices in the finite-population case are presented.

Rectangular Rule
Consider a finite population U : ¼ f1; …; i; …; N g.The variable of interest takes the value y i on unit i ∈ U .Without loss of generality, assume that y i 's are sorted in ascending order.
The rectangular version of defining the generalised income inequality index is GIða; bÞ r ¼ 1 Bða; bÞ where Particular cases applying the rectangular rule are

Trapezoidal Rule
The trapezoidal rule of defining the income inequality indices is here introduced.Following the insight of Pietra (1915) who used this rule for computing the concentration area between the Lorenz curve and the equidistribution line for the Gini index, Giorgi & Guandalini (2013) adopt it for estimating the Bonferroni index.Needless to say, this technique can be extended to the generalised index and to all the income inequality indices associated.
The trapezoidal rule of defining the generalised income inequality is GIða; bÞ t ¼ Particular cases applying the trapezoidal rule are

Reformulation Version
By discretising the integrals in Expressions ( 4), ( 6), ( 12), ( 8), ( 10), ( 14) and ( 16) in Section 4 and FðtÞ by ðk À 1=2Þ=N, we propose a new method of defining the income inequality indices.It is named as reformulation version as the method comes from the reformulation of the formulas of the income inequality indices.

Estimation from a Sample
Income data are often collected by means of sample surveys, and therefore, estimation and its properties for the inequality indices should not be overlooked.Consider a random sample S selected from population U.The values taken by the variable of interest are still assumed to be y i 's but are only known for the units selected in the sample.Consider also that the unit y i takes weight w i .The weight w i could be the inverse of the inclusion probability: & Thompson, 1952).The weights may also be subjected to a calibration procedure (Deville & Särndal, 1992;Särndal, 2007) and could be adjusted in order to compensate questionnaire non-response (Särndal & Lundström, 2005).
One plausible and intuitive estimation for the indices is to use the plug-in estimator.Define The plug-in estimator of the generalised index under the rectangular rule is c GIða; bÞ r ¼ 1 Bða; bÞ while the plug-in estimator of it following the trapezoidal rule is c GIða; bÞ t ¼ 1 Bða; bÞ Particular cases of the plug-in estimators of the income inequality indices built with the rectangular rule are The plug-in estimators of the associated income inequality indices following the trapezoidal rule and the reformulation version are derived similarly to the rectangular rule.In fact, the inequality indices estimated using the aforementioned three approaches converge very quickly to the same value when the number of observation increases.As the estimators are non-linear, estimation of the standard errors associated to sample estimates can be difficult.Variance estimation is not the focus of this paper.Nevertheless, a practical method of estimating the sampling variances is promoted as follows.For the estimation of the sampling variance of a non-linear statistic, one approach is to use the linearisation method, for example, the Graf (2011) method.By computing the derivatives of the sample estimator with respect to the indicator variables of the presence of the units in the sample, the linearised variables can be derived.The linearised variables are then used in the expression of the variance estimator of the total estimator for estimating the sampling variance (see, e.g.Dong et al., 2021).The Graf method can be applied to almost all sampling designs as long as the expression of the variance estimator of the total estimator under the sampling design is known.The income considered is the household gross income including imputed rents but excluding social contribution.In the simulation, the income distribution has been reconstructed by duplicating household income with respect to sampling weights.The reconstructed income distribution consists of N =25,775,872 households in Italy in 2015.The income distribution is heavily right-skewed as the mean is equal to 42,223 €, which is much larger than the median (=34,196 €).

Simulation
The aforementioned income inequality indices (viz.Bonferroni, Gini, 1 st new index, Mehran, Piesch, 2 nd new index and Pietra) estimators are investigated.The indices are computed firstly on the basis of the whole population using the expressions in Section 6.They are then estimated based on R=500 replicated samples selected under stratified simple random sampling with proportional allocation (Str-SRS) 1 with sample size (n = 10,000) for studying the inferential properties of the estimators proposed in Section 7.
To begin with, the relative bias (RB) and the normalised root-mean-square error (NRMSE) of each estimator are computed.
In each selected sample, the inequality indices are estimated using the expressions defined under the rectangular rule, the trapezoidal rule and the reformulation version.The empirical RB and the empirical NRMSE for each index estimator are computed.The results are presented in Table 2.The indices show different magnitudes of inequality for the same population.The values obtained under the rectangular rule, the trapezoidal rule and the reformulation version for each index are identical when measurement is performed on the whole population, which validates the use of the three definitions in the finite-population case.Besides the well-investigated relation between the Bonferroni index and the Gini index, further information on the order of magnitude  Bonferroni,Gini,1 st new, Mehran, Piesch and 2 nd new indices defined using the rectangular rule, the trapezoidal rule and the reformulation method for samples selected under Str-SRS with sample size n = 10,000.
of the indices can be drawn by examining the values that the parameters of the generalised index assume.It is well-known that the Bonferroni index, GI(1,1), puts more weights on the lower incomes with respect to the Gini index, GI(1,2), resulting in the Bonferroni index assuming larger values than the Gini index except for the extreme cases of minimum and maximum concentration (De Vergottini, 1940;1950;Pizzetti, 1951).Furthermore, the numerical computation at the population level confirms that when a; b ≥ 1, the indices with a < b assume larger values and their values increase as b increases, while the indices with a > b assume smaller values, which decrease when a increases, and the values of the indices with a ¼ b tend to the complementary Bonferroni curve for increasing values of a and b.
Plug-in estimators of the indices built following the trapezoidal rule and the reformulation method have similar results.Their performances are better than those built with the rectangular rule.The advantage of the trapezoidal rule over the rectangular rule in estimation is that it is less biased and shows smaller NRMSE according the simulation results for all indices.The reformulation method has some advantages over the trapezoidal rule for some indices but is outperformed by it for the others.

Conclusion
In the present paper, a deep generalisation for income inequality indices in the same framework is presented.The two parameters of the generalised index control its sensitivity to different parts of the income distribution.It is demonstrated that the family of the generalised index encompasses some most famous inequality indices, such as Gini, Bonferroni, Mehran, Piesch, De Vergottini and Pietra (known also as the Robin Hood index).Two new indices are developed to fill a gap in the literature.In fact, many more indices could be easily defined by tuning the two parameters of the generalised index.
By considering the generalised index as a functional in which the complementary Bonferroni curve is weighted by the beta density function, it is possible to analyse its sensitivity to the income distribution for all different values of a and b.Two definitions of the generalised index in finite populations have been presented.The finite-population representations of the generalised index are practical because it can be directly applied in the real world.The family of the generalised index has been defined following the rectangular rule, the trapezoidal rule and the reformulation version.The three definitions for the same index tend to converge fast to the same value when the number of the observations increases.
Numerical computation and simulation study have been performed based on the 2015 IT-SILC data.The theoretical analyses on the generalised index have been confirmed by the simulation study.Furthermore, it has been shown that trapezoidal rule and reformulation method have certain advantages over the rectangular rule.
All in all, the generalised index provides a unification of the inequality indices in the same framework.With the aim of catching inequality comprehensively, an exhaustive study on income inequality requires the use of a broad class of inequality measures that is sensitive to different parts of the distribution.For this reason, the generalised index offers a non-trivial approach to the understanding of the sensitivity of inequality indices of the same family to different levels of the distribution.

Note
plus Province of Bolzano and Province of Trento from the Trentino-Alto Adige/Südtirol region).
Real data from the 2015 Italian component of the European Statistics on Income and Living Conditions (IT-SILC) ([dataset]Istat, 2015) are used for simulation study.The IT-SILC belongs to the framework of surveys yearly carried out by European countries according to the European Regulation n. 1177/2003 for providing data on income, poverty, social exclusion and living conditions.The 2015 IT-SILC sample is a two-stage sample selected among municipalities, stratified by their sizes, and within households.The sample size is composed of 17,985 households and 49,987 individuals.

Table 1 .
Particular cases of the generalised index (H n is the n th harmonic number).

Table 2 .
Empirical RB and NRMSE of the plug-in estimators of the