Making inferences about the polarization, welfare and poverty of nations: a study of 101 countries 1970–1995



Stochastic Dominance techniques are adapted and employed to study the extent and progress of Polarization, Welfare and Poverty of 101 nations over the period 1970–1995. The adaptations provide methods of comparing mass relocation by evaluating various degrees of right and left separation between distributions. The results reveal that, whilst welfare increased and then diminished and poverty diminished and then increased, polarization between rich and poor countries continued unabated throughout the period emphasizing the distinction between polarization and inequality. Copyright © 2004 John Wiley & Sons, Ltd.


The growing gap between rich and poor nations (Jones, 1997a,b; Kremer et al., 2000; Pritchett, 1997; Quah, 1997, 2001) has been a focus of attention in many spheres of economics. For growth theorists it constitutes the challenge to early neoclassical theories of national production that prompted much study of the issue of convergence in Per Capita Gross Domestic Product (Barro, 1998).1 For those concerned with the plight of the poor it conveys a sense of increasing relative global poverty in terms of Per Capita Gross National Product (Wade, 2001). For those interested in welfare and inequality issues it is of major import since inter-nation differences in income measures account for the largest part of global inequality (Berry et al., 1983; Schultz, 1998; Bourguignon and Morrison, 1999). Underlying concern about the gap is an implicit ‘World Welfare Function’ relating to the distribution of incomes across nations that ceteris paribus weights the gap as a negative characteristic. Its widening does not imply, and is not implied by, lower global economic welfare, greater inequality or greater absolute poverty but has more to do with polarization, a relocation of mass from the centre towards the tails of the distribution (Esteban and Ray, 1994; Foster and Wolfson, 1992; Wolfson, 1994).

Polarization has an inter-temporal dimension implying a tendency towards the emergence and increasing intensity and/or separateness of multiple bumps in a distribution over time. As such its investigation has been approached in several different ways. Obviously it can occur before bumps appear so that divergent bimodality (Bianchi, 1997) is not necessary for its existence. Interpreting the global distribution of national per capita incomes as a mixture of rich and poor nation group distributions, polarization is concerned with how changes in their respective masses are reflected in changes in the global distribution. The issue transcends consideration of simple location and scale movements. For example, left skewing poor club and/or right skewing rich club distributions without changing their respective means or variances can engender a widening gap in the global distribution without sub-group location or scale changes. It follows that identifying divergent sub-population means and diminishing sub-population variances (Paap and van Dijk, 1998) is not necessary for polarization.2 The phenomenon has also been examined by representing intra-distributional dynamics as a Markov chain (Quah, 1997). In this context period by period mass relocation is viewed as part of an ongoing structurally constant dynamic process wherein polarization is characterized by the relative magnitudes of the constant parameters (transition matrix) of the process. However polarization may be advancing or retarding at different rates over time with consequent advantages to examining the phenomenon period to period in an unconstrained fashion. Finally Esteban and Ray (1994), Wolfson (1994) and Beach and Slotsve (1996) provide indices for ranking the extent of polarization without having to identify the existence or otherwise of multiple modes. Unfortunately, without a distributional theory for the indices, it is not possible to determine if different index values are due to sampling variation or underlaying distributional differences and, like the Gini inequality coefficient, whilst the rankings are complete they may be ambiguous and rank polarized states differently.

Here stochastic dominance techniques3 are adapted and employed to study the extent of polarization and convergence. While offering only a partial ordering of distributions, they do provide methods for inferentially comparing mass location unambiguously, typically by evaluating various degrees of right separation or left separation between distributions. They can be applied in a period by period fashion without imposing any structure on the nature of the polarization process either in the form of the parameters of the underlying mixture distributions or the parameters of the Markov chain transition matrix. Furthermore they may be modified to examine and identify various types of mass relocation from the middle of a distribution towards its tails regardless of whether or not polarization has resulted in the development of multiple modes. Here these techniques are implemented on the per capita GNP of a sample of 101 countries over the years 1970 to 1995. For comparison a corresponding analysis of global welfare and poverty is presented together with values of various polarization and inequality indices.

A rationale for the use of a per capita GNP measure, as opposed to a per capita GDP most popular in the study of convergence in the empirical growth literature, is appropriate. The difference (GNP = GDP plus investment income received from foreign sources less investment income paid to foreign sources) can be substantial4 within a particular country. GDP measures the output and hence the income produced within a country and is most appropriately employed when the development of the productive capacities of nations is of interest as it is in the empirical growth literature. GNP measures the incomes received within a country and more adequately reflects a country's consumption capacity and hence welfare. Here interest is focused on the welfare of the collective societies in the sample and hence the GNP measure will be used. Section 1 discusses the relationship between mass relocation and the concept of convergence employed in the economic growth literature. Welfare and polarization issues as they relate to stochastic dominance and mass relocation are considered in Section 2. The statistical tests are outlined in Section 3, the results are reported in Section 4 and conclusions are drawn in Section 5. The evidence strongly suggests that polarization, global welfare and poverty took quite different paths.


As Bernard and Durlauf (1996) observe, the study of convergence in the empirical growth literature has followed two routes. One, interpreting convergence as catching up, thinks of two economies i and j converging in terms of E(abs(yi, t+Tyj, t+T)|ξt) < abs(yityjt) for some T > 0, where Yit is per capita income in country i in period t, yit = ln(Yit) and ξt is information at time t. The other, interpreting convergence as equality of long-term forecasts, contemplates limT→∞E(abs(yi, t+Tyj, t+T)|ξt) = 0 as the convergence condition. When i and j have stationary and structurally identical growth processes both definitions of convergence are satisfied. However one can configure stationary growth processes for i and j with different structures that violate both definitions and further, if i and j have identical non-stationary structures that are not co-integrated, the second definition will be violated.

The stationary/non-stationary distinction has antecedents. Gibrat (1930), whose work provided a theoretical foundation for using log-normality in income distribution analysis, formulated a growth process for Yit starting with the premise that the initial value of the variate Yi0 is subject to a sequence of mutually independent proportionate changes eik, k = 1, …, t so that after the passage of time t, Yit = Yi0(1 + ei1)(1 + ei2)·(1 + eit). Assume |eik| to be small relative to 1 and let ln(1 + eik) = uik and yit = lnYit where uik is an i.i.d. process with E(uit) = µi and variance σ2, then:

equation image(1)

with µi (again small relative to 1) corresponding to the incremental drift or growth in the process and uit corresponding to the increment of a drifting Weiner process. Gibrat demonstrated that, after sufficient passage of time t, this renders N(lnYi0 + (µi − σ/2)t, σ2t) the distribution of yit. Generally this process would not satisfy either of the Bernard and Durlauf conditions for ij, even if they had the same parametric structure, unless the processes were co-integrated. Kalecki (1945) proposed an alternative process which, in the present context, replaces (1) with:

equation image(1a)

where 1 > λi > 0. Kalecki establishes that, after a sufficient passage of time, the distribution of yit will be N((ηi + µit)/λi, σ2i2). Again µ may be construed as the incremental growth component but in this case the logarithm of the proportionate change in Y is negatively related to lnYvia − λ: note also the variance of this process is constant through time and the distribution is independent of the initial starting value. Clearly this process would satisfy the Bernard and Durlauf conditions for ij as long as they had the same parametric structure. Unlike Gibrat's model, (1a) may be rewritten as a partial adjustment model with (η + µt)/λ as the target and λ as the adjustment rate. In case (1) the distribution of y is divergent through time and in case (1a) the distribution may be thought of as convergent or at least non-divergent in the sense that whilst both models predict increasing means (1) predicts increasing dispersion whereas (1a) does not.

When the processes i and j have different parametric structures the distribution of yt becomes a mixture in both cases. Differing parametric structures may arise when different processes are associated with groups with different characteristics. In the present context economic models of the separation of rich and poor groups abound (the threshold model of Azariadis and Drazen, 1990 employed in Durlauf and Johnson, 1995; the club model of Galor and Zeira, 1993 employed in Quah, 1997). As Durlauf and Quah (2002) point out, regression models essentially consider conditional averages and are uninformative as to whether the poor–rich gap is closing, which requires study of the complete distribution over time. This is even more pertinent when the progress of welfare, poverty and polarization is at issue since it ultimately depends upon the nature of mass relocation which is not completely reflected in movements of conditional averages.


Atkinson (1970), Kolm (1976) and Foster and Shorrocks (1988) highlight the importance of the nature of mass relocation in income distributions for empirical welfare comparisons. They provide specific definitions of the distributional change necessary and sufficient to engender a welfare improvement for welfare functions in particular classes. The change is defined in terms of Stochastic Dominance Orderings which emerge from considering the average utility gained in moving from one income distribution to another. Consider δ, the change in the expected value of societal utility u(x)5 which has the properties (−1)j−1ju/∂xj ≥ 0, j = 1, …, i for some i > 0, based upon moving from density function G(x) to F(x) both defined on the interval [a, b]. It may be written as:

equation image

A necessary and sufficient condition for δ > 0 for a given i is:

equation image(2)

where, letting f(x) = F0(x), Fi(x) is defined recursively as:

equation image

and Gi(x) is defined similarly. When (2) is satisfied f(x) is said to stochastically dominate g(x) at order i. In the following f(x) ≥jg(x) denotes dominance of g(x) by f(x) of at least order j. For convenience f(x) >jg(x) denotes strict order j dominance where strict inequality in (2) obtains over the relevant range. Note for i < j, f(x) ≥ig(x) implies f(x) ≥jg(x), furthermore the relationship is transitive in that if f(x) ≥jg(x) and g(x) ≥jh(x) then f(x) ≥jh(x). Though the ordering is not complete it is unambiguous and, given the properties of u(·), facilitates orderings of unobservable distributions of u(x) in terms of observable distributions of x.

Here it is convenient to interpret ith-order dominance as the degree of ‘ith-order right separation’ of the two distributions. When f(x) ≥ig(x), Fi(x1) = Gi(x2) implies x2x1, so that Fi is everywhere not to the left of Gi and to the right of it at least somewhere, implying a sense of right separation of f(x) from g(x) at the ith level of integration. As limiting examples let x be a transformation of y with respective distribution functions g(y) and f(x), then a positive location shift transformation6 implies f(x) ≥1g(x): if the transformation is a location-preserving, scale-reducing shift then f(x) ≥2g(x) and if the transformation is a location and scale-preserving, positive-skewing shift then f(x) ≥3g(x).

Of equal interest is the idea of ‘ith-order left separation’ characterized by a condition of the form:

equation image(2a)

Defining w = −x and f(w) and g(w) as appropriately transformed distributions on [−b, −a] this condition is equivalent to f(w) ≥ig(w) (with Fi(w) and Gi(w) defined as before) and has the analogous ‘ith-order left separation of f(x) from g(x)’ interpretation.7 In this context the relationship f(w) ≥1g(w) may be thought of as a negative location shift transformation: if the transformation is a location-preserving, scale-reducing shift then f(w) ≥2g(w) and if the transformation is a location and scale-preserving, negative-skewing shift then f(w) ≥3g(w).

Assuming relative club sizes remain constant, polarization between rich and poor countries may now be thought of in terms of the rich club distribution right separating and the poor club distribution left separating at some order (of course one club separating in the appropriate direction whilst the other remains unchanged would also constitute polarization). When the club distributions are separately identified, polarization can be examined statistically by performing the relevant stochastic dominance tests jointly on successive realizations of the relevant club distributions. Letting fj(x) and gj(x) be the period j rich club and poor club distributions respectively, three conditions need to hold simultaneously for ith-order polarization:

  • 1.f1(x) ≥1g1(x) (establishing that the rich club is first-order right separated from the poor club).
  • 2.f2(x) ≥if1(x) (establishing that the rich club at least ith-order right separates in period 2).
  • 3.g2(w) ≥ig1(w) (establishing that the poor club at least ith-order left separates in period 2).

Thus, as limiting cases, first order polarization is engendered by the respective club means moving further apart, second order polarization arrises when the clubs become more concentrated around their respectively unchanged means and third order polarization occurs when the poor club skews left and the rich club skews right with their means and variances remaining unchanged.

When, as in the present case, the observed distribution is an unknown mixture of unobserved rich and poor country distributions, the problem is to analyse the consequences of polarization within the observed mixtures. Inferences can be made by associating the lower and upper tails of the observed mixtures with the respective poor and rich clubs. Thus partitioning the distributions at some common defining point x* (in the present case it will be the pooled sample medians respectively) and considering the relative progress of the distributions f1(x|x < x*), f2(x|x < x*), f1(x|x > x*) and f2(x|x > x*), two conditions need to hold simultaneously:

  • 1.f2(w|x < x*) ≥if1(w|x < x*) (the left tail at least ith-order left separates in period 2).
  • 2.f2(x|x < x*) ≥if1(x|x > x*) (the right tail at least ith-order right separates in period 2).

Clearly f1(x|x < x*) ≤1f1(x|x > x*) is always true in this case and does not need to be established so that an analogue to condition (1) employed when both rich club and poor club are separately observed is no longer required.


Tests for mass relocation (stochastic dominance) conditions have proliferated in the literature in recent years, Anderson (1996) employs the distribution of integral approximations, Davidson and Duclos (2000) employ the distribution of incomplete moments, and McFadden (1989) and Barrett and Donald (1999) employ distributions of functions of the empirical distribution function. The first two families of tests are attractive because they are easily adapted to situations where samples are non-i.i.d. (Anderson, 1998, 2003; Davidson and Duclos, 2000). Essentially they are a sequence of joint inequality tests for examining vi(f, g), a vector of asymptotically normally distributed estimates of Fi(x) − Gi(x) at a selection of pre-specified values of x. The latter approach is attractive because, unlike the first two approaches, it is a consistent test focusing on the maximum distance Fi(x) − Gi(x) over the whole range of x, however it can be shown that, under smoothness assumptions, the inconsistency problem is not substantive (Anderson, 2000). Anderson (2001) provides a taxonomy of tests appropriate for examining both within and between-population polarization.8 Here the tests formulated in Anderson (2001), modified to account for the between-sample dependence engendered by the ‘panel’ type nature of the data as outlined in Anderson (2003), are used.

The practice has been to employ either the Maximum Modulus Distribution (tables for which are provided in Stoline and Ury, 1979) to a collection of asymptotically standard normal statistics (which is a conservative test) based upon the K individual elements of the vector vi(f, g) and their corresponding standard deviations, or to employ the joint testing procedures advocated in Kodde and Palm (1986) and Wolak (1989). The advantage of the former is that the inequality relations may be studied in detail, the advantage of the latter is that it is not a conservative test. For the joint test let viw(f, g) be the inequality-constrained estimate of the vector vi(f, g) and let (Ωvi)+ be a generalized inverse of the covariance matrix of vi, then for

equation image

the distribution of W is such that:

equation image

where w(k, ki, Ω) is a weight function corresponding to the probability that v, with covariance matrix Ω, has ki − 1 of its k − 1 independent elements positive. Closed form expressions for the weight function only exist for k − 1 up to 4, however following the suggestion in Wolak (1989) they can readily be approximated via pseudo-normal random number generation.

For the polarization tests the vector vi(f, g) is redefined as:

equation image

with the covariance matrix redefined accordingly.


Data from the World Bank World Development Indicator series on per capita GNP in 1987 $US constant prices and population size were collected for 101 countries for the years 1970, 1978, 1987 and 1995. The countries, listed in Appendix A, were selected on the basis of having complete series for the period, the most notable omissions being the USSR, Hungary, Poland, China, East and West Germany and Austria. The years were selected as roughly equal spaced intervals over the observation period. Population size was collected for the purpose of sample weighting, which is an issue for two reasons. Firstly, when using these techniques to study individual welfare with household-based data, household size re-weighting is crucial for theoretical consistency (though practically it rarely seems to affect the qualitative nature of the results because of the limited variation in household size). Here the nation is the household and, with much greater variation in its size, re-weighting sample observations by relative population size is potentially more important from an individual welfare perspective. Secondly, the testing and kernel estimation techniques employed assume within-year i.i.d. sampling and re-weighting is necessary to undo the stratified sampling inherent in the data. Results on both weighted and unweighted samples are presented for comparison. Similarly the use of a panel abrogates the i.i.d. assumptions usually invoked in this work. Again for comparison, results under an assumed i.i.d. and a panel-based dependent sample scheme are reported.

Table I presents the summary statistics for log per capita GNP. Notable from the table is the upward shift in location of the underlaying distribution that is somewhat less obvious in the weighted sample. The substantially increased weight that poor countries have in the weighted sample suggests that mean incomes were increasing more in the rich country club than in the poor country club. The spread on the other hand has steadily increased throughout the period, though the rate is again greater in the unweighted sample suggesting relatively greater divergence in the rich as opposed to the poor country club. In terms of location 1970–1978 constituted a big leap forward whilst there was a fall back in the 1978–1987 period and some slight recovery in the 1987–1995 period. Both weighted and unweighted distributions continued to spread throughout the period, which is also reflected in a continuing increase in the range of the distribution. To put this into perspective the ratio of the richest to the poorest country per capita incomes was approximately 83 in 1970 and 343 in 1996!9 What is not obvious from the table is the multimodal nature of the distributions reflected in the plots of the kernel density estimates10 in Figures 1 and 2 corresponding to unweighted and weighted samples respectively. The distributions are essentially bimodal (alleviating aforementioned concerns regarding sub-population polarization engendering an increased central mass of the mixture) and more obviously so in the weighted sample case.

Figure 1.

Distribution of per capita GNP

Figure 2.

Population-weighted per capita GNP distributions

Table I. Statistics for the natural logarithm of per capita gross national product (constant $US) weighted and unweighted by population share
Weighted mean7.02761387.26739497.12163387.0803898
Weighted median6.52385836.76947666.38012256.5366635
Standard deviation1.17925351.30923401.45196761.6822738
Weighted standard deviation1.41239801.51450331.56262081.7494431

The extent to which this bimodality feature is reflected in various polarization indices is reported in Table II. Note that while the weighted and unweighted Gini coefficients indicate a monotonic increase in inequality throughout the sample period, the Esteban–Ray11 and inter-quartile range/standard deviation polarization indices record an increase, decrease and then an increase in polarization whilst the inter-quartile range/range polarization index indicates a monotonic increase in polarization.

Table II. Inequality and polarization indices for the natural logarithm of per capita gross national product (constant $US)
  • a

    Strictly speaking this is not a Gini coefficient since it relates to the logarithm of income rather than its level.

  • b

    A sample weighted Gini of the logarithms of incomes is equivalent to the Esteban–Ray polarization index with α = 0.

  • c

    A range of values for α (0.5, 1, 1.5) each yielded the same qualitative direction of the polarization index.

Unweighted Gini of In GNPa (inequality)0.09320.09850.11180.1296
Weighted Gini of In GNPa,b (inequality)0.10960.11280.11520.1302
Inter-quartile range/σ (polarization)1.40431.64961.59521.7145
Inter-quartile range/range (polarization)0.37440.43450.46630.4936
Esteban–Ray index (α = 1)c (polarization)0.01380.01560.01480.0172

Table III reports social welfare comparisons based upon stochastic dominance criteria. The results clearly indicate that, regardless of assumed sampling model or population weighting scheme, welfare improved and then diminished over the period. Furthermore in the population-weighted i.i.d. and dependent sample versions it did so to the extent that a welfare loss could be determined over the whole period. 1970–1978 was unambiguously a period of improvement and 1978–1995 was unambiguously a period of deterioration given the transitivity of stochastic dominance relationships. Neither the weighting scheme nor the statistical model specification appears to have had a substantive effect on the qualitative nature of these results, which also have profound implications for a commentary on poverty. Following Atkinson (1987), whatever fixed poverty line is chosen, any measure of absolute poverty based upon a monotonic function of per capita GNP outcomes below that line would have recorded a reduction in poverty over the 1970–1978 period. The increase in poverty over the 1978–1995 period is equally unambiguous. Note that if poverty or the plight of the poor is viewed as a relative concept, then the degree of polarization is perhaps a more relevant criteria.

Table III. Stochastic dominance rankings of per capita gross national product distributionsa–c
Comparison yearsi.i.d. Unweightedi.i.d. WeightedDependent sample unweightedDependent sample weighted
  • a–c

    (↑, i) Indicates a social welfare improvement of order ‘i’ and (↓, i) indicates a social welfare decline of order ‘i’ based upon a P(null) < 0.05 decision criterion.

  • b

    [p1, p2] Correspond to respective upper tail probabilities of Wald criteria for the first order dominance comparison year B dominates year A and year A dominates year B.

  • c

    {p1, p2} Correspond to respective upper tail probabilities of Wald criteria for the second order dominance comparison year B dominates year A and year A dominates year B.

1970–1978(↑, 1)(↑, 1)(↑, 1)(↑, 1)
[1.000, 0.020][0.302, 0.000][0.998, 0.000][0.969, 0.000]
1978–1987(↓, 2)no decision(↓, 1)(↓, 1)
[0.186, 0.968][0.941, 0.696][0.000, 0.365][0.023, 0.908]
{0.010, 0.535}{0.536, 0.130}  
1987–1995(↓, 2)(↓, 1)(↓, 1)(↓, 2)
[0.308, 0.753][0.000, 0.972][0.021, 0.075][0.180, 0.533]
{0.012, 0.541}  {0.005, 0.611}
1970–1995no decision(↓, 1)no decision(↓, 2)
[0.001, 0.014][0.000, 0.393][0.015, 0.000][0.217, 0.014]
    {0.000, 0.627}

From Table IV it is evident that the convolution of inappropriately assuming the data to be drawn from an i.i.d. scheme and not weighting the data by population size yields results at odds with all other combinations of assumptions which themselves yield a reasonably homogenous body of evidence. This is perhaps not surprising given the impact of re-weighting apparent in Figures 1 and 2 and the obvious dependence of successive samples. Polarization between rich and poor countries continued unabated throughout the period and, with the exception of the i.i.d. unweighted results, the evidence is that the trend was steady in each of the sub-periods. The only polarization index in Table II consistent with this is the inter-quartile range/range ratio. The sustained polarization may be observed in Figures 1 and 2 by the ever-increasing lateral distance between the two primary points of modality in successive distributions. However a qualification is in order here, the rich club seems to have become relatively smaller in the 1978–1987 transition. The rich and poor clubs continue to separate but it appears that their relative sizes have changed which violates the constant relative club size assumption invoked in developing the polarization tests and may well be the source of contradiction between the polarization tests and the polarization indices reported in Table II. Interestingly the club polarization process continues through periods of welfare improvement and diminishing poverty as well as through periods of welfare deterioration and increasing poverty, indicating clearly the possibility of simultaneous polarization and welfare improvement (or absolute poverty reduction). The gap between rich and poor countries grew unequivocally throughout the sample period.

Table IV. Polarization rankings of per capita gross national product distributionsa–d
Comparison yearsi.i.d. Unweightedi.i.d. WeightedDependent sample unweightedDependent sample weighted
  • a–d

    (↑, i) Indicates polarization of order ‘i’ and (↓, i) indicates depolarization of order ‘i’ based upon a P(null) <0.05 decision criterion.

  • b

    [p1, p2] Correspond to respective upper tail probabilities of Wald criteria for the first order polarization comparison year B relative to year A and year A relative to year B.

  • c

    {p1, p2} Correspond to respective upper tail probabilities of Wald criteria for the second order polarization comparison year B relative to year A and year A relative to year B.

  • d

    For the purposes of the polarization test the distributions in each of the comparison years were partitioned at the median of the pooled sample.

1970–1987no decision(↑, 1)(↑, 1)(↑, 1)
[0.836, 0.473][0.797, 0.000][1.000, 0.000][1.000, 0.000]
{0.000, 0.011}   
1978–1987(↑, 1)(↑, 1)(↑, 1)(↑, 1)
[1.000, 0.006][0.999, 0.031][0.149, 0.049][0.888, 0.000]
1987–1995no decision(↑, 2)(↑, 1)(↑, 1)
[0.266, 0.971][0.959, 0.079][0.999, 0.000][1.000, 0.000]
{0.174, 0.909}{0.906, 0.019}  
1970–1995(↑, 1)(↑, 1)(↑, 1)(↑, 1)
[0.999, 0.001][0.896, 0.008][1.000, 0.000][1.000, 0.000]


Interpreting convergence (depolarization) and welfare improvement as having to do with the relocation of the mass within a distribution in a particular fashion requires empirical techniques which facilitate assessment of the manner in which mass has relocated. Such techniques for identifying polarization in a collection or mixture of rich and poor countries have been outlined which draw on, and provide companions to, extant stochastic dominance techniques for analysing the progress of global economic well-being and poverty. They do not rely upon the existence of bimodality in the distribution and do not impose any structure on the polarization process. Employing these techniques in an analysis of the distribution of per capita GNP (representing the consumption capacity of a country) over the period 1970–1995 for a broad sample of countries has revealed that, whilst welfare increased and then diminished and poverty diminished and then increased, polarization between rich and poor countries has continued unabated throughout the period.

Re-weighting the data by the relative population size of the country was entertained for both theoretical (attaching the same weight to individuals in different countries) and statistical (undoing the stratified sampling characteristic of the data) reasons. Using unweighted data may be construed as employing a social welfare function over nation states whereas employing weighted data may be thought of as employing a social welfare function over the individuals in those nation states.12 The question is, how badly is one likely to be misled by using one approach rather than the other? Qualitatively there appears to be little difference in the two sets of results in that there were no ranking reversals, though occasionally one approach yielded a ‘no decision’ whereas the other was decisive. Re-weighting appeared to sharpen the polarization and welfare conclusions, as did accommodating between-observation-period dependencies due to the panel type nature of the data. Perhaps the most significant effect was on the kernel estimates of the distribution of per capita GNP. The profound differences between the weighted and unweighted distributions highlight the polarization phenomena confirmed by the statistical tests.

In particular the results emphasize the distinction between polarization and inequality and the notion that one does not imply the other. Thus in a period (1970–1978) when the plight of poor countries improved in terms of their per capita GNP the gap between them and the wealthier nations widened, sustaining the view that the position of the poor worsened in a relative sense. The findings are robust to whether the sample is weighted by population size or not and to whether due allowance is made for the ‘panel type’ nature of the data.


Many thanks are due to two referees and the editor together with seminar participants at the Universities of Toronto, Bristol, Simon Fraser, Alberta and the Institute of Fiscal Studies. This work has been carried out under SSHRC grant number 4100000732.


Algeria, Argentina, Australia, Bahamas, Bangladesh, Barbados, Belgium, Benin, Bolivia, Botswana, Brazil, Burkina Faso, Burundi, Cameroon, Canada, Central African Republic of Chad, Chile, Colombia, Congo, Costa Rica, Cote d'lvoire, Denmark, Dominica, Dominican Republic, Ecuador, Egypt, El Salvador, Fiji, Finland, France, Gabon, Gambia, Ghana, Greece, Guatemala, Guyana, Haiti, Honduras, Hong Kong, Iceland, India, Indonesia, Ireland, Israel, Italy, Jamaica, Japan, Kenya, Korea, Kuwait, Lesotho, Luxembourg, Madagascar, Malawi, Malaysia, Mali, Mauritania, Mexico, Morocco, Nepal, Netherlands, New Zealand, Nicaragua, Niger, Nigeria, Norway, Oman, Pakistan, Panama, Papua New Guinea, Paraguay, Peru, Philippines, Portugal, Rwanda, Senegal, Seychelles, Sierra Leone, Singapore, South Africa, Spain, Sri Lanka, St. Vincent and the Grenadines, Suriname, Swaziland, Sweden, Switzerland, Syria, Thailand, Togo, Trinidad and Tobago, Tunisia, Turkey, United Kingdom, United States, Uruguay, Venezuela, Zaire, Zambia, Zimbabwe.

  • 1

    For economies starting from different initial disequilibrium points with identical characteristics defining equilibrium national income, neoclassical growth theory predicts convergence, or less inequality, in the distribution of national incomes over time. However the recent empirical literature investigating the sources of the progress of national per capita incomes is at pains to assert that nations are not identical (Lee et al., 1997, 1998), that the conditional convergence hypothesis underlaying the models does not imply convergence (or more equality) in its distribution per se (Quah, 1993; Hart, 1995) and that empirical analysis is better served by models characterized by rich and poor clubs (Quah, 1997).

  • 2

    Bianchi (1997) and Paap and van Dijk (1998) follow a statistics literature which argues that multimodality is more easily studied in the context of mixtures of unimodal distributions (there are relatively few parametrically specified multimodal distributions) and proposes tests for spotting multiple modes or dips (Cox, 1966; Good and Gaskins, 1980; Silverman, 1981; Hartigan and Hartigan, 1985).

  • 3

    See Atkinson (1970, 1987), Kolm (1976), Foster and Shorrocks (1988) for the underlaying theory and Anderson (1996), Davidson and Duclos (2000), Barrett and Donald (1999) for the statistical implementation.

  • 4

    Indeed a companion study currently in progress for a similar collection of countries over a similar time period using per capita GDP data from the Penn World Tables yields the opposite inferences with respect to polarization to those reported here. Whether this is largely a result of the poorer nations being the debtor nations in the sample and thus having their consumption rather than their productive capacities constrained accordingly, or whether it is a function of purchasing power parity as opposed to standard exchange rate-based comparisons, is under ongoing investigation.

  • 5

    Generally E(u(x)) is thought of as the welfare function but if u(x) = −P(x) where P(x) is a poverty index based upon incomes, the same dominance criteria can be used to evaluate poverty states measured by poverty indices in a given class (Atkinson, 1987). In terms of social welfare, first order dominance corresponds to an ordering of social preferences based upon monotonic utilitarian social welfare functions, second order to a social preference for mean-preserving progressive transfers and third order to a social preference for mean-preserving progressive transfers at lower income levels. In the context of poverty indices, different levels of dominance ensure for any poverty line the same direction of change for all indices in the class defined by the order of dominance. Hence first order dominance implies coherence between all continuous non-decreasing in X poverty measures (e.g. poverty counts), second order implies coherence between all continuous non-decreasing weakly concave in X poverty measures (e.g. the average deviation from the poverty line of the poor) and third order implies coherence between all continuous non-decreasing strictly concave measures (e.g. the average squared deviation from the poverty line of the poor). Atkinson (1987) presents visual representations of measures from these various classes.

  • 6

    This is easily demonstrated for distributions confined to the positive orthant, f(x) ≥1g(x) is sufficient for E(x|f) > E(x|g) since:

    equation image
  • 7

    This type of dominance is used in the finance literature and relates to the analysis of risk-loving behaviour (see Levy and Weiner, 1998).

  • 8

    A cautionary note is in order. These tests are designed to detect the ‘hollowing out of’ the centre of the distribution (Beach et al., 1998) identified with within-distribution polarization. Unfortunately it can be shown that when the distribution to hand is a mixture of two closely located sub-distributions and polarization takes the form of limited reductions of sub-population variances (increased concentration around the respective poles), polarization will manifest itself in the observed mixture as an increase in central mass. Fortunately it appears that this phenomenon only occurs when the sub-population distributions are located fairly close together and long before bimodality occurs in the mixture (which, for example in a 50/50 mixture of normals with equal variances less than one, occurs when the means are more than one standard deviation apart). Thus we can be sure that, if the mixture is bimodal, sub-population polarization will manifest itself in a loss of mass at the centre of the distribution.

  • 9

    Over the four sample years, the United States, Kuwait, Switzerland and Luxembourg were the richest and Malawi, Bangladesh, Malawi and Zambia the poorest nations respectively.

  • 10

    An Epanechnikov kernel (Silverman, 1986) was employed in each case.

  • 11

    The Esteban and Ray (1994) polarization index for the discretely distributed random variable y which takes on any one of n values yi with probabilities πi, i = 1, …, n. For constants K and α, it is of the form:

    equation image

    where K is a multiplicative constant which does not affect the ordering, α is a parameter reflecting the polarization sensitivity of the index where 0 < α ≤ 1.6: the larger its value the further the measures depart from an inequality measure. Here the ith country's population share in the overall sample corresponds to πi. Note that this index is designed to pick up clustering around many modes whereas the much simpler range-based indices are focused upon bimodal structures and are thus potentially more powerful in the present context.

  • 12

    I am grateful to an anonymous referee for suggesting this interpretation.