Inequality — Measurement, trends, impacts, and policies

This paper introduces a concept of inequality comparisons with ordinal bivariate categorical data. In our model, one population is more unequal than another when they have common arithmetic median outcomes and the ﬁ rst can be obtained from the second by correlation-increasing switches and/or median-preserving spreads. For the canon-ical 2x2 case (with two binary indicators), we derive a simple opera-tional procedure for checking ordinal inequality relations in practice. As an illustration, we apply the model to childhood deprivation in Mozambique.


Introduction
It is widely agreed in the literature that a multi-dimensional view of individual well-being along the lines suggested by Sen (1985Sen ( , 1993 is needed when poverty, social welfare, and inequality comparisons are made. Alkire (2002), Alkire and Foster (2011), and Alkire and Santos (2011) pursue this point and helped form the Multidimensional Poverty Index (MPI), the United Nations Development Programme (UNDP) introduced in its 2010 "Human Development Report". These are welcome innovations in a challenging area of research and policy application. 1 A persistent methodological challenge in the analysis of (multidimensional) inequality is that outcomes are often ordinal in nature; i.e. the outcomes are (partially) ranked in terms of better or worse, but there is no natural measure for the distances between them. In recent years considerable progress has been made in the development of methods based on stochastic dominance theory for comparisons of multidimensional inequality that are robust over broad classes of "utility" indices and aggregation rules across dimensions of well-being. 2 Gravel and Moyes (2006, 2011 characterize the elementary redistributive operations that reduce inequality in a two-dimensional model where one of the attributes is cardinally measurable. Their framework provides a method for making inequality comparisons between distributions with common mean for the cardinal attribute and identical marginal distributions for the ordinal attribute. Decancq (2012) and Meyer and Strulovici (2012) 1 For general discussion of the case for multidimensional understanding of inequality and poverty, see Grusky and Kanbur (2006) and Sen (2006). 2 For surveys of traditional (cardinal) multidimensional inequality measures, such as the various multidimensional generalizations of the Gini index and the Atkinson-Kolm-Sen approach, we refer to Maasoumi (1999) and Weymark (2006). See also Savaglio (2006) and Trannoy (2006) for broader discussions. others) where the latter two papers apply an ordinal multidimensional first order dominance approach. However, ordinal inequality concepts for multidimensional distributions (not necessarily having the same marginal means or distributions) are yet to be developed. 3 When data are ordinal in nature, use of a conventional income inequality measure, such as the Gini index, is not meaningful since it requires that outcomes are measured on a cardinal scale that reflects the relative desirability of outcomes. 4 Measures of dispersion for one-dimensional ordinal categorical data have been developed since at least the early 1990's, see, e.g., Lacy (1996, 2000). Allison and Foster (2004) put forward a simple but illuminating and intuitive model for robust comparisons of inequalities when outcomes are categorical and ordinally ranked. The Allison-Foster framework is a median-based dominance approach in which distribution  is more unequal than distribution  whenever the two distributions have common median and  is more spread out relative to the median than . As discussed in Allison and Foster (2004), the median, rather than for instance the mean, is chosen as the reference point since the median is the natural ordinally invariant center of distribution. For empirical illustration, they provided both first order dominance comparisons and ordinal inequality comparisons of distributions of self-assessed health across states and regions of the United States, and showed that the inequality comparison concept was both meaningful and operational. A number of recent contributions have developed related inequality measures based on dispersions from the median and provided further applications of these methods, e.g. Apouey (2007), Abul Naga and Yalcin (2008), Madden (2010), Kobus and Miłoś (2012), and Dutta and Foster (2013). 5 The aim of this paper is to introduce a median-based notion of inequality for ordinal two-dimensional categorical data, with emphasis on the case of binary indicators. Our concept is relevant for situations where well-being is measured along two dimensions (attributes), and where only ordinal information about the desirability of outcomes is available. This means that along each dimension outcomes can be ranked according to their desirability, but nothing is assumed about the relative importance of attributes, complementarity/substitutability relationships, and the relative importance of levels within each attribute. 6 Our concept extends the Allison-Foster framework for assessing inequality of one-dimensional categorical distributions to a two-dimensional one. Roughly speaking, in our model, distribution  is more unequal than distribution , if the two distributions have a common arithmetic median (i.e. they have a common ordinally invariant reference point) and  can be obtained from  by certain "inequality-increasing elementary transformations" in population mass relative to the reference point. Note that the arithmetic median is the vector of marginal medians (e.g. Hayford 1902; Haldane 1948; Barnett 1976). It has been described as the only reasonable multi-attribute generalization of the median concept when the attributes are different in kinds, e.g. Haldane (1948) and Barnett (1976).
As in the Allison-Foster model, when the (arithmetic) medians differ for two distributions they are incomparable inequality-wise. This tends to limit applicability in cases with many dimensions and levels where it happens less frequently that two given distributions have a shared median. Another obstacle for empirical implementation is that it is in general difficult to check if a given distribution is more unequal than another. Therefore, we focus in this paper on the 2x2 case, where common arithmetic median outcome tend to be the rule rather than the exception, and where an easily implementable procedure for detecting inequality relations between empirical subpopulation distributions can be derived.
The rest of the paper is organized as follows. In Section 2 we motivate, illustrate and provide intuition. Section 3 contains general definitions and a comparison of our approach to that of Allison and Foster (2004). Section 4 addresses the 2x2 case (i.e., the case of two binary outcome variables), and we develop a procedure for detecting inequality relations in practice. Briefly, testing that one distribution is more unequal than another consists of a comparison of medians, and, if a common median exists, verification of a system of inequalities which depends on the location of the median. The test requires straightforward calculations and can be carried out in a spreadsheet. In Section 5 we apply our model to two-dimensional indicators of childhood deprivation in Mozambique. Section 6 concludes.

An ordinal approach to bivariate inequality: illustration and intuition
Suppose a person's well-being can be measured using two 0-1 binary variables, so there are four possible outcomes. Let (0 0) denote the outcome where both variables take the value 0, (1 0) the outcome where the first variable takes the value 1 and the second the value 0, and so on. In the figure below arrows point to better outcomes.
Outcome (0 0) is the worst and (1 1) is the best outcome. We assume it is unknown which of the two intermediate outcomes (0 1) and (1 0) is better. A population is characterized by how people are distributed among the four outcomes. This can be illustrated as follows: where 2 16 of the population has (0 0), 4 16 has (0 1) and (1 0) respectively, and 6 16 has (1 1). Call this distribution  , and compare with distribution : Here  can be obtained from  by moving mass amounting to 1 8 from outcome (0 1) to outcome (0 0) and by moving a similar amount from (1 0) to (1 1) In other words,  can be obtained from  by a correlation-increasing switch (Hamada 1974;Epstein and Tanny 1980;Tchen 1980;Boland and Proschan 1988). As argued by Atkinson and Bourguignon (1982), Tsui (1999), Atkinson (2003), Bourguignon and Chakravarty (2003), Decancq (2011) and others, such a correlation-increasing switch intuitively increases inequality. It provides a balanced movement of mass from two intermediate outcomes to the two extremes that does not change the marginal distributions but increases interdependence. If a person experiences a bad outcome in one of the dimensions of , the conditional probability that the other outcome is also bad is higher for  than for  , so indeed it seems reasonable to say that  is more unequal than  . 7 For one population distribution to be obtained from another by a correlationincreasing switch, it is required that the difference in mass between the two distributions for the outcome (0 0) is exactly equal to the corresponding difference for the outcome (1 1) Unless the populations (or number of observations) underlying the two distributions are very small this is only going to happen in exceptional cases. Obviously,  cannot be obtained from  or  by a correlation-increasing switch. But  ,  and  all have the same arithmetic median in (1 1), i.e. a median value of 1 in each of the two dimensions. 8 If we regard the arithmetic median as the natural center of the distributions then intuitively  is more unequal than . Indeed, distribution  can be obtained from  by moving population mass amounting to 1 16 from the median outcome (1 1) to (1 0): that is,  can be obtained from  by a median-preserving spread (Allison and Foster, 2004).
Accordingly, we will say that a distribution is ordinally more unequal than another if it is possible to obtain the distribution from the other through a sequence of correlation-increasing switches and/or median-preserving spreads. In our example,  is ordinally more unequal than  since there exists a distribution  such that  can be obtained from  through a correlation-increasing switch and  can be obtained from  through a medianpreserving spread.

General formulation
Suppose that there are 2 attributes (dimensions). An outcome is an two- The statement  ≤  will mean that   ≤   for all , and    will 9 The (arithmetic) median of  is the vector ( ) = ( 1 ( 1 )  2 ( 2 )), of coordinate-wise medians.
We say that distribution  can be derived from distribution  by a bilateral transfer (of mass between two outcomes), if there are outcomes   and a non-negative scalar  such that () =  () − , () =  () +  and () =  () otherwise. If    the bilateral transfer is diminishing (i.e. moves mass from a better to a worse outcome), if for some outcome  that    ≤  or  ≤    it is directed away from  and if ( ) = () it is median-preserving. A median-preserving bilateral transfer directed away from the median is a median-preserving spread (see Section 2 for an example).
We say that  is derived from  by a correlation-increasing switch if we can choose outcomes     such that  = ({min{ 1   1 } min{ 2   2 }) and and () =  () otherwise (again, see Section 2 for an example).
In the following, we define an inequality-increasing elementary transformation to be a correlation-increasing switch or a median-preserving spread.
If  can be derived from  by a finite sequence of inequality-increasing elementary transformations, we say that  is ordinally more unequal than  , or, as an equivalent statement,  is ordinally more equal than . Formally,  is ordinally more unequal than Note that the relation 'ordinally more unequal' is a partial order (i.e. reflexive, antisymmetric, and transitive) 10 9 To ensure a unique median value, we will define   (  ) as the smallest element 10 A distribution with 50% mass at one extreme outcome and 50% mass at the other extreme outcome is the unique maximal element with respect to this relation (i.e. no other distribution is more unequal). A distribution having all mass concentrated on one outcome is clearly a minimal element (i.e. no other distribution is more equal). However, As illustrated by Allison and Foster (2004), it is often of interest to complement (ordinal) comparisons of inequality with (ordinal) comparisons of social welfare, see also Zheng (2008). In our ordinal framework, the natural criterion for comparison of social welfare is first order dominance. 11 A population distribution  first order dominates population distribution  whenever  can be obtained from  by iteratively moving population mass from better to worse outcomes, i.e., if there are distributions  = Equivalently, any additive non-decreasing social welfare function would give as least as much social welfare to  than to . 12 Before proceeding, we compare these definitions and concepts with the one-dimensional case put forward by Allison and Foster (2004) and  in a similar way. Allison and Foster (2004) say that  has a greater spread than  whenever For the one-dimensional case,  has greater spread than  precisely if  is ordinally more unequal than  (as defined here). Also, the general definition of first order dominance given here is equivalent to the standard definition in the one-dimensional case. Thus, the definitions presented here generalize those of Allison and Foster's one-dimensional case.

Implementation of the 2x2 case
A central question is how to test if one distribution is ordinally more unequal than another (i.e. has greater spread in an ordinally meaningful sense). For two one-dimensional distributions  and  such testing is a straightforward for  ≥ 2 it is possible to find minimal elements with mass at more than one outcome. 11 First order dominance is also known as the usual (stochastic) order. For general references on stochastic ordering theory, see Müller and Stoyan (2002) or Shaked and Shanthikumar (2007). 12 It is worth mentioning that in the multidimensional context the term "first order dominance" has been used with other meanings in the economics literature. In particular, Atkinson and Bourguignon (1982) and subsequent literature have used this term for a less restrictive stochastic dominance concept which corresponds to additional restrictions on the social welfare function (also known as an orthant stochastic order in the stochastic orderings literature). 9 matter of checking whether  1 inequalities hold. 13 For the multidimensional case (even the two-dimensional), checking if one distribution is more unequal than another is more complicated. We focus in our empirical implementation on the 2x2 case which can be dealt with in a tractable manner.
In this section, we assume that an outcome is a vector

Checking first order dominance relations
Let  and  denote distributions on . By application of Strassen's Theorem (Strassen 1965), it follows that  first order dominates  if and only if the cumulative probability mass at  is smaller than or equal to that at  for every lower comprehensive subset of outcomes. A lower comprehensive subset  ⊆  holds the property that if an outcome is in the subset, then all smaller outcomes are also included in that subset. That is, if  ∈  ,  ∈  and  ≤  then  ∈  Thus, in the 2x2 case,  first order dominates  if and only if the following four inequalities are satisfied: 14

Checking ordinal inequality relations
We proceed next to present necessary and sufficient conditions for  being ordinally more equal than  as defined in Section 2.
Correlation-increasing switches are median-preserving, so a necessary condition  to be ordinally more unequal than  is that the two distributions have common median. 15 We can therefore rely on considering in turn each of four possible cases of common median, and proceed as described below.
Proposition 1 (Ordinal inequality check for the 2x2 case) Let  = {0 1}× {0 1} and let  and  be two distributions on . Then  is ordinally more unequal than  if and only if one of the following six cases holds: A1.  and  have common median (1 1) and  first order dominates  A2.  and  have common median (0 0) and  first order dominates  .
The proof of Proposition 1 is given in Appendix A. The intuition behind the conditions is discussed below.
The cases A1 and A2 are symmetric so we will only discuss A1. As mentioned in Section 4.1,  first order dominates  if and only if it is possible to go from  to  by a finite sequence of diminishing bilateral transfers. Each such bilateral transfer is a median-preserving spread (as shown formally in the Appendix) and thereby an inequality-increasing elementary transformation.
The cases B1 and B2 are symmetric so we will only discuss B1. To provide some intuition for the inequalities in B1, suppose that  does not first order dominate  and  does not first order dominate  . Then, if  is ordinally more unequal than  it is impossible to go from  to  via a finite sequence of inequality-increasing elementary transformations without making use of at least one correlation-increasing switch (because we would then have first order dominance since the median is an extreme outcome (1 1)). Thus, if  is ordinally more unequal than  then  (1 0)  (1 0) and  (0 1)  (0 1), since otherwise it would be possible to go from  to  without any correlation-increasing switches (because if a correlationincreasing switch is involved, one of the intermediate outcome would receive at least as much probability mass from (1 1) as is moved to (1 1) and hence only diminishing bilateral transfers are needed, a contradiction). However, these two conditions are not sufficient for  being ordinally more unequal than  . Roughly speaking, we need a condition ensuring that all mass transferred to (1 1) in the process of moving from  to  can be transferred from the intermediate outcomes (0 1) or (1 0) in connection with a correlation-increasing switch. This is precisely the condition The cases C1 and C2 are symmetric so we will only discuss C1. 16 The first inequalities ensure that  has at least as much mass at the intermediate outcomes, and not more mass than at the extreme outcomes, than . The last condition ensures that the difference in mass at the median outcome (1 0) is no less than the difference in mass at the other intermediate outcome (0 1). As verified formally in the Appendix, the conditions imply (and are implied by) that  can be obtained from  by a correlation-increasing switch and bilateral transfers of mass from (1 0) to (0 0) and (1 1) respectively.
The following illustrates how a concrete data set can be analyzed in the present framework. For illustrative purposes, we highlight examples of all the basic types of ordinal inequality relations that can occur in the 2x2 case (see Section 5.3).

Empirical illustration
In Mozambique, investment in schooling, health, and sanitation has increased the level of human capital and indices of human development. While this development has influenced living standards of both adults and children, its impact on children is of particular interest. The acquisition of human capital in early childhood is imperative for future learning, earnings and health status (UNICEF 2006). Large gaps in basic welfare goods during childhood tend to persist, if not widen, the variation in human capital, productivity and living standards throughout adulthood, see Strauss and Thomas (1995), and Orazem and King (2007).
To address the above challenges voucher or cash transfer programmes targeted at disadvantaged children have in recent year become more common. 17 A general problem with such government transfer programmes is to make sure that transfers are directed at the most disadvantaged children. Efficient targeting of government resources require that administrators can detect the most vulnerable groups. We illustrate how our model can be used for examining inequalities within and between groups of Mozambican children, concentrating on three key characteristics, rural-urban area of residence, gender of head of household, and gender of the child. 18 This results in a total of eight categories of children that we compare with each other. 17 The most famous of these initiatives is probably Mexico's PRO-GRESA/Oportunidades programme, which aims at increasing children's school attendance among poor families, by awarding grants to mothers conditional on school enrolment. See Parker, Rubalcava and Teruel (2007) for further discussion and examples. 18 Urban-rural area of residence is likely to have a significant impact on living standards mainly due to the low population density of rural areas, which makes supply of high quality public services more costly. Children living in female headed households are more likely than other children to fall below the poverty line primarily because women's wages and education tend to be lower than men's. Buvinić and Gupta (1997) review literature relating female headship and poverty. However, as Handa (1996) observes, female headed households are also likely to spend a larger share of their income on improving children's human capital. Finally, households may discriminate based on gender of the child. For example in Mozambique, it is not uncommon for especially rural families to invest more in the education of boys as compared to girls (UNICEF, 2006). 13

Data and summary statistics
We apply the model to the Mozambican Demographics and Health Survey from 2003 (DHS 2003). 19 This is a nationally representative data set that includes detailed information on childhood poverty. We focus on three indicators for severe deprivation in sanitation, health, and education respectively (cf. Gordon et al., 2003). Sanitation deprivation indicates lack of access to a toilet of any kind, including communal toilets or latrines. Health deprivation is an indicator for pre-school-aged children (under five years) who have never been immunized or who have recently been ill with diarrhea but did not receive medical attention. Education deprivation is an indicator for school-aged children (between seven and eighteen years) who have never been to school. We combine these into two 2x2 indicators of childhood poverty for school-aged and pre-school-aged children respectively. A detailed description of the survey is given in UNICEF (2006). 20 Table 1 summarizes how indicators of childhood poverty are distributed among the four possible outcomes. The top panel lists the distribution of sanitation and education (for school-aged children), and the lower panel lists the distribution of sanitation and health (for pre-school-aged children), each by area of residence, gender of head of household, and gender of the child. For example, the first row of the lower panel shows that among pre-schoolaged girls in rural, male-headed households 188% live with poor sanitation and under poor health conditions, 444% have poor sanitation but adequate health, 48% have good sanitation but poor health, and the rest, 32%, have both good sanitation and good health conditions. 21 19 Recently, Arndt et al. (2012) provide an alternative implementation of the multidimensional first order dominance approach with an application to comparisons of child poverty in Vietnam and Mozambique between groups and over time. 20 Lindelow (2006) studies socioeconomic health inequalities in Mozambique using the concentration index. His study is based on income and health data from the 1996-1997 household survey. 21 We have weighted these shares by survey sample weights.   Type B inequalities are those with extreme common medians, in (0 0) or (1 1), but where none of the distributions first order dominates the other. For illustration, compare the distribution for urban boys in female-headed households (last row in upper panel of Table 1) with the distribution for urban boys in male-headed households (third to last row in upper panel of Table 1). None is first order dominating the other, but the latter is more equal. To see this, starting with the distribution for urban boys in maleheaded households, use a correlation-increasing switch of 11 and then a median-preserving spread of 03 and 08 from (1 1) and (0 1) to (0 0). This results in the distribution for urban boys in female-headed households .

Results from pairwise comparisons
Type C inequalities are those where the median is non-extreme and where there is no first order dominance. An illustration of type C ordinal inequality can be seen from comparing the distribution for girls in rural female-headed households (third row in lower panel of Table 1) to the distribution of boys in similar households (fourth row in lower panel of Table  1). Here, the girls are more equally distributed than the boys. To see this, starting with the distribution for the girls, apply first a correlation-increasing transfer of 09 and then a median-preserving spread of 09 and 11 from (0 1) to (0 0) and (1 1), which gives the distribution for boys. Note that because of rounding, the numbers do not match exactly.    Table 2 it emerges that urban groups are better off than rural groups. This is not so surprising. However, it also emerges that there are more first order dominances between rural groups than between urban groups, indicating more between-group inequality in the rural areas than in the urban areas. In particular, schoolage boys in rural male headed households are better off than any other rural group. Moreover, there is more within-group inequality for school-age children in urban female headed households than in the corresponding male headed households. These findings deserve attention in policy debates.

Conclusion
In this paper we have developed an ordinal concept of multi-dimensional inequality, building on the Allison and Foster (2004) framework for comparing inequalities with one-dimensional categorical data. To illustrate how our model can be applied in the 2x2 case we compared poverty distributions of pre-school-and school-aged children from the DHS data in Mozambique. Such data is available for a large number of countries across the developing world, meaning that potentially interesting comparisons are possible.
For these indicators, we find that first order dominance occur relatively frequently while ordinal bivariate inequality relations are less frequent. Whether this is because ordinal inequality relations generally are "rare" empirically or whether it is due to the chosen indicators of child poverty cannot be established with the data in hand. However, the example shows that while instances of ordinal bivariate inequality relations may be relatively uncommon, they do exist empirically. Moreover, our indicators of sanitation, health and education by area of residence, gender of the household health, and gender of the child provide insights into how targeting of for example cash transfer programmes presently under consideration by the Mozambican government should be pursued.
In sum, we have shown that it is possible to develop a meaningful and intuitive concept of ordinal bivariate inequality. We have also demonstrated how it can be applied in the 2x2 case. Future research will be required to explore how to deal with variations of the concept and more general cases. In particular, an important generalization would be to provide an ordinal inequality check procedure that applies to general bivariate problems. Providing such a general procedure will however not be straightforward since the many possibilites of combining correlation-increasing switches and medianpreserving spreads in various sequences will be to complex to analyze directly as in the proof of Proposition 1, and thus a deeper understanding of what can be obtained from these inequality-increasing elementary operations is needed. It is also possible to generalize the definitions and concepts to an arbitrary finite number of dimensions, although the correlation-increasing switch concept does not generalize in a straightforward manner to more than two dimensions (Decancq 2012), and checking inequality would be even more challenging. Finally, the restriction that  and  have common medians for ordinal inequality relations to be viable could possible be relaxed. See Abul Naga and Yalcin (2010) for an exploration along these lines for the one-dimensional case.

A Appendix
A.1 Proof of Proposition 1 We will make use of the following lemma which applies to the general case.
Lemma A Suppose that  is obtained from  by a sequence of bilateral transfers directed away from ( ) and ( ) = (). Then each of these bilateral transfers is median-preserving (i.e. is a median-preserving spread).
If bilateral transfer  is within (( )) we have   ( ) since each bilateral transfers in (( )) is diminishing. In particular, for the median e  resulting after the last bilateral transfer within (( )) we have e  ≤   ( ). However, the remaining bilateral transfers cannot move the median back to ( ) since they are all within  (( )), contradicting ( ) = ().
If bilateral transfer  is within  (( )) for the new median  we have ( )  , since each bilateral transfer in  (( )) is the reverse of a diminishing transfer (i.e. moving mass from worse to better outcomes). Hence, for the median e  resulting after the last bilateral transfer within  (( )) we have ( )   ≤ e , contradicting ( ) = (). ¤ We are now ready to prove Proposition 1. As mentioned prior to the statement of Proposition 1, a shared median is a necessary condition for one distribution to be ordinally more unequal than another. We proceed by showing that for each case of common median, the relevant sets of inequalities stated in Proposition 1 are indeed necessary and sufficient for an ordinal inequality relation to hold. We focus on the case ( ) = () = (1 1) (Case 1) and the case ( ) = () = (1 0) (Case 2). The case ( ) = () = (0 0) is symmetric to Case 1 and the case ( ) = () = (0 1) is symmetric to Case 2.
In the proof of Proposition 1, a pseudo-distribution is defined as a realvalued function  on  with X Case 1: ( ) = () = (1 1) For the inequalities in case A1 in Proposition 1 recall that  first order dominates  if and only if it is possible to go from  to  by a finite sequence of diminishing bilateral transfers. By Lemma A, each such bilateral transfer is a median-preserving spread and thereby an inequality-increasing elementary transformation. Thus, A1 implies that  is ordinally more unequal than  Conversely, if  is ordinally more unequal than  and we can go from  to  by a finite sequence of diminishing bilateral transfers using no correlation-increasing switches then A1 is satisfied. Now, we claim that if  is ordinally more unequal than  and if  is not first order dominated by  then it is possible to obtain  from  from a sequence of inequality-increasing elementary transformations that involves only a single correlation-increasing switch and no bilateral transfers from the outcome (1 1) to other outcomes.
We first verify the last part of the claim, i.e. we show that no bilateral transfers from (1 1) to other outcomes are required. For this, consider a given sequence of inequality-increasing switches (leading from  to ) that contains a bilateral transfer of the amount  from (1 1) to another outcome  We assume, without loss of generality, that  = (0 1). (If  = (0 0) then we can split the bilateral transfer up into two nested bilateral transfers, one from (1 1) to (0 1) and one from (0 1) to (0 0); the case  = (1 0) is symmetric to the one treated here and hence can be omitted).
As noted earlier, we know that the sequence contains at least one correlationincreasing switch (since if otherwise  would first order dominate ). Now, pick an arbitrary correlation-increasing switch from the sequence, and let  denote the amount of mass moved from each of the outcomes (0 1) and (1 0) to (0 0) and (1 1) respectively. We can then decompose this correlationincreasing switch into two bilateral transfers: a bilateral transfer of the amount  from (0 1) to (1 1) and a bilateral transfer of the amount  from (1 0) to (0 0). We consider two cases: (a)  ≥ , and (b)   .
(a) Replace the bilateral transfer from (1 1) to (0 1) of the amount  with a bilateral transfer of the amount  from (1 0) to (0 0), and reduce the amount of mass transferred between each pair of outcomes from  to  − . Note that the amount of mass eventually allocated to each outcome remains the same.
(b) Replace the correlation-increasing switch (which moves the amount  between each pair of outcomes) with a bilateral transfer of the amount  from (1 0) to (0 0), and reduce the size of the bilateral transfer from (1 1) to (0 1) to  − . Again, note that the amount of mass eventually allocated to each outcome remains the same.
Proceeding in this way until no bilateral transfers from (1 1) to other outcomes remain, we can eliminate all bilateral transfers from (1 1) to other outcomes. Note that we have not shown (and it is not needed for our argument) that after each elimination of some bilateral transfer from (1 1) to another outcome, the resulting sequence of pseudo-distributions consists entirely of distributions. It is sufficient to observe that when all bilateral transfers from (1 1) to other outcomes have been eliminated, what remains is a sequence of correlation-increasing switches and/or bilateral transfers from (0 1) and/or (1 0) to (0 0). For this sequence, it is clear that each intermediate pseudo-distribution is a distribution. Moreover, the transformations (i.e. correlation-increasing switches and/or bilateral transfers from (0 1) to (0 0) and from (1 0) to (0 0)) can be arranged in an arbitrary order and we can obtain  from  by a single operation of each type. This proves our claim.
From these observations we get the following: Suppose that  does not first order dominate . Then  is ordinally more unequal than  if and only if the following 3 inequalities are satisfied: Note that in conjunction with the assumption that  does not first order dominate , the 3 inequalities imply that (0 0)  (0 0) (i.e. with strict inequality) From this observation it follows that the 3 inequalities are both necessary and sufficient: The three inequalities are necessary, since if one of them were violated, clearly we could not get  from  by a single correlationincreasing switch and/or bilateral transfers from (0 1) and/or (1 0) to (0 0). To verify that the conditions are sufficient, we give the following constructive argument: Suppose that the conditions are satisfied. Let  = (1 1) −  (1 1). Given  , let b  be the distribution obtained from a correlationincreasing switch of the amount  (where  is transferred from (0 1) to (0 0) and  is transferred from (1 0) to (1 1)) Thus, b This means that  can be obtained from b  by diminishing bilateral transfers from (0 1) and/or (0 1) to (0 0), and we are done. Note that if ( ) = () = (1 0) and if  is ordinally more unequal than  then  can be obtained from  by a finite number of correlationincreasing switches (from (1 0) and (0 1) to (1 1) and (0 0)) and bilateral transfers from (1 0) to the extreme outcomes (1 1) and (0 0). Regardless of how these correlation-increasing switches and bilateral transfers are ordered, each intermediate pseudo-distribution is a distribution. Thus, a single correlation-increasing switch is enough (since all correlation-increasing switches can be amalgamated into a single correlation-increasing switch and still each intermediate pseudo-distribution is a distribution). In particular, we can obtain  from  in three steps, ordered as follows: (1) A correlationincreasing switch, (2) A bilateral transfer from (1 0) to (0 0) and (3) A bilateral transfer from (1 0) to (1 1).

A.2 Bootstrapping
Data is a sample of a larger population, so testing whether two sample groups are genuinely distinct, accounting for sample uncertainty, is of interest. We employ a bootstrap procedure which can be interpreted as testing a nullhypothesis of equality of two distributions against an alternative hypothesis of arbitrary distributions.
Define ( ) = min{(0 0) −  (0 0) (0 0) + (0 1) −  (0 0) −  (0 1) (0 0)+(1 0)− (0 0)− (1 0) (0 0)+(1 0)+(0 1)− (0 0)− (1 0)−  (0 1)}. Then, ( ) ≥ 0 if and only if  first order dominates . The function  is useful for testing statistical significance of first order dominance relations. Following common convention, the null-distribution is generated by merging the observations from the two groups. From the null-distribution, two new samples are generated (drawing randomly with replacement) corresponding in size to the original two samples, and the test statistic  is calculated. Repeating this procedure 1000 times, we obtain a distribution over the test statistic consistent with the null-hypothesis, which we then compare with the test statistic of the original sample. 22 Asterisks in Table  2 indicate significance at the five percent level, meaning that the observed value of  is larger than the 95th percentile of its bootstrapped distribution (indicating that the two groups are genuinely distinct). 23 In the case of ordinal inequality (without the presence of first order dominance) as the bootstrapped test statistic we use the minimum function over the appropriate differences induced by the inequalities as specified in type A, B, or C in Proposition 1. An alternative null-hypothesis, discussed by Howes (1993), Kaur et al. (1994) and Dardanoni and Forcina (1999) and more recently by Davidson and Duclos (2006) in the context of one-dimensional dominance of first and higher order, is non-dominance (including exact equality of distributions) against the alternative of strict dominance. This means that first order dominance is rejected unless there is strong evidence in its favor. A similar kind of test could be envisioned for the ordinal inequality relations. In order to perform such tests in a multidimensional framework, we would have to determine a "least favorable case"; that is, a null-distribution consistent with the null-hypothesis that makes the observed distributions as plausible 22 We refer to Efron and Tibshirani (1993, ch. 16) for a general discussion of the bootstrap approach to hypothesis testing. 23 Robertson, Wright and Dykstra (1988) and Bhattacharya and Dykstra (1994) develop a test for equality of multivariate distributions against an alternative of first order dominance. We do not discuss this approach here. For continuous-variable models, methods for testing multidimensional first and higher order dominance have been developed by Crawford (2005) as possible. We conjecture that the least favorable case in this situation is, in fact, a case of equal distributions, and that it is valid to interpret our bootstrap procedure along this line of reasoning. A detailed exploration of these subtle econometric issues is beyond the scope of the present paper.