Measuring Inequality by Asset Indices: A General Approach with Application to South Africa

Asset indices are widely used, particularly in the analysis of Demographic and Health Surveys, where they have been routinely constructed as &#8220;wealth indices.&#8221; Such indices have been externally validated in a number of contexts. Nevertheless, we show that they often fail an internal validity test, that is, ranking individuals with &#8220;rural&#8221; assets below individuals with no assets at all. We consider from first principles what sort of indexes might make sense, given the predominantly dummy variable nature of asset schedules. We show that there is, in fact, a way to construct an asset index which does not violate some basic principles and which also has the virtue that it can be used to construct &#8220;asset inequality&#8221; measures. However, there is a need to pay careful attention to the components of the index. We show this with South African data.


Introduction
Asset indices have become widely used since Filmer and Pritchett (2001) described a simple way to calculate them. Their use really took off once the Demographic and Health Surveys incorporated the calculation of a "wealth index" with the release of each dataset (Rutstein and Johnson, 2004). A Google Scholar search (April 18, 2014) came up with 13,900 "hits" on "DHS wealth index," 2,434 citations of the article by Filmer and Pritchett (2001), 591 citations of the paper by Rutstein and Johnson (2004) documenting the creation of the DHS index. The main use of the indices in this vast literature is in creating wealth rankings, separating the "rich" from the "poor" as ingredients for more substantive analyses.
Several articles, including the original piece by Filmer and Pritchett (2001), have tried to validate these indices against external criteria, for example, incomes or expenditures. A recent review (Filmer and Scott, 2012) concludes that "the use of an asset index can clearly provide useful guidance to the order of magnitude of Note: We have benefited from useful comments from David Lam and seminar participants at the University of Michigan, as well as from audience members at the UNU-WIDER conference on Inequality-Measurement, trends, impacts, and policies, Helsinki, September 2014. We would also like to thank Conchita DAmbrosio and two anonymous referees for feedback which improved the paper markedly. Of course, we remain responsible for all remaining errors. *Correspondence to: Martin Wittenberg, 3.48.2, School of Economics Building, Middle Campus, University of Cape Town, Lovers Walk, Rondebosch 7701, South Africa (Martin.Wittenberg@ uct.ac.za).

V C 2017 UNU-WIDER
This is an open access article distributed under the terms of the Creative Commons Attribution IGO License https://creativecommons.org/ licenses/by/3.0/igo/legalcode which permits unrestricted use, distribution, and reproduction in any medium, provided that the original work is properly cited. In any reproduction of this article there should not be any suggestion that UNU or the article endorse any specific organization or products. The use of the UNU logo is not permitted. This notice should be preserved along with the articles URL. rich-poor differentials" (p. 389), although the asset indices measure a different concept than per capita consumption. Indeed, the paper devotes attention to the question of under which circumstances the two measures will provide the most similar rankings, arguing that this will occur when per capita expenditures are well explained by observed household and community characteristics and when "public goods" are more important in household expenditures than "private ones" such as food. In other work, we have ourselves argued that asset indices do a good job of proxying for income differences (Wittenberg, 2009(Wittenberg, , 2011. None of this literature has examined whether the asset indices calculated in the traditional way make sense internally, that is, according to a number of simple criteria such as that individuals that have more (of anything) should be ranked higher than individuals that have less. In particular, little attention has been paid to the problems created by the predominantly dummy variable nature of asset schedules. We show that this is not just a theoretical issue but that, in a number of cases, DHS wealth indices exhibit anomalous rankings.
One additional issue that has been lamented in some contexts is that the way in which these indices are typically calculated precludes the use of traditional inequality measures. One might think that if it makes sense to talk about inequality in incomes or wealth, it would certainly make sense to think about inequality in asset holdings (McKenzie, 2005;Bhorat and van der Westhuizen, 2013). Nevertheless, the manipulation of traditional indices is not a viable strategy (Wittenberg, 2013): a different approach is needed. As we show below, it is when we consider the particular problems of calculating inequality measures with dummy variables that many problems with the creation of asset indices crystallize. However, we show that these problems are not insuperable. Indeed, an approach due to Banerjee (2010) for dealing with multidimensional inequality can be used to create such asset indices, as we will show below.
We show that this approach is easy to implement and we apply it to South African data. This provides a new perspective on the evolution of South African inequality which is somewhat at odds with the literature measuring inequality with money-metric approaches. We think it is likely that the asset approach reveals genuine improvements over time, although the reduction in inequality is unlikely to be as dramatic as the Gini coefficients calculated on the asset indices suggest. We think that more detailed asset inventories would moderate some of the conclusions. Indeed, one of our key points is that asset indices need to be approached with some caution-churning out "wealth indices" in semiautomated ways, without considering in detail what the individual scores suggest, is likely to be problematic.
The plan of the paper is as follows. In Section 2, we provide a very brief overview of the theoretical literature dealing with asset indices. We follow on by enunciating several principles for the creation of such indices in Section 3. We refer to these as "principles" since our approach is not fully axiomatic. Our approach is more heuristic-investigating what happens when we apply different approaches to simple data and considering whether the answers make sense. We do this in Sections 4-6, where we consider first the case of a single binary variable and then we progressively consider more complicated cases. In each case, we consider both the index itself and what it might mean to estimate inequality with it. Having set Review of Income and Wealth, Series 63, Number 4, December 2017 V C 2017 UNU-WIDER out what we consider to be a defensible approach, we turn to applying it to DHS data in Section 7. Finally, we consider what assets may tell us about the evolution of inequality in South Africa from 1993 to 2008.
The chief contributions of our paper to the literature are both negative and positive. On the negative side, we show that there are anomalies embedded deep in the predominant approaches for creating asset indices, which users should be aware of before blithely adopting them. On the positive side, this paper: (1) describes how to construct an asset index that is internally coherent; (2) shows that inequality measures on this index are well defined and have reasonable interpretations; (3) provides some perspective on the "art" of index construction; and (4) provides a fresh perspective on South African inequality.

Literature Review
McKenzie (2005, p. 232) suggests that the idea of using the first principal component of a set of asset variables as an index for "wealth" has been around in the social science literature for a long time. Its use, however, has become common only after the publication of Filmer and Pritchett (2001) and the subsequent adoption of the method in the release of the DHS "wealth indices" (Rutstein and Johnson, 2004). The basic idea of principal components is to find the linear combination of the asset variables that maximizes the variance of this combination. More formally, if we have k random variables a 1 ; . . . ; a k , each standardized to be of mean zero and variance one, the objective is to rewrite these as (1) where the A i are unobserved components, created so as to be orthogonal to each other. Writing this in vector notation as a5VA; it follows that the covariance matrix (here equal to the correlation matrix R) is given by where U5E AA 0 ð Þ. Note that U is diagonal since the unobserved components are assumed to be orthogonal to each other. We need to impose some normalization in order to get a determinate solution. Let U be the matrix of eigenvalues and V the orthonormal matrix of eigenvectors, and assume that V is ordered so that the Review of Income and Wealth, Series 63, Number 4, December 2017 V C 2017 UNU-WIDER eigenvector associated with the largest eigenvalue is listed first. We can then solve for A, to obtain A5V 0 a: In particular, A 1 5v 11 a 1 1v 21 a 2 1 . . . 1v k1 a k : We will refer to this as the PCA index. By assumption, var A 1 ð Þ5k 1 , the first eigenvalue, and we can show that no other linear combination of the a i variables will achieve a greater variance (Wittenberg, 2009, pp. 5-6).
If the asset variables a i do not have unit variance and zero mean, they are first standardized, so that the equation for the first principal component will be given by where the coefficients v i1 are the elements of the eigenvector v 1 associated with the largest eigenvalue k 1 of the correlation matrix R of the a i variables. The constant c is the weighted sum of the means, which ensures that A 1 has a zero mean. The use of the first principal component was defended by Filmer and Pritchett (2001) on a "latent variable" interpretation of equations (1): A 1 is whatever explains most of what is common to a 1 ; a 2 ; . . . ; a k and it makes most sense to think of this as "wealth." Other authors have taken this formulation more seriously and have suggested that other procedures, such as factor analysis, be used to retrieve the common latent variable (Sahn and Stifel, 2003). 1 Although the procedure produces a different index than the PCA one, in practice the indices calculated by both approaches are highly correlated, particularly since authors using this approach seem to restrict themselves to extracting only one factor and eschew the "orthogonal rotations" that produce arbitrarily many solutions.
Reviews of the procedure have focused on several issues. First, if the assets are measured mainly through categorical variables, then the index defined through equation (2) is intrinsically discrete. The more assets and the more integer-valued variables (e.g. number of rooms) that are included in the index, the smoother the resulting index will be and the better will be its potential to differentiate finer gradations of poverty (McKenzie, 2005). Second, if categorical variables with multiple categories are included (e.g. water access), then the resulting group of dummy variables will be internally correlated with each other in ways that will influence the construction of the index. The more categories, the more dummy variables and the more this group influences the overall index. As a result, some authors have used multiple correspondence analysis instead (Booysen et al.,1 For a more detailed discussion of the factor analysis approach, see Wittenberg (2009).
Review of Income and Wealth, Series 63, Number 4, December 2017 V C 2017 UNU-WIDER 2008). Unfortunately, it cannot accommodate continuous variables. In practice, the PCA index is also highly correlated with the MCA index. An additional point is that some of the categories will inevitably feature as "bads" and so should definitely receive a negative weight (Sahn and Stifel, 2003). This is, however, different to the cases that we consider later, where "goods" get assigned negative scores.
A third issue which has received some attention is whether or not the index should include infrastructure variables (such as access to water and sanitation). Houweling et al. (2003) tested the PCA index rankings for sensitivity to the assets included. They were concerned about the fact that the infrastructure assets might have independent effects on the outcome of interest, in particular child mortality. They show that the rankings change somewhat as some of the "assets" are stripped out. Thus there are important judgments to be made in deciding which assets to include or exclude in an asset index.
Several authors have tried to validate asset indices against external benchmarks. We have already referred to the review article by Filmer and Scott (2012). They found that different techniques for constructing asset indices tended to get results that were highly correlated with each other, but in some cases differing from the rankings implied by per capita consumption. This is not thought to be a problem in principle, since it is possible that assets may be a more reliable indicator of long-run economic well-being. They may also be measured with less error (Filmer and Pritchett, 2001;Sahn and Stifel, 2003).
One noteworthy finding in Filmer and Scott (2012) is that urban-rural differences tend to be more marked when using asset indices than when using per capita expenditure. Consumption/expenditure is felt to be a better indication of longerrun money-metric well-being than income, and thus the high aggregate correlation between asset indices and consumption is not that surprising. But this makes the sharp urban-rural divergence between these two measures noteworthy. It could be due to the fact that wealth is more concentrated than consumption, but perhaps it is also due to the fact that many of the household durable goods that make up asset schedules (e.g. televisions and refrigerators) require electricity, which tends to be more accessible in urban areas. Indeed, we have argued that both principal components and factor analysis will tend to extract an index which is a hybrid of "wealth" and "urbanness" (Wittenberg, 2009). We will show below that the asset index values rural assets (in particular, livestock) negatively, thus making rural asset holders look poorer than they should. We will suggest that the urban-rural differences are actually exaggerated by the indexes.

Principles for the Creation of Asset Indices
Intuitively, all the justifications for the creation of an asset index rely on the idea that higher asset holdings should convert into a higher index number and, conversely, a higher index number should imply greater wealth. This is a simple, yet obvious, internal consistency requirement. We shall refer to this as the monotonicity principle. In order to outline this more rigorously, we first define what we mean by an asset and an asset index. We define assets as goods that provide (potentially) a stream of benefits. An asset variable A j will be a random variable such that a j is either the quantity or the value or the presence/absence of the asset. This excludes "bads." We also therefore do not allow a j to be negative.
Definition 1. Let a 1 ; a 2 ; . . . ; a k ð Þ2< k be a vector of asset holdings. The function A : < k ! < defined for all possible asset holdings is called an asset index.
Typically, we will restrict attention to linear asset indices, that is, indices that can be written in the form A a 1 ; a 2 ; . . . ; a k ð Þ 5v 1 a 1 1v 2 a 2 1 . . . 1v k a k .
Principle 2. Let A a 1 ; a 2 ; . . . ; a k ð Þ be an asset index. The asset index is monotonic if and only if Note that this is a fairly weak condition. It does, not, for instance, rule out "inferior" assets. For instance, if we had an asset schedule that listed different types of stoves-for example, electric, paraffin, coal, or gas-the corresponding "ownership" vectors might be recorded as 1; 0; 0; 0 ð Þ ; 0; 1; 0; 0 ð Þ ; 0; 0; 1; 0 ð Þand 0; 0; 0; 1 ð Þ , respectively. Since none of these vectors is numerically bigger than the other, there is no restriction on how the asset index should rank them either. However, if these are not recorded as mutually exclusive categories, then an individual that owned both an electric stove and a gas stove should receive a higher asset index than one that owns only an electric stove.
The second principle that we require is that the index must be ratio-scale, that is, it must have an absolute zero. This is indispensable if we want to calculate inequality measures on the index, since it is not valid to calculate "shares" (required to construct, for instance, the Lorenz curve) if the variable is not ratioscale. It implies in particular that the index must be able to recognize individuals or households that own nothing. Obviously, this principle is violated by all of the current asset indices, except those that simply sum up the number of assets. Nevertheless, it is sensible to maintain that if the notion of "asset holdings" is to have any meaning, it is only in relation to individuals that do not have any. Even for purely ranking exercises, it is conceptually necessary that it makes sense to define the "have-nots" and that they should rank at the bottom. Assuming that the previous two principles hold, it then makes sense to consider inequality measures on the space of asset index measures.
Principle 4. We will say that the asset inequality measure I is robust if it can be applied to asset vectors of dummy variables as well as to continuous ones.
Review of Income and Wealth, Series 63, Number 4, December 2017 V C 2017 UNU-WIDER Robustness is not a conceptual requirement, but it is desirable nonetheless, given that the asset information is typically dummy variable based. Theoretically, there is no reason why one should not construct different types of measures for different types of data. It is, however, much simpler if the approach can accommodate these differences. One big advantage of robust measures is that we know what the measures mean when the underlying data are of the continuous type that are treated in standard social welfare accounts. When these measures are applied to dummy variables, however, the interpretation becomes more complex. Robustness in this case means that the "standard" and "non-standard" treatments are part of the same continuum, so that if the measurement of the variable were to improve over time, we would only need to tweak our approach rather than switch completely. It is easier to see what this means by turning directly to the simplest case of all.

One Binary Variable
Consider first the case where we have precisely one binary variable, for example, we know whether or not the respondent owns a television set. Note that in this case the only possible "asset index" is the variable itself. Note also that we cannot analyse these data "from first principles" according to the typical axioms of inequality measurement, since these types of data will not support the "principle of transfers"-it is impossible to take away an asset from person j and give it to person i without them changing places in the distribution. Furthermore, such a "trade" (by the principle of anonymity) would leave the distribution precisely unchanged-and ratio-scale independence does not hold either, since rescaling of the variable does not provide a valid asset distribution.

Standard Inequality Measures
Many of the standard inequality measures (e.g. Atkinson indices) will not provide valid answers in the presence of zeroes. Nevertheless, some do, with the Gini coefficient the most common example. It is instructive to consider what the Gini of such a variable would measure. Assume that there are n 0 observations with zeroes and n 1 ones. Let the proportion of ones be p, that is, p5 n 1 N , where N5n 0 1n 1 . The Gini coefficient 2 is simply 12p. This is not an unattractive choice as a measure of inequality: if everyone has the asset, then the Gini is zero; as p ! 0-that is, the asset becomes concentrated in a smaller and smaller group-the index approaches one. It is obvious that given the paucity of information in the binary variable any "measure" of inequality must be, in some sense, a function of p.
There are some alternatives. For instance, the coefficient of variation applied to the binary variable would yield ffiffiffiffiffiffi ffi 12p p q . This again yields a measure of zero when p 5 1, but in this case the index of inequality approaches 11 as p ! 0.
2 Wagstaff (2005) provides a discussion of "concentration indices" for the case where the dependent variable is binary. This value for the Gini coefficient is a special case of his more general result.
Review of Income and Wealth, Series 63, Number 4, December 2017 V C 2017 UNU-WIDER Obviously, both measures break down at p 5 0. Indeed, in a world in which nobody has the "asset," it seems hard to define what inequality in the possession of that asset would mean. It is also worth noting that both measures give meaningful results only if the variable records the possession of a "good." If the variable measures a deprivation, it should be recoded first.

The Cowell-Flachaire Measures
An alternative to the cardinally based measures is the approach for ordinal variables proposed by Cowell and Flachaire (2012). These require us to measure the status of everyone in the distribution, where this is simply the count of everyone of equal rank or lower ("downward" measure) or, alternatively, everyone of equal rank or higher ("upward" measure). Both are expressed as proportions of the population. The vector of status measures s5 s 1 ; s 2 ; . . . ; s N ð Þis then used to calculate an inequality measure, relative to a "reference" status, which Cowell and Flachaire suggest should be set to 1. The inequality measures then become: A virtue of this set of measures is that it is invariant to the way in which the ordinal variable is "cardinalized," since the cardinalization will not affect the ranking of individuals in the distribution.
In the case of our binary variable, we obtain the following status values: Consequently, the "downward" measure would be while the "upward" measure would be 2plog p if a50: Low values of a emphasize inequality at the bottom of the distribution-the deprivation of those without the assets is felt more-while for a values close to one what happens at the top is more accentuated. Note that when p 5 0 or p 5 1, inequality is zero. Indeed, by considering the second derivative of I a , it is clear that this measure of inequality has an inverse "U"-shaped curve, as shown in Figure 1 for the case a 5 0. The variance of the distribution (which is also sometimes used as a measure of inequality) also exhibits this sort of pattern, with a low index of inequality near p 5 0 and p 5 1.

The Meaning of Asset Inequality
The difference in the behavior of the two groups of inequality "measures," namely a monotonic decrease in inequality as p goes from near zero to one versus inverse "U" shaped, raises fundamental questions about how we interpret the contrast between the "haves" and the "have-nots." In the Gini and coefficient of variation interpretation, that gulf is the central feature of the distribution-so if 99 percent of the population are lacking the asset but 1 percent have it, that is the most salient fact about the distribution. In the Cowell-Flachaire view, if most of the population shares the deprivation, then most outcomes are very similar to each other, that is, there is not a lot of inequality.
Which of these perspectives is right? Consider a "satisfaction with life" variable that has been measured on a Likert scale ranging from 1 (very dissatisfied) to 5 (very satisfied). Let 99 percent of the population record a "3" (i.e. neutral) but let 1 percent rate above that. This variable could be dichotomized as a 0/1 binary variable, with the "satisfied" responses scored as 1 while those below are recorded as zero. This distribution probably should not rate as very unequal, so in this context the Cowell-Flachaire measure seems more reasonable than the equivalent Gini. Note, however, that the Cowell-Flachaire measure is invariant to linear translation-that is, we would get the same measure whether 99 percent responded "3" and 1 percent "4" or whether 99 percent answered "4" and 1 V C 2017 UNU-WIDER percent "5," or even 99 percent "1" and 1 percent "5." Indeed, the reason why these all give the same distributional measure is that the conversion of the underlying phenomenon into a cardinal measure is arbitrary.
The central question is therefore to what extent the binary variable is an arbitrary coding of the underlying distribution. The crucial difference is not so much what the "1" codes for (since that could stand for almost any value), but whether the "0" can be thought of as absolute. Indeed, as we noted in the previous section, the Gini coefficient is sensible only if the variable is ratio-scale, that is, if the zero is absolute. The reason why the Gini scores inequality so highly when p is low is that the gulf between having nothing and having something is enormous. This is true, however, only if the "0" is really nothing and "1" signals the real possession of an asset (e.g. a car). Some of the variables typically used in the construction of asset indices need to be thought about very carefully in this context. For instance, a dummy variable for "tiled roof" obviously really measures the presence or absence of a "tiled roof." Nevertheless, the absence of a tiled roof does not imply the absence of all roofs; whether or not the gap between owning a thatched roof and a tiled roof is as vast as the gap between having nothing and having something is debatable.
Nonetheless, many of the assets do measure material gaps-ownership of a car or of a television are examples. Some infrastructure variables arguably also satisfy this criterion. The presence or absence of water in the house may be such a salient difference that the "0" really denotes a key absence. For variables such as these, the Gini measure seems closer to our intuition of how we would think about "asset inequality." We take two points away from this discussion. First, one needs to think quite carefully about what variables one wants to include in ones measure of "asset inequality." If the variables in question are, at best, ordinal quality-of-life measures (e.g. "tiled roof"), then the appropriate "inequality measure" needs to be an ordinal one, like the Cowell-Flachaire approach. Second, if the binary variable really captures the presence or absence of a real asset, then the behavior of the Gini coefficient accords more closely with our intuition of "asset inequality." Nevertheless, we accept that this is a judgment issue and that different analysts might come to different conclusions.

Two Binary Variables
We now turn to consider the case in which we have two binary variables. We could obviously analyse both variables separately, but we might want to combine the information to arrive at some overall measure of "asset inequality." There are several potential ways of doing this. First, we could combine the two variables into one scale (an "asset index") and then apply some inequality measure to that scale. Depending on whether we think of the scale as giving us cardinal or ordinal values, we could use either a standard inequality measure or the Cowell-Flachaire ordinal measures. Second, we could utilize some of the approaches in the "multidimensional inequality" literature. First, however, we will rehearse some of the issues that make the two variable case more complicated. To make the discussion more precise, let us presume that the empirical information on the two binary variables X 1 and X 2 is contained in the following matrix: where 0 n j is the n j null vector 0; 0; . . . ; 0 ½ 0 and 1 n k is the n k vector of ones 1; 1; . . . ; 1 ½ 0 . Let N5 X j n j , and let p 1 5 n 2 1n 4 N ; p 2 5 n 3 1n 4 N , and p 12 5 n 4 N ; that is, p 1 is the proportion of the population that owns asset 1, p 2 is the proportion that owns asset 2, and p 12 is the proportion owning both. Without loss of generality, let us assume that p 1 ! p 2 , that is, the second asset is rarer in the population than the first.
What distinguishes the cases is that the correlation between the two variables is positive in the former, while it is negative in the latter. The literature on multidimensional inequality measurement speaks about a "correlation increasing majorization" (e.g. Tsui, 1999, p. 150). Intuitively, the second case, in which everyone has an asset, should be less unequal than the first, in which some people have nothing and some have everything. In general, we would like a measure of inequality that is true to that intuition. We now turn to the first method, that of combining the two variables into one scale.

Creating an Asset Index
As noted above, one of the most common ways of creating an asset index is by means of principal components. Applying the PCA formula mechanically, we can derive the values of the asset index in terms of p 1 , p 2 , and p 12 (see Appendix A.1 in the online supporting information; in particular, the table). Several insights follow from an examination of those formulae. Trivially, since the mean of the variables (by construction) is zero and they include positive and negative values, we cannot use traditional inequality measures on these values. Second, however, the range of the index is a function of the ranges of the standardized variablesx 1 andx 2 . Those are of the form ffiffiffiffiffiffiffiffi V C 2017 UNU-WIDER one and follow a "U" shape, with a minimum at p 1 5 1 2 . As an "inequality statistic," the range (and hence dispersion) of the asset index therefore works inversely to the Cowell-Flachaire statistic for the univariate case. It is unlikely to communicate useful information about real inequality in the distribution of assets. This is the contrary to the intuition, expressed for instance by McKenzie (2005), that the dispersion of the index could be a measure of inequality.
A third point emerges from the fact that the "weight" assigned to asset 1 is the sign of the correlation coefficient between the two standardized assets, which is negative if p 12 < p 1 p 2 . Indeed, whenever p 12 < p 1 p 2 , the asset scores give the following ranking: 1; 0 ð Þ 0 0; 0 ð Þ and 1; 1 ð Þ 0 0; 1 ð Þ; that is, a person who has the more common asset is always ranked below a person who does not have the asset (see the second column of the table in Appendix A.1, in the online supporting information). How can this possibly make sense? The problem arises from the fact that the principal components analysis correctly isolates the negative correlation between the two assets. But the PCA procedure is intended to isolate what is common to both; this quandary is resolved by interpreting x 1 as a "bad" instead of a genuine asset. Given the philosophy of the PCA approach this is understandable, but it is problematic in this context nonetheless.
Indeed, it is not difficult to construct examples where the first asset becomes such an intense "bad" that a person having no assets gets a higher score than an individual with both assets. We show one example in Table 1. Indeed, whenever 12p 1 < p 2 < p 1 , the PCA rankings will produce such a perverse outcome. Table 1 also shows that the principal components method is not unique in this regard: the most popular alternatives, namely factor analysis (with one factor) and multiple correspondence analysis, produce precisely the same perverse ranking.
Is this case relevant for empirical analyses? There are, in fact, many practical examples where "assets" acquire negative weights in principal components procedures. In the South African case (as shown below), ownership of cattle is frequently negatively correlated with the ownership of other asset types, mainly because cattle are a typically "rural" asset while the other assets require a connection to the electricity grid.
Given these issues, it would be wise to restrict the construction of these type of asset indices to situations where the assets are positively correlated-although that would be a non-trivial limitation. Nevertheless, even in these cases, the question of how to deal with the negative values created by the estimation process remains. One possible response would be to add, to all index values, a positive constant that is big enough to ensure that only non-negative values remain. Linear translations of this sort have been used in some cases (Sahn and Stifel, 2000; Bhorat and van der Westhuizen, 2013). The problem is that while these shifts maintain the rankings, the Gini coefficients are not invariant to such transformations. Indeed, Lorenz dominance on subgroups can be reversed (see Wittenberg, 2013).

Creating an Ordinal Scale
With two binary variables, there are four possible outcomes, that is, four possible values for any combined scale. If we can rank the value of the two assetsfor example, if having the x 1 asset is preferred to having the x 2 asset-then we know that 1; 0 ð Þ 1 0; 1 ð Þ. But then we can rank all outcomes in the obvious way, that is, 1; 1 ð Þ 1 1; 0 ð Þ 1 0; 1 ð Þ 1 0; 0 ð Þ. Even if we cannot cardinalize these bundles, we could use the Cowell-Flachaire approach to create inequality measures. Indeed, one of their examples is precisely of this form.
One problem with this approach is that it does not take into account the correlation between the two assets. Consider, for instance, two assets that can be ranked (e.g. ownership of a car versus ownership of a television) and assume that precisely half of the population have cars and the other half have televisions. Assume now that we "redistribute" the televisions to those who have cars, that is, we now have half of the population that have cars and televisions and half that have nothing. The Cowell-Flachaire measures will report the same inequality measures before and after, despite the fact that the distribution has become a lot more unequal.
Additional problems arise if there are more than two binary variables, because we would then need to know how owning two lower-ranking assets (e.g. a television and a cell phone) rank relative to owning only the most desirable asset (e.g. a car). There are 2 k possible bundles with k binary variables and we would need to be able to rank all of them. In practice, with more than two variables this approach is likely to be intractable. Consequently, we will not pursue these Cowell-Flachaire measures further.
6. The Multidimensional Case

The Multidimensional Gini
The literature dealing with multidimensional indices tends to approach the issue axiomatically. However, currently this theoretical work does not extrapolate well to contexts in which multidimensionality is defined by vectors of binary variables. Given that such situations are very much the context of this paper, we use the multidimensional Gini proposed by Banerjee (2010). As the approach is based on the Gini coefficient, it can cope better with zeroes than approaches based on generalized entropy measures (e.g. Tsui, 1999).
The procedure harks back to the first approach, that is, creating a linear combination of the variables on which the Gini coefficient is then estimated. Again, the weights on the components are given by the elements of an eigenvector Review of Income and Wealth, Series 63, Number 4, December 2017 V C 2017 UNU-WIDER of a cross-product matrix, but in this case the variables are not demeaned, so that the moments, as it were, are calculated around zero rather than the mean. As a result, the weights are compelled to be positive. Banerjee proves that when applied to standard continuous non-negative variables, this approach provides an inequality index that satisfies all of the key axioms, but also shows increasing inequality if a "correlation increasing transfer" occurs.
More concretely, Banerjee suggests that variables should first be divided by their mean. In our case, transforming the original data matrix X (given in equation (4)), we obtain A (non-normalized) eigenvector associated with the maximal eigenvalue is p 2 2p 1 1 ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi p 2 2p 1 ð Þ 2 14p 2 12 q 2p 12 1 " # ; 0 provided that p 12 6 ¼ 0. If p 12 50, then the maximal eigenvalue is 1 p 2 with associated eigenvector 0 1 ½ . The "index values" (up to a multiplicative constant) for this case are given in Table 2. If p 12 6 ¼ 0 (and still assuming that p 2 p 1 ), we can show that the index will order the asset bundles as 0; 0 ð Þ 0 1; 0 ð Þ 0 0; 1 ð Þ 0 1; 1 ð Þ. However, when p 12 50-that is, the vectors of asset holdings are completely orthogonal-then the first asset gets a weight of zero, that is, y 0; 0 ð Þ5y 1; 0 ð Þ50. This does not create perverse rankings, but it does mean that asset one is completely ignored. This illustrates why Banerjee in his proofs requires that there be at least one individual that owns positive quantities of all assets. In this context, this would require p 12 > 0. This is undoubtedly a limitation, although one not nearly as severe as the requirement that the assets be positively correlated.
where u 1 5 y 1 p 1 y 1 1p 2 y 2 ; (6) is valid for any asset index which scores the assets as y 0; 0 ð Þ50; y 1; 0 ð Þ5y 1 ; y 0; 1 ð Þ5y 2 ; y 1; 1 ð Þ5y 1 1y 2 . The formula is interesting, because it shows that 12p 2 is an upper bound for the Gini, as the expressions in brackets in the third and fourth terms both have to be non-negative. So the proportion of the less common asset is the key determinant for inequality overall. In the extreme case where p 1 5p 2 5p 12 -that is, where the society splits into two groups, one which owns nothing and one which owns both assets-the upper bound is reached. Indeed, it is also reached in the case we have ruled out, where p 12 50, because then asset 1 is scored as having value zero, that is, u 1 50. It turns out that the behavior of the asset index and the associated Gini coefficient depend critically on p 12 . The rarer p 12 is, the more the procedure down-values the first asset. This is accentuated by the size of the gap between p 1 and p 2 .
One interesting special case is if p 1 5p 2 . Then the Gini approaches 122p 2 as p 12 ! 0, that is, it treats the two assets equally and inequality gets measured according to who has any assets versus who has none. This is an attractive property, although the probability of finding such a balanced relationship in any "real world" application is zero. Nevertheless, the limiting value of 122p 2 serves as a lower bound to the Gini coefficients that can be achieved.

Some Provisional Lessons
The key lesson is that the process of deriving weights for the asset index needs to be handled with care in any analysis of asset well-being and, in particular, in the analysis of asset inequality. The conventional PCA, FA, or MCA procedures can yield negative weights. Simply dropping these variables from the analysis (if they are genuine assets) is likely to skew the results in other ways. The uncentered PCA (UC PCA) of Banerjee can handle these cases, provided that ownership of these assets is not completely orthogonal to that of the other assets. Nevertheless, in situations where the overlap of asset holdings is relatively small, these unconventional assets may be down-valued. It seems important to inspect both the asset scores and the resulting rankings before doing any substantive analysis. These are not mere theoretical niggles. We have focused at length on the measurement of asset inequality, and have shown that these same limitations of PCA, FA, and MCA procedures lie behind their inability to provide useful applied measures of asset inequality. In contrast, UC PCA can provide such measures.
In the next two sections, we further explore these lessons using two South African case studies. In the next section, we use South African DHS data to interrogate the asset scores that result from all of the latent variable approaches including UC PCA. We find examples of negative asset values for each of PCA, FA, and MCA. The UC PCA approach does not produce these negative values and we are able to compare the vectors of values across these techniques and go on to show the potential costs of simply dropping such negative values in analysis using PCA, FA, or MCA.
This analysis affirms that while the UC PCA approach has some limitations, it has much to recommend it. In the context of this paper, its greatest strength is that it results in an asset index that contains only non-negative values. It therefore satisfies the standard axioms of inequality analysis and can be used for inequality analysis. We show this briefly in Section 7, and in Section 8 we implement a fuller example of this by exploring changes in asset inequality in South Africa in postapartheid South Africa.

Application to the DHS Wealth Indices
A 1998 Demographic and Health Survey (MEASURE DHS, 1998) allows us to apply the above discussion in the South African context in a way that explores the similarities and differences between the UC PCA approach on the one hand, and the PCA, FA, or MCA approaches on the other. The general approach to the creation of the DHS wealth indices is outlined in the paper by Rutstein and Johnson (2004). As many assets as possible are used, including country-specific ones. The actual coefficients underlying the index can be accessed on the DHS website. 3 For the South African case, the coefficients on several key variables are shown in the first column of Table 3. Note that these are the coefficients on the untransformed variables, that is, v i =s i (see equation (3)). The most important point for our purposes is the fact that the coefficients on the two livestock variables (possession of a donkey or horse, and possession of sheep or cattle) are both negative. It follows that individuals that have no assets will rank above individuals that have only donkey and/or cattle. Indeed, if we search for the poorest individuals (according to the wealth index), they invariably own livestock.
In order to investigate this further, we categorize individuals in terms of their possession (or otherwise) of "real" assets. We exclude building materials from the list and include only water piped inside the house and access to electricity. The list, with the corresponding summary statistics, is shown in Table 4. The minimal possible asset holding corresponds to one room with nothing else. Households in the Notes: *The DHS wealth index uses occupants per room rather than number of rooms. DHS WI, DHS wealth index; UC PCA, uncentered principal components analysis; PCA, principal components analysis; MCA, multiple correspondence analysis; FA, factor analysis. Notes: Estimates are weighted to the population using the sample weights. Standard errors adjusted for clustering. All variables are binary except for "Rooms." Review of Income and Wealth, Series 63, Number 4, December 2017 V C 2017 UNU-WIDER DHS with such minimal assets could have a large range of "wealth index" numbers, depending on the building material of which their accommodation was made. Interestingly, however, 13 percent of individuals who had a higher asset holding (typically owning livestock as well as having more rooms), nevertheless had a lower wealth index than the mean score among those with no moveable possessions. Indeed, the richest person among those with no water in the house, no electricity, one room, and no durables was better off (according to the wealth index) than 47 percent of individuals that had at least something on top of one room.
In order to explore the relationship between livestock ownership and other forms of assets further, we constructed a series of asset indices using our more restrictive list of assets. Besides the uncentered principal components index (labelled UC PCA in Table 3), we also constructed indices using PCA, MCA, and FA. The first point to note is that the negative weighting on livestock ownership is a feature of each of the latter three approaches. The coefficients shown in Table  3 are those on the untransformed variables, that is, before any standardization.
The second point to note is that the UC PCA also has its bizarre feature: in this case, it is the extremely large implied coefficient on ownership of a motorcycle. The reason for this is that the coefficient is v i =l i , where v i is the score from the principal components calculation and l i is the mean of the variable. We divide by l i due to the standardization suggested by Banerjee. As Table 4 shows, motorcycles are owned by very few South Africans, and consequently the score becomes inflated in ways which are unlikely to reflect their real asset status. Consequently, we decided to drop this variable and recalculate the index (the results are shown in column 3). Ownership of a personal computer now gets the highest score, although its magnitude is not as outlandish as that for the motorcycle.
Similarly, we also recalculated the PCA index without the livestock variables, to provide the fairest comparison between the two techniques. This, however, did not have much of an impact on the remaining coefficients, as can be seen by comparing columns 4 and 5 in Table 3. It will, of course, remove the anomalies noted earlier. Individuals owning livestock will now appear indistinguishable from individuals owning nothing. What is the impact of this for the identification of deprivation?
One simple check is to divide the population up into quintiles according to the two indices and see how well they compare. Table 5 performs that analysis. We see that there are some key differences. The starkest contrast is provided by the 175 households which are rated in the bottom quintile according to the PCA index but are rated at the top of the UC PCA. Looking at the means of the asset variables, it emerges that all of them owned horses/donkeys, 76 percent of them also owned sheep or cattle, and 75 percent of them also owned a radio. Ownership of horses and/or donkeys is a significant asset according to the uncentered PCA. Perhaps the coefficient is on the large side, but it is unlikely that households that own both types of livestock should truly be ranked among the poorest of the poor (the bottom 20 percent). Of course, the original PCA index would have ranked many of these households below the "poorest of the poor" (given the negative value on those assets).
In Table 6, we present the correlation matrix between the different asset indices. Although we have used fewer assets in our version of the principal components scores, they are still highly correlated with the wealth index released with the DHS. The PCA, FA, and MCA approaches end up highly correlated. The two uncentered PCA indices show much lower correlations. The first of these has very low correlations with all the indices, since motorcycle owners receive such high scores that the entire distribution is highly skewed (95 percent of all scores are below 8, whereas motorcycle owners score above 50). The second shows correlations of 0.75 with the PCA index that does not weight livestock negatively-but correspondingly lower correlations with the others that maintained that negative weighting.
The obvious implication of all of this is that the standard asset indices will tend to find higher urban-rural contrasts in poverty than the uncentered PCA. This is shown clearly in Table 7. In each case, we have classified the bottom 40 percent of individuals as "poor" according to the DHS wealth index, the PCA 2 index and the second uncentered PCA index. It is clear that there is a strong urban-rural poverty gradient. Nevertheless, the DHS wealth index accentuates this contrast, while the uncentered PCA index finds more urban poverty and less rural poverty. This should not be surprising given the negative valuation of rural assets in the DHS wealth index and the strong positive valuations of urban infrastructure.
Interestingly, calculating the Gini coefficient on the asset scores of the UC PCA we find (in Table 8) strong asset inequality in South Africa in 1998, not dissimilar to the magnitude of income inequality (Leibbrandt et al., 2010). Furthermore, as this table also suggests, there were strong inequalities within rural areas, a finding that many South Africans will find plausible. We now turn to consider the evolution of asset inequality in South Africa using two nationally representative surveys conducted under the auspices of SAL-DRU at the University of Cape Town. The first of these is the Project for Statistics on Living Standards and Development (PSLSD), conducted in 1993, and the second is the first wave of the National Income Dynamics Study (NIDS). These studies have already been used to investigate changes in money-metric income inequality over the period (Leibbrandt et al., 2010). It has been found that over this period, money-metric inequality started at very high levels and remained at those high levels.
Both of these surveys are nationally representative general living standards instruments that gathered detailed information on incomes, expenditures, and assets, as well as education, health, and other dimensions of well-being. The literature on money-metric inequality has been useful in giving detailed attention to the comparability of the incomes and expenditure in these two surveys over time (Leibbrandt et al., 2010).
The two datasets provide good coverage of household assets. However, they differ in asset registries. In total, 31 assets categories exist, of which the NIDS contains 29 and the PSLSD contains 19. The NIDS does not include an electrical kettle or the presence of a geyser in its asset register. Some of the assets not included in the PSLSD are due to technological progress. Assets such as computers and cell phones were not as prominent in 1993 as they are now and thus were not included. Furthermore, the NIDS includes greater detail with regard to  transportation assets (such as motorcycles, boats, and donkey carts) as well as agricultural assets (such as tractors, ploughs, and grinding mills), which are not included in the PSLSD. However, the PSLSD has the advantage of not only including ownership of assets but also the quantity of each asset owned. In order to look at asset inequality over time, we need to calculate a pooled index for the two periods first, so that we are using the same scores for the assets in each period. This limits us to assets that were asked for in both periods. The descriptive statistics presented by Bhorat and van der Westhuizen (2013) suggest that there has been considerable progress over the period. Table 9 presents the statistics as calculated on our data.
One immediately evident issue is that the prevalence of landlines has gone down as the availability of cell phones has become ubiquitous. If this measurement issue is not addressed, it will result in a spurious decrease in assets over time. Indeed, given the relative rarity of landlines in the later period, these would become erroneously marked as valuable assets instead of as assets whose utility is actually in decline. Consequently, we collapse landline and cell phone ownership into an omnibus "any phone" variable. The coefficients on the assets implied by our uncentered PCA asset index are given in Table 10. When we use this asset index to construct Lorenz curves, we obtain the result shown in Figure 2. The Lorenz curves show clear evidence that asset inequality fell considerably, and this is confirmed by the Gini coefficients, which fell markedly, from 0.47 in 1993 to 0.29 in 2008. As a reflection of the fact that these Lorenz curves and Gini coefficients were estimated from the pooled UCPC measure, the pooled or Population Lorenz curve is plotted in the figure too.
The fact that asset inequality should have declined is not surprising given that the statistics shown in Table 9 show strong increases in access to assets between 1993 and 2008. This is not universally true-motor cars, for instance, remain relatively rare. Nevertheless, the penetration of television, cell phones, refrigerators, and electricity suggest that asset holdings have certainly increased. By contrast, the money-metric measures suggest very little change. Part of the problem, of course, is that if the whole distribution shifts upward by an equiproportionate amount of money, the measured money-metric inequality will remain static. Note, however, that dummy variables cannot be rescaled in this way. The way in which we measure asset inequality will make asset ownership more common at the bottom, leave it unchanged at the top, and thus reduce inequality.
It is also true, of course, that all measurements are contingent on the schedules that are employed. One note of caution in this regard is appropriate. The asset inequality measure for 1998 that we calculated for the DHS is significantly higher than either the 1993 or 2008 measures that we have just considered. The main reason for this is that the asset schedule for 1998 included assets such as "personal computer," which allowed a better contrast to be drawn between high earners and the rest. 4 While we are sure that access to assets has spread and that in this sense asset inequality has decreased, the magnitude of the initial level of inequality and the size of the decrease are probably not as dramatic as suggested by the Gini coefficients that we have reported.

Conclusion
In this paper, we have argued that asset indices can be interesting and powerful tools for analysing social trends. However, doing so in an unreflective and automatic way is unlikely to provide useful insights. We have drawn particular Of course, the case of the motorcycle should remind us that some of these contrasts can be overdrawn.
Review of Income and Wealth, Series 63, Number 4, December 2017 V C 2017 UNU-WIDER attention to the fact that the standard approaches often value access to a good such as livestock negatively, implying that the household would be better off without access to that good. Proceeding in this way, for instance, has obscured real asset holdings in rural areas in the South African case. We go on to show how this has led to an exaggerated sense of rural deprivation and a lack of appreciation for deprivation in urban areas. This is not just about South Africa. DHS weights are available for a range of other African countries and these show that it is common for there to be negative weights on rural assets such as landholding and cattle. 5 There is a large literature showing that it is these very assets that are stores of value or wealth in many rural African contexts, which seems to provide strong support to the salience and importance of this point.
A related focus of this paper has been to link these problematic properties of widely used asset indices to the limitations of these indices in measuring asset inequality. So, the standard application of these indices may also have obscured real inequality within rural areas. But this has been hard to ascertain up to this point, as these indices do not allow for the measurement of asset inequality.
Our analysis has gone on to suggest that it is possible to create asset indices in ways that allow the calculation of Gini coefficients. To that end, we have used the method suggested by Banerjee for the calculation of "multidimensional Gini coefficients" using continuous data. Our application suggests that the technique can work well, provided that care is taken in ensuring that some rare assets do not distort the index. In general, then, whether researchers are using our proposed approach to deriving an asset index and measuring inequality, we have shown clearly in this paper that such indices should not be used without scrutinizing the implied coefficients.
We have used nationally representative survey data from 1993 and 2008 to derive an uncentered principal components analysis asset index for South Africa spanning the initial 15 years of post-apartheid South Africa. We have used this index to analyse how asset inequality has changed in South Africa between these two years. We have plotted comparable Lorenz curves and derived comparable Gini coefficients. The Lorenz curves show unambiguously that asset inequality has declined sharply over time. The Gini coefficients give a sense of the extent of this decline. This picture of falling asset inequality contrasts sharply with the money-metric analysis of inequality over the same period. The latter narrative is one of very high inequality in 1993 that does not fall over the post-apartheid years. Substantively, our empirical work suggests that the money-metric approach to inequality measurement in South Africa may have obscured the real progress in large portions of the population and in important dimensions of inequality.
Still, this stark difference does prompt some reflections on the limitations of the scope of our analysis in this paper. We have focused on the derivation of asset indices and asset inequality from a binary view of assets: whether households do or do not have access to them. Such are the data that we have in many developing countries and, as a consequence, such asset indices are very common in the international literature; thus justifying the focus of the paper.
Nonetheless, this binary view of the world misses the complexity arising from a more continuous approach. We cannot differentiate between households that have many instances of an asset (such as TVs) and those that only have one. Nor does it take account of the differing quality or values of the assets or the real returns that they bestow on the household. In short, our "asset indices" fall considerably short of true wealth indices.
Finally, it is worthwhile pointing out that "asset indices" are used toward many different ends: trying to identify the poor and deprived; measuring the gap between the rich and the poor; and ranking households in terms of their quality of life. Measures such as the Cowell-Flachaire one are aimed at only one of these objectives and perform well in that context. The fact that Cowell-Flachaire does not perform as well in the situation that we analyse is not to deny that utility. Indeed, it is too much to expect that one type of index could address all of the issues listed and do so equally well. The same is true of the UC PCA index. It is not a tool that will work in all contexts. Our point that asset indices should not be used in an unreflective way also extends here: it is vital to think about what goes into the index as well as how it is assembled.