• Open Access

How many types are there?


  • Corresponding author: Ian Crawford, Department of Economics, University of Oxford, Manor Road, Oxford, OX1 3UQ, UK. Email: ian.crawford@economics.ox.ac.uk.

  • Pendakur acknowledges the financial support of the SSHRC of Canada; Crawford is grateful for funding through the Centres for Microdata Methods and Practice and Microeconomic Analysis of Fiscal Policy at the IFS. We are very grateful to Soren Arnberg, Richard Blundell, Martin Browning, Andrew Chesher, Simon Sokbae Lee, Arthur Lewbel and J. Peter Neary for helpful comments.


We consider a revealed preference-based method that will bound the minimal partition of consumer microdata into a set of preference types such that the data are perfectly rationalisable by standard utility theory. This provides a simple, non-parametric and theory-driven way of investigating unobserved preference heterogeneity in empirical data, and easily extends to any choice model which has a revealed preference characterisation. We illustrate the approach using survey data and find that the number of types is remarkably few relative to the sample size – only four or five types are necessary to fully rationalise all observed choices in a data set with 500 observations of choice vectors.

1. Unobserved Heterogeneity in Microdata

One of the most striking features of consumer microdata is the great heterogeneity in choice behaviour which is evident, even amongst economic agents which are otherwise similar in observable respects. This presents researchers with a difficult problem – how to model behaviour in a way which accommodates this heterogeneity and yet preserves theoretical consistency and tractability.

One rather robust response is to demand that everything should be explainable by the theory in terms of observables alone. This view is typified by Stigler and Becker (1977):

Tastes neither change capriciously nor differ importantly between people.

The research agenda which follows from this view is one which tries to explain differences in observed behaviour without recourse to unobserved heterogeneity in tastes, but instead purely in terms of the theory and observable differences in constraints, characteristics of market goods and characteristics of agents. From this point of view, resorting to unobserved preference heterogeneity in order to rationalise behaviour is a cop-out; it is an admission of failure on the part of the theory.

From this perspective, it is therefore a matter of some regret that measures of fit in applied work on microdata are typically very low – that is, the theory performs poorly (Banks et al. (1997); Lewbel and Pendakur, (2009) who report R2 as low as 20% in consumer demand microdata). As a result, the belief that unobserved heterogeneity is an inescapable and essential part of the modelling problem has become the dominant view in the profession. This approach was summarised by the joint 2,000 Nobel laureates as follows:

In the 1960s, rapidly increasing availability of survey data on individual behavior … focused attention on the variations in demand across individuals. It became important to explain these variations as part of consumer theory, rather than as ad hoc disturbances. McFadden (2000), Nobel Lecture

Research in microeconometrics demonstrated that it was necessary to be careful in accounting for the sources of manifest differences among apparently similar individuals. … This heterogeneity has profound consequences for economic theory and for econometric practice. Heckman (2000), Nobel Lecture

In applied microeconometrics, the standard approach has been to pool data across agents and to model the behaviour of individuals as a combination of a common component and an idiosyncratic component which reflects unobserved heterogeneity. In its least sophisticated form, this amounts to interpreting additive error terms as unobserved preference heterogeneity parameters. Recently, it has become clear that such an approach typically requires a combination of assumptions on the functional form of the statistical model and the distribution of unobserved heterogeneity. Contributions here include McElroy (1987), Brown and Walker (1989), Lewbel (2001) and Lewbel and Pendakur (2009). Broadly, the current consensus on unobserved heterogeneity is that: it is a fundamental feature of consumer microdata; if neglected it makes econometric estimation and identification difficult; and it is rather hard to deal with convincingly, especially in non-linear models and where heterogeneity is not additively separable.

Whilst the dominant empirical methods, by and large, proceed by pooling agents, the approach which we develop here is based on partitioning . The spirit of pooling agents is to account for heterogeneity with a small number of extra parameters (e.g. one) per type or characteristic, as in fixed-effects models with lots of covariates. Here, many parameters (e.g. those relating to covariates) are shared across the agents in the pooled model, and each agent has one or more agent-specific parameters. Some pooling models have a continuum of types (Lewbel and Pendakur, 2009) and some have a small number of discrete types (Heckman and Singer, 1984) but pooling models share the feature that unobserved heterogeneity is an ‘add-on’ to the model shared by all agents.

In contrast, the spirit of partitioning is to allow each type to be arbitrarily different from every other, for example, by giving each type a completely different set of parameters governing the effects of covariates. We use restrictions resulting from the assumption that all agents are utility maximising to partition agents into groups which maximise different utility functions.

We work from the basis of revealed preference (RP) restrictions (Afriat, 1967; Diewert, 1973; Varian, 1982). At heart, RP restrictions are inequality restrictions on observables (prices, budgets and demands), which provide necessary and sufficient conditions for the existence of an unobservable (a well-behaved utility function representing the consumer's preferences which rationalises the data). RP restrictions are usually applied to longitudinal data on individual consumers and are used to check for the existence and stability of well-behaved preferences. In this article, we apply this kind of test to cross-sectional data on many different consumers (though, as we describe below, our idea applies to many contexts with optimising agents). In this context, RP restrictions are interpretable as a check for the commonality of well-behaved preferences.1

Of course, this is a rather simplistic idea. The very notion that such a check might pass and that the choices of all of the consumers in a large microeconomic data set could be explained perfectly by a single common utility function is, as Lewbel (2001) points out, ‘implausibly restrictive’. The real problem is what to do if (or more likely when) the data do not satisfy the RP restrictions. It is important to recognise that there are many reasons that a model which assumes homogeneous preferences might fit the data poorly including mistakes by the data collector (measurement error), mistakes by the individuals themselves (optimisation error) and mistakes by the theorist (specification error, which is to say applying the wrong model). The truth is doubtless a mixture of all three. This article focuses primarily on the last of these and in particular on the issue of preference heterogeneity: we ask how far we can get by assuming that unobserved preference heterogeneity is the sole cause of poor fit.2Dean and Martin (2010) provide one type of solution along these lines: they show how to find the largest subset of the data that do satisfy (some of) the RP restrictions. However, their approach leaves some of the data as unexplained by the optimising model.

The contribution of this article is to provide a different (and complementary) set of strategies for the case where the pooled data violate the RP restrictions. Here, some amount of preference heterogeneity is necessary to model those data – we need more than just one utility function. The question is how many do we need? Is it very many (perhaps as many as there are observations), or just a few? This article shows how to find out the minimum number of types (utility functions) necessary to fully explain all observed choices in a data set. In seeking the minimum number of utility functions necessary to rationalise behaviour, we keep with Friedman's (1953) assertion that we do not want the true model, which may be unfathomably complex; rather, we want the simplest model that is not rejected by the data. Occam's Razor applies here: we know that we can fully explain behaviour with a model in which every agent is arbitrarily different from every other but that approach is not useful for modelling or predicting behaviour. Instead, our aim is to group agents into types to the maximum possible degree that is consistent with common preferences. If the minimum number of types (utility functions) is very large relative to the number of observations, then modelling strategies with a continuum of types, or with one type for each agent (such as fixed effects models), might be appropriate. In contrast, if the minimum number of types is small relative to the number of observations, then modelling strategies with a small number of discrete types, such as those found in macro-labour, education choice and empirical marketing models, might be better.

We argue that our approach offers two main benefits which may complement the standard approaches to unobserved heterogeneity in empirical work. First, it provides a framework for dealing with heterogeneity which is driven by an economic model of interest and it thereby provides a practical method of partitioning data so that the observations in each group are fully theory consistent. This contrasts with approaches wherein only part of the model (the part which excludes the unobserved heterogeneity) satisfies the theory.3 Second, it is elementary: our approach does not require statements about the empirical distributions of objects we cannot observe or functional structures about which economic theory is silent. This contrasts with the standard approach of specifying a priori both the distribution of unobserved preference heterogeneity parameters and its functional relationship with observed variables.

We implement our strategy with cross-sectional data set consumer microdata. These data happen to record milk purchases but, importantly, they have individual-level price, quantity and product characteristics information, and so are ideal for the application of RP methods. We find that the number of types needed to completely explain all of the observed variation in consumption behaviour is quite small relative to the number of observations in our data. For our main application, with a cross-section data set of 500 observations of quantity vectors, we find that four or five types is enough. Furthermore, it seems that two-thirds of the data are consistent with a single type and two types are sufficient to model 85% of the observations.

This article is organised as follows. We begin with a description of the cross-sectional data on household expenditures and demographics which we use in this study. We then investigate whether these data might be rationalised by partitioning on observed variables which form the standard controls in microeconometric models of spending patterns. We then set out a simple method for partitioning on revealed preferences, and consider whether the results from these partitioning exercises can be a useful input to econometric modelling of the data. We then consider the problem of inferring the number of types in the population from which our sample is drawn. The final Section draws some conclusions.

2. The Data

In this article, we focus on the issue of rationalising cross-sectional household-level data on spending patterns with the standard static utility maximisation model of rational consumer choice. This approach can readily be extended to other more exotic economic models which have a non-parametric/revealed preference characterisation (examples are given in the discussion and in the online Appendix). The data we use are on Danish households and their purchases of milk. These households comprise all types ranging from young singles to couples with children to elderly couples. The sample is from a survey of households which is representative of the Danish population. Each household keeps a strict record of the price paid and the quantity purchased as well as the characteristics of the product. We aggregate the milk records to a monthly level, partly to ease the computational burden and partly to allow us to treat milk as a non-durable, non-storable good, so that the intertemporally separable model which we are invoking is appropriate. Six different types of milk are recorded in the data: skimmed, semi-skimmed or full-fat versions of either organic or conventionally produced milk. Quantity indices are computed by simply adding the volume of each variety purchased and a corresponding unit price (total expenditure on a given variety divided by the total volume of that variety purchased) is used as the price index. That we can differentiate varieties is a particularly attractive feature of the these data because it means that variation in these prices in the cross-section is principally due to supply and demand variation across markets (defined by time and location) and not due to unobserved differences in product qualities and characteristics.4 Our full data set has information on 1,917 households. As some of the following calculations are computationally quite expensive, we begin by drawing a smaller random sample of 500 households from our data. In Section 6, we return, gradually, to the full sample size.

Descriptive statistics are given in Table 1. In what follows let I = {i:i = 1,…,500} denote the index set for these observations and let {pi,qi}i ∈ I denote the price–quantity data. We will also make use of a list of observed characteristics (these are standard demographic controls used in demand analysis) of each household and these are represented by the vectors {zi}i ∈ I.

Table 1. Descriptive Statistics
{wi}i = 1,…,500Budget shares 
Conventional full fat0.1688010.3158
Conventional semi-skimmed0.4255010.4102
Conventional skimmed0.1521010.2932
Organic full fat0.0374010.1394
Organic semi-skimmed0.0977010.2237
Organic skimmed0.118500.99510.2669
 Total expenditure (DK) 
Total expenditure66.19864.8222345.127958.5765
{pi}i = 1,…,500Prices (DK litre) 
Conventional full fat6.15073.306811.32890.4652
Conventional semi-skimmed5.41044.09197.95670.4304
Conventional skimmed5.15244.16196.20750.1814
Organic full fat7.33356.11888.65970.1860
Organic semi-skimmed6.49685.05658.53740.2187
Organic skimmed6.26795.53127.96840.1501
{zi}i = 1,…,500Demographics 
Singles {0,1}0.3260010.4692
Singles parents {0,1}0.0420010.2008
Couples {0,1}0.3500010.4774
Couples with children {0,1}0.2300010.4213
Multi-adult {0,1}0.0520010.2222
Age (Years)47.8600188715.5240
Male HoH {0,1}0.92010.27156

Given these data, the question of interest is whether it is possible to rationalise them with the canonical utility maximisation model. The classic result on this issue is provided by Afriat's Theorem; see especially Afriat (1967); Diewert (1973); Varian (1982, 1983). Afriat's Theorem shows that the generalised axiom of revealed preference (GARP) is a necessary and sufficient condition for the existence of a well-behaved utility function u(q) which exactly rationalises the data. Such rationalisability requires that for every observed choice, qi, the choice made weakly utility-dominates all other affordable choices: u(qi) ≥ u(q) for all q such that inline image. Let inline image denote a direct revealed preference relation and let R be the transitive closure of R0. The GARP is defined by the restriction that inline image. If observed demands {pi,qi}i ∈ I satisfy these inequalities, then there is a single utility function (preference map) that can rationalise all observed demands. If not, then there is not.

We checked the data for consistency with GARP and it failed.5 No single utility function exists which can explain the choices of all of these households –Lewbel's (2001) warning seems to be justified. So we now turn to the question: how many well-behaved utility functions are required to rationalise these price–quantity microdata? Obviously 500 utility functions, each one rationalising each observation, will be over-sufficient. The next two Sections explore the idea of conditioning on observed demographic variables and revealed preferences to find a minimal necessary partition of these data.

3. Partitioning on Observed Variables

We begin by investigating whether it is possible to achieve a parsimonious partition of the data using ‘standard observables’– i.e. the sorts of variables which are often used as conditioning variables in microeconometric demand systems. To do this, we used information on the structure of the household (defined according to five groups: single person households, single parents, couples, couples with children, and multi-adult households), the age of the head of household (three roughly equally-sized groups: less than 40 years old, 40–60 years old and over 60), region (there are nine regional indicators observed in the data), the gender of the head of household and the size of the household's budget (deciles). Using these variables to partition observations, there were 341 non-empty cells – the distribution of groups sizes were 235, 68, 29, 5, 2 and 2, respectively, for singletons, pairs, triples and groups of 4, 5 and 6 households. Despite the fact that this partitioning of the data was clearly very fine and that by creating groups composed of very small numbers of households we improve the prospect that within-group tests of GARP are satisfied (indeed singletons cannot fail), we found this did not produce a partition which was consistent with within-group commonality of preferences. It seems that even very small groups of households with similar observable characteristics exhibit preference heterogeneity. An interesting implication of this exercise6 is that one can immediately conclude that no combination of two or three of these conditioning variables can produce consistent partitions. This is because combined data from multiple cells will always violate GARP if any of the data in the contributing cells violate. Thus, if a fine partition cannot rationalise the data, then neither can any coarser partition constructed from it.

Instead of using pre-defined cells to partition the data, it is also possible to take a more data-driven, adaptive approach. This is essentially a question of designing a search algorithm which uses the results of a sequence of GARP tests to tell the investigator where to place the partitions. The simplest example of such an approach would be to order the data by some observable like age then to start with the youngest household and add successively older households until the current group violates GARP. The data are partitioned at this point and the last household to be added is then used to start the next group and so on. If the investigator wishes to consider other conditioning variables, then the resulting partition is naturally path dependent (the sequence in which one selects the variables with which to order the data affects the final result). As the number of conditioning variables grows, the number of potential paths grows very quickly as does the computational complexity of finding the best solution. Nonetheless, while it may not be computationally feasible to find a fully efficient solution by checking all paths, such an approach does hold out the possibility of finding a more parsimonious partition than might be available through the use of pre-defined groups. To investigate we first stratified the data according to the household structure variable described above and next ordered the household of each structure by the age of the head of household and, beginning with the youngest, we sequentially tested the RP condition in order to see whether we could rationalise behaviour by a further partition on age into contiguous bands. This proved impossible because there were instances of households with the same structure whose heads of household were the same age whose behaviour was not mutually rationalisable. Having first split by household structure and then split by age and not yet found a rationalisation for the data, we further split by region. This too failed to rationalise the data as there were instances of households with identical structure and age living in the same region who were irreconcilable with a common utility function. We then looked at the gender of the household head. This, finally, produced a rationalisation of the data. In contrast to the exercise which used 341 pre-defined cells and still could not rationalise the data, this adaptive procedure produced a consistent partition with 46 types defined by household structure/age/region/gender.

The panel (a) of Figure 1 shows the distribution of group sizes with the groups ordered largest to smallest. This shows that the largest groups consist of approximately 5% of the data (there are two such groups) whilst the smallest (the 44th, 45th and 46th on the left of the histogram) consist of singletons. The panel (b) of Figure 1 shows the cumulative proportion of the data explained by the rising numbers of types. The first ordinate shows that approximately 5% of the data are rationalisable by one type (the most numerous) and approximation 10% by two most numerous types. Ten types are needed to rationalise half the data.

Figure 1.

Partitioning on Observed Demographics

It appears, therefore, that efforts to find a partition of the data into types which admit common within-type preferences on the basis of the sorts of variables typically observed in microdata on consumer choices do not seem to produce a parsimonious result. Whilst a search algorithm does a great deal better than the simpler fixed-cell type of approach, the results are still not impressive – each type only accounts for around 2% of the data on average.

4. Partitioning on Revealed Preferences

In this Section, we consider partitioning on revealed preferences. As before, we are interested in trying to split the data into as few (and as large) groups as we can such that all of the households within each group can be modelled as having a common well-behaved utility function. However, this time we will not use observables like those used above to guide/constrain us. The simplest ‘brute force’ approach would be to check the RP restrictions within all of the possible subsets of the data and retain those which form the minimal exclusive exhaustive partitions of the data. This is computationally infeasible as there are 2500 such subsets. Instead, we have designed two simple algorithms which will provide two-sided bounds on the minimal number of types in the data. The details of the algorithms need not detain us here (they are described in the online Appendix).

We ran the algorithms on our data and found that the minimal number of types was between 4 and 5. That is, one needs at least 4, and not more than 5, utility functions to completely rationalise all the observed variation in choice behaviour observed in these data in terms of income and substitution effects.7 For our upper bound of five types, our algorithm also delivers a partition of the data into the groups, such that within-groups a single utility function is sufficient to rationalise all the observed behaviour.8Table 2 gives the average budget shares for each group delivered by our upper bound algorithm, and Figure 2 shows the distribution of types and gives the same information as Figure 1 on the same scale for ease of comparison and in order to emphasise how parsimonious this partition is in comparison. In contrast to Figure 1, we can see that a single utility function can rationalise the observed choices of around two-thirds of the sample. And two utility functions is all that is needed to rationalise nearly 85% of the data.

Table 2. Average Budget Shares Across Types
  Conventional milkOrganic milk
Sample meansGroup NFull-fatSemiSkimFull-fatSemiSkim
Type 13210.1600.4960.1430.0240.0750.100
Type 21000.1550.2850.2050.0700.1210.162
Type 3530.2390.3510.0740.0440.1440.147
Type 4180.1340.2560.1480.0320.2580.170
Type 580.2920.1950.3570.1280.0170.009
Figure 2.

Partitioning by Revealed Preferences

Our expectation was that, even though conditioning on observables did not seem able to produce a parsimonious partition which could perfectly rationalise the data, nonetheless observable characteristics of households would be important, although imperfect, correlates of type membership. However, a multinomial logit model of group membership conditional on demographic characteristics (age and sex of household head, number of members, number of children and geographic location) has a (McFadden unadjusted) pseudo-R2 of only 5.4%.9 That is, observed characteristics of households are essentially uninformative regarding which of the five types to which a household is assigned. The implication here is that, in a framework where we want to find the minimum number of types, unobserved preference heterogeneity is vastly more important than observed demographic heterogeneity.

5. Estimation of Preferences

The incorporation of unobserved preference heterogeneity into demand estimation is a theoretically and econometrically tricky affair. Matzkin (2003, 2007) proposes a variety of models and estimators for this application, all of which involve non-linearly restricted quantile estimators, and most of which allow for unobserved heterogeneity which has arbitrary (but monotonic) effects on demand. These models are difficult to implement and, as yet, only Matzkin (2003, 2007) has implemented them. Lewbel and Pendakur (2009) offer an empirical framework that incorporates unobserved preference heterogeneity into demand estimation that is easy to implement but which requires that unobserved preference parameters act like fixed effects, pushing the entire compensated budget-share function up or down by a fixed factor.

Given the difficulty of incorporating unobserved preference heterogeneity beyond a fixed effect, it is instructive to evaluate how our five utility functions differ from one another. As group 5 has only 8 observations assigned to it, we leave it out of this part of the analysis. For the remaining groups, we estimate group-specific demand systems. As we know that, within each type, there exists a single preference map which rationalises all of the data, we need not worry about unobserved heterogeneity in our estimation. We know that there is a single integrable demand system which exactly fits the data for each group. However, we do not know the specification of that demand system so our main econometric problem is finding the right specification. We take the simplest possible route here and estimate a demand system with a flexible functional form – the quadratic almost ideal (QAI) demand system (Banks et al., 1997). The idea is that such a model should be flexible enough to fit the conditional mean well and that the interpretation of the errors is solely specification error.10

The QAI demand system has budget shares, inline image, for each good j = 1 ,…, K and each household i = 1, …, N given by




and inline image are prices, xi is total expenditure on all (milk) goods and inline image are error terms. The rationality restrictions of homogeneity and symmetry require that ∑kak = 1, ∑kbk = ∑kqk = 0, ∑kAkl = 0 for all l, and Akl = Alk for all k, l. We impose these restrictions and report the coefficients ak and bk in Table 3 below. Here, Engel curves (defined as budget-share functions over expenditure holding price constant) are roughly quadratic in the log of total expenditure. Blundell and Robin (1999) show that this budget-share system may be estimated by iterated seemingly unrelated regression (SUR), and we use that method. For estimation, we normalise each price to its median value and normalise expenditure to its median value, so that at median prices and expenditure,  ln pk =  ln x = 0. In practice, the estimates from this iterated model are ‘close’ to estimated coefficients from OLS regression of budget shares wj on a constant (aj), log-prices (Ajk), log-expenditure (bj) and its square (qj).

Table 3. Predicted Budget Shares and Semi-elasticities, QAI Estimation
  Conventional milkOrganic milk
GroupGroup NFull-fatSemiSkimFull-fatSemiSkim
Levels, aj
Group 13210.1550.4340.1730.0200.0850.133
Group 21000.1530.2870.1940.0890.0920.184
Group 3530.1950.3300.0910.0700.1300.184
Group 4180.0840.1710.2950.0520.2090.190
Semi-elasticities wrt expenditure, bj
Group 1321−0.0400.0040.021−0.0030.0020.016
Group 2100−0.0440.066−0.030−0.033−0.0020.043
Group 353−0.0130.010−0.0440.016−0.0190.049
Group 418−0.0990.034−0.1750.0280.268−0.056

By estimating budget-share equations for each of our four largest groups, we characterise what their Engel curves look like and test whether including group dummies in budget-share equations (as in Lewbel and Pendakur, 2009) is sufficient to absorb the differences across these utility functions.

The top panel of Table 3 gives predicted budget shares for each group, evaluated at a common constraint defined by the vector of median prices and the median milk expenditure level. (These are the level coefficients in the QAI regressions for each group, where prices and expenditures are normalised to 1 at the median constraint.) The point estimates differ quite substantially across groups and a glance at the estimated standard errors shown in parentheses shows that the hypothesis that these point estimates are the same value is heartily rejected.

The bottom panel of Table 3 gives estimated slopes of budget shares with respect to the log of expenditure (expenditure semi-elasticities) at a common constraint defined by the vector of median prices and the median milk expenditure level. These are the slope coefficients in the QAI regressions for each group and they differ somewhat across groups. We can weakly reject the hypothesis that the slopes are the same across all four groups: the sample value of the Wald test statistic for the hypothesis is 26 and under the Null it is distributed as a inline image with a p-value of 3.7%. In fact, the restriction that we can bring in heterogeneity via group dummies implies that all these groups have the same slope and curvature terms. This hypothesis is also weakly rejected – the sample value of the test statistic is 45.4, and under the Null it is distributed as a inline image with a p-value of 3.5%. Individually, only groups 2 and 4 show evidence that they differ from group 1 in terms of the total expenditure responses of budget shares (they test out with p-values of 8% and 1%, respectively).

While expenditure effects differ only modestly across groups, the estimated price responses of budget shares differ greatly across groups. We do not present coefficient estimates here because there are 15 of them for each group but we can assess their difference across groups via testing. The test that all four groups share the same price responses has a sample value of 382 and is distributed under the Null as a inline image with a p-value of less than 0.1%. Further, any pairwise test of the hypothesis that two groups share the same price responses rejects at conventional levels of significance.

One can also test the hypothesis that the heterogeneity across the types can be absorbed into level effects. Not surprisingly, given that we reject both the hypotheses that total expenditure effects are identical and that price effects are identical, this test is massively rejected. The test statistic has a sample value of 405 and is distributed under the Null as a inline image with a p-value of less than 0.1%.

One problem with using the QAI demand system to evaluate the differences across groups is that there is no reason to think that the functional structure imposed by the QAI demand system is true. An alternative approach is to use non-parametric methods. These methods have the advantage of not imposing a particular functional form on the shape of demand. They have the disadvantage of suffering from a severe curse of dimensionality, because in essence one needs to estimate the level of the function at every point in the support of possible budget constraints. The dimensionality problem is that this support grows fast with the number of goods in the demand system. A non-parametric approach that does not suffer from the curse of dimensionality is to try to estimate averages across the support of budget constraints.

In the top panel of Table 4, we present the average over all observed budget constraints of the non-parametric estimate of budget shares for each group. For the non-parametric analysis, we study only the three largest groups, totalling 474 observations. For each group, we non-parametrically estimate the budget-share function evaluated at each of the 474 budget constraints and report its average over the 474 values. Non-parametric estimates of budget-shares given prices and expenditures are computed following Haag et al. (2009) and the averages of these estimates are presented in the Table. The non-parametric estimate of the budget-share vector at a particular expenditure level and price vector is the locally weighted average of budget-shares, with weights declining for observations with ‘distant’ prices or expenditures. Haag et al. (2009) show how to estimate such a locally weighted model while maintaining the restrictions of Slutsky symmetry and homogeneity, and we use that approach. Simulated standard errors are in parentheses.11

Table 4. Predicted Budget Shares and Semi-elasticities, Non-parametric Estimation
 MeanMinMaxStd. Dev
{wi}i = 1,…,500Budget shares 
Conventional full fat0.1688010.3158
Conventional semi-skimmed0.4255010.4102
Conventional skimmed0.1521010.2932
Organic full fat0.0374010.1394
Organic semi-skimmed0.0977010.2237
Organic skimmed0.118500.99510.2669
 Total expenditure (DK) 
Total expenditure66.19864.8222345.127958.5765
{pi}i = 1,..,500Prices (DK litre) 
Conventional full fat6.15073.306811.32890.4652
Conventional semi-skimmed5.41044.09197.95670.4304
Conventional skimmed5.15244.16196.20750.1814
Organic full fat7.33356.11888.65970.1860
Organic semi-skimmed6.49685.05658.53740.2187
Organic skimmed6.26795.53127.96840.1501
{zi}i = 1,…,500Demographics 
Singles {0,1}0.3260010.4692
Singles parents {0,1}0.0420010.2008
Couples {0,1}0.3500010.4774
Couples with children {0,1}0.2300010.4213
Multi-adult {0,1}0.0520010.2222
Age (Years)47.8600188715.5240
Male HoH {0,1}0.92010.27156

The top panel of Table 4 shows average levels that are broadly similar to the sample averages reported in Table 2. However, those reported in Table 4 differ in one important respect: whereas those shown in Table 2 are averages across the budget constraints in each group, those reported in Table 4 are averages across the budget constraints of all groups. That is, whereas the sample averages in Table 2 mix the effects of preferences and constraints, the non-parametric estimates in Table 4 hold the budget constraints constant. These numbers suggest that there is a quite a lot of preference heterogeneity. For example, Group 1 and Group 2 have statistically significantly different average budget shares for most types of milk.

Given that unobserved heterogeneity which can be absorbed through level effects can fit into recently proposed models of demand (Lewbel and Pendakur, 2009), it is more important to figure out whether the slopes of demand functions differ across groups. The bottom panel of Table 4 presents average derivatives with respect to the log of expenditure (i.e. the expenditure semi-elasticities of budget-share functions), again averaged over the 474 observed budget constraints, with simulated standard errors shown in parentheses.

Clearly, the estimated average derivatives are much more hazily estimated than the average levels. But, one can still distinguish groups 1 and 2: the skimmed conventional milk budget share function of group 2 has a statistically significantly lower (and negative) expenditure response than that of group 1. No other pairwise comparison is statistically significant. However, the restriction that the average derivatives are the same across groups combines 10 z-tests like this, two restrictions for each of the five independent equations. One can construct a non-parametric analogue to the joint Wald test of whether the three groups share the same expenditure responses in each of the six equations. This test statistic has a sample value of 24.3 and has a simulated p-value of 0.7%.12

The picture we have of the heterogeneity in the consumer microdata is as follows. First, we can completely explain all the variation of observed behaviour with variation in budget constraints and four or five preference maps (i.e. ordinal utility functions). Second, the groupings are not strongly related to observed characteristics of households. That is, the primary heterogeneity here is unobserved. Third, the groups found by our upper bound algorithm are very different from each other, mainly in terms of how budget shares respond to prices but also in expenditure responses. That the budget-share equations of the groups differ by more than just level effects suggests that unobserved preference heterogeneity may not act like ‘error terms’ (or fixed effects) in regression equations and thus do not fit into models recently proposed to accommodate preference heterogeneity in consumer demand modelling.

6. How Many Types in the Population?

Up to now, we have concerned ourselves with the question of how many types are needed to characterise preferences in a sample of micro-economic choice data. This begs the question of how many types are needed to characterise preferences in the population from which the sample is drawn. This is similar in some ways to the famous coupon collector's problem (Erdős and Rényi, 1961) and other classical problems in probability theory like the problem of estimating how many words Shakespeare knew, based on the Complete Works (Efron and Thisted, 1976). It is also a difficult question to answer credibly – especially when the unseen types in the population are not abundant and there is consequently a high probability that you will miss them in any given sample.

Biologists have long concerned themselves with a question which is closely analogous to ours, that of the number of species which exist in the population of animals. Biostatisticians have developed a variety of estimators for this object. Most are based on the ‘frequency of frequencies’ of species in a sample of animals; see, e.g., surveys by Bunge and Fitzpatrick (1993); Colwell and Coddington (1994). The frequency of frequencies records the number of singletons, defined as species observed only once in a sample, the number of doubletons, defined as the number of species observed twice, and so on.

Perhaps, the simplest of these estimators is that of Chao (1984) who proposes a lower bound estimator of the number of species equal to inline image, where sobs is the number of species observed in the sample, s1 is the number of singletons and s2 is the number of doubletons. This estimator has the property that it equals sobs when there are no singletons (s1 = 0). A variety of other (non-parametric) estimators have been proposed since Chao (1984) but as far as we are aware all share this same property regarding the dependence on the number of singletons.

Another approach is to characterise the number of species via extrapolation of the number of species observed in increasingly large samples from a finite population (Colwell and Coddington (1994) survey this literature). The idea is intuitively appealing: if the graph of sobs(N ), the number of observed species as a function of sampling effort measured by the sample size N, asymptotes to a fixed number, then this may be taken as an estimate of the number of species in the population.

The analogy between animal species and preference types is worth considering for a moment. Whether two individuals could have the same utility function, and thus could be of the same type, is verifiable (via RP tests). However, when revealed preference restrictions are used to identify types, it is often possible to fit individuals into more than one type. That is to say that the definition of a type is not ‘crisp’ and the allocation of individuals to types is not unique. It may be that persons A and C violate an RP test when pooled together, and so have different preferences, but that B passes an RP test when combined with either – where then should we put B? In assessments of biodiversity which apply the statistical methods described above, the literature proceeds as if there is no such uncertainty as to which species an observation should be assigned. It is worth pointing out that biologists know that this is not entirely true. There exist ‘ring species’ (the Ensatina salamanders which live in the Central Valley in California are the famous example) where (sub)species A and C cannot breed successfully, but species B can breed with either A or C– where then should B lie in the taxonomy? The biostatistics literature treats this as an ignorable problem. It may or may not be an ignorable problem for economists. Nonetheless we too will ignore it.

As shown in the previous Section, we did not find any singletons in our data set of 500 observations. Therefore, the frequencies of frequencies approach cannot be applied fruitfully in our data – it will simply give an estimate of the number of types in the population equal to the number of types in our sample. So, we adopt the idea of plotting sobs(N) and extrapolating. From the full data set of 1,917 observations, we took random sub-samples of sizes 250, 500, …, 1,750, and the full sample of 1,917 observations, and ran our upper and lower bound algorithms to determine bounds on the minimum number of types necessary to rationalise all the observed choices in each sample. Figure 3 shows results for sobs(N ): the upper line traces out the upper bound, and the lower line traces out the lower bound. Figure 4 shows the ratio of types to sample size: sobs(N )/N.

Figure 3.

The Bounds on the Number of Types Against Sample Size

Figure 4.

The Bounds on the Ratio of Types to Sample Size

Examination of Figure 3 does not immediately suggest an asymptote for sobs(N). However, it is clear from Figure 4 which shows the ratio of types to sample size that the number of types rises slower than linearly with the number of observations in the sample. Because the number of observations in the sample does not get anywhere near the size of the population (about 2.5 million, the number of households in Denmark), we cannot pick out this asymptote in a non-parametric way. Raaijmakers (1987) suggests the use of the parametric Eadie–Hoffstee equation (sobs(N ) = spop − Bsobs(N )/N, where spop and B are unknown parameters) to estimate the asymptote and provides a maximum likelihood estimator for the asymptote spop, which may be taken as the estimated number of species in the population. Implementation of this estimator using the upper bound on the number of types results in an estimate of 10.75 with a standard error of 0.6. This suggests that the number of types in the population is at most 12.

7. Conclusions

We consider an elementary method of partitioning data so that it can be explained perfectly by the theory, and in a way which admits the minimal necessary heterogeneity. We argue that our approach offers two benefits which may complement the more established microeconometrics treatment of unobserved heterogeneity. First, it provides a framework in which to study heterogeneity which is driven by the economic model of interest. In doing so it provides a practical method of partitioning data so that the observations in each group are precisely theory consistent rather than just approximately so. This allows researchers to estimate group-specific demand models without fear of the complications which arise in the presence of unobserved heterogeneity. Second, it does not require statements about the distributions of objects we cannot observe or functional structures about which economic theory is silent.

Throughout this article, we have focused on consumer data and on the canonical utility maximisation model. This is mainly for expositional reasons and it is important to point out what we are proposing can easily be applied to the analysis of heterogeneity in any microeconomic model of optimising behaviour which admits a RP-type characterisation. This is an increasingly wide class which includes profit-maximisation and cost-minimisation models of competitive and monopolistic firms, models of intertemporal choice, habits, choice under uncertainty, collective household behaviour, characteristics models and firm investment, as well as special cases of all of these models which embody useful structural restrictions on preferences or technology (e.g. weak separability, homotheticity and latent separability).13 To adapt our approach to any of those models, one simply replaces the GARP check in all the algorithms with the appropriate RP check (see the online Appendix). The point is that our strategy for assessing heterogeneity in the consumer demand framework is, in principle, applicable to any environment where agents are assumed to be optimising something.

In the empirical illustration, we characterise the amount of heterogeneity necessary to completely rationalise the observed variation in our consumer microdata. We find that very few types are sufficient to rationalise observed behaviour completely. Our results suggest that Stigler and Becker (2000) had it wrong: preferences do indeed differ both capriciously and importantly between people. The capriciousness is that although in the three decades since Becker and Stigler's assessment, we have learned much about how to deal with preference heterogeneity that is correlated with observed variables, it seems that the more important kind of heterogeneity is driven by unobserved variables. Our results also suggest that models which use a small number of heterogeneous types – such as those found in macro-labour models, education choice models and a vast number of empirical marketing models – may in fact be dealing with unobserved heterogeneity in a sufficient fashion. In contrast, models like Lewbel and Pendakur (2009), in which unobserved preference heterogeneity is captured by a multi-dimensional continuum of unobserved parameters, could well be overkill.


  • 1

     We are not the first to make this observation. Gross (1995) also applies RP tests to cross-sectional consumer data to look at the evidence for and against the assumption of commonality.

  • 2

     In fact, the approach considered here can be augmented to allow for measurement and optimisation errors as well. The methods involved are not original to this article, but we give a brief account of them in the online Appendix.

  • 3

     We note that by having a model in which the data are theory-consistent by construction, one cannot test the theory. Indeed, in our context, testability amounts to precluding unobserved heterogeneity.

  • 4

     See Deaton (1988) for a discussion of the problems which arise when unit prices which combine multiple varieties of goods are used.

  • 5

     We use the method described in Varian (1982) which uses an algorithm due to Warshall (1962) to check for cycles which violate GARP. The time required is proportional to the number of observations cubed. See the Appendix in Varian (1982) for details.

  • 6

     We are grateful to an anonymous referee for suggesting this.

  • 7

     Recalling that our data are a random sample of 500 observations from a larger data set of 1,917 observations, we also investigated the variability of these bounds induced by (re)sampling. We took 25 samples of 500 observations with replacement and calculated the bounds on the number of types in each sample. In all cases, the bounds remained [4,5]. We are very grateful to an anonymous referee for suggesting this exercise and conclude from it that the bounds on the number of types, for a given sample size, is reasonably robust to sampling variation. We investigate the effects of varying the size of the sample below.

  • 8

     Note that the allocation of households to groups is not necessarily unique – it might be feasible to allocate any given household to more than one group. We return to this point below.

  • 9

     We note that the low value of the pseudo-R2 is not driven by the large number of classifications (5). If we drop the fifth type (the smallest group), the pseudo-R2 drops to 4.5%, and if we drop the fourth and fifth types (the two smallest groups), it drops to 4.1%. We also not that the mean value of each regressor is not significantly different across groups.

  • 10

     Measurement error is much more cumbersome to consider in a revealed-preference context, so we do not consider it here.

  • 11

     It is well known that average derivative estimators suffer from boundary bias. Although the estimates in Table 4 do not trim near the boundaries, estimates which do trim near the boundaries yield the same conclusions. Standard errors are simulated via the wild bootstrap using Radamacher bootstrap errors. Non-parametric estimators only suffer from specification error in the small sample. Such error disappears as the sample size gets large. Further, unobserved heterogeneity need not cause a deviation from the regression line, because such heterogeneity is not necessary after our grouping exercise. Thus, the wild bootstrap, which bases simulations on resamples from an error distribution, is actually an odd fit to the application at hand. An alternative is to resample from budget constraints (rather than from budget shares) to simulate standard errors. These simulated standard errors are much smaller and make the groups look sharply different from each other in terms of both average levels and average slopes.

  • 12

     If we use the alternative resampling strategy which provides tighter standard errors (outlined in the previous footnote), then the test that the average derivatives are the same for all three groups is rejected in each of the five independent equations, and, not surprisingly, rejected for all five together.

  • 13

    Afriat (1967), Hanoch and Rothschild (1972), Diewert (1973), Varian (1982, 1983a,b, 1984), Browning (1989), Bar-Shira (1992), Cherchye et al. (2007) and Blow et al. (2008).

  • 14

     The treatment of optimisation errors in RP tests is due to Afriat (1967) and that of measurement error is due to Varian (1985).