SEARCH

SEARCH BY CITATION

Keywords:

  • Data envelopment analysis;
  • Spottiswoode report;
  • Stochastic frontier analysis;
  • Value judgment

Abstract

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Feasible performances
  5. 3. Geometric insights
  6. 4. Supporting algebra
  7. 4.1. Becoming 100% efficient!
  8. 4.2. Bad outputs
  9. 5. Developments in data envelopment analysis
  10. 5.1. Slack
  11. 5.2. Weight restrictions
  12. 5.3. Environmentals
  13. 6. Stochastic frontier analysis
  14. 7. Police forces and performances
  15. 8. Critical features of the data envelopment analysis technique
  16. 8.1. Missing values?
  17. 8.2. Indiscrimination
  18. 8.3. Local priorities
  19. 8.4. Rank difficulties
  20. 8.5. Variable returns to scale
  21. 8.6. Weight restrictions
  22. 9. Prospects
  23. Acknowledgements
  24. Appendix A
  25. References
  26. Discussion on the paper by Stone
  27. References in the discussion

Summary. The single-input case of the ‘technical efficiency’ theory of M. J. Farrell is reformulated geometrically and algebraically. Its linear programming developments as ‘data envelopment analysis’ are critically reviewed, as are the related techniques of ‘stochastic frontier analysis’. The sense and realism of using data envelopment analysis or stochastic frontier analysis techniques, rather than some value-based method, for the assessment of police force efficiency are questioned with reference to the Spottiswoode report and related studies.

‘There can be no economy where there is no efficiency’… (Disraeli, 1868).

‘Order is, at one and the same time, that which is given in things as their inner law… and also that which has no existence except in the grid created by a glance, an examination, a language; and it is only in the blank spaces of this grid that order manifests itself as though already there, waiting in silence for the moment of its expression’… (Foucault, 1973).

‘Striving to better, oft we mar what's well’… (King Lear, act I, scene iv).

1. Introduction

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Feasible performances
  5. 3. Geometric insights
  6. 4. Supporting algebra
  7. 4.1. Becoming 100% efficient!
  8. 4.2. Bad outputs
  9. 5. Developments in data envelopment analysis
  10. 5.1. Slack
  11. 5.2. Weight restrictions
  12. 5.3. Environmentals
  13. 6. Stochastic frontier analysis
  14. 7. Police forces and performances
  15. 8. Critical features of the data envelopment analysis technique
  16. 8.1. Missing values?
  17. 8.2. Indiscrimination
  18. 8.3. Local priorities
  19. 8.4. Rank difficulties
  20. 8.5. Variable returns to scale
  21. 8.6. Weight restrictions
  22. 9. Prospects
  23. Acknowledgements
  24. Appendix A
  25. References
  26. Discussion on the paper by Stone
  27. References in the discussion

The seminal concept of technical efficiency was introduced by Farrell (1957). It was the seed for later exploitation, following its rediscovery by Charnes et al. (1978) and subsequent relabelling as ‘CCR-efficiency’ under the broader heading of ‘data envelopment analysis’ (DEA) (Cooper et al., 2000).

What, roughly speaking, is ‘technical efficiency’ as initially defined? Farrell took the theory of Pareto optimality as developed by Koopmans (1951) and Debreu (1951), and made ‘one small step’ for econometrics. Koopmans and Debreu were concerned with the overall efficiency of an economic system broken down into interacting ‘activities’ (Koopmans) or ‘production units’ (Debreu). Farrell's innovation was to see that the technique that gave Debreu's efficiency coefficient for the whole system could be applied to the individual independent production units of an industry, where the data for each unit were the volumes of specified inputs and outputs. Farrell's sole application was to the farming industry in the then 48 states of the USA, with four inputs and a single output (cash receipts). His theory was for any number of inputs and any number of outputs, but this paper will consider only the single-input case, thereby turning the farm production example on its head. The restriction accommodates the only application considered here—to the 43 police forces of England and Wales—provided that we can replace Farrell's single cash output by a single cash (or cash equivalent) input.

For the multiple-outputs case, it is necessary to think of an output profile vector, rather than a production function that gives a single output. For the definition of Farrell's technical efficiency, the output profile of a unit is compared with a sufficiently rich set of (typically all hypothetical) unit performances that have the same output profile but (typically) different values of the input variable. Each of these performances must be taken as feasible even if it has never been observed, even approximately.

The (input minimization) technical efficiency of a unit is simply the ratio f of the smallest input in the comparison set of feasible performances, with the same output profile, to the actual input to the unit. The idea is that, whatever the characteristics of the observed output profile, that profile could have been achieved with input reduced to a fraction f of its actual value, by eliminating the inefficiency represented by the complementary fraction 1− f. An alternative definition of f, involving output maximization for fixed input with fixed ratios of the outputs, will not be considered here.

The key elements in Farrell's (1957) approach are the procedures by which feasible performances are invented: Section 2 is a simple account that suffices for input minimization in the single-input case. Section 3 aims to convey the richness of the associated geometric operations. Section 4 covers all the algebra that we need, including the important theorem of Nunamaker and an implicit connection to linear programming. Section 5 covers the more interesting of the DEA developments of Farrell's technique, whereas Section 6 lists the widely recognized weaknesses in stochastic frontier analysis (SFA). Section 7 samples the plethora of performance indicators that have been devised for the police and touches on divided views about their aggregation. Section 8 is unreserved in criticism of the DEA technique, whereas Section 9 invites the reader to have a look at Appendix A which tentatively develops an alternative value-based approach.

2. Feasible performances

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Feasible performances
  5. 3. Geometric insights
  6. 4. Supporting algebra
  7. 4.1. Becoming 100% efficient!
  8. 4.2. Bad outputs
  9. 5. Developments in data envelopment analysis
  10. 5.1. Slack
  11. 5.2. Weight restrictions
  12. 5.3. Environmentals
  13. 6. Stochastic frontier analysis
  14. 7. Police forces and performances
  15. 8. Critical features of the data envelopment analysis technique
  16. 8.1. Missing values?
  17. 8.2. Indiscrimination
  18. 8.3. Local priorities
  19. 8.4. Rank difficulties
  20. 8.5. Variable returns to scale
  21. 8.6. Weight restrictions
  22. 9. Prospects
  23. Acknowledgements
  24. Appendix A
  25. References
  26. Discussion on the paper by Stone
  27. References in the discussion

The simplest conceivable case is one where n units are independently engaged in the same range of productive activities over the same time period and subject to the same environmental influences. The resulting data �� = {(xi,yi),i =1,…,n } where xi is the (continuously variable) input and yi =(yi(1),…,yi(s)) is the s-dimensional (continuously variable) output vector for the ith unit. Suppose that each output component is non-negative and also, until further notice, a goodoutput in the sense of ‘the more the better’.

There are three steps in the creation of the two sets (�� CRS and �� DRS below) of feasible performances that Farrell (1957) considered. The first two of these steps create a smaller set, �� VRS, which happens to be the set that was recommended by Spottiswoode (2000) for application to the 43 police forces of England and Wales. The abbreviations CRS, DRS and VRS stand for the econometric concepts of constant, decreasing and variable returns to scale respectively. In the following definitions, the term ‘feasible’ applies both to the observed performances in �� and to the hypothetical performances as they are created.

  • Step 1 : worsening —take any feasible performance and assert the feasibility of performances that are ‘worse’, in the sense that, for the same input, one or more outputs are smaller. This relies on the everyday experience that it is not difficult to produce less for the same money—and no sophisticated econometric concept needs to be invoked. (Restriction to input minimization allows us to dispense with ‘worsening’ by increasing input for the same outputs, which simplifies the later algebra.)

  • Step 2 : mixing —this is much more demanding. It asserts that it is feasible to obtain any mixture of the output vectors of any two feasible performances by combining the inputs in the same proportions, no matter how far apart the performances are in their ( s +1)-dimensional space. (A ‘mixture’, more formally an affine combination , of two vectors y1 and y2 is a vector of the form ay1 +(1− a ) y2 where 0< a <1.)

  • Step 3 : rescaling —this relies on econometric concepts. It invokes either CRS, to assert that by rescaling any feasible performance you can obtain c times as much of every output by using c times the input for any non-negative value of c whether greater or less than 1— or DRS, to make the same claim only when c is restricted to being less than 1.

When the relevant steps (1 and 2 for VRS, and 1, 2 and 3 for DRS or CRS) are made ad libitum in any order, we obtain the solid bodies or comparison sets�� VRS, �� DRS and �� CRS in which the respective technical efficiencies f VRS, f DRS and f CRS can be defined for a generic unit g with performance (xg,yg). These efficiencies are (typically) determined by comparison with a performance that is a hypothetical construct from other units whose individual inputs and output profiles may all be very different from that of the particular unit, especially when the number of outputs is not small compared with the number of units. Farrell (1957) held that dropping rescaling would exclude mixing (and hence VRS) on the grounds that mixing is really a DRS rescaling of two performances followed by their aggregation: without rescaling, mixing may have to be justified as the feasibility of interpolation over relatively long distances.

3. Geometric insights

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Feasible performances
  5. 3. Geometric insights
  6. 4. Supporting algebra
  7. 4.1. Becoming 100% efficient!
  8. 4.2. Bad outputs
  9. 5. Developments in data envelopment analysis
  10. 5.1. Slack
  11. 5.2. Weight restrictions
  12. 5.3. Environmentals
  13. 6. Stochastic frontier analysis
  14. 7. Police forces and performances
  15. 8. Critical features of the data envelopment analysis technique
  16. 8.1. Missing values?
  17. 8.2. Indiscrimination
  18. 8.3. Local priorities
  19. 8.4. Rank difficulties
  20. 8.5. Variable returns to scale
  21. 8.6. Weight restrictions
  22. 9. Prospects
  23. Acknowledgements
  24. Appendix A
  25. References
  26. Discussion on the paper by Stone
  27. References in the discussion

The description of the comparison sets (generically ��) as ‘solid bodies’ alludes to the essentially geometric character of Farrell's (1957) technique. From that geometry comes the concept of the efficiency frontierℱ, which is where any feasible unit would have an efficiency f of 1 and might then be said to be ‘100% efficient’, and which is a frontier in the sense that there is no feasible performance in �� with the same output profile and lower input.

Geometric accounts of the Farrell technique tend to use bland two-dimensional surrogates for what might happen in many dimensions. Although Fig. 1 with s =1 and n =5 is one such representation, it illustrates the preceding concepts and also acts as a precursor of the more realistic deployment of two dimensions in Fig. 4 later. Incidentally, Fig. 1 reveals that if performances (x,0) (‘money for old rope!‘) were taken to be feasible then mixing would make �� VRS indistinguishable from �� DRS.

image

Figure 1. Single-output technical efficiencies fg = AF / Ag and fh = AF / Ah for two of five units, and the associated comparison sets and efficiency frontiers for (a) VRS, (b) DRS and (c) CRS

Download figure to PowerPoint

image

Figure 4. Geometry for CRS, DRS and VRS efficiencies in the quadrant Qg defined by the plane Oxg : the point ɛ is the (typically unique) frontier performance of Qg that is in ℱ CRS , ℱ DRS and ℱ VRS ; the points δ and ζ are the intersections with Qg of ( s −1)-dimensional hyperplanes that are themselves the intersections of s -dimensional facets of the ℱ s indicated; for the g shown, f CRS = f DRS = α β / α g and f VRS = α γ / α g

Download figure to PowerPoint

It is arguable that a realistic assessment of performance of police forces calls for an appreciable number of discriminatory outputs—a value of s less than 30 might underestimate what may be required. For the CRS case, let us firstly try to represent such high dimensionality in three (strictly two!) dimensions. Put the single input x on a vertical axis over a horizontal plane that must stand for the s dimensions of the output profile. Next, imagine each output variable separately plotted against x in an n-point scatterplot on a door-like positive quadrant hanging from the vertical axis (as in some picture archive), and then (impossibly) require the s quadrants to be all set at right angles to each other. Each unit is then to be pictured as one of n Cartesian points in s +1 dimensions—on which the processes of worsening, mixing and rescaling then get to work (in any order) to create the solid body of feasible performance points, �� CRS. This turns out to be a cone—a convex polyhedronwith one vertex at the origin. The extremal rays that define the cone are semi-infinite lines from the origin that lie in the surface of the cone and that are the intersections of pairs of the s-dimensional flat facets that make up that surface. These lines typically number s +1 (one in each quadrant in addition to the x-axis) plus the number of those lines from the origin through the units that are on the efficiency frontier of the cone. (The latter number can be anything from 1 to n, depending on ��, and the corresponding units are the frontier units that achieve an f-value of 1, i.e. a technical efficiency of 100%.) The efficiency frontier ℱ is that part of the surface of the cone that is formed by the facets that do not include the x-axis: it can be thought of as the underbelly of the cone.

Fig. 2 for n =3 and s =2 manages to show two extremal rays, OC and OD, generated by a worsening of the two extremals through the frontier units g and g ′ ′ respectively. If this is already looking complicated, that is a measure of where things may be going when s is of the order of 30.

image

Figure 2. Three-dimensional compromise of an ( s +1)-dimensional intention, showing how f CRS is determined for unit g by the facet OBE created by rescaling and mixing from the frontier units g and g ′ ′ : the feasibility set �� is the polyhedric cone with vertex O and extremal rays OB, OC, OD, OE and the x -axis, where OC and OD are the projections of OB and OE on the planes y (2)=0 and y (1)=0 respectively by worsening; the efficiency frontier ℱ is the underbelly of the cone determined by the three facets OBC, OED and OBE (all planes and lines extend to   ∞  ), and f CRS = AF / Ag

Download figure to PowerPoint

The picture can be marginally simplified, as in Fig. 3, by reducing the (s +1)-dimensional data to the spartial performance indicators(PPIs)

  • image((3.1))

(The ‘partial’ in PPI is there because only part of the input x is conceivably dedicated to each output. Fig. 3 anticipates the need to accommodate bad outputs as in Section 4.2.) Such a representation of the CRS case is widely employed, even though it may conceal widely differing ‘sizes’ of units represented by x. The fact that CRS must be invoked with conviction to support this picture is easily overlooked. The points in Fig. 3 correspond to rays in the unreduced figure, and this must be kept in mind when we are tempted to think of a unit as a mixture of units that appear close to it in the reduced representation.

image

Figure 3. Illustration of the reduced representation with n =42 and s =2, using data from the Home Office for 1998–1999: p (1) is the number of violent crimes ‘detected’ per million pounds input; p (2) is the same for ‘undetected’ violent crimes

Download figure to PowerPoint

A preferable simplification of the (s +1)-dimensional picture is to take the slice that is the two-dimensional quadrant Qg, say, defined by the rays Ox and Og. Fig. 4 does this and can illustrate the relationships of the efficiency frontier's intersections with Qg for each of the three levels of rescaling, CRS, DRS and VRS, and the hypothetical complexity of the associated f-values.

4. Supporting algebra

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Feasible performances
  5. 3. Geometric insights
  6. 4. Supporting algebra
  7. 4.1. Becoming 100% efficient!
  8. 4.2. Bad outputs
  9. 5. Developments in data envelopment analysis
  10. 5.1. Slack
  11. 5.2. Weight restrictions
  12. 5.3. Environmentals
  13. 6. Stochastic frontier analysis
  14. 7. Police forces and performances
  15. 8. Critical features of the data envelopment analysis technique
  16. 8.1. Missing values?
  17. 8.2. Indiscrimination
  18. 8.3. Local priorities
  19. 8.4. Rank difficulties
  20. 8.5. Variable returns to scale
  21. 8.6. Weight restrictions
  22. 9. Prospects
  23. Acknowledgements
  24. Appendix A
  25. References
  26. Discussion on the paper by Stone
  27. References in the discussion

In the following, ageqslant R: gt-or-equal, slanted0 and, until further notice, wgeqslant R: gt-or-equal, slanted0. The subscript ‘ + ‘ in a single quantity implies summation over the subscript; for example a+y+ stands for a1y1 + … + anyn etc.

Lemma 1. Generically

  • image
  • image

where Aa is the condition that a+leqslant R: less-than-or-eq, slant1 for DRS and a+ =1 for VRS, whereas Bu is that u =0 for CRS and ugeqslant R: gt-or-equal, slanted0 for DRS.

Proof. The first equality follows from the definition of ��. For the ⊆, x = a+x+ and yleqslant R: less-than-or-eq, slanta+y+ imply that, for all w and u with Bu,

  • image(□)

Theorem 1. The CRS, DRS and VRS efficiencies f of a generic performance (xg,yg) are the special cases of the dual formulae

  • image((4.1))

Proof.f =min (x,yg)    ∈     �� ( x / xg )=min a ( a + x + / xg | a + y + geqslant R: gt-or-equal, slantedyg with Aa )= xf / xg , say. Suppose that ( x * , yg ) were in �� * but not �� , with x * < xf . Since �� is convex, there would be a supporting hyperplane at ( xf , yg )  ∈   �� , given by x = wyu , with xgeqslant R: gt-or-equal, slantedwyu in �� and x * < wygu , with wgeqslant R: gt-or-equal, slanted0 , u =0 for CRS and ugeqslant R: gt-or-equal, slanted0 for DRS. (If any component of w were negative, worsening of the corresponding output at ( xf , yg ) would contradict the definition of �� . For ( cxf , cyg ) in �� with c ≠1, cxfgeqslant R: gt-or-equal, slantedcwygu , whence u = wygxfgeqslant R: gt-or-equal, slantedcu , and hence u =0 for CRS and ugeqslant R: gt-or-equal, slanted0 for DRS, thereby satisfying Bu .) Then ( wygu )/ x * >1 and ( wyiu )/ xileqslant R: less-than-or-eq, slant1, i =1,…, n , which contradicts the supposition that ( x * , yg ) is in �� * . Hence min{ x :( x , yg )  ∈   �� * } = xf and

  • image

In this minimization, we are free to consider only xleqslant R: less-than-or-eq, slantxg and to add the condition wygugeqslant R: gt-or-equal, slanted0. (When wygu<0,

  • image

and the (w,u) condition on x is automatically satisfied and therefore unrestrictive.) Then

  • image
  • image(□)

The first formula for f in equation (4.1) shows that

  • (a)
    f CRS , f DRS and f VRS are all invariant under a change of scale of input or any output,
  • (b)
    f VRS , but not f CRS or f DRS , is invariant with respect to a change of origin of any output and
  • (c)
    none of the three f s is invariant with respect to a change of origin of x .

4.1. Becoming 100% efficient!

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Feasible performances
  5. 3. Geometric insights
  6. 4. Supporting algebra
  7. 4.1. Becoming 100% efficient!
  8. 4.2. Bad outputs
  9. 5. Developments in data envelopment analysis
  10. 5.1. Slack
  11. 5.2. Weight restrictions
  12. 5.3. Environmentals
  13. 6. Stochastic frontier analysis
  14. 7. Police forces and performances
  15. 8. Critical features of the data envelopment analysis technique
  16. 8.1. Missing values?
  17. 8.2. Indiscrimination
  18. 8.3. Local priorities
  19. 8.4. Rank difficulties
  20. 8.5. Variable returns to scale
  21. 8.6. Weight restrictions
  22. 9. Prospects
  23. Acknowledgements
  24. Appendix A
  25. References
  26. Discussion on the paper by Stone
  27. References in the discussion

If pg(1)= yg(1)/xg were the largest value of p(1) in ��, then g would have f CRS =1. (Put w1 =1,wj =0 for j ≠1 and u =0 in equation (4.1).) If yg(1) were the uniquely largest value of y(1), then g would have both f DRS and f VRS equal to 100%. (Put w1   =  max(xg/yg(1),maxi[(xgxi)/{ yg(1)− yi(1)}]), wj =0 for j ≠1 and u = w1yg(1)− xg in equation (4.1).) If xg is the smallest input in ��, VRS gives 100% without any reference to output! (This follows from the first expression in equation (4.1): a+x+/xggeqslant R: gt-or-equal, slantedmini(xi/xg)=1.)

Theorem 2 (Nunamaker, 1985). f CRS, f DRS and f VRS cannot decrease when a new output is added to �� or when an existing output is disaggregated into two additive component outputs.

Proof. By equation (4.1), f =maxw { fs(w1,…,ws)} where fs(w1,…,ws) is the intermediate maximum with respect to u when w is fixed. If y(s +1) is the new output, fs(w1,…,ws)= fs +1(w1,…,ws,0) and

  • image

The same inequality serves if y(s) is disaggregated as y*(s)+ y*(s +1).        □

Some idea of the enhancement of f CRS-values when s is increased is given by simple probability. If the PPIs p(1),…,p(s) in equation (3.1) were continuous independent random variables in their assignment to units, the expectation of the proportion of units with a maximum value in at least one component of p is 1−(1−1/n)s —which is a lower bound for the expectation of the proportion of units that receive 100%. For n =43 and s =30, this would be 0.51—a majority of the units. The phenomenon rests on a feature of high dimensionality—that most of the volume measure of a finite solid body is relatively close to its boundaries. Are more precise answers attainable for specified bodies and distributions of p, short of tedious simulation? That this would not be easy is suggested by the limited results of Renyi and Sulanke (1963) for the expectation of the number of sides of the convex hull of n points randomly distributed in polygons (s =2) as n [RIGHTWARDS ARROW]   ∞  . Only asymptotics in which s [RIGHTWARDS ARROW]   ∞   also are likely to be relevant. (See Section 8.2 for an entertaining simulation.)

4.2. Bad outputs

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Feasible performances
  5. 3. Geometric insights
  6. 4. Supporting algebra
  7. 4.1. Becoming 100% efficient!
  8. 4.2. Bad outputs
  9. 5. Developments in data envelopment analysis
  10. 5.1. Slack
  11. 5.2. Weight restrictions
  12. 5.3. Environmentals
  13. 6. Stochastic frontier analysis
  14. 7. Police forces and performances
  15. 8. Critical features of the data envelopment analysis technique
  16. 8.1. Missing values?
  17. 8.2. Indiscrimination
  18. 8.3. Local priorities
  19. 8.4. Rank difficulties
  20. 8.5. Variable returns to scale
  21. 8.6. Weight restrictions
  22. 9. Prospects
  23. Acknowledgements
  24. Appendix A
  25. References
  26. Discussion on the paper by Stone
  27. References in the discussion

Outputs that are bad in the sense of ‘more means worse’ are best thought of as unwelcome side-effects or outcomes of inactivities in a unit, although they may still be controllable by changes in a unit's activity profile. The Farrell (1957) technique can accommodate bad outputs by simply changing the direction of the inequality in the definition of worsening. The geometry looks different, but the definition of technical efficiency stands. The only changes in lemma 1 and theorem 1 are reversals of the inequality signs in yleqslant R: less-than-or-eq, slanta+y+ and wgeqslant R: gt-or-equal, slanted0 for the bad outputs. Efficiencies of 100% could then be awarded to a unit having a smallest bad PPI for CRS, or a uniquely smallest bad output for DRS and VRS.

5. Developments in data envelopment analysis

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Feasible performances
  5. 3. Geometric insights
  6. 4. Supporting algebra
  7. 4.1. Becoming 100% efficient!
  8. 4.2. Bad outputs
  9. 5. Developments in data envelopment analysis
  10. 5.1. Slack
  11. 5.2. Weight restrictions
  12. 5.3. Environmentals
  13. 6. Stochastic frontier analysis
  14. 7. Police forces and performances
  15. 8. Critical features of the data envelopment analysis technique
  16. 8.1. Missing values?
  17. 8.2. Indiscrimination
  18. 8.3. Local priorities
  19. 8.4. Rank difficulties
  20. 8.5. Variable returns to scale
  21. 8.6. Weight restrictions
  22. 9. Prospects
  23. Acknowledgements
  24. Appendix A
  25. References
  26. Discussion on the paper by Stone
  27. References in the discussion

The ‘minimum’ formula of equation (4.1) expresses the ‘primal’ linear programming algorithm for Farrell (1957), and the ‘maximum’ formula the corresponding ‘dual’ algorithm. Curiously describing such algorithms as ‘models’, Charnes et al. (1978) used the ‘maximum’ with u =0 as their definition of f for the CRS case, with the motivation that, in addition to looking like a natural index ratio measure of efficiency, it has the advantage (as they saw it) of giving each unit an individually most favourable weighting of outputs. The DEA developments of Farrell's work that stem from Charnes et al. (1978) deserve careful study: Färe et al. (1994) have given an elegant and exhaustive mathematical coverage, based on the ground clearing work of Shephard (1970). The following subsections describe and evaluate particular DEA developments, excluding matters that go beyond the single-input case and input minimization within that case, and ignoring some rather recondite literature.

5.1. Slack

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Feasible performances
  5. 3. Geometric insights
  6. 4. Supporting algebra
  7. 4.1. Becoming 100% efficient!
  8. 4.2. Bad outputs
  9. 5. Developments in data envelopment analysis
  10. 5.1. Slack
  11. 5.2. Weight restrictions
  12. 5.3. Environmentals
  13. 6. Stochastic frontier analysis
  14. 7. Police forces and performances
  15. 8. Critical features of the data envelopment analysis technique
  16. 8.1. Missing values?
  17. 8.2. Indiscrimination
  18. 8.3. Local priorities
  19. 8.4. Rank difficulties
  20. 8.5. Variable returns to scale
  21. 8.6. Weight restrictions
  22. 9. Prospects
  23. Acknowledgements
  24. Appendix A
  25. References
  26. Discussion on the paper by Stone
  27. References in the discussion

The question of slack for unit g arises when the minimization of x for fixed outputs yg meets ℱ at a point γ of a facet whose points are worsenings of feasible performances obtained by mixing or rescaling only. Since the 100% technically efficient performance γ has been created by the gratuitous ‘free disposal’ of the outputs of at least one 100% performance, γ* say, that has had no worsening in its creation, it is natural to prefer γ* to γ as a comparison performance for g. One development of DEA concerns the linear programming machinery to identify such a γ* by removal of the ‘slack’ in γ, and to list the frontier units (‘peers’) whose mixing or rescaling created γ*. When s>2, this removal can typically be done in an infinity of ways, each depending on some relative weights assigned to the s outputs: equal weights are used in Cooper et al. (2000).

Farrell (1957) had no interest in moving from γ to γ * : ‘slack inefficiency’ could only be quantified by introducing the prices that Farrell had eschewed. He should not be criticized for not doing what he did not want to do, by critics using previously rejected arbitrary weights ( Cooper et al. (2000) , page 46).

‘Points at infinity’ were used by Farrell to allow f to be calculated for units giving rise to slack: it is a small matter that linear programming can do this more easily. If there is any technical lacuna in Farrell (1957), it is in not distinguishing between points at infinity for inputs and those for outputs which, as Fig. 2 illustrates with OC and OD, are (typically) not on the output axes.

5.2. Weight restrictions

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Feasible performances
  5. 3. Geometric insights
  6. 4. Supporting algebra
  7. 4.1. Becoming 100% efficient!
  8. 4.2. Bad outputs
  9. 5. Developments in data envelopment analysis
  10. 5.1. Slack
  11. 5.2. Weight restrictions
  12. 5.3. Environmentals
  13. 6. Stochastic frontier analysis
  14. 7. Police forces and performances
  15. 8. Critical features of the data envelopment analysis technique
  16. 8.1. Missing values?
  17. 8.2. Indiscrimination
  18. 8.3. Local priorities
  19. 8.4. Rank difficulties
  20. 8.5. Variable returns to scale
  21. 8.6. Weight restrictions
  22. 9. Prospects
  23. Acknowledgements
  24. Appendix A
  25. References
  26. Discussion on the paper by Stone
  27. References in the discussion

Another development of DEA is an increasing willingness to put prior restrictions on the weights w =(w1,…,ws) in the dual determination of f (Allen et al., 1997; Pedraja-Chaparro et al., 1997; Spottiswoode, 2000)—to prevent a unit that ‘dominates in the production of a relatively unimportant output’ from being assessed as efficient ‘at the expense of a [unit] which is, in fact more efficient at producing a more valuable output’ (Lewis, 1986). Cooper et al. (2000), chapter 6, conceded that there are situations ‘outside the data’ where ‘additional information is available’. Methods that are compatible with linear programming include the use of inequality restrictions such as a<w2/w1<b (Thanassoulis, 1995) that generalize and have been viewed as therefore superior to any method that assigns fixed weights.

5.3. Environmentals

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Feasible performances
  5. 3. Geometric insights
  6. 4. Supporting algebra
  7. 4.1. Becoming 100% efficient!
  8. 4.2. Bad outputs
  9. 5. Developments in data envelopment analysis
  10. 5.1. Slack
  11. 5.2. Weight restrictions
  12. 5.3. Environmentals
  13. 6. Stochastic frontier analysis
  14. 7. Police forces and performances
  15. 8. Critical features of the data envelopment analysis technique
  16. 8.1. Missing values?
  17. 8.2. Indiscrimination
  18. 8.3. Local priorities
  19. 8.4. Rank difficulties
  20. 8.5. Variable returns to scale
  21. 8.6. Weight restrictions
  22. 9. Prospects
  23. Acknowledgements
  24. Appendix A
  25. References
  26. Discussion on the paper by Stone
  27. References in the discussion

If potential environmental influences are represented by a t-dimensional vector of environmentalsz =(z(1),…,z(t)), the enlarged database is now �� = {(xi,yi,zi):i =1,…,n }. Continuously variable non-negative environmentals are either favourable (‘more is better’ for outputs) or unfavourable (‘more is worse’). Banker and Morey (1986) would incorporate their ‘exogenously fixed inputs’ (environmentals under another name) into the VRS case of equation (4.1)by replacing wy by wyuz with u(k)geqslant R: gt-or-equal, slanted0, taking all z(k) to be favourable.

This way of handling environmentals is relatable to the treatment of bad outputs in Section 4.2. As far as the minimization of input is concerned, environmentals have the same status as outputs: the method must estimate the least cost of a feasible performance with the same output and environmental profiles as the generic unit g. Now, �� must be constructed in a space of s + t +1 dimensions, where a worsening of a favourable environmental (to create a less efficient feasible performance) is given by its increase, like a bad output, and therefore must be assigned a non-negative u(k). Likewise, an unfavourable environmental is like a good output and would receive a non-positive u(k). Norman and Stoker (1991) incorporated favourable environmentals into the CRS case of equation (4.1), in which Cooper et al. (2000) then allowed unfavourable environmentals. Neither justified mixing or rescaling for environmental inputs. (For the CRS case, the construction of �� rests on the supposition that the feasibility of (x,y,z) implies that of (cx,cy,cz) for non-negative c.)

6. Stochastic frontier analysis

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Feasible performances
  5. 3. Geometric insights
  6. 4. Supporting algebra
  7. 4.1. Becoming 100% efficient!
  8. 4.2. Bad outputs
  9. 5. Developments in data envelopment analysis
  10. 5.1. Slack
  11. 5.2. Weight restrictions
  12. 5.3. Environmentals
  13. 6. Stochastic frontier analysis
  14. 7. Police forces and performances
  15. 8. Critical features of the data envelopment analysis technique
  16. 8.1. Missing values?
  17. 8.2. Indiscrimination
  18. 8.3. Local priorities
  19. 8.4. Rank difficulties
  20. 8.5. Variable returns to scale
  21. 8.6. Weight restrictions
  22. 9. Prospects
  23. Acknowledgements
  24. Appendix A
  25. References
  26. Discussion on the paper by Stone
  27. References in the discussion

The Farrell–DEA technique treats the data �� in a purely empirical fashion, but SFA has a theoretically imaginative approach that raises the evergreen question of realism. A concept seeded by Farrell (1957) and developed for the single-output case by Aigner et al. (1977), SFA is concerned that uncontrollable variation in output is confounded with inefficiency by deterministic techniques like DEA. The problem it then faces is how to separate these two contributions to the deviation of each unit from the supposed true efficiency frontier. The delicacy and lack of robustness to assumptions of any method of doing this poses a significant challenge to reality—especially when the method is translated to the single-input–multiple-outputs case.

Some SFA practice and literature (e.g. Bauer (1990)) suggests that the single-input case can be treated as the nearly symmetric dual of the well-developed single-output case (Schmidt, 1985). (The formal symmetry would be exact except that inefficiency increases cost.) Neglecting environmentals and any errors in outputs, the generally adopted model has the logarithmic form

  • image((6.1))

where F(y) represents a trues-dimensional efficiency frontier surface. The model must be completed by specifying the joint distribution over units of the random variables u representing inefficiency and v representing uncontrollable variation or error in x. A commonly assumed form of F is the dual Cobb–Douglas function

  • image((6.2))

in which CRS would require b1 + … + bs =1. The LIMDEP software (Greene, 1995) gives v a normal distribution with zero mean and offers the choice of an exponential, half-normal or truncated-at-zero normal distribution for u. The parameters in the distributions of u and v and in equation (6.2) are estimated by maximum likelihood. Conditionally on u + v = e and given y = yg, the theoretical expectation of the distribution of u is a function of e and the parameters. The efficiency of g is then estimated as the negative exponential of the maximum likelihood estimate of this function (Jondrow et al., 1982).

Anyone adopting this approach must

  • (a)
    ignore errors in outputs (which would introduce the identifiability problems that are associated with the inevitable errors in both the ‘dependent’ variable x and the ‘independent’ variables y : Bauer (1990) suggested that model (6.1) is applicable only if the outputs are exogenously determined),
  • (b)
    make an arbitrary choice of the joint distribution ofuandv , on which the method relies heavily, when �� is (typically) insufficiently informative,
  • (c)
    assume a form for F and
  • (d)
    also think about environmentals .

Moreover, the use of a Cobb–Douglas F(y) violates the convexity condition on the feasibility set in s +1 dimensions defined by worsening the frontier surface x = F(y): for s =2, y(2) is a concave function of y(1) for constant x. So, there can be a built-in conflict between the ‘models’ for DEA and SFA. Spottiswoode (2000)recommended SFA as a ‘check’ on DEA, but, since DEA and SFA both aim to define a frontier that fits the same data ��, the to-be-expected correlation between the resulting efficiencies cannot be taken as validation of their logics.

7. Police forces and performances

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Feasible performances
  5. 3. Geometric insights
  6. 4. Supporting algebra
  7. 4.1. Becoming 100% efficient!
  8. 4.2. Bad outputs
  9. 5. Developments in data envelopment analysis
  10. 5.1. Slack
  11. 5.2. Weight restrictions
  12. 5.3. Environmentals
  13. 6. Stochastic frontier analysis
  14. 7. Police forces and performances
  15. 8. Critical features of the data envelopment analysis technique
  16. 8.1. Missing values?
  17. 8.2. Indiscrimination
  18. 8.3. Local priorities
  19. 8.4. Rank difficulties
  20. 8.5. Variable returns to scale
  21. 8.6. Weight restrictions
  22. 9. Prospects
  23. Acknowledgements
  24. Appendix A
  25. References
  26. Discussion on the paper by Stone
  27. References in the discussion

The 43 police forces of England and Wales (Fig. 5) vary greatly in size and character: the large area of Dyfed–Powys is policed by a 25th of the personnel in the Metropolitan (London) police force and has a very different output profile—not just for outputs related to sheep-stealing or high financial fraud.

image

Figure 5. The 43 police force areas of England and Wales: 1, Avon and Somerset; 2, Bedfordshire; 3, Cambridgeshire; 4, Cheshire; 5, City (of London); 6, Cleveland; 7, Cumbria; 8, Derbyshire; 9, Devon and Cornwall; 10, Dorset; 11, Durham; 12, Dyfed–Powys; 13, Essex; 14, Gloucestershire; 15, Greater Manchester; 16, Gwent; 17, Hampshire; 18, Hertfordshire; 19, Humberside; 20, Kent; 21, Lancashire; 22, Leicestershire; 23, Lincolnshire; 24, Merseyside; 25, Metropolitan; 26, Norfolk; 27, Northamptonshire; 28, Northumbria; 29, North Wales; 30, North Yorkshire; 31, Nottinghamshire; 32, South Wales; 33, South Yorkshire; 34, Staffordshire; 35, Suffolk; 36, Surrey; 37, Sussex; 38, Thames Valley; 39, Warwickshire; 40, West Mercia; 41, West Midlands; 42, West Yorkshire; 43, Wiltshire

Download figure to PowerPoint

The following inputs, outputs and environmentals have been considered as ingredients of performance indicators (PIs) (the outputs and first-listed environmentals are those selected by Spottiswoode (2000)from a much wider field):

  • (a)
    inputs —staff costs; operating costs; consumption of capital costs;
  • (b)
    outputs —recorded crimes; percentage of recorded crime detected; domestic burglaries; violent crimes; theft of and from motor vehicles; number of offenders dealt with for supplying class A drugs; public disorder incidents; road traffic collisions involving death or serious injury; level of crime; fear of crime; feelings of public safety;
  • (c)
    environmentals —number of young men; stock of goods available to be stolen; changes in consumer expenditure.

The first two of the inputs are readily combined as ‘net revenue expenditure’. If this can be combined with the third to give a reasonably accurate measure of the total cost, we shall have our postulated single input. Most of the outputs appear, with many more, in the listing and discussion of ‘best value performance indicators’ in Department of the Environment, Transport and the Regions (1999). The three environmentals were given as examples in Spottiswoode (2000). The complexity of the environmentals problem is indicated by the fact that the following ones have already been used in the police funding formula (Association of Police Authorities, 1999) to determine the money that is thought to be appropriate for police forces with different environmentals

  • (d)
    police funding formula environmentals —resident population; daytime population; population in terraced housing; population in class A residential neighbourhoods; ‘striving’ areas; population in one-parent families; households with only one adult; households in rented accommodation; population at a density of more than one per room; population density; sparsity of population; length of built-up roads; length of motorways.

The last two decades have seen a phenomenal expansion in the deployment of batteries of PIs in widely varying public services in many developed countries. Arguments in favour of using PIs have been energetically presented in Osborne and Gaebler (1992), on the basis of American experience. Smith (1993), extending his earlier study (Smith, 1990), developed a more sceptical stance in his study of the maternity services of the British National Health Service. The dysfunctionalities, that are actually or potentially associated with PIs, to which Smith drew attention are associated with the necessarily multidimensional output or outcome profile of public service, leading to possibilities for ‘tunnel vision’, ‘suboptimization’, ‘myopia’, ‘convergence’, ‘gaming’, ‘ossification’ and ‘misrepresentation’ —practices that have less survival value in the private sector where profit provides the dominant one-dimensional PI.

Her Majesty's Treasury is currently looking for some single measure of efficiency to help in a revision of the present police funding formula (Association of Police Authorities, 1999) for dividing the funding cake—since that formula does not involve any outputs! Smith (1993)boldly asserted that ‘no such aggregation is possible in the public sector’ —but, in the police force context, has there yet been any serious attempt to do so? Can we be sure that there is no reasonable proxy for the holy grail, without a search for something that avoids the pitfalls of the DEA–SFA techniques? Even a less ambitious reduction to a handful of thoughtfully aggregated efficiency measures, in conjunction with a few quality assurance indices, might significantly lighten the current darkness. Opinions on this question will be as divided as those to be found in the discussion of Cox et al. (1992), in relation to whether ‘weights’ on different outcomes of clinical intervention could be used to give a single output measure, such as ‘quality-adjusted life years’, that health economists could employ in resource allocation. Hibbert, warned that

‘Those seeking to describe complex phenomena, or to take decisions based on them, will inevitably be drawn towards summary measures with an apparently scientific basis …‘,

but Torrance held that the appropriateness of such measures for resource allocation is an empirical question of whether they ‘provide useful information to the responsible decision maker’. The question is touched on by Cook and Farewell (1996) and Senn (1989) who opined that mathematical technique alone can never ‘make unnecessary the search for clinically relevant [aggregate] measures’.

8.1. Missing values?

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Feasible performances
  5. 3. Geometric insights
  6. 4. Supporting algebra
  7. 4.1. Becoming 100% efficient!
  8. 4.2. Bad outputs
  9. 5. Developments in data envelopment analysis
  10. 5.1. Slack
  11. 5.2. Weight restrictions
  12. 5.3. Environmentals
  13. 6. Stochastic frontier analysis
  14. 7. Police forces and performances
  15. 8. Critical features of the data envelopment analysis technique
  16. 8.1. Missing values?
  17. 8.2. Indiscrimination
  18. 8.3. Local priorities
  19. 8.4. Rank difficulties
  20. 8.5. Variable returns to scale
  21. 8.6. Weight restrictions
  22. 9. Prospects
  23. Acknowledgements
  24. Appendix A
  25. References
  26. Discussion on the paper by Stone
  27. References in the discussion

The DEA technique derives its putative efficiencies by procedures that do not require value judgments on different outputs or outcomes. Cooper et al. (2000) saw merit in this:

‘In addition to avoiding a need for a priorichoices of weights, DEA does not require specifying the form of the relation between inputs and outputs in, perhaps, an arbitrary manner and, even more important, it does not require these relations to be the same for each [unit]‘.

Can there really be self-defining efficiency measures—functions of the database alone and deter- minable without reference to context—derivable by an almost mechanical technique (DEA) or by a statistically inspired method (SFA), neither of which is significantly or transparently influenced by a judgment of the relative values of various disparate outputs or outcomes? For those who would answer ‘no’ to this question, Appendix A offers a tentative formulation of a value-based analysis (VBA).

8.2. Indiscrimination

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Feasible performances
  5. 3. Geometric insights
  6. 4. Supporting algebra
  7. 4.1. Becoming 100% efficient!
  8. 4.2. Bad outputs
  9. 5. Developments in data envelopment analysis
  10. 5.1. Slack
  11. 5.2. Weight restrictions
  12. 5.3. Environmentals
  13. 6. Stochastic frontier analysis
  14. 7. Police forces and performances
  15. 8. Critical features of the data envelopment analysis technique
  16. 8.1. Missing values?
  17. 8.2. Indiscrimination
  18. 8.3. Local priorities
  19. 8.4. Rank difficulties
  20. 8.5. Variable returns to scale
  21. 8.6. Weight restrictions
  22. 9. Prospects
  23. Acknowledgements
  24. Appendix A
  25. References
  26. Discussion on the paper by Stone
  27. References in the discussion

Nunamaker's theorem (Section 4.1) gives a theoretical underpinning to the experience of DEA practitioners expressed by Thanassoulis et al. (1987) that

‘ … the larger the number of inputs and outputs in relation to the number of units being assessed, the less discriminatory the method appears to be.… Thus the number of inputs and outputs included in a DEA assessment should be as small as possible, subject to their reflecting adequately the function performed by the units being assessed.‘

In a similar vein, Spottiswoode (2000), page 19, coupled the recommendation of a ‘small set’ of outcome measures with the palliative suggestion (page 17) that, for example, the inclusion of fraud statistics in a ‘level of overall crime’ measure would suffice to ‘capture’ fraud. Whatever that means, the requirement of an adequate ‘reflection’ of some function or activity must be more demanding than its implicit representation within some output or other: the omission of any output or outcome dependent on some irksome activity could be dysfunctionally exploited by police forces in their allocation of resources, if that activity were not explicitly acknowledged.

The seriousness of the problem can be exposed by almost any simulation. Take 43 units made up of 21 ‘goats’ and 22 ‘sheep’, each producing 30 PPIs {p(j)} independently at random. Give all units a uniform(0, 100) distribution on each of p(1),…,p(15), and give the sheep a uniform(50, 100) distribution on each of p(16),…,p(30)—but the goats only zero. The average of the 30 PPIs (giving them equal weight!) decisively separates the sheep from the goats (Student's t =28). The DEAP software of Coelli (1996) awards all units f-values of 100%, except one goat whose f CRS of 94% was boosted to 98% by VRS when inputs were taken to match those of the 43 police forces of England and Wales.

8.3. Local priorities

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Feasible performances
  5. 3. Geometric insights
  6. 4. Supporting algebra
  7. 4.1. Becoming 100% efficient!
  8. 4.2. Bad outputs
  9. 5. Developments in data envelopment analysis
  10. 5.1. Slack
  11. 5.2. Weight restrictions
  12. 5.3. Environmentals
  13. 6. Stochastic frontier analysis
  14. 7. Police forces and performances
  15. 8. Critical features of the data envelopment analysis technique
  16. 8.1. Missing values?
  17. 8.2. Indiscrimination
  18. 8.3. Local priorities
  19. 8.4. Rank difficulties
  20. 8.5. Variable returns to scale
  21. 8.6. Weight restrictions
  22. 9. Prospects
  23. Acknowledgements
  24. Appendix A
  25. References
  26. Discussion on the paper by Stone
  27. References in the discussion

The strongest argument for DEA–SFA techniques may be as follows. Ignore any problem of small numbers of units. If units were thick on the ground, the frontier ℱ may represent truly efficient units driven to apportion their activities so that they obtain the particular output-profile shape that responds to locally dictated priorities not to be found in ��. If the output profile shape of any unit were equally driven by considerations of local priority, it would be reasonable to measure its efficiency by Farrell's (1957)f —comparing its cost with a feasible unit on the frontier that, by assumption, would be close in performance to the actual units of which it is a mixture. The question of allocative efficiency would then be excluded by the strong assumption of dictating local priorities.

It is arguable that these two assumptions would not hold for the police force case with 43 units and 30 or more outputs, where the output profile shape of any inefficient force is unlikely to be determined by local priorities alone. The prioritization case for DEA might then be seen as a cover for anti-judgmental permissiveness, and Farrell's f as a ‘ post hoc definition of allocative efficiency’ (Watkins, 2000). What the DEA technique may be usefully doing is positioning the performance of a unit relative to a frontier that may or may not reflect what can be most efficiently achieved, for which objective Paul Hewson (private communication) has likened the technique to projection pursuit.

8.4. Rank difficulties

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Feasible performances
  5. 3. Geometric insights
  6. 4. Supporting algebra
  7. 4.1. Becoming 100% efficient!
  8. 4.2. Bad outputs
  9. 5. Developments in data envelopment analysis
  10. 5.1. Slack
  11. 5.2. Weight restrictions
  12. 5.3. Environmentals
  13. 6. Stochastic frontier analysis
  14. 7. Police forces and performances
  15. 8. Critical features of the data envelopment analysis technique
  16. 8.1. Missing values?
  17. 8.2. Indiscrimination
  18. 8.3. Local priorities
  19. 8.4. Rank difficulties
  20. 8.5. Variable returns to scale
  21. 8.6. Weight restrictions
  22. 9. Prospects
  23. Acknowledgements
  24. Appendix A
  25. References
  26. Discussion on the paper by Stone
  27. References in the discussion

In favour of their adoption of the CRS case of the maximum formula of equation (4.1) as an efficiency measure for the case of ‘valued’ outputs of unascertainable value or ‘weight’, Charnes et al. (1978) noted that it is an upper bound — ‘no other set of common weights will give a more favorable rating’. What does this upper bound justification imply for any practice that ranks the upper bounds, if it is held that there are some true but unascertained weights? To avoid confusion with the common mathematical usage of the term ‘value’, the Italian valuta (plural valuti) will be used for a true weight: valuta vj will be the (possibly negative) value per unit volume of the jth output. The aggregate value indexV for unit i is then

  • image((8.1))

and the associated value efficiencyφ for unit g may be defined when maxi(Vi)>0 as

  • image((8.2))

As an upper bound, f CRS =maxwgeqslant R: gt-or-equal, slanted0 { φg(w)} where φg(w)= defVg(w)/maxi { Vi(w)}. In terms of V, does 10% for unit g indicate a worse performance than 100% for unit g ‘? For g, we know only that Vgleqslant R: less-than-or-eq, slantVg*/10 where g* is a ‘peer’ for g with

  • image((8.3))

where w(g) maximizes φg(w). If g were also a peer for g, equation (8.3) would be relationally informative only if { wi(g)} were thought to be reasonable surrogates for the true but unascertained valuti. Worse, if g were not a peer for any unit, its 100% could be justifiably described as self-awarded: like a pupil who comes top in an examination in which he or she is the only candidate.

8.5. Variable returns to scale

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Feasible performances
  5. 3. Geometric insights
  6. 4. Supporting algebra
  7. 4.1. Becoming 100% efficient!
  8. 4.2. Bad outputs
  9. 5. Developments in data envelopment analysis
  10. 5.1. Slack
  11. 5.2. Weight restrictions
  12. 5.3. Environmentals
  13. 6. Stochastic frontier analysis
  14. 7. Police forces and performances
  15. 8. Critical features of the data envelopment analysis technique
  16. 8.1. Missing values?
  17. 8.2. Indiscrimination
  18. 8.3. Local priorities
  19. 8.4. Rank difficulties
  20. 8.5. Variable returns to scale
  21. 8.6. Weight restrictions
  22. 9. Prospects
  23. Acknowledgements
  24. Appendix A
  25. References
  26. Discussion on the paper by Stone
  27. References in the discussion

Based as it is on steps 1 and 2 of Section 2, VRS has no obvious connection with any returns-to-scale concept. The connection was invented by Banker et al. (1984) who set aside the index ratio approach of Charnes et al. (1978) and considered the econometric concepts underlying f CRS. They rejected the rescaling that is associated with CRS and DRS, and retained only mixing and worsening. From this, they derived a characterization of f VRS that was equivalent to the ‘maximum’ of Section 4. Their justification for rejecting rescaling was that the associated extrapolation outside the convex hull determined by the n unit performances would violate Farrell's own maxim that ‘it is far better to compare performances with the best actually achieved than with some unattainable ideal’.

For Banker et al. (1984) the efficiency frontier of interest is ℱ VRS, and the variable u, determined by equation (4.1), provides useful econometric information about the orientation of the efficiency frontier: specifically whether at the efficient performance γ, say, that defines f VRS for a generic unit g, the frontier itself manifests DRS with u>0, increasing returns to scale with u<0, or (exceptionally) CRS with u =0. (One of the figures of Banker et al. (1984) corresponds to our Fig. 4 in which u<0 for the g shown, but where u would be positive if g had been above the section ɛζ.) Scale efficiencyS is defined as f CRS/f VRS. (In Fig. 4, f VRS = αγ/αg and f CRS = αβ/αg so S = αβ/αγ.) Farrell's (1957) CRS efficiency is then trivially expressible as a product of ‘scale efficiency’ and VRS efficiency, an idea that has been theoretically elaborated by Appa and Yue (1999).

The only DEA ‘model’ in Spottiswoode et al. (2000) is the one that gives VRS efficiencies. The general question of whether or not it is ‘quite simple to allow for diseconomies of scale’ (Farrell (1957), section 2.4), as reflected in the orientation of the efficiency frontier ℱ VRS, is subservient to the question of whether we should do so or not. It is far from obvious that, without extra information, any definition of efficiency should be changed to accommodate apparent decreasing or increasing returns to scale on the frontier. When the size of units is under administrative control, such features of the data might be seen as consequences of inefficiency—in large units when the frontier shows evidence of pronounced DRS; in small units when it shows increasing returns to scale.

8.6. Weight restrictions

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Feasible performances
  5. 3. Geometric insights
  6. 4. Supporting algebra
  7. 4.1. Becoming 100% efficient!
  8. 4.2. Bad outputs
  9. 5. Developments in data envelopment analysis
  10. 5.1. Slack
  11. 5.2. Weight restrictions
  12. 5.3. Environmentals
  13. 6. Stochastic frontier analysis
  14. 7. Police forces and performances
  15. 8. Critical features of the data envelopment analysis technique
  16. 8.1. Missing values?
  17. 8.2. Indiscrimination
  18. 8.3. Local priorities
  19. 8.4. Rank difficulties
  20. 8.5. Variable returns to scale
  21. 8.6. Weight restrictions
  22. 9. Prospects
  23. Acknowledgements
  24. Appendix A
  25. References
  26. Discussion on the paper by Stone
  27. References in the discussion

Value judgments can confine w in equation (4.1) to a convex polyhedral cone, thereby generalizing any use of a single ray through the origin (equivalent to fixed weights or, with CRS, to fixed valuti in equation (8.1)). Whether a generalization is useful or not depends on how it is exploited. Proponents of DEA claim that using the automatic DEA machinery to pick weights from the cone allows each unit to have its ‘local priorities’ acknowledged. However, there is a crucial difference between an externally validated adjustment for explicitly documented environmentals in a standardized efficiency measure and a covert unit-based allowance for typically undocumented environmentals corresponding to local priorities. Appendix A suggests that the appropriate generalization is a straightforward sensitivity analysis of a common choice of valuti in equation (8.1) in which sensitivity may be a desideratum rather than an embarrassment. The two approaches would be reconcilable if the proponents of DEA made their restrictions sufficiently severe!

9. Prospects

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Feasible performances
  5. 3. Geometric insights
  6. 4. Supporting algebra
  7. 4.1. Becoming 100% efficient!
  8. 4.2. Bad outputs
  9. 5. Developments in data envelopment analysis
  10. 5.1. Slack
  11. 5.2. Weight restrictions
  12. 5.3. Environmentals
  13. 6. Stochastic frontier analysis
  14. 7. Police forces and performances
  15. 8. Critical features of the data envelopment analysis technique
  16. 8.1. Missing values?
  17. 8.2. Indiscrimination
  18. 8.3. Local priorities
  19. 8.4. Rank difficulties
  20. 8.5. Variable returns to scale
  21. 8.6. Weight restrictions
  22. 9. Prospects
  23. Acknowledgements
  24. Appendix A
  25. References
  26. Discussion on the paper by Stone
  27. References in the discussion

What are the prospects for a VBA of the police force efficiency problem using the value index V with a constant set of valuti across all forces?: not ‘prospects’ in the sense of a successful oil strike, but of successful completion of the hard groundwork on which logically and sociologically acceptable efficiencies can be constructed. When it was announced that US President Coolidge had just died, some wit asked ‘How can they tell?‘. The same question will be more difficult to answer if and when some spokesman tells us that such-and-such method has successfully produced a string of percentages that can be used to rank the efficiency of police forces. ‘Success’ will mean that the method has been studied closely by all interested parties, that it has survived an analysis of its internal logic by the breakdown of any technical complexity into comprehensible components and that any value judgments on which the method relies have been established by democratic ‘consultation’.

At the time of writing, the DEA–SFA approach has not been successful in this sense. It is neither too early nor too late to throw VBA into the ring for a competitive consideration, which is what Appendix A does.

Acknowledgements

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Feasible performances
  5. 3. Geometric insights
  6. 4. Supporting algebra
  7. 4.1. Becoming 100% efficient!
  8. 4.2. Bad outputs
  9. 5. Developments in data envelopment analysis
  10. 5.1. Slack
  11. 5.2. Weight restrictions
  12. 5.3. Environmentals
  13. 6. Stochastic frontier analysis
  14. 7. Police forces and performances
  15. 8. Critical features of the data envelopment analysis technique
  16. 8.1. Missing values?
  17. 8.2. Indiscrimination
  18. 8.3. Local priorities
  19. 8.4. Rank difficulties
  20. 8.5. Variable returns to scale
  21. 8.6. Weight restrictions
  22. 9. Prospects
  23. Acknowledgements
  24. Appendix A
  25. References
  26. Discussion on the paper by Stone
  27. References in the discussion

The following individuals and organizations have helped either knowingly or unwittingly: Juanita Roche (ex Her Majesty's Treasury) for initial impetus; RAS for continuing motivation; staff of the Home Office Research, Development and Statistics: Economics and Resource Analysis Unit for healthy argument and generous access to literature; Ina Dau and Richard Chandler for divine help with ‘emacs’ and LATEX; Vern Farewell and referees for valuable comments.

Appendix A

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Feasible performances
  5. 3. Geometric insights
  6. 4. Supporting algebra
  7. 4.1. Becoming 100% efficient!
  8. 4.2. Bad outputs
  9. 5. Developments in data envelopment analysis
  10. 5.1. Slack
  11. 5.2. Weight restrictions
  12. 5.3. Environmentals
  13. 6. Stochastic frontier analysis
  14. 7. Police forces and performances
  15. 8. Critical features of the data envelopment analysis technique
  16. 8.1. Missing values?
  17. 8.2. Indiscrimination
  18. 8.3. Local priorities
  19. 8.4. Rank difficulties
  20. 8.5. Variable returns to scale
  21. 8.6. Weight restrictions
  22. 9. Prospects
  23. Acknowledgements
  24. Appendix A
  25. References
  26. Discussion on the paper by Stone
  27. References in the discussion

Cubbin and Tzanidakis (1998) would dismiss VBA as ‘simple ratio analysis’ in favour of ‘sophisticated mathematical and statistical modelling’. Spottiswoode (2000) also rejected the apparent simplicity of V , preferring DEA with weights ‘reflecting local circumstances and local police plans’ and, it seems, the non-linearity that is in ℱ VRS but not ℱ CRS . In fact, the simplicity of V is only skin deep. Its historical precedents in welfare economics appear to have been blighted by technical problems in the marriage of the science of societal values with the aspiringly more difficult science of economics. The reluctance to engage in the thorny issue of multiperson preferences may have been reinforced by the famous ‘impossibility theorem’ of Arrow (1951) —the apparently unattractive finding that, roughly speaking, dictatorship was the only option for social choice satisfying some simple axioms. However, one form of dictatorship is the benevolent exercise of political will and judgment by a democratically elected government. The idea that some sort of exogenous valuation is necessary to resolve matters surfaces embarrassingly throughout the DEA literature, e.g. in the restricted weights idea of Section 5.2.

There would be at least six interactive steps in any VBA:

  • (a)
    the collection and screening of potentially relevant data;
  • (b)
    the specification of input cost and choice of outputs;
  • (c)
    fixing the valuti;
  • (d)
    the calculation of undoctored value indices V ;
  • (e)
    the stratification and/or adjustment of V for environmentals;
  • (f)
    a sensitivity analysis.

Fixing the valuti would be particularly difficult, but there is a vast literature dealing with social welfare functions (Ray, 1984; Mishan, 1988; Mitchell and Carson, 1989) that may be found to be helpful. The following stages of a possible VBA are offered only as a starting-point for discussion.

  • Stage 1 : establish societal weights Sj , j =1,…, s , using a good output y ( p ) as a pivotal for which Sp is set equal to 100.

  • Stage 2 : obtain from each unit its own best estimate of how its own input cost, generically x , should be notionally divided as x = x [1]+ … + x [ s ] to represent the internal costs of generating the output volumes.

  • Stage 3 : for good outputs only, calculate common cost weightsCj as the medians, or more generally as ‘trimmed means’ ( Mosteller and Tukey, 1977 ), of the n ratios yi ( j )/ xi [ j ], scaled so that Cp =100. For bad outputs, set Cj =0.

  • Stage 4 : for good outputs only, calculate vj ( W )= WSj +(1− W ) Cj , where W is a negotiable weight in (0, 1). For bad outputs, set vj ( W )= Sj .

  • Stage 5 : use { vj ( W )} to calculate the value indices Vi , i =1,…, n , for a range of values of W , and adjust them for environmentals by transparent techniques for each W .

  • Stage 6 : calculate the associated value efficiencies φ i , i =1,…, n (Section 8.4), and make a sensitivity analysis of the choice of the societal weights in stage 1, using intervals of uncertainty centred on that choice.

  • Stage 7 : negotiate the value of W with unit managers.

Remark 1. The weights Sj will be negative for bad outputs.

Remark 2. For bad outputs, the corresponding x[j]-values will be 0! An outcome whose reduction is both valued and associated with unit activity, and hence input cost, can have that reduction counted as a good output. An estimation of input costs will not be easy when the same internal costable activity serves to generate more than one output.

Remark 3. The proposal incorporates the view that, to retain the goodwill of workers in the n units being assessed, it would be unreasonable for purely societal valuti v to be imposed that took little or no account of the relative internal costs of generating different outputs. However, weights that simply reflected estimates of internal unit costs would not give an incentive for units to meet the ‘market pressures’ that are represented by the societal values.

Remark 4. A problem for stage 5 is the possible confounding of efficiencies with environmentals. Consider the (s + t +1)-dimensional space of input, outputs and environmentals, and imagine that there is no shortage of units—that n is effectively infinite. It would then be possible to use the environmentals z to define thin strata, and to consider within-individual strata and between-stratavariations. Within a stratum, there is no question of a correlation between efficiency and environmentals since the latter are essentially constant, and any measure of efficiency will be determined by the variations in (x,y) for the fixed z. Between strata, the variation in the stratum means of (x,y) or their logarithmic transforms must reflect either a necessary ‘technical’ dependence on environmentals at a constant efficiency level or some relationship between efficiency and environmentals. Such insight may clarify logic, but it does not resolve the problem of how to do the calculations when there are only 43 units spread over at least 40 dimensions (with t =9, 40 is the barely realistic minimum). There are two general ideas that may help out: unit subdivision and cross-validation. Farrell's (1957) units were states of the USA, for which the data may have been available at the much smaller county level. Our police forces occupy, on average, a 43rd of the area of England and Wales, but data can be recorded at the level of the smaller basic command unit. Variations between basic command units within forces may be more informative about the technical dependence of performance on environmentals than on efficiency, given that efficiency may be fairly homogeneous in the centrally organized body that is a police force. For fitting the equations, the technique of cross-validatory choice (Stone, 1974) of an adjustment formula might be used as a model-free device for controlling overadjustment.

Remark 5. The posterior adjustment of V by z makes statistical sense because of the intrinsic character of V: each unit has its own z to be considered as influential for that unit alone. The same cannot be said for f, whose interactive character allows a unit's f to be, perhaps strongly, influenced by the environmentals of other units.

References

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Feasible performances
  5. 3. Geometric insights
  6. 4. Supporting algebra
  7. 4.1. Becoming 100% efficient!
  8. 4.2. Bad outputs
  9. 5. Developments in data envelopment analysis
  10. 5.1. Slack
  11. 5.2. Weight restrictions
  12. 5.3. Environmentals
  13. 6. Stochastic frontier analysis
  14. 7. Police forces and performances
  15. 8. Critical features of the data envelopment analysis technique
  16. 8.1. Missing values?
  17. 8.2. Indiscrimination
  18. 8.3. Local priorities
  19. 8.4. Rank difficulties
  20. 8.5. Variable returns to scale
  21. 8.6. Weight restrictions
  22. 9. Prospects
  23. Acknowledgements
  24. Appendix A
  25. References
  26. Discussion on the paper by Stone
  27. References in the discussion
  • Aigner, D., Lovell, C. A. K. and Schmidt, P. (1977) Formulation and estimation of stochastic frontier production function models. J. Econometr., 6, 2137.
  • Allen, R., Athanassopoulos, A., Dyson, R. G. and Thanassoulis, E. (1997) Weights restrictions and value judgements in data envelopment analysis: evolution, development and future directions. Ann. Oper. Res., 73, 1334.
  • Appa, G. and Yue, M. (1999) On setting scale efficient targets in DEA. J. Oper. Res. Soc., 50, 6069.
  • Arrow, K. J. (1951) Social Choice and Individual Values. New York, Wiley.
  • Association of Police Authorities (1999) Pounding the Beat: a Guide to Police Finance in England and Wales. London: Association of Police Authorities.
  • Banker, R. D., Charnes, A. and Cooper, W. W. (1984) Some models for estimating technical and scale inefficiencies in Data Envelopment Analysis. Mangmnt Sci., 30, 10781092.
  • Banker, R. D. and Morey, R. C. (1986) Efficiency analysis for exogenously fixed inputs and outputs. Oper. Res., 34, 513521.
  • Bauer, P. W. (1990) Recent developments in the econometric estimation of frontiers. J. Econometr., 46, 3956.
  • Charnes, A., Cooper, W. W. and Rhodes, E. (1978) Measuring the efficiency of decision making units. Eur. J. Oper. Res., 2, 429444.
  • Coelli, T. (1996) A guide to DEAP version 2.1: a data envelopment analysis (computer) program. Working Paper 96/08. Department of Econometrics, University of New England. (Available from www.une.edu. au/econometrics/cepa.htm.)
  • Cook, R. J. and Farewell, V. T. (1996) Multiplicity considerations in the design and analysis of clinical trials. J. R. Statist. Soc. A., 159, 93110.
  • Cooper, W. W., Seiford, L. M. and Tone, K. (2000) Data Envelopment Analysis: a Comprehensive Text with Models, Applications, and References. Boston: Kluwer.
  • Cox, D. R., Fitzpatrick, R., Fletcher, A. E., Gore, S. M., Spiegelhalter, D. J. and Jones, D. R. (1992) Quality-of-life assessment: can we keep it simple (with discussion)? J. R. Statist. Soc. A, 155, 353393.
  • Cubbin, J. and Tzanidakis, G. (1998) Regression versus data envelopment analysis for efficiency measurement: an application to the England and Wales regulated water industry. Util. Poly, 7, 7585.
  • Debreu, G. (1951) The coefficient of resource utilization. Econometrica, 19, 273292.
  • Department of the Environment, Transport and the Regions (1999) Performance indicators for 2000/2001. Department of the Environment, Transport and the Regions, London. (Available from www.localregions.detr.gov.uk/bestvalue/bvindex.htm.)
  • Färe, R., Grosskopf, S. and Lovell, C. A. K. (1994) Production Frontiers. Cambridge: Cambridge University Press.
  • Farrell, M. J. (1957) The measurement of productive efficiency (with discussion). J. R. Statist. Soc. A, 120, 253290.
  • Foucault, M. (1973) The Order of Things: an Archaeology of the Human Sciences. New York: Vintage.
  • Greene, W. H. (1995) LIMDEP7 Reference Manual. New York: Econometric Software.
  • Jondrow, J., Lovell, C. A. K., Materov, I. S. and Schmidt, P. (1982) On the estimation of technical efficiency in the stochastic frontier production function model. J. Econometr., 19, 233238.
  • Koopmans, T. C. (ed.) (1951) Activity Analysis of Production and Allocation. New York: Wiley.
  • Lewis, S. (1986) Measuring output and performance: Data Envelopment Analysis. Report. Her Majesty's Treasury's Public Expenditure Survey Committee, Development Sub-Committee, London.
  • Mishan, E. J. (1988) Cost–Benefit Analysis: an Informal Introduction. London: Routledge.
  • Mitchell, R. C. and Carson, R. T. (1989) Using Surveys to Value Public Goods: the Contingent Valuation Method. Washington: Resources for the Future.
  • Mosteller, F. and Tukey, J. W. (1977) Data Analysis and Regression. Reading: Addison-Wesley.
  • Norman, M. and Stoker, B. (1991) Data Envelopment Analysis: the Assessment of Performance. Chichester: Wiley.
  • Nunamaker, T. R. (1985) Using Data Envelopment Analysis to measure the efficiency of non-profit organizations: a critical evaluation. Mang. Decsn Econ., 6, 5058.
  • Osborne, D. and Gaebler, T. (1992) Reinventing Government. Reading: Addison-Wesley.
  • Pedraja-Chaparro, F., Salinas-Jimenez, J. and Smith, P. (1997) On the role of weight restrictions in data envelopment analysis. J. Product. Anal., 8, 215230.
  • Ray, A. (1984) Cost–Benefit Analysis: Issues and Methodologies. Baltimore: Johns Hopkins University Press.
  • Renyi, A. and Sulanke, R. (1963) Uber die konvexe Hulle von n zufallige gewahlten Punkten. Z. Wahrsch. Ver. Geb., 2, 7584.
  • Schmidt, P. (1985) Frontier production functions. Econometr. Rev., 4, 289355.
  • Senn, S. (1989) Combining outcome measures: statistical power is irrelevant. Biometrics, 45, 10271028.
  • Shephard, R. W. (1970) The Theory of Cost and Production Functions. Princeton: Princeton University Press.
  • Smith, P. (1990) The use of performance indicators in the public sector. J. R. Statist. Soc.A. ,153, 5372.
  • — (1993) Outcome-related performance indicators and organizational control in the public sector. Br. J. Mangmnt., 4, 135151.
  • Spottiswoode, C. (2000) Improving police performance: a new approach to measuring police efficiency. Report. Public Services Productivity Panel, Her Majesty's Treasury, London. (Available from www.hmtreasury.gov.uk/pspp/studies.html.)
  • Stone, M. (1974) Cross-validatory choice and assessment of statistical predictions (with discussion). J R Statist Soc B, 36, 111147.
  • Thanassoulis, E. (1995) Assessing police forces in England and Wales using Data Development Analysis. Eur. J. Oper. Res., 87, 641657.
  • Thanassoulis, E., Dyson, R. G. and Foster, M. J. (1987) Relative efficiency assessments using Data Envelopment Analysis: an application to data on rates departments. J. Oper. Res. Soc., 38, 397411.
  • Watkins, C. J. C. H. (2000) Report of the July 2000 meeting of the Official Statistics Section of the Royal Statistical Society. RSS News, 28, no. 1, 19.

Discussion on the paper by Stone

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Feasible performances
  5. 3. Geometric insights
  6. 4. Supporting algebra
  7. 4.1. Becoming 100% efficient!
  8. 4.2. Bad outputs
  9. 5. Developments in data envelopment analysis
  10. 5.1. Slack
  11. 5.2. Weight restrictions
  12. 5.3. Environmentals
  13. 6. Stochastic frontier analysis
  14. 7. Police forces and performances
  15. 8. Critical features of the data envelopment analysis technique
  16. 8.1. Missing values?
  17. 8.2. Indiscrimination
  18. 8.3. Local priorities
  19. 8.4. Rank difficulties
  20. 8.5. Variable returns to scale
  21. 8.6. Weight restrictions
  22. 9. Prospects
  23. Acknowledgements
  24. Appendix A
  25. References
  26. Discussion on the paper by Stone
  27. References in the discussion

Peter C. Smith (University of York)

It is an honour to propose the vote of thanks for Professor Stone's paper. The problem that it addresses can be summarized simply. Using the paper's notation, it is to indicate a measure of efficiency by constructing an index V where, for observation i,

  • image

if necessary taking into account the environmental circumstances in which the unit of observation must operate.

The paper rightly alludes to vexed questions concerning the legitimacy and the advisability of constructing such an index, and the intellectual rationale underlying the endeavour. However, once the decision to construct the index has been taken, two key issues must be addressed. Which measures of output y should be chosen? And what weights v should be used? The productivity analysis research industry seeks to provide technical solutions to these problems. However, I wish to argue that these questions are essentially political rather than technical issues.

Spottiswoode's (2000) report, which motivated Professor Stone's paper, emanated from the influential Public Services Productivity Panel and was warmly endorsed by the Chief Secretary to the Treasury. It recommended two specific technical solutions to calculating V : data envelopment analysis (DEA), which allows the weights v to vary freely between units, and stochastic frontier analysis (SFA), which—for each output—uses as a weight a statistical estimate of the sample average cost of securing an extra unit of output. Underlying these beguilingly simple constructs are some profound methodological difficulties, which the paper does a masterful job in summarizing.

DEA is easy to use and requires no specification of functional form. But it offers no guidance on the quality of the model specification. Have the ‘right’ outputs been chosen (Smith, 1997)? And are the weights that are implicit in the analysis acceptable (Pedraja-Chaparro et al., 1997)? Everything is left to the judgment of the analyst. SFA requires a choice of a functional form. It can then offer some guidance on model choice in the form of the usual parametric selection and specification tests, but much is still left to judgment. To demonstrate the scale of the problem, Table 1 reproduces from Jacobs (2001)the rank correlation coefficients between the efficiency rankings for five DEA models and their SFA counterparts for 232 National Health Service hospitals. It is possible to offer plausible justification for adopting any of these models, yet their policy messages differ profoundly. How are we to choose our recommended model?

Table 1.  Pearson correlation coefficients of results for 232 hospitals from five DEA specifications and their SFA counterparts†
DEA-1DEA-2DEA-3DEA-4DEA-5SFA-1SFA-2SFA-3SFA-4SFA-5
  1. †Source: Jacobs (2001). Figures in bold refer to models based on identical data.

1.0000         
0.22981.0000        
0.37290.63401.0000       
0.75750.35130.53721.0000      
0.47220.60620.83520.61491.0000     
0.42740.46670.59460.51660.57561.0000    
0.09570.62090.42310.18310.40380.63541.0000   
0.21540.43180.59750.31650.48520.82970.69171.0000  
0.41920.48350.65830.55430.59980.87630.68150.80651.0000 
0.33990.51950.65570.46330.63430.94960.65350.87310.82171.0000

Similar results arise when we confront technical choices concerned with the transformation of outputs, treatment of environmental factors and missing data, and the use of panel data. To understand the scope for debate, we have only to examine the response to the ‘World health report 2000‘ produced by the World Health Organization (2000). This document used productivity models to rank the health systems of 191 countries and has generated an intense technical and policy exchange that shows no sign of abating (Williams, 2001).

In short, numerous technical judgments must be made in the application of productivity techniques, and these judgments seriously affect policy conclusions. This is not to say that methods such as DEA and SFA cannot offer useful insights into the characteristics of a complex data set. Indeed I believe that their careful deployment can often form an important element in the development of evidence-based policy. However, any particular model specification can be readily challenged, and there is only a remote possibility of being able to come to a definitive judgment on performance by using technical apparatus alone.

So do we throw up our hands and say that it is all impossible, or do we seek out some other way forward? Professor Stone advocates an approach which he characterizes as ‘value-based analysis’. The idea is simple. It retains the formulation of the efficiency index given above but advocates the use of ‘societal weights’ rather than a set of weights emerging from arcane technical analysis. Yet how are theseweights to be determined? The objectives and priorities that are attached to public services vary enormously between individuals, and there is no golden rule to say that some societal measure of central tendency is in any sense correct.

In my view the choices of outputs and weights are essentially political rather than technical problems. Political choices in this domain can of course be guided by technical analysis, in particular the use of well-designed population surveys. But in the end it is the job of elected politicians, rather than statisticians, to reconcile the often diverse popular views concerning the objectives and priorities of public services.

The index of efficiency discussed here passes a stark finality of judgment on public service organizations. I believe strongly that—at least in the criminal justice sector—its construction is a legitimate and important undertaking. But if done properly it requires clarity in the definition of political priorities, and—as Benjamin Disraeli noted—‘finality is not the language of politics’. The Spottiswoode report and many similar endeavours betray a regrettable inclination to shy away from tough political choices, and to hide political values behind a technical smoke-screen. Professor Stone has succeeded admirably in exposing the confused thinking underlying this trend in public policy. His paper makes an important contribution to the public debate and it gives me great pleasure to propose the vote of thanks.

AndrewChesher (University College London)

Professor Stone's masterly exposition of Michael Farrell's work on the measurement of technical efficiency and of the stochastic frontier analysis (SFA) approach to the problem is timely. (There is an interesting account of the history of the development of data envelopment analysis (DEA) in Fø rsund and Sarafoglou (2000).) The questions that he raises about the suitability of these methods for measuring the efficiency of delivery of public services are most pertinent.

The focus of Professor Stone's paper is the measurement of efficiency in the public sector. Similar issues arise in the regulated private sector. For example to inform recent price control reviews Oftel has commissioned studies employing DEA and SFA to measure the efficiency of production of fixed line telephone services, comparing British telephone service providers with around 50 US fixed line operators.

Here, as in public sector applications, the samples are small, the data are contaminated by measurement error and transitory variation, and the accuracy of estimates may be low and poorly captured by conventional summary statistics developed from asymptotic approximations.

Crucially, the identification of policy relevant magnitudes rests on very fragile foundations, as Professor Stone's critique of SFA makes clear. Identification in the context of DEA is particularly problematic since DEA is not usually cast in the context of a model of behaviour and a model of data generation in which policy relevant magnitudes can be defined and the way in which data are informative about them can be understood.

Farrell (1957) considered only productive efficiency, giving no attention to the issue of efficiency in the choice of amounts to produce . In the context of public service provision this choice is important, because incentive mechanisms aimed at promoting productive efficiency may influence the choices of amounts of outputs to produce.

In the private sector, where prices clear markets, prices serve as signals of consumers’ valuations of outputs at the margin. In the public sector signals do not usually come via prices. They must be provided by the Government. Let w(y) be the Government's valuation of an output vector y. Then, from the Government's point of view, optimal production is

  • image

were x(y) is the minimum cost function. (In practice this optimization is likely to be subject to a budget constraint, C(y)≤ C*.)

In this context a meaningful measure of the ‘inefficiency’ of a public service provider, i, facing cost function xi(y) and producing an output vector yi is

  • image(1)

Measured by this yardstick, rankings of service providers facing the same value and minimum cost functions (w(⋅) and x(⋅)) depend only on w(yi)− xi(yi). The minimum cost function is irrelevant unless we wish to know the extent of the loss caused by inefficiency and/or the cause of inefficiency. The latter is addressed by noting that the inefficiency measure can be decomposed as

  • image((2a),(2b))

The two terms (2a) and (2b) are necessarily non-negative, measuring the loss due to respectively

  • (a)
    the suboptimal choice of yi and
  • (b)
    inefficient production of the potentially suboptimal yi .

DEA and SFA aim at measuring term (2b).

Professor Stone's value-based analysis (VBA) incorporates societal valuations of outputs and should be judged by considering its effectiveness in capturing inline image. Let v(W)= { vj(W)} j =1 s denote the ‘valuti’ that appear in the VBA (recall that they depend on a ‘negotiated’ constant W   ∈  (0,1)) and suppose that the Government's value function is linear in output: w(y)= v(W)y. Then the inefficiency measure inline image in equation (1) takes the form

  • image

and rankings by inline image of service providers facing the same value and minimum cost functions are iden- tical with rankings by inline image, with the top performing service provider having the largest value of inline image.

In this setting Professor Stone's value efficiency measures are φi = Vi/maxj(Vj) where

  • image

and as they are calculated on a per unit achieved cost basis they will generally yield rankings that differ from those generated by inline image. Ranking becomes problematic when comparing service providers facing different value and minimum cost functions, perhaps because they operate in different geographical regions, but could perhaps be based on the contributions that each inline image makes to their total across all service providers.

Both the VBA-based value efficiency measures, inline image and Professor Stone's φi, on their own provide no information about the extent of the loss due to inefficiency or about the cause of inefficiency. To address these issues the minimum cost function x(y) must be estimated. This is what DEA and SFA attempt to do. Research effort should be devoted to improving on these methods.

A careless use of efficiency measures as the basis for rewards or penalties may lead to suboptimal output choices. There will generally be underproduction of outputs that are omitted from the value function w(y). Even if all outputs are included, if service providers do not face the minimum cost function then there will be a tendency to specialize in the production of outputs that providers can produce relatively efficiently and service may be withdrawn from users who are costly to serve. Universal service obligations placed on regulated private utilities are one response to the latter problem.

Briefly, on one statistical issue, there are now bootstrap-based inferential methods for DEA (e.g. Simar and Wilson (2000a,b)). The identification problem which plagues SFA may be eased with repeated observations, either gathered through time, if inefficiency is time invariant within enterprises, or at a point in time within subunits of the units being considered.

Professor Stone's paper raises serious questions regarding the usefulness of DEA and SFA methods in measuring the efficiency of public service provision. His paper will stimulate research in this topical and very important area and I have much pleasure in seconding the vote of thanks.

The vote of thanks was passed by acclamation.

James Foreman-Peck (Her Majesty's Treasury, London)

Professor Stone advocates value weights, ‘valuti’, instead of weights chosen by data envelopment analysis or stochastic frontier analysis for indices of public service efficiency. This position seems more justified in the case that he examines, where there are multiple (public service) outputs, than in Farrell's original example, where there were many inputs and one output. In the first it is more reasonable to suppose that ‘society’ would or should have such weights. The second is a technical matter, where a variety of processes may be appropriate to different circumstances, perhaps best understood by the units under study themselves.

A very simple diagrammatic representation of the value point can be made by considering two outputs of policing—crime prevention and crime detection—and one input—police time (Fig. 6). The frontier XX shows the maximum that can be achieved from given police time. A technically efficient police force would ensure that reallocating police time between the two activities could not increase one without reducing another. The gradient of the frontier indicates how much of one activity must be given up to achieve an increase in the other by reallocating labour. Without output ‘prices’ or values there is no way of judging whether any such shift would be advantageous or not.

image

Figure 6. Social preferences and police efficiency

Download figure to PowerPoint

If a police force were ‘socially efficient’, the marginal social value of a police officer in crime prevention should be equal to that in criminal detection. To calculate such efficiency some form of value weight is necessary to convert ‘crimes prevented’ into ‘crime-detected equivalents’—otherwise the two indices are incommensurable. These values are Professor Stone's valuti. Social preferences based on these valuti, and higher valued preferences, are represented respectively as YY and Y′′Y′′′ in Fig. 6.

Police force B would be identified as technically efficient by a frontier method. Police force A would be technically inefficient by comparison. But ‘society’ prefers the inefficient police force; A is on a higher valued preference curve.

Much improvement of public services, however, depends on establishing the effectiveness of different input combinations for achieving a given output, e.g. various mixes of police foot and car patrols in the prevention of burglaries. This is a matter of what works in different circumstances, not a question of relative values. In such instances data envelopment analysis and stochastic frontier analysis may be helpful.

S. M. Rizvi (London)

Firstly I congratulate Professor Stone on this intellectually vibrant paper. However, I believe that staff costs, operating costs and capital consumption costs as inputs (Section 7) will not yield very elegant results, especially when we undertake international comparisons. I suggest that total police hours worked would be a better input variable. Moreover, for a more insightful analysis, the total police hours worked could be divided into hours consumed for administration purposes, namely, recording crimes etc., and those relating to the maintenance of law and order, and solving crimes. With regard to the environmental variables, I would have thought that the number of young men out of work would yield more insightful results than simply the number of young men, since those in work would not have much time to indulge in criminal activities, unless crimes are committed in a state of drunken stupor after office hours. Finally, the author should also consider the influence of violence portrayed on television on the actual violence in society.

V. T. Farewell (Medical Research Council Biostatistics Unit, Cambridge)

First let me say how pleased I am that Professor Stone has been able to present his paper. I congratulate him on both the paper and his perserverance in the clarification of issues in the measurement of efficiency for public services.

I chaired the Royal Statistical Society's Official Statistics Section meeting which was prompted by Spottiswoode's (2000) report. At that time, I summarized the situation as being that the report advocated data envelopment analysis and stochastic frontier analysis models with some minimal acknowledgement of ‘dissenting advice’ but some individuals felt that a wider discussion than that enabled by the report would be valuable. We have, perhaps, not moved far from this position. Thus I welcome this meeting.

I should like to suggest that it may be appropriate to take some reasonably large steps backwards before proceeding too much further forwards in this area. For example, there is surely a need for a fairly extensive preliminary data analysis of any variables which are to be used in efficiency measures and this should be in the public domain.

From a more methodological perspective, general considerations of multiplicity are relevant. Data envelopment analysis and stochastic frontier analysis appear to be rather extreme examples of the ‘summary measure’ approach to multiplicity. What consideration has been given to others, in particular to the use of marginal procedures which retain the individuality of some responses? For the reporting of clinical trials in rheumatoid arthritis, the use of five outcome measures is recommended as no single summary measure captures the complexity of response to treatment. The measurement of the efficiency of a police force is surely at least as complicated. Substantial input from police forces should help to direct thinking in this area.

The valuable work by Goldstein and Spiegelhalter (1996) on institutional rankings should be considered in light of the possible end use of efficiency measures. Also, since the measurement of outcomes of primary interest is difficult, the potential pitfalls of a reliance on surrogate measures deserve discussion.

Finally, there should be published comparisons of any suggested procedure for efficiency measurement with a variety of others. Professor Stone's value-based analysis should be pursued. However, even simpler approaches, such as O'Brien's (1984) procedure which simply sums the ranks when a large number of outcome measures are involved, are transparent and could be the basis of informative comparisons.

D. R. Cox (Nuffield College, Oxford)

Professor Stone has produced a searching and original paper on an important topic.

A small technical point is that if one were to use stochastic frontier analysis, and I totally share Professor Stone's general reservations, then even in the most favourable circumstances considerable care would be needed with the statistical analysis. Barndorff-Nielsen and Cox (1994), page 110, showed that even in a highly simplified version of the problem the likelihood function, although not technically irregular, is unlikely to be well behaved.

On a much broader and more important aspect, it might be argued that data envelopment analysis will give valuable answers provided that the number of performance measures (output variables) is kept very small. This has major disadvantages even within the context of finding a single measure of efficiency but more broadly can be very dangerous. Performance measures clearly have other possible objectives, e.g. as tools for local management of various kinds, including steering an organization towards certain objectives, and the provision of information for the public. All this will tend to point towards multidimensionality combined with much more use of sampling to reduce burden and a careful analysis for rational dimension reduction. This is not least to avoid such situations as the ranking of hospitals on the basis of criteria that do not include an assessment of the success of the care provided or of university teaching departments without considering what is taught and the attractiveness and success of the teaching. If a one-dimensional score is essential then the method of Professor Stone's Appendix A seems appealing with, however, quite frequent updating of the weights.

Juanita Roche (Richmond)

The UK Government recognized its need for more input from independent statistical experts at least by mid-1999, when the Performance and Innovation Unit began a review of ‘quantitative analysis and modelling in central government’. Their report concluded that government needed, above all, to make ‘better use of links to the academic world’ (http://www.cabinetoffice.gov.uk/innovation/2000/adding/coiaddin.pdf, page 70).

More recently, the Civil Service Commissioners published their 2000–2001 annual report, which described an appeal from a civil servant regarding statistics on progress towards a Government target. They conclude that there is

‘an obligation on civil servants to take reasonable measures to ensure that the way that they present data [to Parliament and the public] does not have the effect of being deceptive or misleading’,

and that the measures required are ‘independent scrutiny… [and] appropriate professional advice… on the presentation of statistics’ (http://www.civilservicecommissioners.gov.uk/documents/annual/cscrep00.pdf, pages 26−27).

The experience of Professor Stone, and others, in advising the Treasury on its proposal for measuring the efficiency of police forces suggests that the Government continues to have great difficulty in accepting independent advice and scrutiny regarding statistics. The new framework for National Statistics simply codifies the problem. Ministers will decide what counts as a ‘national statistic’ and is therefore subjected to quality assurance; and it appears that much regarding the performance measurement of public services will be excluded.

There is no kind of national statistics of greater interest to the public and Parliament than statistics on the performance of public services. Indeed, no other kind of statistics is so central to the functioning of Parliamentary democracy. If Parliament and the public cannot be confident of the quality of the statistics that they receive on the performance of Government and public services, then they cannot exercise effective oversight of Government.

It is as inadvisable to allow governments to present accounts of their performance in whatever form they like, with no independent scrutiny, as it would be to allow companies to present accounts of their performance in whatever form they liked, and unaudited. The Statistics Commission must exercise its power to recommend legislation to secure a truly independent framework for National Statistics, on the grounds of Parliament's right to determine what information it and the public require from the Government and with what assurance of validity. In the meantime, the statistics community must do everything that it can to help Parliament and the public to evaluate critically all the statistics presented to them by the Government.

Chris Tofallis (University of Hertfordshire Business School, Hatfield)

Mervyn Stone does a great service in drawing the attention of the statistics community to the area of assessing efficiency. Despite the fact that Farrell's seminal work of the 1950s appeared in the Journal of the Royal Statistical Society, statisticians have done little to further this work. Most research on data envelopment analysis (DEA) appears in the literature of operational research or management science, and for stochastic frontier analysis mainly in the econometrics literature. In both cases very little is critical. This may be because papers offering critiques are refereed by the people whose work is being criticized, and so they never see the light of day. Perhaps that is where the statistics community may play a useful role in being both critical and constructive, and this paper offers a clear example.

It includes a very clear section on ‘feasible performances’; this is the ‘production possibility set’ of efficiency theory. By emphasizing the three operations of mixing, worsening and rescaling, comprehension of how such sets are defined is greatly assisted.

Professor Stone is rightly concerned by the fact that when using DEA the discrimination between the units being compared may be very low when the number of output variables is high. Yet users of statistical methods face similar problems. There are things that we can do to help to improve discrimination in DEA. An obvious one is to aggregate some of the variables. Secondly, we can increase the number of data points by including data from two or more time periods. This simultaneously allows progress to be measured at individual units as well as more clearly delineating the set of feasible performances.

Allowing a single figure to represent the performance of a unit involved in a variety of outputs is a summary statistic in extremis. Moreover a DEA score of 100% does not mean that a particular unit is outstanding in all areas. Thus a different approach to improving discrimination is to apply DEA to subsets of variables to obtain an ‘efficiency profile’ for each unit (Tofallis, 1997). Each subset may represent a particular aspect of performance—these might be things that appear in the mission statement. Such a profile more easily identifies areas of weakness within each unit.

Finally, the very generous approach that DEA adopts in attaching weights means that it is more useful in identifying poor performance than good. It is a method that bends over backwards to make each unit appear in a good light. If, despite this effort, the resulting score is still poor then we have strong evidence that the unit in question appears to be a laggard and needs further investigation. However, DEA can underestimate efficiency if there is a convex region in the underlying production function.

Jane Galbraith (London School of Economics and Political Science)

Professor Stone's admirable paper does not suggest that data envelopment analysis would ever be appropriate for measuring the efficiency of public services, but it does indicate clearly that it is particularly inappropriate if the number of output variables is large. Unfortunately the smaller the number of output variables the greater the risk that the units being assessed will manipulate them to increase their score on the performance indicator.

Where this is done by creative accounting (e.g. by changing the ways that reported crimes are categorized) little harm may be done. But where resources are reallocated to improve the specific output variables this can be to the detriment of the overall provision of service. In either case the performance indicator's validity will be undermined.

Therefore it is important in devising a performance indicator that the output variables are chosen so that, when units try to optimize their scores, not too much harm is done to the service and the indicator retains some validity.

Ben Torsney (University of Glasgow)

The illustration considered in the paper focused on seeking an efficiency measure in respect of the provision of police services in England and Wales alone. Of course the Treasury has no direct responsibility for provision in Scotland, so such an exclusion is forgivable, and, indeed, could possibly be exploited.

The issue that I wish to address concerns the fact that all police forces in England and Wales are presumed to be included in the analysis—a population rather than a sample.

Whatever method of analysis is adopted, a fundamental question is what does residual variation represent? It could possibly be variation explained by non-included factors or year-to-year variation.

However, I think that there is scope for a more substantive consideration of the issue. It is one which I face in a study of outreach provision of health services at health centres (from which one or more general practitioner practices can operate) in Scotland. This includes the provision of electrocardiogram or X-ray equipment or specialist consultant clinics; see Milne and Torsney (1992, 1993, 1994, 1997, 2001) and Torsney and Milne (1999).

A model for a binary response is needed here and potential explanatory variables are available, but on what basis can inferences about parameters be judged? I have no great wisdom to offer here. One possibility might be a superpopulation version of whatever model or method is adopted. Does the author have any advice on this?

Of course, if the observed units do represent a sample from a wider grouping, this issue is less pressing. For example if the English and Welsh police forces could be viewed as representative of the UK, as a whole, then inferences or predictions could possibly be made about Scotland and their accuracy assessed. However, such an extrapolation might be unreasonable if, unlike the Scottish Executive, the UK Treasury does not consult experts!

Greg Phillpotts (Department of Health, London)

I would like to return to the point raised by Juanita Roche, that of the trust that the public have in official statistics and statistical methods. My point is that the Government has provided a framework to build trust in National Statistics. This includes the need for a quality review of key national statistics and the methods used to produce them and incorporating the use of outside expertise in the review process. I do not know from my position at the Department of Health whether the efficiency measures that are produced by the Home Office and the subject of this paper are part of National Statistics. There is a meeting to be held here at the Society on January 28th, 2002, at 3 p.m. about the draft National Statistics code of practice and this question of coverage of National Statistics could be picked up then.

Stephen Senn (University College London)

This paper is, in the best traditions of this Society, bringing the results of skilful investigations of statistical theory to bear on a matter of practical importance. I have one question concerned with whether it has any lessons for us as statisticians.

It is now about 70 years since two statisticians, who were associated with the institution at which I work, in the department which Professor Stone once headed and in which he now has an honorary appointment, were faced with an analogous problem to that considered in this paper. They were considering the choice of statistical tests and characterized such tests by using two properties only: a far simpler case than that of judging police forces that is considered here. In fact, since they restricted their consideration (perhaps misleadingly for future development) to fixed sample sizes, there is a further simplification, in that to make the analogy we could have to consider police forces operating with the same budget.

The statisticians were, of course, Jerzy Neyman and Egon Pearson and the two outputs from statistical tests that they considered were type I and type II error rates (Neyman and Pearson, 1933). They recognized the impossibility of simultaneously minimizing these and adopted instead the approach of fixing one error rate and minimizing the other. If imitation is the measure of success, this procedure has turned out to be a stupendous success but it seems to involve, if anything, even more squeamishness about combining outputs than data envelopment analysis does.

My question for Professor Stone is this: the lessons of his paper for the Government seem clear; are there any for statisticians?

The following contributions were received in writing after the meeting.

Rolf Färe, Shawna Grosskopf and Valentin Zelenyuk (Oregon State University, Corvallis)

We thank M. Stone for bringing the efficiency literature to the attention of the statistics community, but we take issue with some of the remarks raised in his ‘unreserved criticism of the data envelopment analysis (DEA) technique’ (Section 1). The laundry list of pitfalls in the approach would provide an admirable checklist for any empirical analysis, e.g. avoiding specification errors and paying attention to the sample size. His proposed alternative approach is also perfectly consistent with reasonable DEA studies, although we would substitute ‘Choose an appropriate model specification’ for ‘fix the valuti’.

However, we disagree that the Farrell–DEA technique produces ‘self-defining efficiency measures—functions of the database alone and determinable without reference to context’ (Section 8.1). Over the last decades we have made considerable efforts to point out the connections between the DEA model and axiomatic production and duality theory as pioneered by Shephard (1953, 1970), including the reference cited by the author. These provide the theoretical underpinnings of the DEA model and give an economic context. Exploiting the links to economic theory and the flexibility of the activity analysis (DEA) model allows the analyst to customize the specification to the application. For example, we have long endorsed using Shephard's (1974) cost indirect model for applications in the public sector. In that model the bench-mark objective is to maximize services (or outcomes or activities) subject to the given budget. Here, in contrast with Stone's example, input prices are explicitly included and the solution yields the cost minimizing allocation of inputs consistent with the budget. (Farrell's original model does this as well; in fact, Farrell's major contribution is considered to be the fact that he decomposed the cost inefficiency into the technical component focused on by Stone and a price-related allocative efficiency component.) We also advocate exploiting duality theory to derive the associated shadow prices of the outputs to provide information concerning the ‘values’ that are implicit in the observed mix of outputs. Stone may also be interested in a variation which explicitly includes a utility function as part of the DEA problem; see Färe et al. (2002).

In addition, recent results also demonstrate that DEA has respectable statistical properties as an estimator, and with the aid of bootstrapping techniques it can be readily adapted to undertake statistical inference and hypothesis testing as well (see Simar and Wilson (2000a)). In conclusion, we would argue that Stone is condemning a discipline for what he sees as the shortcomings of a particular application.

Paul Hewson (Devon County Council, Exeter)

I would like to start by asking a naıve question. With the exception of the British Crime Survey data, seven of the variables proposed as police outputs are counts, and one is a percentage. Presumably there is a loss of analytical subtlety when the environmental variables are dealt with as a modified input rather than an offset to some of these count variables. In a statistical context, one would imagine that the variability in estimates of counts increases at least as much as the underlying mean value for the count, whereas the variability in the percentage estimates may decrease as we approach either end of the scale. I am therefore intrigued by the possibilities that in treating data of this kind empirically the set of feasible performances will be dominated by outliers. The author's point about the need for a small set of indicators is well made in Section 8.2, but I question the way in which these items were selected. Given a large range of data, is there any scope for dimension reducing techniques to allow the construction of either lower dimension projections or regular reselection of a smaller number of variables which can act as proxies for a wider range of variables?

Turning to the value-based analysis I would firstly like to ask about non-decision-making possibilities, such as whether there is any need to have a single consensus vector of valuti. I feel that there could be explorative potential in studying efficiency with a range of valuti, including valuti that represent organizations’ stated priorities, locally determined valuti and valuti for different subsections of the population. This may not avoid the ‘benevolent dictatorship’ problem alluded to, but it might help to inform the way in which improvements in efficiency and policy generally impacted on different sections of the community.

Finally, given the developments in Bayesian statistics since 1957, I would like to finish by asking what potential may exist for approaching value-based analysis in a modelling context. The obvious first advantage is in dealing more accurately with random variation throughout the data, and presumably the value-based element can be dealt with in terms of loss functions or utilities.

It is noteworthy that this paper was presented a month after the National Audit Office (2001) reported on some unacceptable ways in which hospital waiting list figures may have been adjusted. Some of the risks of naıve use of performance indicators may be demonstrated in this report, and it is clear that any further analysis should seek not to add additional dysfunctionalities.

Gary Koop (University of Glasgow) and Mark F. Steel (University of Kent at Cantebury)

Stochastic frontier analysis (SFA), data envelopment analysis (DEA) and the proposed value-based an- alysis (VBA) involve different sets of assumptions. In applications, these assumptions may or may not be reasonable. Stone is very critical of SFA and we thought it appropriate to note that these criticisms are, perhaps, not as damaging as he suggests. There is a burgeoning literature on SFA with multiple outputs, including some of our own Bayesian work (Fernández et al., 2000, 2002). In economic applications, prices often provide us with the output weights that Stone desires. If these are not available, we would argue that, with a careful choice of outputs, the data-based SFA and DEA approaches will, in many cases, be less objectionable (and more practical) than somehow choosing ‘societal weights’ for outputs (especially given Stone's desire to ‘retain the goodwill of workers’ and need to ‘obtain from each unit  … how its own input costs  …  should be notionally divided’ and ‘negotiate the value  … with unit managers’).

We briefly comment on the ‘widely recognized weaknesses in … SFA’ (Section 6).

Ignore errors in outputs

This feature is shared by all methods, and it seems that formal errors-in-variables methods can relatively easily be used in the statistical context of SFA.

Make an arbitrary choice of the distribution of u and v

The use of longitudinal or panel data (Schmidt and Sickles, 1984; Koop et al., 1997; Fernández et al., 1997) can substantially reduce the sensitivity of our results and Bayesian methods allow formal model comparisons and averaging. The question is whether the assumptions are appropriate for the empirical question at hand. For example, SFA typically assumes that ‘measurement error is Normally distributed, independently of efficiency’, whereas DEA–VBA typically assumes that ‘measurement error is identically zero’.

Assume a form for F

There is a large literature on flexible functional forms which may be sensible in a given application (Stone's use of the Cobb–Douglas form really sets up a straw man) and restrictions of economic theory (e.g. monotonicity and concavity) can trivially be imposed through the prior. Alternatively, nonparametric or semiparametric methods can be used. See Koop et al. (1994).

Environmentals

Environmentals can either be included as explanatory variables in the frontier or the efficiency distribution (Koop et al., 1997) or as bad outputs (Fernández et al., 2002).

Finally, it is important to note that statistical methods allow for probability statements and confidence or credible intervals, which can be of great practical importance (see Kim and Schmidt (2000)).

Emmanuel Thanassoulis (Aston University, Birmingham)

Data envelopment analysis can work with value judgments

In Fig. 7 the data envelopment analysis (DEA) weights are estimates of resources that an efficient force would use per burglary and violent crime cleared (Thanassoulis, 1996, 2001). We refer to them as marginal resource levels (MRLs). We have three properly defined sets of MRLs, one respectively for output mixes reflected on AB, BC and CD. The MRLs of AB render forces on AB 100% efficient and so on.

image

Figure 7. Simple DEA assessment of police forces

Download figure to PowerPoint

Stone asks in Section 8 can we really take as a measure of G's efficiency the proportion of its expenditure that the MRLs of BC explain when they may not necessarily reflect the (unknown) worth of violent or burglary clear-ups? There is no reason to limit ourselves to this technical DEA efficiency measure. If the valuti exist or relative valuti ranges can be derived from stakeholders then why would we not use DEA models with weights restrictions or those for allocative efficiency (Thanassoulis (2001), section 4.7.2) to derive overall(value-based) efficiency measures also? Surely it is valuable to know the part of the value shortfall of a force due to the incompatibility of its output mix with stakeholder worths (allocative inefficiency) and the part due to its inability to gain maximum outputs relative to other forces (technical inefficiency). That is to say nothing of the identification by DEA of suitable role model forces which an inefficient force can emulate whether it is to gain maximum technical and/or allocative efficiency. On another point, in Fig. 7 force A may not be an efficient peer for any inefficient force whereas B and C can be peers for many forces. Yet the MRLs of AB render both force A and force B 100% (technically) efficient. Is force A necessarily less efficient in value or technical terms than force B? There is a difference between efficient performance and whether or not the output mix of a force is shared by any other forces.

The problem surely is not the technical efficiency and much other managerial information that DEA yields but rather how to solicit from stakeholders information on valuti that would enrich the assessment. DEA is a tool for using such valuti information as can be gathered. It does not preclude its use.

The author replied later, in writing, as follows. I want to thank the Society's Programme Committee for accepting this paper for ‘reading’, thereby allowing things to be said that were not ‘read’. Things said by discussants, especially the critical bits, will be the first and perhaps only things read in the printed version. In aggregate their good sense will efficiently add useful public service to an otherwise academic occasion.

The contributions from Professor Smith, Professor Chesher, Professor Farewell, Sir David Cox, Dr Roche and Mrs Galbraith will speak volumes without any significant comment from me. But I cannot refrain from lightly questioning the pessimism in the first and the optimism in the second.

Professor Smith points out that there is no ‘golden rule’ with which to fix global societal priorities in the shape of the valuti {vi }, and he thinks that we must rely on elected politicians to do that job. At present, police forces with their police authorities claim to be exercising local priorities—and, in this, a traditionally trusting society does not demur significantly. So it would be a small step towards transparency to put Chief Constables (rather than politicians) into conclave with a modicum of statisticians—until they emerge with agreed global valuti and a formula for environmental priority adjustment of the index V (not forgetting adjustment for documentable deficiencies in the local criminal justice system).

Professor Chesher wants to see research effort devoted to establishing a ‘minimum [over output profiles] cost function’, the same for all forces, so that we would know how much inefficiency there is above the largest Vi. Until that is done, we have to live (do we not?) with Farrell's preference for a yardstick based on observed performances, albeit one that is less empirical and more normative than Farrell's.

Unless practices change, optimism may also be present in Dr Roche's suggestion that National Statistics should embrace public service statistics. A less ambitious suggestion (Stone, 2002) is that mandarins should be free to relax the constraints of the usual tendering and contracting procedures—to bring into play, informally and at low cost, a wider range of outside judgment, in cases where broad judgment might help at the start of any project and where either convergence or divergence of outside views would be informative.

Dr Foreman-Peck has thrown additional light on the important difference between what motivated Farrell in his agricultural economics example and what now clearly motivates society in its assessment of police force outputs. It is the difference between an arguably irrelevant technical efficiency and an overall efficiency based on valuti that includes a compensatingly irrelevant allocative efficiency.

I see Dr Tofallis's ‘efficiency profiling’ as providing something smoother than the current practice of presenting public service units with a large number of performance targets, as a means of provoking improvements here and there but without making clear where political masters may choose to intervene. Provided that interest can be limited to technical efficiency and it is applied with the perceptiveness of its inventor, the technique will be preferable to such abusive practice.

Mr Hewson has beaten up a number of hares that would require more ground than I have on which to pursue them properly. His suggestion of using different sorts of valuti is interesting, as is the reminder of a particular dysfunctionality associated with the ‘political ownership’ of National Health Service performance indicators.

Professor Thanassoulis simply repeats Farrell in pointing out that, when valuti are fixed, overall value efficiency can be factorized into allocative and technical efficiency—usefully for Farrell. Here, I cannot see the logic in letting technical-efficiency-based data envelopment analysis (DEA) determine force-specific valuti for V, when the valuti are not fixed but specified by ‘weight restrictions’, e.g. intervals (which is what the Spottiswoode report recommends). Clearly the option of somehow using intervals is a generalization of using single values, but the value of a generalization depends on how it is exploited.

Professor Senn has milked a nice metaphor for all that it can give. For the B-word he wants me to use, I refer him to the contribution of Dr Koop and Professor Steel. Their first reference shows how far we have come since the simple kinetics of a billiard ball set the Bayesian ball rolling. If we try to apply the Bayesian stochastic frontier analysis (SFA) model of Fernández et al. (2000) to the case in hand (with T =1 for 1 year of data and ours taken to be 30), and reversing the order in which that paper builds the likelihood but without changing the model, we would have the following.

  • (a)
    The distribution of the shape statisticsS of the 30 logged outputs L =  log ( Y ) would be boldly specified as those of the degenerate 30-parameter log–Dirichlet distribution ‘ LD ( s )‘. ( S is the maximal invariant under affine transformations a + qL for q >0. Writing a = C α where Σ   α j   =  1, this specification fixes q and α in the identification of the distributions of C α + qL and LD ( s ), but not the distribution of C .)
  • (b)
    The distribution of C would be fixed (and hence that of Y ) by giving θ = def { α 1   Y (1) q + … + α 30   Y (30) q } 1/q the distribution of exp { β  log ( x )− γ + σ E } where γ >0 and E is N (0,1).

Proper priors would then be given to s, q, α,β,γ and σ of the sort (large variances etc.) that are mistakenly thought to evade all the problems of improper priors. In (b), θ indexes the ‘aggregate’ or ‘technologically equivalent’ output profiles, with respect to which exp(− γ) is taken to represent the ‘efficiency’ —whose posterior distribution given an observed performance profile (x,y) would then be calculated by Markov chain Monte Carlo algorithms.

I fear the introduction of such econometrically motivated models into the practical definition and measurement of public service efficiency: their intimidating complexity, barely understood by their creators, would inhibit the necessary wider understanding of their weak points. Even before thinking about (b) (in which the α and q of (a) mysteriously resurface without justification), the model should be required to pass some non-Bayesian test of the goodness of fit of the sample of 43 realizations of S to the family of distributions prescribed for it. In their banking efficiency study, Fernández et al. (2000) did not venture such a test, which would be in the spirit of George Box's approach to scientific Bayesian modelling (Box, 1980). Is it fanciful to think that, if you tried to implement such a model, you might be charged with ‘obstructing the police’ in their modest effort to improve?

I am sure that Dr Torsney has very good reasons (to do with ‘understanding, intervention and prediction’ rather that ‘evaluation and assessment’) for wanting to model the health service processes that he studies with such care. But having now seen that context-free theoretical modelling may be either misleading or hopelessly ambitious, I prefer here to ‘pass’ on the challenging questions that he has raised.

Professor Färe and his colleagues chide me with lack of respect for powerful theory, as being too blinded by a particular application—one that happens to absorb nearly £8 billion of public expenditure. They do not tell me where I have shown disrespect. What I do not respect is the opinion of a DEA advocate in a hospitable Home Office seminar—that I must be wrong about the Spottiswoode report because there are now at least 5000 doctoral theses on the DEA–SFA techniques that it recommended.

References in the discussion

  1. Top of page
  2. Abstract
  3. 1. Introduction
  4. 2. Feasible performances
  5. 3. Geometric insights
  6. 4. Supporting algebra
  7. 4.1. Becoming 100% efficient!
  8. 4.2. Bad outputs
  9. 5. Developments in data envelopment analysis
  10. 5.1. Slack
  11. 5.2. Weight restrictions
  12. 5.3. Environmentals
  13. 6. Stochastic frontier analysis
  14. 7. Police forces and performances
  15. 8. Critical features of the data envelopment analysis technique
  16. 8.1. Missing values?
  17. 8.2. Indiscrimination
  18. 8.3. Local priorities
  19. 8.4. Rank difficulties
  20. 8.5. Variable returns to scale
  21. 8.6. Weight restrictions
  22. 9. Prospects
  23. Acknowledgements
  24. Appendix A
  25. References
  26. Discussion on the paper by Stone
  27. References in the discussion
  • Barndorff-Nielsen, O. E. and Cox, D. R. (1994) Inference and Asymptotics. London: Chapman and Hall.
  • Box, G. E. P. (1980) Sampling and Bayes’ inference in scientific modelling and robustness (with discussion). J. R. Statist. Soc. A, 143, 383430.
  • Färe, R., Grosskopf, S. and Roos, P. (2002) Integrating consumer satisfaction into productivity indexes. In Efficiency in the Public Sector (ed. K.Fox), pp. 201218.Boston: Kluwer.
  • Farrell, M. J. (1957) The measurement of productive efficiency (with discussion). J. R. Statist. Soc. A, 120, 253290.
  • Fernández, C., Koop, G. and Steel, M. F. J. (2000) A Bayesian analysis of multiple-output production frontiers. J. Econometr., 98, 4779.
  • — (2002) Multiple-output production with undesirable outputs: an application to nitrogen surplus in agriculture. J. Am. Statist. Ass., 97, in the press.
  • Fernández, C., Osiewalski, J. and Steel, M. F. J. (1997) On the use of panel data in stochastic frontier models with improper priors. J. Econometr., 79, 169193.
  • Fø rsund, F. R. and Sarafoglou, N. (2000) On the origins of data envelopment analysis. Memorandum 24/2000. Department of Economics, University of Oslo, Oslo.
  • Goldstein, H. and Spiegelhalter, D. J. (1996) League tables and their limitations: statistical issues in comparisons of institutional performance (with discussion). J. R. Statist. Soc. A, 159, 385443.
  • Jacobs, R. (2001) Alternative methods to examine hospital efficiency: data envelopment analysis and stochastic frontier analysis. Hlth Care Mangmnt Sci., 4, no. 2, 103116.
  • Kim, Y. and Schmidt, P. (2000) A review and empirical comparison of Bayesian and classical approaches to inference on efficiency levels in stochastic frontier models with panel data. J. Product. Anal., 14, 91118.
  • Koop, G., Osiewalski, J. and Steel, M. F. J. (1994) Bayesian efficiency analysis with a flexible form: the AIM cost function. J. Bus. Econ. Statist., 12, 339346.
  • — (1997) Bayesian efficiency analysis through individual effects: hospital cost frontiers. J. Econometr., 76, 77105.
  • Milne, R. G. and Torsney, B. (1992) Non price-allocative procedures: Scottish solutions to a National Health Service problem. In Health Economics Worldwide: Proc. 2nd Wrld Congr. Health Economics, Zurich, Sept. 1990 (eds H. E.Frech III and P.Zweifel), pp. 187202. Dordrecht: Kluwer.
  • — (1993) Allocating public services in an administrative environment: the Scottish National Health Service's experience. In CREDES-CES Actes du Colloque Europe en de l'Analyse Economique aux Politiques de Santé (eds E.Levy and A.Mizrahi), pp. 7984. Paris: SCRIPTA Diffuscon.
  • — (1994) Public choice in the distribution of health services: some evidence of a command economy. In Proc. 43rd Int. Conf. Applied Econometrics Association, Lyons, pp. 3747.
  • — (1997) The efficiency of administrative governance: the experience of the pre-reform British national health service. J. Compar. Econ., 24, 161180.
  • Milne, R. G., Torsney, B.et al.(2001) Consultant outreach, 1991 to 1998: an update and extension on its distribution in Scotland. Hlth Bull., 59, 315331.
  • National Audit Office (2001) Inappropriate Adjustments to NHS Waiting Lists. London: National Audit Office.
  • Neyman, J. and Pearson, E. S. (1933) On the problem of the most efficient tests of statistical hypotheses. Phil. Trans. R. Soc. Lond. A, 231, 289337.
  • O'Brien, P. C. (1984) Procedures for comparing samples with multiple endpoints. Biometrics, 40, 10791087.
  • Pedraja-Chaparro, F., Salinas-Jiménez, J. and Smith, P. (1997) On the role of weight restrictions in data envelopment analysis. J. Prod. Anal., 8, 215230.
  • Schmidt, P. and Sickles, R. (1984) Production frontiers and panel data. J. Bus. Econ. Statist., 2, 367374.
  • Shephard, R. W. (1953) Cost and Production Functions. Princeton: Princeton University Press.
  • — (1974) Indirect Production Functions. Meisenheim am Glan: Hain.
  • — (1970) Theory of Cost and Production Functions. Princeton: Princeton University Press.
  • Simar, L. and Wilson, P. W. (2000a) A general methodology for bootstrapping in nonparametric frontier models. J. Appl. Statist., 27, 779802.
  • — (2000b) Statistical inference in nonparametric frontier models: the state of the art. J. Product. Anal., 13, 4978.
  • Smith, P. (1997) Model misspecification in data envelopment analysis. Ann. Oper. Res., 73, 233252.
  • Spottiswoode, C. (2000) Improving police performance: a new approach to measuring police efficiency. Report. Public Services Productivity Panel, Her Majesty's Treasury, London. (Available from www.hmtreasury.gov.uk/pspp/studies.html.)
  • Stone, M. (2002) Can public service efficiency measurement be made a useful tool of government?: lessons from the Spottiswoode Report. Publ. Money Mangmnt, 22, 3340.
  • Thanassoulis, E. (1996) A data envelopment analysis approach to clustering operating units for resource allocation purposes. Int. J. Mangmnt Sci., 24, 463476.
  • — (2001) Introduction to the Theory and Application of Data Envelopment Analysis: a Foundation Text with Integrated Software. Boston: Kluwer.
  • Tofallis, C. (1997) Input efficiency profiling. Comput. Oper. Res., 24,253258.
  • Torsney, B. and Milne, R. (1999) Applications of statistics in health economics. In Aplicaciones Estadisticas un reto para el Nuevo Milenio(eds J. L.Fidalgo and J. M.Rodriguez), pp. 9096. Hesperides.
  • Williams, A. (2001) Science or marketing at WHO?: a commentary on World Health Report 2000. Hlth Econ., 10, no. 2, 93100.
  • World Health Organization (2000) The World Health Report 2000. Health Systems: Improving Performance. Geneva: World Health Organization.