# Statistical Approaches for Non-parametric Frontier Models: A Guided Tour

## Summary

A rich theory of production and analysis of productive efficiency has developed since the pioneering work by Tjalling C. Koopmans and Gerard Debreu. Michael J. Farrell published the first empirical study, and it appeared in a statistical journal (Journal of the Royal Statistical Society), even though the article provided no statistical theory. The literature in econometrics, management sciences, operations research and mathematical statistics has since been enriched by hundreds of papers trying to develop or implement new tools for analysing productivity and efficiency of firms. Both parametric and non-parametric approaches have been proposed. The mathematical challenge is to derive estimators of production, cost, revenue or profit frontiers, which represent, in the case of production frontiers, the optimal loci of combinations of inputs (like labour, energy and capital) and outputs (the products or services produced by the firms). Optimality is defined in terms of various economic considerations. Then the efficiency of a particular unit is measured by its distance to the estimated frontier. The statistical problem can be viewed as the problem of estimating the support of a multivariate random variable, subject to some shape constraints, in multiple dimensions. These techniques are applied in thousands of papers in the economic and business literature. This ‘guided tour’ reviews the development of various non-parametric approaches since the early work of Farrell. Remaining challenges and open issues in this challenging arena are also described. © 2014 The Authors. International Statistical Review © 2014 International Statistical Institute

## 1 Introduction

Production theory and efficiency analysis examine how firms or production units in a particular sector of activity transform their inputs (e.g. labour, energy and capital) into quantities of outputs, that is, the goods or services that are produced by the firms. The analysis is not limited to business firms such as manufacturing concerns, electricity plants, banks and for-profit hospitals; it is used to examine schools, universities, provision of public goods and services, non-profit organisations including hospitals and credit unions and so on. The efficient production frontier is defined in the relevant input–output space as the locus of the maximal attainable level of outputs corresponding to given levels of inputs. Alternatively, if prices of inputs are available, one can consider a cost frontier defined by the minimal cost of producing various levels of outputs. Intermediate cases are also possible; for example, one might consider maximisation of quantities of a subset of outputs while minimising quantities of some (perhaps all) inputs, holding other quantities fixed. In all cases, the problem amounts to estimating a boundary surface in the relevant space (e.g. the input–output space in the case of a production frontier or the output–cost space in terms of a cost frontier) under shape constraints induced by some economic assumptions (e.g. monotonicity or concavity). The technical efficiency of a particular production plan (characterised geometrically by a point in the input–output space) is then determined by an appropriate measure of distance between this point and the optimal frontier. The background of the economic theory behind this analysis is due to Koopmans (1951) and Debreu(1951); see Shephard (1970) for a comprehensive presentation of the underlying economic theory.

In empirical studies, the attainable set in the input–output space is unobserved, and hence, the efficiency of a given firm is also unknown. These quantities must be estimated from a sample of observed combinations of input and output quantities obtained from existing production units operating in the activity sector being studied. Many different approaches have been investigated in the literature, including statistical models of varying degrees of sophistication and ranging from fully parametric to fully non-parametric approaches. This literature has developed in a variety of academic fields, including economics, management and management science, operations research, econometrics and statistics; in each case field, papers ranging from ‘very theoretical’ to ‘very applied’ can be found.

This ‘guided tour’ focuses on statistical results obtained in the non-parametric branch of the literature, while stressing the inherent difficulty of the problem and solutions that have been developed. The tour begins in Section 2 by defining the basics of an economic model for production theory. The most popular non-parametric estimators, based on envelopment techniques, are then presented in Section 3. Statistical properties of the estimators and practical aspects of inference making (mostly by bootstrap methods) are discussed in Section 4. Section 5 presents various extensions that have been proposed in the literature to address some of the inherent drawbacks of envelopment estimators (e.g. sensitivity to extreme data points and outliers). Section 6 shows how environmental factors that may influence the production process can be included in the analysis, allowing for heterogeneity. Finally, Section 7 briefly describes additional interesting issues and challenges that remain open questions, including (i) how to use non-parametric methods to improve some parametric estimators, (ii) introduction of noise in the observational process; (iii) testing issues; and (iv) non-parametric frontier models for panel data. The existing first solutions to these problems are described, but more work is needed on these issues.

## 2 The Economic and The Statistical Paradigms

### 2.1 Production Theory and Efficiency Scores

Following Koopmans (1951) and Debreu (1951), the production process can be described as follows. Let denote a vector of p input quantities, and let denote a vector of q output quantities. The production set

(2.1)

describes the set of physically attainable points (x,y). For efficiency evaluation, the efficient boundary of Ψ, that is, the technology, is of interest. This is defined by

(2.2)

Some minimal economic assumptions on Ψ are typically made. Most studies assume that inputs and outputs are freely (or strongly) disposable, that is,

(2.3)

although this condition can be relaxed to allow for effects of congestion, pollution or perhaps other phenomena (see Färe et al., 1985, for a discussion). This hypothesis in (2.3) assumes that it is always possible (even if not economically sound) to waste resources and implies monotonicity of the technology. Another mild assumption is that all production requires use of some positive input quantities (this is called the ‘no free lunch’ assumption):

(2.4)

In addition, it is often assumed that the attainable set Ψ is convex so that if (x1,y1),(x2,y2)∈Ψ, then for all α∈[0,1],

(2.5)

Other economic assumptions on Ψ (e.g. returns to scale) are sometimes made, but at this stage, only those introduced in the preceding text are needed.

The technical efficiency of a given production plan (x,y) can now be measured along the lines of Debreu (1951) and Farrell (1957). The input measure of technical efficiency θ(x,y) is given by the minimal radial contraction of the inputs to project the point (x,y) onto the efficient frontier Ψ:

(2.6)

where for all points (x,y)∈Ψ,θ(x,y)≤1. A value of 1 indicates an efficient point lying on the boundary of Ψ. Similarly, the output-oriented efficiency score λ(x,y) is the maximal radial expansion of the outputs that projects the points (x,y) onto the efficient frontier, that is,

(2.7)

Here, for all points, (x,y)∈Ψ,λ(x,y)≥1. A value of 1 indicates an efficient point lying on the boundary of Ψ.

Other distance measures have been proposed in the economic literature, including hyperbolic distance

(2.8)

due to Färe et al., (1985) (also Färe & Grosskopf, 2004), where input and output quantities are adjusted simultaneously to reach the boundary along a hyperbolic path. Note that γ(x,y) = 1 if and only if (x,y) belongs to the efficient boundary Ψ. More recently, a lot of interest has been devoted to directional distances (Chambers et al., 1998; Färe & Grosskopf, 2000) because of their great flexibility. Here, the projection of (x,y) onto the frontier is along a path in a given direction d = (−dx,dy), where and . The flexibility is due to the fact that some values of the direction vector can be set to zero. Distance from a point (x,y) to Ψ is then measured by

(2.9)

where for all points (x,y)∈Ψ,δ(x,y)≥0. A value of 0 indicates an efficient point lying on the boundary of Ψ. Note that as a special case, the Farrell–Debreu radial distances can be recovered; for example, if d = (−x,0),δ(x,ydx,dy) = 1−θ(x,y)−1. Another interesting feature is that directional distances are additive measures; hence, they permit negative values of x and y (e.g. in finance, an output y may be the return of a fund, which can be, and often is, negative). Long debates have been discussed in the economic literature on the directions should be chosen, and many choices are possible (e.g. a common one for all firms or a specific direction for each firm); see Färe et al., (2008) for a discussion.

It should be noticed that all these efficiency measures characterise the efficient boundary by measuring the distance from a known, fixed point (x,y) to the unobserved boundary Ψ; the only difference among the measures in (2.6)(2.9) is in the direction in which distance is measured.

### 2.2 Returns to Scale

Returns to scale is an important property of the technology Ψ and determines what happens as the scale of production is increased. Let denote the convex cone of Ψ, and define

(2.10)

analogous to (2.2). If , then , and Ψexhibits globally constant returns to scale (CRS).

Alternatively, for Ψ convex, if , then the different regions of Ψ may display locally either CRS, increasing returns to scale (IRS) or decreasing returns to scale (DRS). In this case, the subset of Ψ given by {(x,y)∣(x,y)∈Ψ and exhibits locally CRS (this subset may include a single point or perhaps many points). The subset of Ψ given by

(2.11)

exhibits locally IRS, while the subset of Ψ given by

(2.12)

exhibits locally DRS. If Ψ has different regions that display IRS, CRS and DRS, then Ψ is said to be of varying returns to scale (VRS).

Along the IRS portion of a technology, a small increase in input usage allows a greater-than-proportionate increase in output quantities produced. By contrast, along the DRS portion of a technology, a small decrease in input usage requires only a less-than-proportionate decrease in output quantities. In this sense, absent other considerations, the CRS portion of Ψ is the optimal part of the technology; that is, the CRS portion of Ψ corresponds to the most productive scale of operation (see Banker, 1984, for a discussion).

### 2.3 Statistical Modelling

The concepts introduced earlier are useful in theory, but in practice, the attainable set Ψ and its boundary Ψ are unknown, as are the efficiency scores. The best an empirical researcher can hope to do is to estimate these from a random sample of input–output combinations . Of course, a well-defined statistical model—a description of the data generating process (DGP)—describing how the random sample is generated is needed before anything can be estimated. This involves specifying a probabilistic structure, as well as an input–output mapping.1

There are two main streams of thought in the extant literature: (i) the so-called deterministic frontier models (this wording is unfortunate because nothing is deterministic) and (ii) stochastic frontier models. In deterministic models, it is assumed that all observations in the sample belong to the attainable set:

(2.13)

This may be seen as a reasonable assumption, but it implies that no noise (e.g. measurement error) is admitted in the DGP. In these models, distance to the frontier is interpreted as pure inefficiency. Alternatively, stochastic frontier models permit some noise in the DGP, so some observations may lie outside Ψ.2 This is appealing for its flexibility, but the result is that distance to the frontier has two components, noise and inefficiency, and hence, identification becomes problematic, requiring additional assumptions that may reduce flexibility.

Another classification of the approaches concerns the chosen level of modelling: either parametric models or non-parametric models may be used. Parametric models are rather restrictive because they rely on a particular functional form both for the frontier and for the two elements of the stochastic parts of the model (i.e. the distribution of the inefficiency and, for stochastic frontier models, the distribution of the noise). Parametric, deterministic models are discussed by Aigner & Chu (1968) and Greene (1980); for parametric, stochastic models, see Aigner et al., Battese & Corra 1977, Meeusen & van den Broeck 1977, Olson et al., 1980 and Jondrow et al., 1982.

To illustrate, consider the case of one output Y and a vector of inputs X. One of the simplest specification for the production frontier function may be

(2.14)

where is random noise and represents inefficiency (Ui≥0; in a deterministic model, Vi would not appear), with Vi⟂⟂Uii = 1,…,n. If the variables are in the log scale, (2.14) gives the familiar Cobb–Douglas production function. Clearly, this model rests on some very specific hypotheses. Maximum likelihood or modified ordinary least squares (OLS) can be used to estimate the parameters, but the inefficiency, Ui, is not identified. Once parameters have been estimated, only the convoluted residual is observed, presenting a deconvolution problem to estimate the inefficiency and the noise parts of . Jondrow et al., 1982 suggest estimating individual efficiency by an estimate of . Various numerical problems arise in the estimation of these models, and inference about inefficiency presents additional problems. Simar & Wilson 2010 suggest bootstrap and bagging methods for making inference on the parameters and on the individual efficiencies in such models.

The parametric routes in this guided tour are not pursued; the interested reader can refer to Kumbhakar & Lovell 2000 and Greene 2008. Hereafter, the focus is on non-parametric frontier models, which share the very attractive property of relying only on mild assumptions suggested by economic theory, such as those in (2.3), (2.4) and in some cases (2.5). In addition, except for some mild regularity conditions, no parametric restrictions will be imposed on the distribution of (X,Y) on Ψ. To date, mainly deterministic frontier models have been developed in these non-parametric approaches, because the identification problem that arises when noise is added becomes much more difficult to handle in a non-parametric framework. Later sections will describe how non-parametric deterministic models can be modified to introduce some noise and to be resistant to outliers or extreme data points. In addition, Section 7.2 briefly discusses some new lines of research aimed at allowing for noise in non-parametric frontier models.

## 3 Non-parametric Envelopment Estimators

Historically, Farrell 1957 is the pioneering, first empirical work to estimate an attainable set enveloping the cloud of data points and the resulting efficiency scores. This has been popularised by Charnes et al., 1978 and Banker et al., 1984 using linear programming techniques. These works rely on the convexity assumption (2.5) for Ψ and various returns-to-scale assumptions. Estimates without imposing convexity on Ψ came later in the works of Afriat 1972 and Deprins et al., 1984. For simplification, the following presentation starts with the latter. For many years, estimators of efficiency revolved around the Farrell–Debreu radial measures. As seen in the succeeding discussion, these have more recently been extended to hyperbolic and directional distances. Introductory textbooks on these topics include the works of Thanassoulis 2001 and Cooper et al., 2011. Fried et al., 2008 present a more advanced, comprehensive picture of the topic, including parametric approaches.

### 3.1 Free Disposal Hull Estimators

The free disposal hull (FDH) estimator, relying only on the free disposability assumption (2.3), was proposed by Deprins et al., 1984. The FDH estimator of Deprins et al., 1984 is able to handle full multivariate inputs and outputs and can be used to estimate distance functions in any direction.3 The idea is very simple; an estimator

(3.1)

of Ψ is defined by considering the union of all the orthants (positive in x and negative in y) having their vertex at the observed data points. FDH estimators of efficiency are obtained by plugging the estimator (3.1) into the definitions (2.6)(2.9). For example, in the case of a particular production plan (x,y), substituting for Ψ in (2.6) yields

(3.2)

Estimators , and are obtained by similar substitutions. Note that the notation in (3.1) makes it clear that this estimator depends on the random sample .

Free disposal hull estimators are fast and easy to compute using only sorting algorithms. A point is said to dominate another point (x,y)∈Ψ if and . Let Dx,y denote the set indices of points in dominating (x,y); that is, . Then

(3.3)

where for a vector a,aj denotes its j-th component. The estimator is computed similarly; see Simar & Wilson 2013 for details. Wilson 2011 gives an algorithm for computing , and Simar & Vanhems 2012 provide a simple way for computing when all elements of the direction vectors are strictly positive; if this is not the case, algorithms given by Daraio & Simar 2014 can be used.

Jeong & Simar 2006 propose a linearised version of the FDH estimator (LFDH), denoted by , to avoid the unpleasant step-function nature of the FDH estimator. The idea of the LFDH estimator was initiated in the bivariate case (p = q = 1) by Hall & Park 2002, where neighbouring FDH efficient points are linked by straight lines. In multivariate set-ups, investigated by Jeong & Simar 2006, the neighbouring vertices of the FDH solution are interpolated by a hyperplane, so that . The vertices to be interpolated are identified using Delaunay triangulation methods (or tessellation; e.g. Brown, 1979), and then linear programming methods identify the supporting hyperplane. Hall and Park show, in the bivariate case, that both the asymptotic bias and variance of the LFDH estimator are reduced by a factor of approximately 1/3 when compared with its FDH counterpart. Jeong and Simar derive the asymptotic properties of the LFDH estimator in the more general case, and their Monte Carlo simulations confirm that both bias and variance of the LFDH estimator are better when compared with those of the FDH estimator. Hall and Park suggest also a procedure to reduce the remaining bias of the LFDH in the bivariate case without impacting the variance, but so far, this bias correction approach has not been extended to more general set-ups.

### 3.2 Data Envelopment Analysis Estimators

The data envelopment analysis (DEA) estimator introduced by Farrell 1957 and popularised by Charnes et al., 1978 is based on the additional assumption that Ψ is convex and allows for VRS. Farrell's estimator of Ψ is the convex hull of the FDH estimator given by

(3.4)

The multipliers γi and the constraint serve to ‘convexify’ the FDH of the sample observations (Xi,Yi).

Varying-returns-to-scale DEA estimators of the efficiency measures defined in (2.6)(2.9) are obtained by substituting for Ψ in the definitions; in the case of the input-oriented measure θ(x,y), the resulting VRS-DEA estimator is given by the linear program

(3.5)

The VRS-DEA estimator of λ(x,y) is given by the linear program

(3.6)

The VRS-DEA estimator of δ(x,ydx,dy) can also be computed as the solution to a linear program; see Simar et al., (2012) for details. Wilson 2011 gives a numerical algorithm for computing the hyperbolic estimator , which cannot be written as a linear program.

If Ψ is globally CRS, then Ψ can be estimated by the convex cone of obtained by dropping the constraint in (3.4); denote this estimator by . Substituting for Ψ in (2.6)(2.9) yields CRS-DEA estimators of the efficiency measures defined there. The computation of the corresponding CRS-DEA estimators is similar to the computation of their VRS-DEA counterparts; the input-oriented and output-oriented estimators and are computed by dropping the constraint in (3.5)(3.6). The CRS-DEA directional estimator is computed by dropping the same constraint from the linear program that appears in Simar et al., (2012), and can be computed as either or .

The various DEA estimators that have been discussed can be adapted to handle cases where Ψ exhibits either IRS and CRS, but not DRS, or CRS and DRS, but not IRS. Details are given in Simar & Wilson 2002, 2013 and Banker et al., 2004. For purposes of this review, only the VRS-DEA and CRS-DEA estimators are needed.

Software for computing the non-parametric estimators—both FDH and DEA—is available in the FEAR library described by Wilson 2008 for use with R. The software is freely available for academic use as described in the licence that accompanies FEAR.

Both DEA and FDH estimators are widely used in empirical studies. In many cases, estimation of efficiency is the end goal, but in a few cases, efficiency estimates have been used as a proxy for management quality in another model of interest. For example, Wheelock & Wilson (1995b, 2000) use VRS-DEA estimates of banks' technical efficiency in duration models to explain time to insolvency. Wheelock & Wilson (2000) use the same estimates to also examine time to acquisition by another bank.

## 4 Statistical Inference Using Data Envelopment Analysis/Free DisposalHull Estimators

The non-parametric envelopment estimators described in Section 3 have been used in several thousands of papers without acknowledging the fact that the resulting DEA/FDH efficiency scores computed in these papers are in fact estimators.4 For a given sample , the various FDH and DEA efficiency estimators discussed earlier provide only point estimates of unknown quantities. It seems that until recently, most researchers using these estimators were unaware, and unconcerned, by the statistical properties of these estimators.

Until recently, much of the DEA/FDH literature was concentrated in the field of operations research; DEA/FDH techniques were considered as non-statistical or non-econometric by those working in statistics or econometrics, who focused primarily on parametric approaches to efficiency analysis. At the conference sponsored by the US National Science Foundation on parametric and non-parametric approaches to frontier analysis, held at the University of North Carolina in Chapel Hill in October 1988 and attended by one of the authors of this paper, researchers from the statistics/econometrics camp and the operations research camp stopped short of coming to blows, but the discussions were sometimes heated, acrimonious and occasionally hostile. By now, however, this has changed; the ‘two worlds’ have been largely unified by recent theoretical results. After the original empirical work of Farrell 1957, Afriat 1972 and Deprins et al., 1984, Banker 1993, Korostelev et al., 1995, 1995 and Simar 1992, 1996 were the first to consider the FDH/DEA procedures from a statistical viewpoint (including presentation of a well-defined statistical model). This section summarises most of the results available today. To streamline the discussion, only results for the radial, input-oriented Farrell–Debreu efficiency measures are presented. These results extend trivially to the output orientation with some (perhaps tedious) changes in notation. Extension to the hyperbolic and directional measures is given by Wilson 2011, Simar & Vanhems 2012 and Simar et al., (2012). A comprehensive and informative survey covering all these cases can be found in Simar & Wilson (2013).

### 4.1 Asymptotic Properties

As noted earlier, the notation introduced in Section 3 makes it clear that and are functions of the random sample . Consequently, all of the efficiency estimators discussed so far necessarily measure efficient relative to the boundary of an estimate of the attainable set.

#### 4.1.1 Consistency

The first results concern minimal, but essential properties: statistical consistency and achieved rates of convergence. For a long time, the only available results were for cases where either inputs or outputs were unidimensional. For p = 1 and q≥1, Banker 1993 establishes the consistency of for convex sets Ψ but provides no information on the rate of convergence. The first systematic analysis of convergence of envelopment estimators appears in Korostelev et al., 1995, 1995.5

For the case p = 1 and q≥1, Korostelev et al., 1995 prove that under the free disposability assumption and the hypothesis that the joint density of (X,Y) on Ψ is uniform, the FDH estimator of Ψ is the maximum likelihood estimator, is consistent and achieves the optimal rate

(4.1)

where dH(·,·) denotes the Hausdorff metric between the two sets. Korostelev et al. also describe a ‘blown-up’ version of the FDH estimator reaching asymptotically the optimal mini-max risk.

Korostelev et al., 1995 relax the uniform distribution and consider both FDH and DEA estimators, again with p = 1 and q≥1. Here, the risk of the estimators are defined in terms of , the Lebesgue measure of the symmetric difference between sets. Under free disposability assumption (but not convexity of Ψ), Korostelev et al., 1995 obtain

(4.2)

where denotes the Lebesgue measure of the symmetric difference between two sets. Adding the assumption of convexity of Ψ, they obtain

(4.3)

Korostelev et al., 1995 show also that under their respective frameworks, both the FDH and DEA estimators converge with the best possible rates in the class of monotone boundaries for FDH and the class of monotone and concave boundaries for DEA.

These results also reveal for the first time that the non-parametric envelopment estimators suffer from the ‘curse of dimensionality’ shared by most non-parametric techniques. This was new, although not a surprise in the statistical world; FDH and DEA are consistent and share the best possible rates in their respective class, but ever more data are needed as the dimensionality of the problem increases.

A bit later, Kneip et al., 1998 (for the DEA case) and Park et al., 2000 (for the FDH case) derive the rates of convergence for the efficiency estimators in the more general setting of multivariate inputs and outputs. For the FDH estimators, only the free disposability assumption is needed; for the DEA estimators, the convexity assumption is required. Under some regularity assumptions (e.g. smoothness of the frontier, continuity of the density of (X,Y) near the boundary and strictly positive f(x,y) on the frontier), they obtain for any fixed point (x,y)∈Ψ

(4.4)

where ‘∙’ represents either FDH or VRS and κ = 1/(p + q) for the FDH case (only under the free disposability assumption) and κ = 2/(p + q + 1) for the VRS-DEA case (adding the convexity assumption). Adding the assumption that Ψ is globally CRS, Park et al., 2010 prove that the rate of the CRS-DEA estimator improves to κ = 2/(p + q). Interestingly, in this case, whenever p + q≤4, the rate of the non-parametric estimator is faster than the usual root-n parametric rate, but only under globally CRS. Even more recently, Kneip et al., 2013 prove that the VRS-DEA estimator also achieves the rate with κ = 2/(p + q) when Ψ is globally CRS.

The rate in (4.4) for the input-oriented Farrell–Debreu radial score also holds for the output-oriented case after straightforward changes in notation. Wilson 2011 establishes the same rates for the hyperbolic case, and Simar & Vanhems 2012 and Simar et al., (2012) establish the same rates for the directional case.

#### 4.1.2 Asymptotic law

Statistical consistency is a fundamental, essential property of any estimator, but for inference, sampling distributions or their approximations are needed. The FDH case is easier because it is linked to the estimation of a maximum or a minimum of the support of some appropriate random variable. The first result is due to Park et al., 2000, who show that under mild regularity conditions,

(4.5)

Park et al. describe the parameter μ of the limiting Weibull and show that μp + q is the probability of observing a firm dominating the point (x+ζ,y) for small ζ, where (x,y) is the reference frontier point of (x,y) in the input direction such that x=θ(x,y)xx. Park et al. suggest a consistent estimator of μ and a way of selecting the ‘smoothing’ parameter ζ in simulated samples, but in practice, this parameter is difficult to implement. Hence, bootstrap techniques are an attractive alternative. Bădin & Simar 2009 propose a simple way to correct the inherent bias of the FDH estimator in finite samples.

In the particular case of one input in the input orientation (or of one output in the output orientation), Daouia et al., 2010 use results from extreme value theory (EVT) to extend the earlier result in cases where the density of (X,Y) smoothly tends to zero as, for example, in the univariate, input orientation, xx. The result is similar, but as expected, the rate of convergence deteriorates by the speed (the number of derivatives converging to zero) with which this density approaches zero at the frontier point. Daouia et al., 2010 also examine the case where the density of (X,Y) tends to infinity when approaching the frontier, with the reverse effect (i.e. improving the rate of convergence). Recently, Daouia et al., 2014 generalise these results from EVT to the full multivariate setting by considering the behaviour of the joint density of X and Y near the boundary along the appropriate ray (e.g. x for the radially oriented measure) and obtain similar results.

The DEA cases are much more difficult to analyse, because the efficient frontier is determined by a facet of a convex polyhedron (for the VRS case) or of a convex cone (for the CRS case). Results from EVT cannot be directly applied. Gijbels et al., 1999 is the first work that derived the asymptotic distribution of the DEA estimator, although they worked on the simplest case where p = q = 1. Nonetheless, this was the first successful attempt at the difficult problem of deriving the limiting distribution of the DEA estimator which had been left unsolved for more than 40years since Farrell 1957. Later, Jeong & Park 2006 generalised the work of Gijbels et al. to the case where p = 1 and q > 1 in the input-oriented case. The final extension to the full multivariate framework is due to Kneip et al., 2008 for the VRS-DEA case and to Park et al., 2010 for the CRS-DEA case. As usual, for the VRS-DEA case, both free disposability and convexity of Ψ are needed; for the CRS-DEA case, the additional assumptions of global CRS are required. Under analogue mild regularity conditions (in particular, strict positivity of the joint density on the frontier),

(4.6)

where ‘∙’ now denotes either VRS or CRS and where Q(·) is some regular, non-degenerate distribution with unknown parameters η depending on unknown quantities characterising the DGP. Kneip et al., 2008 analyse the VRS case, with κ = 2/(p + q + 1), and Park et al., 2010 examine the CRS case, with the faster rate κ = 2/(p + q).

There is no explicit closed form for the limiting distribution in the general multivariate case. Jeong 2004 and Jeong & Park 2006 derive the limiting distribution QVRS in (4.6) in the univariate input case, and the distribution can be simulated with an estimate of the parameter η. As noted earlier in the particular case of one input and one output, Gijbels et al., 1999 provide a closed expression of the asymptotic distribution of the DEA estimator in the VRS case. Park et al., 2010 show that in the same univariate case, the CRS estimator has a simple exponential distribution, and they provide a way to simulate the distribution of the multivariate CRS-DEA case by using some smoothing parameters. However, this is difficult to implement in practical applications. See Jeong & Park 2011 for a theoretical survey. Clearly, for the DEA estimators, bootstrap techniques are very useful for making inference.

Interestingly, Kuosmanen 2008 finds that for the univariate output case, the DEA estimator can be obtained by solving a convex non-parametric least-squares problem that as an equivalent representation in terms of a quadratic programming problem, where the objective is quadratic but with linear constraints. Other numerical methods for computing DEA estimators may be possible; more research is needed.

Note that other approaches (e.g. Hall et al., 1998) use estimators of boundaries without imposing the natural economic constraints (like monotonicity or concavity). Consequently, these are less popular than DEA/FDH estimators in the field of productivity and efficiency analysis.

### 4.2 Bootstrap Techniques

As noted earlier, there are several difficulties associated with the practical use of the asymptotic results described in Section 4.1 for making inference. So far, bootstrap methods seem to be the only viable alternative for making inference on θ(x,y).

The first suggestion of using bootstrap methods in the context of production and efficiency analysis appears in Simar 1992 in a panel-data setting, but without any theoretical justification. The first theoretical results for using bootstrap in a frontier set-up appear in Hall et al., 1995, where consistency of the bootstrap is established for the particular case of a semiparametric panel model; a double bootstrap is recommended to improve the performance of approximation.

The first study suggesting bootstrap techniques for assessing the sampling variability of the VRS-DEA efficiency estimator in a fully non-parametric frontier model is Simar & Wilson 1998. The procedure is rather simple: the idea is to generate a bootstrap sample from in an appropriate way. Then, efficiency for any point (x,y) of interest is evaluated relative to a bootstrap estimate to obtain the corresponding bootstrap value of the efficiency score, say for the (input) radial distance to the boundary of . If the bootstrap is consistent, then as n,

(4.7)

where Monte Carlo replications of the left-hand side can be used to approximate the unknown right-hand side.

The main problem is how , that is, the bootstrap sample of size n, should be generated so that (4.7) holds. It is well known from the statistical literature (e.g. Bickel & Freedman, 1981) that the naive bootstrap (i.e. resampling with replacement from the pairs (Xi,Yi) in ) is not consistent owing to the unknown boundary of Ψ, which is the support of (X,Y) (e.g. Simar & Wilson, 2011a) for a pedagogical explanation of the problem). This fact was not recognised by some in the frontier literature as indicated by the debate in Simar & Wilson , 1999c.

Rather than using the inconsistent naive bootstrap, Simar & Wilson 1998 propose using a smooth bootstrap. In this early study, Simar and Wilson implement the smoothed bootstrap in a simple model under the assumption that the distribution of the inefficiencies along the chosen direction (input rays or output rays) is homogeneous in the input–output space. Hence, the smoothing operates only on the estimation of the univariate density of the efficiencies, making the problem much easier to handle. Simar & Wilson 2000 extend this idea to a more general heterogeneous case where the distribution of efficiency is allowed to vary over Ψ. This requires more complication than the original procedure and involves the estimation of a smoothed density of (X,Y) with unknown support in a (p + q)-dimensional space. No theoretical justification was given for either approach, but results from intensive Monte Carlo experiments described in both papers suggest that these bootstrap procedures give reasonable approximations for correcting the bias of the efficiency estimates and for building individual confidence intervals for the efficiency of any fixed point (x,y).

The full theory on the asymptotic properties of the VRS-DEA estimator and of the bootstrap is established in Kneip et al., 2008. Here, two bootstrap techniques are proven to be consistent: (i) a double-smooth bootstrap where in addition to smoothing the empirical distribution of the data, the support of Ψ is estimated by a smoothed version of the VRS-DEA estimator; and (ii) a subsampling bootstrap.

The double-smooth bootstrap developed by Kneip et al., 2008 involves numerical difficulties, making it difficult to implement and computationally demanding. Kneip et al., 2011 provide a simplified, consistent and computationally efficient version of the double-smooth bootstrap. The idea is rather simple. It is well known that the naive bootstrap does not work, but the problem is localised to points near the boundary. The idea behind the simplified Kneip et al., 2011 method is to draw naively among observations that are ‘far’ from the frontier and draw the remaining points from a uniform distribution with support ‘near’ the frontier. This neighbourhood of the frontier is tuned by a smoothing parameter that can be selected by a simple ‘rule of thumb’. For obtaining consistency, the VRS-DEA frontier estimate must be smoothed, and here, a second bandwidth parameter is selected by cross-validation methods.

The subsampling approach is much more simple to implement, as a bootstrap sample , where m = nγ for some γ∈(0,1), is obtained by drawing with (or without) replacement m pairs (Xi,Yi) from the original sample . Kneip et al., 2008 prove the consistency of subsampling but do not provide suggestions for how a value for m might be selected in practice. Their simulation results indicate that performance of the subsampling bootstrap in terms of achieved coverages of estimated confidence intervals is quite sensitive to the choice of m.

Unless the convexity assumption is imposed on Ψ, the FDH or LFDH estimators must be used, as the DEA estimators are inconsistent without convexity of Ψ. The limiting Weibull distribution is not easy to use because it contains an unknown parameter that is not easy to estimate in practice. Jeong & Simar 2006 prove that subsampling provides a consistent approximation of the sampling distribution of FDH and LFDH estimators, but here again, no practical advice is offered on how to select an appropriate subsample size. Note that Hall & Park 2002 propose a ‘translation bootstrap’ as a simple way for estimating the bias of the LFDH estimator. This was carried out and justified for the simple univariate case with p = q = 1 and could be applied to other estimators. So far, this approach has not been extended to more general multivariate frameworks.

Simar & Wilson 2011a, using results from Politis et al., 2001 and Bickel & Sakov 2008, provide a data-based algorithm for selecting an appropriate value of the subsample size m, for both the FDH and DEA cases. The idea is to compute the object of interest (e.g. bounds of a confidence interval or bias estimate) for various values of m on some selected grid. Then the value of m where the results show the smallest volatility is selected. This volatility can be computed for each value of m in the grid by computing, for example, the standard deviation between the three or five values found for the adjacent values of m. Simar & Wilson 2011a investigate the performance of their method (in terms of achieved coverages of individual confidence intervals for efficiency scores) by intensive Monte Carlo experiments, for both FDH and DEA estimators. The results indicate that the method works well for moderate sample sizes similar to those faced in practice, providing reasonable approximations of the sampling distribution of the estimators.6

### 4.3 A Numerical Example

In order to provide an illustration of DEA and FDH estimators, with corresponding confidence intervals estimated by subsampling as described by Simar & Wilson 2011a, a sample of 1000 pseudo-random observations was generated using the following method. Let p = 2 and q = 2. To generate a single observation, let be a (p + q)-tuple uniformly distributed on a unit sphere centred at the origin in , where up and uq are vectors of length p and q, respectively. Then set x = (1−|up|) and y=|uq|λ−1, where λ= exp(0.5|η|) is a random variable with ηN(0,1). The (p + q)-tuples u are generated using the genxy.sphere() function provided in the FEAR (version 2.0) library described by Wilson 2008.

Two sets of DEA and FDH estimates of efficiency for the first 15 observations in the simulated sample are shown in Table 1. In one case, only the first 100 observations were used to estimate efficiency for the first 15 observations; in the other case, all 1000 observations were used to estimate efficiency for the first 15 observations. All estimates were computed using either the input-oriented VRS-DEA estimator or the input-oriented FDH estimator . Also shown in Table 1, corresponding to each efficiency estimate, are 95% confidence intervals estimated by subsampling (unbalanced and without replacement) along the lines described by Simar & Wilson 2011a.

Table 1. Input-oriented DEA and FDH estimates for a simulated sample with p = 2,q = 2.
DEAFDH
n = 100n = 1000n = 100n = 1000
iθ(x,y)
1. a

DEA, data envelopment analysis; FDH, free disposal hull; VRS, varying returns to scale.

10.60840.73140.65530.97210.8288
(0.4879,0.6971)(0.6058,0.6518)(0.6118,1.0000)(0.6581,0.8288)
20.84891.00000.92681.00000.9561
(0.7739,1.0000)(0.6906,0.9253)(0.7362,1.0000)(0.6813,1.0000)
30.47420.66080.50381.00000.7420
(0.4654,0.6438)(0.4411,0.4994)(0.6619,1.0000)(0.5592,0.7420)
40.66001.00000.67631.00000.9566
(1.0000,1.0000)(0.4546,0.6680)(1.0000,1.0000)(0.8246,1.0000)
50.82141.00000.97451.00001.0000
(0.6694,1.0000)(0.9154,0.9725)(0.4786,1.0000)(0.7998,1.0000)
60.92511.00001.00001.00001.0000
(0.6345,1.0000)(0.8546,1.0000)(0.7124,1.0000)(0.3635,1.0000)
70.82801.00000.90301.00001.0000
(0.9612,1.0000)(0.8010,0.8989)(0.9351,1.0000)(0.6273,1.0000)
80.99131.00001.00001.00001.0000
(0.4498,1.0000)(0.7521,1.0000)(0.3105,1.0000)(0.5465,1.0000)
90.68870.89690.76891.00000.8986
(0.8580,0.8669)(0.6710,0.7646)(0.9807,1.0000)(0.5566,1.0000)
100.73941.00000.82131.00001.0000
(1.0000,1.0000)(0.6094,0.8030)(1.0000,1.0000)(0.3332,1.0000)
110.87021.00000.95441.00001.0000
(0.4420,1.0000)(0.8573,0.9464)(0.2566,1.0000)(0.7151,1.0000)
120.51330.66870.54770.95670.7520
(0.4582,0.6447)(0.4919,0.5467)(0.8171,1.0000)(0.5534,0.7520)
130.94951.00001.00001.00001.0000
(0.4018,1.0000)(0.7850,1.0000)(0.2685,1.0000)(0.5378,1.0000)
140.76180.97110.82521.00001.0000
(0.7423,0.9647)(0.7550,0.8143)(0.4945,1.0000)(0.7589,1.0000)
150.68340.84520.72981.00000.8864
(0.6483,0.8360)(0.6652,0.7259)(0.5177,1.0000)(0.7310,0.8864)

The first column in Table 1 gives the observation number, and the second column gives the ‘true’ value of the simulated (output) inefficiency. The next two columns contain the VRS-DEA efficiency estimates obtained using either 100 or 1000 observations, and the last two columns contain the corresponding FDH efficiency estimates. The results clearly indicate that as the sample size is increased from 100 to 1000, the estimated values tend to move closer to the ‘true’ values. In addition, the estimated confidence intervals become more narrow, reflecting the increase in information when the sample size is increased. In a number of cases, particularly when the VRS-DEA estimator is used, the estimated confidence intervals lie entirely to the left of the corresponding efficiency estimate. This is to be expected, given the bias of both the VRS-DEA and FDH efficiency estimators. It is also apparent from the results in the table that the estimates are biased upward. The bias becomes smaller when the sample size increases from 100 to 1000 and is larger for the FDH estimator than for the VRS-DEA estimator.

## 5 Robust Versions of Envelopment Estimators

### 5.1 Probabilistic Formulation of the Production Process

In an innovative paper, Cazals et al., 2002 focus on the probabilistic structure of the DGP. Doing so permits the efficiency measures defined in (2.6)(2.9) to be reformulated and suggests alternative features that may be estimated. Clearly, the DGP is completely characterised by the bounded joint density of (X,Y) or any one-to-one transformation of it. It is convenient to characterise the joint probability law of (X,Y) by

(5.1)

which is the probability of observing a firm dominating the production plan (x,y). The attainable set Ψ is the support of HXY. This joint distribution can be decomposed by writing

(5.2)

while noting that the conditional distribution function (DF) FXY(xy) is non-standard because the conditioning on Y is not Y = y or Yy, but instead Yy.

Remarkably, as shown by Daraio & Simar 2007b by extending the work of Cazals et al., 2002 to multivariate settings, under the free disposability assumption, the Farrell–Debreu input-oriented efficiency measure in (2.6) can be defined equivalently as

(5.3)

An obvious non-parametric estimator of θ(x,y) is obtained by replacing FXY(θxy) in (5.3) with its empirical analogue

(5.4)

Then for any y in the support of Y, simple manipulations reveal that

(5.5)

which can be shown to be equal to the expression given in (3.3) for the FDH estimator. This gives an additional natural motivation for the FDH estimators. Similar results are obtained for the output orientation by writing HXY(x,y) = SYX(yx)FX(x), where the conditional survival function SYX(yx) is also non-standard because the conditioning event is Xx instead of X = x or X > x.

This nice, simple idea opens the door to a number of other developments such as the robust versions of envelopment estimators described later as well as the conditional measures of efficiency introduced in Section 6. Wheelock & Wilson 2008 and Wilson 2011 extend this approach to hyperbolic measures, and Simar & Vanhems 2012 extend the approach to directional measures.

### 5.2 Order-m Partial Frontiers

Both the FDH and DEA estimators fully envelop the sample observations in . Consequently, FDH and DEA estimators are very sensitive to outliers or extreme data points. Several methods exist (e.g. Wilson, 1993, 1995; Simar, 2003; Porembski et al., 2005) for detecting outliers in this setting, but determining what constitutes an ‘outlier’ necessarily involves some subjectivity on the part of the researcher.

Alternatively, Cazals et al., 2002 introduce a concept of a ‘partial’ frontier (as opposed to the ‘full’ frontier Ψ) that provides a less-extreme benchmark than the support of the random variable (X,Y) and has its own economic interpretation. The concept is presented here in the input orientation, but extension to the output, hyperbolic and directional cases is straightforward.

To begin, consider a single input (or cost) x. So here, the full frontier can be represented by a function ϕ(y)= inf{xFXY(xy) > 0}, where the conditional DF is defined in (5.2) with conditioning on Yy. The order-m frontier for an integer m≥1 is defined by , where X1,…,Xm are independent and identically distributed draws from FXY(·∣y). This provides a less-extreme benchmark than the full frontier.7 As explained by Cazals et al., 2002, the order-m frontier can then be computed as

(5.6)

So, the benchmark for a unit (x,y) producing a level y of outputs is the expected minimum input level among m firms drawn at random from the population of firms producing at least output level y. For finite m, this is clearly less extreme than the full frontier. Cazals et al., 2002 show that ϕm(y)→ϕ(y) as m. The order-m efficiency score can be defined as θm(x,y) = ϕm(y)/x. Note that for finite m,θm(x,y) is not bounded above by 1, in contrast to θ(x,y) defined in (2.6).

Cazals et al., 2002 extend the order-m efficiency score to multivariate settings as follows. Consider m random draws of random variables Xi,i = 1,…,m, generated by FXY(·∣y) and define the random set . Then the Farrell–Debreu input-oriented efficiency score of (x,y) with respect to the attainable set Ψm(y) is given by

(5.7)

Because Ψm(y) is random, is a random variable. Cazals et al. define the order-m input efficiency score as the expectation of this random variable, that is,

(5.8)

This can be easily computed by a simple Monte Carlo method.

Extension to the output direction is simple; see Cazals et al., 2002 for details. Extension to hyperbolic and directional distances is somewhat more complicated because of the nature of the order-m concept in the multivariate framework and requires some additional work. Results are given by Wilson 2011 for the hyperbolic case and by Simar & Vanhems 2012 for directional cases.

A non-parametric estimator of the order-m input efficiency score is obtained by plugging the empirical distribution into (5.8) to replace the unknown FXY(xy). Cazals et al. derive a remarkable property for the resulting estimator, that is,

(5.9)

where an explicit expression for is given by Cazals et al. The root-n rate of convergence is rather unusual in non-parametric settings; the asymptotic normality facilitates easy construction of confidence intervals. Of course, similar properties hold in the output orientation. In addition, all the properties of the order-m radial distances and their estimators have been extended to hyperbolic and directional distances by Wilson 2011 and Simar & Vanhems 2012, respectively.

### 5.3 Order-α Quantile Frontiers

An alternative partial frontier concept for defining a less-extreme benchmark than the full frontier is related to the concept of conditional quantiles, although different from the usual conditional quantile. Aragon et al., 2005 introduce the idea for the case of a univariate input (for an input-oriented measure) or a univariate output (for the output orientation) by using quantiles of a non-standard, univariate DF. These ideas are extended to the full multivariate setting by Daouia & Simar 2007, who derive quantiles along the radial distances.

Working in the input direction, the central idea is to benchmark the unit operating at (x,y) against the input level not exceeded by (1 − α) × 100% of firms among the population of units producing at least output level y. The resulting efficiency measure is defined by

(5.10)

where it is important to recall that the conditioning is on Yy.8 The quantity θα(x,y) is called the ‘input efficiency at level α × 100%’. If θα(x,y) = 1, then the unit operating at (x,y) is said to be input efficient at the level α × 100% because it is dominated by firms producing at least the level of output y with probability 1 − α. Similar to the order-m measure, it is clear that θα(x,y)→θ(x,y) as α→1; that is, the full frontier efficiency measure is recovered as α→1.

A non-parametric estimator of θα(x,y) is obtained by using the empirical DF to replace FXY,n(θxy) in (5.10); a simple computational algorithm is provided by Daouia & Simar 2007, where the corresponding output-oriented measure and its estimator are also presented. Wheelock & Wilson 2008 extend the order-α concept to hyperbolic measures and provide a fast numerical algorithm for computing estimates. Simar & Vanhems 2012 extend the method to directional distances.

The properties of the non-parametric order-α estimators are similar to those of the order-m estimators; for example, in the input orientation,

(5.11)

where again an explicit expression is given by Daouia & Simar 2007 for . Similar results hold in the output, hyperbolic and directional cases.9

Order-α estimators have been used by Wheelock and Wilson 2004, 2008 to examine the efficiency of check-processing operations by the US Federal Reserve System, by Wheelock & Wilson 2009 to examine efficiency and productivity within the US commercial banking industry and by Wheelock & Wilson 2013 to examine changes in efficiency and operating cost among US credit unions.

### 5.4 A Numerical Example

The order-m and order-α estimators can be illustrated using the DGP defined by Wilson (2011, pp. 126–127, Example 6.2.1), where p = q = 1 and the joint density of (X,Y) is uniform over the north-west quarter of a circle centred at (1,0) with a radius of 1. Wilson 2011 gives expressions for the marginal DFs FX(x) and FY(y), the conditional DF FXY(xYy) and the joint DF HXY(x,y).

Table 2 shows the true values θm(x,y) and θα(x,y) for the first 15 observations in a sample of 1000 observations drawn from the DGP described earlier. The true values are computed numerically using (5.8) and (5.10) and the expression for FXY(xy) appearing in Wilson (2011, Equation 6.3.9). In addition, Table 2 shows two sets of estimates—order-m, with m = 75, and order-α, with α = 0.95—for the first 15 observations. As in Section 4.3, each estimator was applied twice, once using only the first 100 observations for estimation and then using all 1000 observations for estimation. For each estimate, the corresponding estimated (95%) confidence intervals are shown; these were estimated using a naive bootstrap.

Table 2. Input-oriented order-m and order-α estimates for a simulated sample with p = q = 1,m = 75,α = 0.95.
Order-m Order-α
n = 100n = 1000 n = 100n = 1000
iθm(x,y)θα(x,y)
10.34030.31470.36830.43870.38590.4673
(0.3094,0.4899)(0.3327,0.4204) (0.3094,0.5022)(0.4341,0.4973)
20.51900.57210.56740.66220.64310.7085
(0.5677,0.7281)(0.5206,0.6277) (0.5677,0.7607)(0.6386,0.7503)
30.51350.52390.48930.57980.52390.5944
(0.5239,0.7366)(0.4696,0.5636) (0.5239,0.7366)(0.5082,0.6352)
40.84860.78350.81130.96330.78330.9790
(0.7833,1.0166)(0.7867,0.9299) (0.7833,1.0166)(0.8621,1.0439)
50.56760.64310.63240.72920.72060.7703
(0.6361,0.8160)(0.5849,0.7027) (0.6361,0.8524)(0.7205,0.8278)
60.10850.13150.05130.28480.23600.2782
(0.1202,0.2502)(0.0324,0.0982) (0.1374,0.4800)(0.2250,0.3136)
70.31110.37070.32690.44280.46860.4810
(0.3622,0.4977)(0.3014,0.3709) (0.3621,0.6153)(0.4137,0.5458)
80.95550.98170.64752.25002.65982.4761
(0.8499,2.2447)(0.4510,1.1097) (1.0000,3.6128)(2.0702,2.8189)
90.32130.49290.31820.49960.62000.5473
(0.4795,0.6622)(0.2711,0.4038) (0.4790,0.8140)(0.4790,0.6338)
100.50900.72990.49260.82170.89870.8944
(0.6923,1.0245)(0.4311,0.6011) (0.6907,1.2762)(0.7344,1.0604)
112.07501.27411.80483.69404.32123.7940
(1.0162,4.4313)(1.4794,2.4401) (3.5681,5.7897)(3.3349,4.5766)
120.28480.30670.29980.39330.32510.4186
(0.3026,0.4145)(0.2769,0.3375) (0.3026,0.5194)(0.3836,0.4593)
130.78441.05580.75731.23601.30121.3106
(1.0022,1.6013)(0.6333,0.9481) (1.0000,2.0416)(1.1335,1.5674)
140.13090.14540.08910.30230.41830.3347
(0.1138,0.4055)(0.0588,0.1498) (0.2580,0.5112)(0.2797,0.3760)
150.29770.48560.26400.50980.60790.5146
(0.4682,0.6210)(0.2225,0.3328) (0.4672,0.8589)(0.4472,0.6057)

Overall, the results in Table 2 confirm that both estimators tend to converge to the true values as sample size increases. The width of the confidence interval estimates also tends to become more narrow with increasing sample size. In contrast to the results in Section 4.3, the estimated confidence intervals in Table 2 typically include the point estimate, as neither the order-m nor order-α estimators are biased. In a few cases, the estimated confidence intervals shown in Table 2 do not cover the corresponding true value, but this is to be expected given that the sample is of finite size and the (asymptotic) rate of type I errors is 5%.

### 5.5 Further Extensions

Cazals et al. (2002, Theorem 2.4) and Daouia and Simar (2007, Proposition 2.5) show that the order-m and order-α partial efficiency scores are monotonic under the assumption of tail monotonicity of the implied non-standard, conditional DF. In other words, under tail monotonicity, θm(x,y) and θα(x,y) are monotone and non-decreasing in y, and similarly, the corresponding output measures λm(x,y) and λα(x,y) are monotone and non-decreasing in x. In the input-oriented case, tail monotonicity amounts to assuming that for all yy,FXY(xy)≤FXY(xy). This is not too restrictive in practice; roughly speaking, it requires that the probability of using less than a fixed input level x decreases as the production level increases.

Unfortunately, even with the tail monotonicity assumption, in finite samples, the partial efficiency estimators do not share this monotonicity property. For the univariate case (e.g. with one input in the input orientation), Daouia & Simar propose an easy way to monotonise the estimated frontiers and show that the modified estimators retain the same asymptotic properties as the original estimators. In a more recent study, Daouia et al., 2014 propose an alternative way for defining the order-α efficiency scores, in a full multivariate setting. The resulting estimators share the desirable monotonicity property and have superior robustness properties, even if the tail monotonicity assumption does not hold. This new quantile approach is obtained from the directional distance estimator of order-α described by Simar & Vanhems 2012. In the input orientation, it involves a vector of zero directions for the outputs; in the output orientation, a vector of zero directions is used for the inputs. Analogue results for the order-m case should be available soon.

### 5.6 Robust Estimation of the Full Frontier

Both the order-m and order-α partial frontiers can be used to provide robust estimation of the full frontier itself or of the corresponding full-efficiency scores defined in (2.6)(2.9). Cazals et al., 2002 show that the estimator converges to as m. If the convergence is fast enough (i.e. m = O(n)), then converges also to the full frontier efficiency score θ(x,y), but with the same limiting distribution and the same non-parametric rate of the FDH estimator when n. However, for finite n, m will be finite and will not envelop all the data points; hence, the order-m estimator remains more robust than the FDH estimator in the presence of outliers.

More recently, Daouia et al., 2012 have shown, for the case of a univariate input in the input orientation, by letting m converge to slowly enough that the order-m estimator provides an asymptotically normally distributed estimator of distance to the full frontier. The necessary rate for m is roughly m = O(n1/3); see Daouia et al., 2012 for a precise formulation.

Not surprisingly, similar properties hold for the order-α frontiers. Clearly, as α→1, the order-α estimator converges to the FDH estimator. But, as shown by Daouia & Simar 2007, if α = α(n)→1 fast enough, that is, if n(p + q + 1)/(p + q)(1 − α(n))→0 as n, the order-α(n) estimator can be used to estimate full efficiency, obtaining the properties of the FDH estimator (with a non-parametric rate of convergence and limiting Weibull distribution). Of course in practice, n is finite, so the order-α frontier will not envelop all the data points and will also be more robust with respect to extreme or outliers than the ordinary FDH estimator.

Using results from EVT, Daouia et al., 2010 show for the univariate input case in the input orientation that by choosing α = α(n) converging to 1 slowly enough, an estimator of distance to the full frontier with a normal limiting distribution is obtained.

Order-m and order-α efficiency estimators are compared and investigated from the robustness theory perspective by Daouia & Ruiz-Gazen 2006 and Daouia & Gijbels 2011a, . They demonstrate relations between the two concepts and analyse their respective advantages and limitations. Daouia & Gijbels formalise a data-driven procedure to detect outliers and select appropriate values of the orders, providing a theoretical background for ideas in Simar 2003 for detecting outliers. As noted at the beginning of Section 5.2, detection of outliers is critical when using envelopment estimators, and several methods exist for detecting outliers in the context of efficiency estimation. Careful applied researchers should use several of these, as any one method is unlikely to detect all outliers in every situation.

## 6 Introducing Environmental Factors

The analysis of productive efficiency has in general two components: (i) estimation of a benchmark frontier that serves to evaluate performance of firms and (ii) investigation of the influence of outside, environmental factors on the production process. These factors, denoted by , may reflect differences in ownership, regulatory constraints, business environment and so on. Such factors are neither inputs nor outputs and are typically not under control of the manager, but nonetheless, they may influence the production process. Conditions described by Z may affect the range of attainable values for the inputs and outputs (X,Y), and hence the shape of the boundary of the attainable set; or Z may affect only the distribution of inefficiencies inside the attainable set or, in some cases, Z may affect both. Of course, Z might also be completely independent of (X,Y). The effect of Z is unknown and must be estimated appropriately.

There have been dozens of papers in the non-parametric literature suggesting ways to introduce Z into the analysis of the production process. Some proposals are rather simple but quite restrictive (e.g. the one-stage approaches discussed later), whereas others are valid only under very peculiar, restrictive conditions that are rarely tested. Examples of the latter, including the two-stage approaches discussed later, can be found in hundreds of published applied papers. Cazals et al., 2002 show a natural way to introduce environmental variables by extending their probabilistic formulation of the production process. The literature at present remains muddled and confused in many instances, but the conditional efficiency scores defined by Daraio & Simar 2007b offer a way out of the darkness. The various approaches are briefly reviewed in the following.

### 6.1 One-stage Approaches

The earliest attempts to incorporate environmental variables into the analysis of production were one-stage approaches along the lines of Banker & Morey 1986 and Färe et al., 1989. In this approach, the is treated as a vector of r freely disposable inputs or outputs that contribute to the definition of an augmented attainable set . The efficiency scores are defined relative to the boundary of this new set; for example, in the input orientation, one might define

(6.1)

Then non-parametric DEA or FDH estimators of Ψ treat Z as a freely disposable input if it is favourable to production of output, or as a freely disposable, undesirable output if it is detrimental to output production.

This approach has its own merits and is particularly easy to implement, but it has three important drawbacks. First, the researcher must know a priori whether Z is favourable or detrimental. Second, the approach only allows monotone effects of Z on the process (in many cases, effects may be either U shaped or inverted U shaped). Finally, one must assume free disposability, and in addition convexity if DEA estimators are used, of the augmented set Ψ. For these reasons, this approach has been used less than others in recent years.

### 6.2 Two-stage Approaches

A large part of the literature focuses on two-stage approaches for including environmental factors. One can find perhaps hundreds of papers using this approach, but as Simar & Wilson 2007 discuss, in most of these studies, statistical models are ill-defined, inappropriate estimators are used and inference is inconsistent. The basic idea is to estimate efficiency scores in a first stage considering only the space of inputs and outputs (X,Y), ignoring Z. Then in a second stage, the estimated efficiencies are regressed on Z. Although DEA and FDH efficiency estimates are truncated at one by construction, many examples exist in the literature where researchers either have ignored this or have confused truncation with censoring. By formalising this procedure, Simar & Wilson 2007, 2011b have shown that (i) this approach is meaningful only if a ‘separability condition’ between Z and (X,Y) holds and (ii) even if the second stage regression is meaningful, traditional inference is flawed by problems linked to the fact that the true efficiency scores are not observed but instead must be replaced by biased estimators that are not independent. Simar and Wilson suggest the use of bootstrap methods for addressing issue (ii), but the separability assumption remains a restrictive assumption in need of testing.

The problem can be formalised as follows. Consider the random variables (X,Y,Z) defined on an appropriate probability space with support . Consider also the conditional distribution of (X,Y), conditional on Z = z, described by

(6.2)

This gives the probability that a firm facing environmental conditions z will dominate the point (x,y). Given Z = z, the attainable set of combinations of inputs and outputs is

(6.3)

the support of HXYZ(x,yz).

The unconditional probability of (x,y) being dominated is given by HXY(x,y) in (5.2); here, it is clear that

(6.4)

where fZ(z) is the marginal density of Z. The support of HXY(x,y) is as usual denoted by Ψ, the marginal attainable set. These attainable sets are related by

(6.5)

Of course, for all .

Situations where the two-stage approach is meaningful can now be described. In particular, suppose that the shape of the attainable sets Ψz change with z, which is quite natural in many applications. Then no economic meaning for a firm facing environmental conditions z can be given to the marginal measure of efficiency (e.g. θ(x,y)) with respect to the boundary of the marginal set Ψ, because the boundary of this set may not be reachable for the unit facing condition z. The only situation where any meaning can be attached to these marginal measures is the case where the shape of the boundary of the conditional attainable sets is independent of z. This is the ‘separability condition’ described by Simar & Wilson 2007 and requiring

(6.6)

If this condition is verified, then it is reasonable to use an appropriate second-stage regression to investigate whether Z has some impact on the distribution of the efficiencies inside the unique attainable set Ψ, provided one uses methods to make inference consistently.

Additional methodological difficulties are described by Simar & Wilson 2007, 2011b. Even if the specified second-stage regression model is correctly specified for the true values of θ(x,y), these are latent variables and in practice are replaced by non-parametric estimates that are biased, suffer from the curse of dimensionality and are not independent. The statistical model specified by Simar & Wilson 2007 suggests using a truncated normal regression, transforming the estimated efficiencies so that they are bounded below by 1 in the case of input efficiency estimates. Results from Monte Carlo experiments presented by Simar and Wilson indicate that bootstrap algorithms provide reasonable approximation for making inference in the second-stage regression. Park et al., 2008 suggest the use of a non-parametric truncated regression model in the second stage, using local likelihood methods.

Kneip et al., 2014 analyse the consequence of replacing the true, unknown efficiency scores by DEA or FDH estimators in a second-stage regression and suggest alternative methods for obtaining valid inference. Banker & Natarajan 2008 propose a different model where the two-stage approach can be applied, but as discussed by Simar & Wilson 2011b, their model is rather restrictive, and the conventional inference they suggest in the second stage (using simple OLS techniques) is incorrect due to the problems enumerated by Kneip et al., 2014.

### 6.3 Conditional Frontiers

Because the separability condition (6.6) may be problematic, the safest approach for introducing environmental variables relies on the conditional model (6.2) along the lines of Cazals et al., 2002. Daraio & Simar 2007b define the conditional Farrell–Debreu input efficiency measure as

(6.7)

which gives the radial distance (in the input space) from (x,y) to the efficient boundary of units facing environmental conditions z. Adaptation to the output orientation is straightforward. Along the lines of the probabilistic formulation of efficiency scores in Section 5.1, it can be shown that

(6.8)

which can be compared with (5.3). Note also that here, in FXY,Z(xy,z), the conditioning event is (Yy,Z = z). This conditional DF is given by FXY,Z(xy,z) = HXYZ(x,yz)/HXYZ(,yz).

Non-parametric estimators of the θ(x,yz) are obtained from a sample by plugging a non-parametric estimator of FXY,Z(xy,z) into (6.8). Such an estimator may be obtained by standard non-parametric kernel smoothing, for example,

(6.9)

where K((Ziz)/h) is a familiar shorthand notation in case of multivariate Z (using product kernels and a vector of bandwidths h = (h1,…,hr)). As noted by Daraio & Simar 2007b, the kernels must have a compact support. Optimal bandwidth selection procedures for standard conditional distributions have been proposed by Hall et al., 2004. The procedure has been adapted to the set-up here by Bădin et al., 2010.

The resulting estimator of θ(x,yz) is called the conditional FDH estimator. Daraio & Simar 2005, 2007a show that the estimator is given by the simple expression

(6.10)

where ||a|| denotes the Euclidean norm of a vector a. This can be interpreted as a localised version of the FDH estimator, with localisation for data point such that Zi is in an h-neighbourhood of z (compare this with the unconditional FDH given in (3.3)). In fact, the conditional attainable set is estimated by

(6.11)

Daraio & Simar 2007a, 2007b suggest convexifying this set to obtain an estimator

(6.12)

of Ψz under the additional assumption of local convexity (i.e. assuming that Ψz is convex), where . The corresponding, conditional DEA estimator of θ(x,yz) is

(6.13)

which can be written as a local version of the linear program in (3.5).

The asymptotic properties of these non-parametric conditional efficiency estimators are derived by Jeong et al., 2010. To summarise, the properties are similar to those of the unconditional DEA and FDH estimators, except that the effective number of observations used for building the estimators is now the number of observations for which ||Ziz||≤h. When optimal bandwidths are selected as in Bădin et al., 2010, the effective number of observations is in the order of n4/(4 + r) in place of n for the unconditional estimators. The rates of convergence are thus deteriorated by this factor.

The conditional efficiency scores have also their robust versions; see Cazals et al., 2002 and Daraio & Simar 2007b for the order-m version and Daouia & Simar 2007 for the order-α analogue. Also, conditional measures have been extended to hyperbolic distances by Wheelock & Wilson 2008 and to hyperbolic distances by Simar & Vanhems 2012.

Bădin et al., 2012, 2014 suggest useful tools for analysing the impact of Z on the production process, by exploiting the comparison between the conditional and unconditional measures. These tools (graphical and non-parametric regressions) allow one to disentangle the impact of Z on any potential shift of the frontier or potential shift of the inefficiency distributions. Daraio & Simar 2014 provide also a bootstrap test for testing the significance of environmental factors on the conditional efficiency scores. These tools have been used in macroeconomics to gauge the effect of foreign direct investment and time on ‘catching-up’ by developing countries; see Mastromarco & Simar 2014.

Recently, Florens et al., 2014 propose an alternative approach for estimating conditional efficiency scores that avoid explicit estimation of a non-standard conditional distribution (e.g. FXY,Z(xy,z)). The approach is less sensitive to the curse of the dimensionality described earlier. It is based on very flexible non-parametric location-scale regression models for pre-whitening the inputs and the outputs to eliminate their dependence on Z. This allows one to define ‘pure’ inputs and outputs, and hence a ‘pure’ measure of efficiency. The method permits returning in a second stage to the original units and evaluating the conditional efficiency scores, but without explicitly estimating a conditional DF. The paper proposes also a bootstrap procedure for testing the validity of the location-scale hypothesis. The usefulness of the approach is illustrated using data on commercial banks to analyse the effects of banks' size and diversity of the services offered on the production process and on the resulting efficiency distribution.

## 7 Other Aspects and Open Issues

### 7.1 Robust Parametric Estimation of Frontiers

Non-parametric methods are attractive because they impose few restrictions on either the functional form of the frontier or on the stochastic part of the model (i.e. the distribution of the inefficiencies below the frontier). On the other hand, parametric models for the frontier are appealing because they offer a richer economic interpretation of the production process, e.g. sensitivity of the production of output to particular inputs. For purposes of illustration in the succeeding discussion, consider a single output and a production function Yi=β0+βXiUi, where Ui≥0 (alternatively, for a cost function, Ui≤0); Y and X could be measured on a log scale.

The first approach in the parametric world is due to Aigner & Chu (1968), who proposed enveloping the data in a parametric way by solving some mathematical programs to minimise or with respect to β0 and β, subject to Ui≥0. This family of estimators has been investigated by Knight 2006, who derives the consistency of the programming estimator. Hall et al., 1998 derive the limiting distribution of the parametric linear frontier estimator, considering even local polynomial versions for the univariate case. Park 2001 generalises the result to the multivariate case. But one drawback of the mathematical programming methods is that observations far from the frontier have excessive weight (in particular, when using a quadratic objective) in determining the shape of the optimal frontier. Finally, because the resulting frontier will envelop all the data points, the estimators will be very sensitive to outliers.

The other stream of approaches in the parametric framework follows the lines of Greene (1980) and is based on (shifted) regression ideas. As explained in detail by Florens & Simar 2005, these approaches have several drawbacks. First, in most cases, they require specification of the density of U. Second, independence between the inputs X and the stochastic inefficiency term U (or at least, for a constant μ) is required for all the proposed estimators. Most importantly, however, because , the frontier function is, by construction, a shift of the regression of Y on X, that is, the conditional mean function. Hence, any estimation procedure (by modified OLS or maximum likelihood estimation) will capture the shape of the ‘middle’ of the cloud of data points. This is not natural for capturing the shape of the frontier function, which may differ from the shape of the conditional mean function.

Florens & Simar 2005 and Daouia et al., 2008 address all these drawbacks and provide robust versions of the parametric estimator of the frontier. A parametric frontier model is estimated by using a two-step procedure. In the first step, a non-parametric method is used to estimate where the production frontier is located, and all the data points are projected onto this frontier. Then in a second step, these projections are adjusted using simple methods to fit a specified parametric model. Because the production frontier is the locus of optimal production plans, one should expect a better parametric fit if only efficient units are used to estimate it. This idea appears in Simar 1992, but with no theoretical justification nor results on the statistical properties of the estimators of the β's. When using the FDH estimator in the first stage, Florens & Simar 2005 prove the consistency of the method, and in the order-m case (which is more robust to outliers), they obtain root-n consistency and asymptotic normality, with an explicit expression for the variance. Daouia et al., 2008 obtain similar results when using the robust order-α estimators in the first stage. Daraio & Simar 2007a indicate how to extend these ideas to a full multivariate setting by using a parametric model for a distance function. These approaches are very attractive because they use estimation tools—the FDH estimator and its robust versions and OLS techniques—that are familiar to practitioners of frontier analysis. These methods also suggest that bridges can be built between the parametric and non-parametric worlds.

### 7.2 Non-parametric Stochastic Frontier

One of the limitations of the non-parametric envelopment estimators described earlier is that, as with all estimators in ‘deterministic’ frontier models, they do not allow for noise in the DGP. It is frequently argued that parametric approaches are superior to the non-parametric approaches because they admit ‘stochastic’ frontier models such as (2.14). Of course, this neglects the fact that in some applications, the parametric restrictions, both on the frontier function and on the stochastic part of the model, may be irrelevant. Moreover, the noise admitted into stochastic frontier models must be unrelated to the right-hand side variables, and it does not allow for measurement error in these variables. A number of recent papers attempt to introduce noise into non-parametric boundary and frontier estimation. The robust estimators of the full frontier described earlier can be viewed as allowing the presence of noise, but still the underlying model is a ‘deterministic’ one.

Hall & Simar 2002 show that even if the noise is symmetric and the inefficiency distribution has a jump at the frontier, a fully non-parametric model with both noise and inefficiency is not identified. They provide a strategy that allows introduction of noise into the model and consistent estimation of the unknown boundary of support of a random variable reflecting inefficiency. Consistency and identification are obtained by letting the variance of the noise converge to zero as n. Monte Carlo experiments indicate that the procedure works well for a signal-to-noise ratio (measured by the ratio of the respective standard deviations) of 5, and the authors argue that the signal-to-noise ratio is likely to be high in many applications. They apply the procedure to estimate non-parametrically a production function and hence provide a first non-parametric alternative to classical parametric stochastic frontier models such as (2.14). Simar extends these ideas to fully multivariate settings, confirming good behaviour of the resulting modified DEA estimators in the presence of noise of moderate size.

To solve the basic identifiability issue, some structure on the model is required. One approach is to leave the production function unspecified while specifying a fully parametric model for the stochastic part (i.e. specifying a parametric density for the inefficiency term U and for the independent noise V). The simplest approach is to assume homoscedasticity of both components. This is investigated by Fan et al., 1996 and Kuosmanen & Kortelainen 2012. These semiparametric approaches are interesting, but they retain both the homoscedasticity assumption of the stochastic terms and the parametric assumptions for the inefficiency component and hence are likely to introduce misspecification errors into the model, leaving statistical consistency in doubt.

Kumbhakar et al., 2007 propose an alternative based on local maximum likelihood techniques. Their model is

(7.1)

where and , with U and V independent conditionally on X. The functions and are unspecified, as are unknown functional parameters. Estimation uses local maximum likelihood techniques where the unknown functional parameters are approximated by local polynomials (either linear or quadratic). Kumbhakar et al., 2007 provide asymptotic properties, which involve a limiting normal distribution. The procedure requires selection of bandwidths, which is carried out by the usual likelihood cross-validation. Simar & Zelenyuk 2011 extend the procedure to a fully multivariate set-up by using an analogue of the model in (7.1) to characterise the univariate distance to a multivariate boundary surface. By doing so, they provide stochastic versions of DEA and FDH estimators. The resulting estimates show encouraging results, such as adaptation to the size of noise (i.e. the stochastic FDH/DEA estimators collapse to the usual FDH/DEA estimators in the absence of noise), robustness with respect to outliers and other properties. However, the method is computationally intensive. Recent work by Simar et al., 2014 avoids much of the computational burden by proposing a non-parametric version of the modified OLS technique. Asymptotic properties of this non-parametric, modified least-squares estimator are also given.

The latter approaches are appealing and very flexible, but still they require some ‘local’ parametric assumptions on the distribution of the inefficiency term in order to obtain statistical consistency. Kneip et al., 2012b overcome this limitation, obtaining identification of the model by assuming independent Gaussian noise, but with unknown variance. The inefficiency distribution is left unspecified and is estimated by simple histograms. Then the model is estimated by the penalised likelihood method, where the penalisation controls the smoothness of the histogram. Kneip et al. first address the estimation of a univariate boundary and then adapt the procedure to estimate a production (or cost) function. Although only log(n) convergence rates are achieved in theory, in practice, the resulting estimates behave well in finite samples of sizes commonly faced in practice, as revealed by Monte Carlo experiments.

### 7.3 Testing Issues

For the practitioner, it is important to be able to test empirically hypotheses relating to the DGP and having to do with the shape of the frontier (e.g. convexity and returns to scale). This is important not only for economic considerations but also for statistical reasons, because as shown earlier there is much to be gained in terms of statistical precision by assuming convexity of Ψ or CRS for Ψif such assumptions are appropriate. Reducing the dimensionality of the problem by testing the relevance of certain inputs and outputs or the possibility of aggregating inputs or outputs is of interest. Conversely, too-restrictive assumptions (e.g. assuming Ψ is CRS when it is in fact only VRS) lead to inconsistent estimation.

In typical testing problems, a test statistic is defined to discriminate between a null hypothesis and an alternative hypothesis; the primary difficulty is in providing a critical value for a test of a given size. This has been carried out in several studies using bootstrap approximations; for example, Simar & Wilson 2001 use bootstrap methods to test whether inputs or outputs can be aggregated. Similarly, bootstrap methods are used by Simar & Wilson 2002 to test CRS versus VRS, by Simar & Zelenyuk 2006, to test whether efficiency distributions and their means differ across two samples, by Simar & Wilson 2011a to test convexity of Ψ and by Daraio et al., 2010) to test the ‘separability’ condition discussed in Section 6.2 in the presence of environmental factors. These studies used either the restrictive, homogeneous bootstrap ideas of Simar & Wilson 1998 or the subsampling ideas of Simar & Wilson 2011a but lacked convincing theoretical justification. Intensive Monte Carlo simulations in several of the studies indicate that the suggested procedures achieve reasonable size and power properties in samples of moderate size, but theoretical results are lacking in these studies. This gap is filled by Kneip et al., 2014.

In each case listed in the preceding paragraph, test statistics involving comparisons of FDH, CRS-DEA or VRS-DEA estimates are used. The difficulty for providing theoretical justification for general hypothesis tests is that the properties of FDH/DEA estimators described in Section 4 hold only for a given, fixed point of interest, for example, for , an estimator of θ(x,y), where (x,y) is non-stochastic. Note that here the notation for the VRS-DEA estimator of θ(x,y) has been modified to explicitly show that the estimator depends on a random sample of firms' input–output combinations. But when constructing test statistics for tests of returns to scale and other model features, the FDH or DEA estimators are evaluated at random points .

To illustrate the problem, consider perhaps the simplest statistic, the sample mean of the efficiency scores

(7.2)

where the efficiency estimators under the summation sign could be FDH, CRS-DEA or VRS-DEA estimators. The statistic might be used to make inference on the (population) mean efficiency . Denote by the population variance of efficiency scores. If the true efficiencies were observable, one might be happy to use

(7.3)

to make inference because under mild regularity conditions, . But of course, is a latent variable and must be replaced by , which is observable.

Unfortunately, DEA and FDH estimators are biased and correlated. The bias is of the order nκ, the same order as the convergence rate, where κ is defined earlier for FDH, VRS-DEA and CRS-DEA estimators and depends on the dimension p + q of the input–output space. The bias does not disappear when averaging in (7.2), but the variance tends to zero, as shown by Kneip et al., 2014. In fact, Theorem 4.1 of Kneip et al. reveals that under mild regularity conditions and assumptions appropriate for the given estimator (i.e. free disposability for FDH, plus convexity for VRS-DEA or CRS for the CRS-DEA),

(7.4)

where C is some constant and Rn,κ=op(nκ). This result shows clearly that the inherent bias of the envelopment estimators ‘kills’ the variance whenever κ≤1/2. Kneip et al., 2014 solve the problem by estimating the leading terms of the bias by a kind of generalised jackknife. Then by using a simple, consistent estimator of the variance, they provide a way for making inference on μθ using normal approximations. If the dimension p + q is too large (p + q > 4 for the VRS case, p + q > 5 for the CRS case or p + q > 3 for the FDH case), this bias correction is not enough. In these cases, a solution is provided by computing the estimator of μ by an average of over random subsample j = 1,…,nκ of observations, where nκ=[n2κ]≤n. Kneip et al., 2014 show that the resulting statistic has a corresponding central limit, normal approximation, but with a lower rate.

Kneip et al., 2013 extend these basic theoretical results to testing problems such as testing whether mean efficiencies differ across two groups of producers, testing CRS versus VRS and testing convexity of Ψ. The proposed tests avoid the need for bootstrap methods, although the bootstrap remains a valid alternative and is perhaps useful in some instances. The tests are used in an empirical setting by Apon et al., 2014 to examine the effect of locally available high-performance computing systems on the efficiency of research output in eight academic disciplines across US research universities. For each discipline, Apon et al. first test whether Ψ is convex and then test CRS versus VRS in cases where convexity of Ψ is not rejected. They then test for differences in mean efficiency across two groups, using results from the first two tests to select the appropriate estimator (i.e. FDH, VRS-DEA or CRS-DEA) to use in the last test.

### 7.4 Non-parametric Models for Panel Data

Many approaches have been proposed for dealing with panel data in the world of parametric productivity analysis; see Kumbhakar & Lovell 2000 and Greene 2008 and the references therein. By contrast, although there is a large literature on using panel data and FDH/DEA estimators to examine changes in production processes over time, the literature is largely astatistical. Most of this non-parametric literature revolves around Malmquist indices defined to measure changes in productivity over time. These changes can be decomposed using various identities to attribute changes in productivity to changes in efficiency, shifts in the technology Ψ and other changes from one period to the next; see Färe et al., 2008 for details and a comprehensive survey.

To illustrate, consider two periods t1<t2, with corresponding attainable sets and . Let denote the production plan of a firm at time tj, j = 1,2, and let denote the output-oriented Shephard 1970 distance (i.e. the inverse of the output-oriented Farrell efficiency measure) of the point , relative to the conical hull of . Then the output-oriented Malmquist productivity index is defined by

(7.5)

This gives the geometric mean of the gain in productivity of the firm moving from in period t1 to in period t2, measured relative to the conical hulls of the attainable sets in each of the two periods. In practice, the terms can be estimated by CRS-DEA estimators.

Simar & Wilson 1999a adapt the smooth, homogeneous bootstrap of Simar & Wilson 1998 to make inference about the measure defined by (7.5), bootstrapping on pairs of observations (from each of the two periods) to preserve the time dependence. Daskovska et al., 2010 extend the idea and introduce some dynamics to provide forecasts of Malmquist indexes.

Kneip & Simar 1996 use kernel methods to estimate individual production functions (for each firm) in a panel-data setting and define the production frontier as the envelope of these individual functions. Asymptotic theory is provided, but the method depends on T as well as n, where T and n denote the number of periods and number of firms, respectively. Note that by enveloping the individual production frontiers, Kneip & Simar anticipate the idea of ‘metafrontiers’ presented by O'Donnell et al., 2008.

Several semiparametric approaches are possible in the context of panel data. Park & Simar 1994 suggest linear models for the production function with a non-parametric random firm effect, whose support (i.e. upper boundary) determines the frontier level. Park et al., analyse the case where this random effect is correlated with some regressors (inputs). Park et al., 2003, 2003a, 2007 extend these results to various forms of dynamic models. Kneip et al., 2012a extend these ideas to still more general semiparametric models by analysing a model having a non-parametric long time trend and a firm-specific technical efficiency term varying non-parametrically with time. The latter is estimated by factor models. This approach can be viewed as a compromise between the fully parametric approach of Kneip & Simar 1996 and the somewhat restrictive semiparametric model of Park & Simar 1994. Kneip & Sickles 2011 provide a comprehensive survey of the various approaches, including the ones mentioned here.

## 8 Conclusions

The field of frontier estimation is fascinating, because it is a non-standard econometric problem and is not easy. Both parametric and non-parametric approaches have their own advantages and disadvantages, but clearly, both approaches are statistical in nature. The only real differences are in the assumptions the researcher is willing to make.

Non-parametric methods for efficiency estimation bring together a wide variety of mathematical tools from mathematical statistics, econometrics and operations research. As this guided tour shows, a large number of statistical results are available today, but a number of challenges remain to be solved. These include the non-parametric treatment of endogeneity and latent heterogeneity in frontier settings, as well as relevant, flexible non-parametric models for panel data. These and other unresolved issues are currently being pursued by the authors of the guided tour.

## Notes

1The choice of inputs and outputs is often made in empirical studies with little discussion and few, if any, checks on robustness with respect to the choice that is made. At least in some cases, the choice of inputs and outputs to include in a model can have a big impact on any results that are obtained; see Wheelock & Wilson 1995a for an example.2In particular, the noise is typically assumed to be independent of the right-hand side variables, which are typically assumed to be measured without error.3Afriat 1972 used similar ideas earlier to define a left-continuous monotone production function, but only for the case of univariate output and freely disposable inputs. In addition, his approach only allowed efficiency to be measured in the output direction.4Seiford 1996 provides a survey of more than 700 published papers using DEA/FDH techniques; Cooper et al., 2000 similarly cite roughly 1500 references; and Gattoufi et al., 2004 provide more than 1800 references. A search on Google Scholar using the keywords ‘efficiency’, ‘production’ and either ‘dea’ or ‘fdh’ yielded about 103000 papers (many are unpublished working papers) on 26 February 2014. Almost all of these papers ignore any statistical considerations.5Note that Afriat 1972, in the univariate output case, also introduced some statistical methods in his study, by considering a parametric, beta distribution for inefficiency and using maximum likelihood for parameter estimation. However, no statistical properties are established in this paper.6Note that Jeong & Park 2011 suggest an alternative way to select a subsample size, for a special case of a univariate output when output efficiency is estimated.7In the econometric literature, the definition of the order-m frontier appears sometimes, with some abuse of notation, as to stress the fact that the random variables X1,…,Xm are generated conditionally on Yy.8Note that the approach here is quite different from traditional non-parametric quantile regression (e.g. Fan et al.; Li & Racine, 2007), where the conditioning is on the event Y = y. The non-standard conditioning on Yy in (5.10) guarantees monotonicity properties of the resulting frontier estimates, providing a better economic interpretation (as explained in the next subsection). In addition, conditioning on Yy allows an estimation procedure that avoids smoothing techniques, leading to the root-n convergence rate of the resulting non-parametric estimators.9Note that Martins-Filho & Yao 2007 suggest, in the case of a univariate output in an output orientation, smoothing the estimator of the conditional DF before determining its quantile-frontier. It is not clear that this smoothing, which requires a bandwidth, adds any substantial gain over the simpler procedure using the root-n consistent empirical conditional DF as described earlier.

## Acknowledgements

Research support from the IAP Research Network P7/06 of the Belgian State (Belgian Science Policy) and from the US National Science Foundation (grant no. SMA-1243436) is gratefully acknowledged. Any remaining errors are solely our responsibility.