## 1. Introduction

Let *y* and *X* be respectively ℝ-valued and ℝ^{p}-valued random variables. Without prior knowledge about the relationship between *y* and *X*, the regression function *g*(*x*)=*E*(*y*|*X*=*x*) is often modelled in a flexible nonparametric fashion. When the dimension of *X* is high, recent efforts have been expended in finding the relationship between *y*and *X* efficiently. The final goal is to approximate *g*(*x*) by a function having simplifying structure which makes estimation and interpretation possible even for moderate sample sizes. There are essentially two approaches: the first is largely concerned with function approximation and the second with dimension reduction. Examples of the former are the additive model approach of Hastie and Tibshirani (1986)and the projection pursuit regression proposed by Friedman and Stuetzle (1981); both assume that the regression function is a sum of univariate smooth functions. Examples of the latter are the dimension reduction of Li (1991) and the regression graphics of Cook (1998).

A regression-type model for dimension reduction can be written as

where *g* is an unknown smooth link function, *B*_{0}=(*β*_{1},…,*β*_{D}) is a *p*×*D* orthogonal matrix (*B*_{0}^{T}*B*_{0}=*I*_{D×D}) with *D*<*p* and *E*(*ɛ*|*X*)=0 almost surely. The last condition allows *ɛ* to be dependent on *X*. When model (1.1) holds, the projection of the *p*-dimensional covariates *X* onto the *D*-dimensional subspace *B*_{0}^{T}*X* captures all the information that is provided by *X* on *y*. We call the *D*-dimensional subspace *B*_{0}^{T}*X* the effective dimension reduction (EDR) space. Li (1991) introduced the EDR space in a similar but more general context; the difference disappears for the case of additive noise as in model (1.1). See also Carroll and Li (1995), Chen and Li (1989) and Cook (1994). Note that the space spanned by the column vectors of *B*_{0} is uniquely defined under some mild conditions (given in Section 3) and is our focus of interest. For convenience, we shall refer to these column vectors as EDR directions, which are unique up to orthogonal transformations. The estimation of the EDR space includes the estimation of the directions, namely *B*_{0}, and the corresponding dimension of the EDR space. For specific semiparametric models, methods have been introduced to estimate *B*_{0}. Next, we give a brief review of these methods.

One of the important approaches is the projection pursuit regression proposed by Friedman and Stuetzle (1981). Huber (1985) has given a comprehensive discussion. Chen (1991) has investigated a projection pursuit type of regression model. The primary focus of projection pursuit regression is more on the approximation of *g*(*x*) by a sum of ridge functions *g*_{k}(⋅), namely

than on looking for the EDR space.

A simple approach that is directly related to the estimation of EDR directions is the average derivative estimation (ADE) proposed by Härdle and Stoker (1989). For the single-index model *y*=*g*_{1}(*β*_{1}^{T}*X*)+*ɛ*, the expectation of the gradient ▽*g*_{1}(*X*) is a scalar multiple of *β*_{1}. A nonparametric estimator of ∇*g*_{1}(*X*) leads to an estimator of *β*_{1}. There are several limitations of ADE.

- (a) To estimate
*β*_{1}, the condition*E*{*g*_{1}^{ ′ }(*β*_{1}^{T}*X*)}≠0 is needed. This condition is violated when*g*_{1}(⋅) is an even function and*X*is symmetrically distributed. - (b) As far as we know, there is no successful extension to the case of more than one EDR direction.

The sliced inverse regression (SIR) method proposed by Li (1991) is perhaps up to now the most powerful method for searching for EDR directions and dimension reduction. However, the SIR method imposes some strong probabilistic structure on *X*. Specifically, the method requires that, for any constant vector *b*^{T}=(*b*_{1},…,*b*_{p}), there are constants *c*_{0} and *c*^{T}=(*c*_{1},…,*c*_{D}) depending on *b* such that, for the directions *B*_{0} in model (1.1),

As pointed out by Cook and Weisberg in their discussion of Li (1991), the most important family of distributions satisfying condition (1.2) is that of elliptically symmetric distributions. Now, in time series analysis we typically set *X*=(*y*_{t−1},…,*y*_{t−p})^{T}, where {*y*_{t}} is a time series. Then it is easy to prove that elliptical symmetry of *X* for all *p* with (second-order) stationarity of {*y*_{t}} implies that {*y*_{t}} is time reversible, a feature which is the exception rather than the rule in time series analysis. (For a discussion of time reversibility, see, for example, Tong (1990).)

Another aspect of searching for the EDR space is the determination of the corresponding dimension. The method proposed by Li (1991) can be applied to determine the dimension of the EDR space in some cases but for reasons mentioned above it is typically not relevant for time series data.

In this paper, we shall propose a new method to estimate the EDR directions. We call it the (conditional) minimum average variance estimation (MAVE) method. Our approach is inspired by the SIR method, the ADE method and the idea of local linear smoothers (see, for example, Fan and Gijbels (1996)). It is easy to implement and needs no strong assumptions on the probabilistic structure of *X*. Specifically, our methods apply to model (1.1) including its generalization within the additive noise set-up. The joint density function of covariate *X* is needed if we search for the EDR space globally. However, if we have some prior information about the EDR directions and we look for them locally, then existence of density of *X* in the directions around EDR directions will suffice. These cases include those in which some of the covariates are categorical or functionally related. The observations need not be independent, e.g. time series data. On the basis of the properties of the MAVE method, we shall propose a method to estimate the dimension of the EDR space, which again does not require strong assumptions on the design *X* and has wide applicability.

Let *Z* be an ℝ^{q}-valued random variable. A general semiparametric model can be written as

where *G* is a *known* smooth function up to a parameter vector *θ*∈ℝ^{l}, *φ*(⋅): ℝ^{D}↦ℝ^{D ′ } is an *unknown* smooth function and *E*(*ɛ*|*X*,*Z*)=0 almost surely. Special cases are the generalized partially linear single-index model of Carroll *et al.*(1997) and the single-index functional coefficient model in Xia and Li (1999). Searching for the EDR space *B*_{0}^{T}*X* in model (1.3) is of theoretical as well as practical interest. However, the existing methods are not always appropriate for this model. An extension of our method to handle this model will be discussed.

The rest of this paper is organized as follows. Section 2 describes the MAVE procedure and gives some results. Section 3 discusses some comparisons with existing methods and proposes a simple average outer product of gradients (OPG) estimation method and an inverse MAVE method. To check the feasibility of our approach, we have conducted many simulations, typical ones of which are reported in Section 4. In Section 5 we study the circulatory and respiratory data of Hong Kong and the hitters' salary data of the USA using the MAVE methodology. In practice, we standardize our observations. Appendix A establishes the efficiency of the algorithm proposed. Some of our theoretical proofs are very lengthy and not included here. However, they are available on request from the authors. Finally, the programs are available at http://www.blackwellpublishers.co.uk/rss/