Sequels are rarely as good as the originals, so I approach this review of the market efficiency literature with trepidation. The task is thornier than it was 20 years ago, when work on efficiency was rather new. The literature is now so large that a full review is impossible, and is not attempted here. Instead, I discuss the work that I find most interesting, and I offer my views on what we have learned from the research on market efficiency.
I. The Theme
I take the market efficiency hypothesis to be the simple statement that security prices fully reflect all available information. A precondition for this strong version of the hypothesis is that information and trading costs, the costs of getting prices to reflect information, are always 0 (Grossman and Stiglitz (1980)). A weaker and economically more sensible version of the efficiency hypothesis says that prices reflect information to the point where the marginal benefits of acting on information (the profits to be made) do not exceed the marginal costs (Jensen (1978)).
Since there are surely positive information and trading costs, the extreme version of the market efficiency hypothesis is surely false. Its advantage, however, is that it is a clean benchmark that allows me to sidestep the messy problem of deciding what are reasonable information and trading costs. I can focus instead on the more interesting task of laying out the evidence on the adjustment of prices to various kinds of information. Each reader is then free to judge the scenarios where market efficiency is a good approximation (that is, deviations from the extreme version of the efficiency hypothesis are within information and trading costs) and those where some other model is a better simplifying view of the world.
Ambiguity about information and trading costs is not, however, the main obstacle to inferences about market efficiency. The joint-hypothesis problem is more serious. Thus, market efficiency per se is not testable. It must be tested jointly with some model of equilibrium, an asset-pricing model. This point, the theme of the 1970 review (Fama (1970b)), says that we can only test whether information is properly reflected in prices in the context of a pricing model that defines the meaning of “properly.” As a result, when we find anomalous evidence on the behavior of returns, the way it should be split between market inefficiency or a bad model of market equilibrium is ambiguous.
Does the fact that market efficiency must be tested jointly with an equilibrium-pricing model make empirical research on efficiency uninteresting? Does the joint-hypothesis problem make empirical work on asset-pricing models uninteresting? These are, after all, symmetric questions, with the same answer. My answer is an unequivocal no. The empirical literature on efficiency and asset-pricing models passes the acid test of scientific usefulness. It has changed our views about the behavior of returns, across securities and through time. Indeed, academics largely agree on the facts that emerge from the tests, even when they disagree about their implications for efficiency. The empirical work on market efficiency and asset-pricing models has also changed the views and practices of market professionals.
As these summary judgements imply, my view, and the theme of this paper, is that the market efficiency literature should be judged on how it improves our ability to describe the time-series and cross-section behavior of security returns. It is a disappointing fact that, because of the joint-hypothesis problem, precise inferences about the degree of market efficiency are likely to remain impossible. Nevertheless, judged on how it has improved our understanding of the behavior of security returns, the past research on market efficiency is among the most successful in empirical economics, with good prospects to remain so in the future.
II. The Main Areas of Research
The 1970 review divides work on market efficiency into three categories: (1) weak-form tests (How well do past returns predict future returns?), (2) semi-strong-form tests (How quickly do security prices reflect public information announcements?), and (3) strong-form tests (Do any investors have private information that is not fully reflected in market prices?) At the risk of damning a good thing, I change the categories in this paper.
Instead of weak-form tests, which are only concerned with the forecast power of past returns, the first category now covers the more general area of tests for return predictability, which also includes the burgeoning work on forecasting returns with variables like dividend yields and interest rates. Since market efficiency and equilibrium-pricing issues are inseparable, the discussion of predictability also considers the cross-sectional predictability of returns, that is, tests of asset-pricing models and the anomalies (like the size effect) discovered in the tests. Finally, the evidence that there are seasonals in returns (like the January effect), and the claim that security prices are too volatile are also considered, but only briefly, under the rubric of return predictability.
For the second and third categories, I propose changes in title, not coverage. Instead of semi-strong-form tests of the adjustment of prices to public announcements, I use the now common title, event studies. Instead of strong-form tests of whether specific investors have information not in market prices, I suggest the more descriptive title, tests for private information.
Return predictability is considered first, and in the most detail. The detail reflects my interest and the fact that the implications of the evidence on the predictability of returns through time are the most controversial. In brief, the new work says that returns are predictable from past returns, dividend yields, and various term-structure variables. The new tests thus reject the old market efficiency-constant expected returns model that seemed to do well in the early work. This means, however, that the new results run head-on into the joint-hypothesis problem: Does return predictability reflect rational variation through time in expected returns, irrational deviations of price from fundamental value, or some combination of the two? We should also acknowledge that the apparent predictability of returns may be spurious, the result of data-dredging and chance sample-specific conditions.
The evidence discussed below, that the variation through time in expected returns is common to corporate bonds and stocks and is related in plausible ways to business conditions, leans me toward the conclusion that it is real and rational. Rationality is not established by the existing tests, however, and the joint-hypothesis problem likely means that it cannot be established. Still, even if we disagree on the market efficiency implications of the new results on return predictability, I think we can agree that the tests enrich our knowledge of the behavior of returns, across securities and through time.
Event studies are discussed next, but briefly. Detailed reviews of event studies are already available, and the implications of this research for market efficiency are less controversial. Event studies have, however, been a growth industry during the last 20 years. Moreover, I argue that, because they come closest to allowing a break between market efficiency and equilibrium-pricing issues, event studies give the most direct evidence on efficiency. And the evidence is mostly supportive.
Finally, tests for private information are reviewed. The new results clarify earlier evidence that corporate insiders have private information that is not fully reflected in prices. The new evidence on whether professional investment managers (mutual fund and pension fund) have private information is, however, murky, clouded by the joint-hypothesis problem.
III. Return Predictability: Time-Varying Expected Returns
There is a resurgence of research on the time-series predictability of stock returns, that is, the variation (rational or irrational) of expected returns through time. Unlike the pre-1970 work, which focused on forecasting returns from past returns, recent tests also consider the forecast power of variables like dividend yields D/P), earnings/price ratios (E/P), and term-structure variables. Moreover, the early work concentrated on the predictability of daily, weekly, and monthly returns, but the recent tests also examine the predictability of returns for longer horizons.
Among the more striking new results are estimates that the predictable component of returns is a small part of the variance of daily, weekly, and monthly returns, but it grows to as much as 40% of the variance of 2– to 10-year returns. These results have spurred a continuing debate on whether the predictability of long-horizon returns is the result of irrational bubbles in prices or large rational swings in expected returns.
I first consider the research on predicting returns from past returns. Next comes the evidence that other variables (D/P, E/P, and term-structure variables) forecast returns. The final step is to discuss the implications of this work for market efficiency.
A. Past Returns
A. 1. Short-Horizon Returns
In the pre-1970 literature, the common equilibrium-pricing model in tests of stock market efficiency is the hypothesis that expected returns are constant through time. Market efficiency then implies that returns are unpredictable from past returns or other past variables, and the best forecast of a return is its historical mean.
The early tests often find suggestive evidence that daily, weekly, and monthly returns are predictable from past returns. For example, Fama (1965) finds that the first-order autocorrelations of daily returns are positive for 23 of the 30 Dow Jones Industrials and more than 2 standard errors from 0 for 11 of the 30. Fisher's (1966) results suggest that the autocorrelations of monthly returns on diversified portfolios are positive and larger than those for individual stocks. The evidence for predictability in the early work often lacks statistical power, however, and the portion of the variance of returns explained by variation in expected returns is so small (less than 1% for individual stocks) that the hypothesis of market efficiency and constant expected returns is typically accepted as a good working model.
In recent work, daily data on NYSE and AMEX stocks back to 1962 [from the Center for Research in Security Prices (CRSP)] makes it possible to estimate precisely the autocorrelation in daily and weekly returns. For example, Lo and MacKinlay (1988) find that weekly returns on portfolios of NYSE stocks grouped according to size (stock price times shares outstanding) show reliable positive autocorrelation. The autocorrelation is stronger for portfolios of small stocks. This suggests, however, that the results are due in part to the nonsynchronous trading effect (Fisher 1966). Fisher emphasizes that spurious positive autocorrelation in portfolio returns, induced by nonsynchronous closing trades for securities in the portfolio, is likely to be more important for portfolios tilted toward small stocks.
To mitigate the nonsychronous trading problem, Conrad and Kaul (1988) examine the autocorrelation of Wednesday-to-Wednesday returns for size-grouped portfolios of stocks that trade on both Wednesdays. Like Lo and MacKinlay (1988), they find that weekly returns are positively autocorrelated, and more so for portfolios of small stocks. The first-order autocorrelation of weekly returns on the portfolio of the largest decile of NYSE stocks for 1962–1985 is only .09. For the portfolios that include the smallest 40% of NYSE stocks, however, first-order autocorrelations of weekly returns are around .3, and the autocorrelations of weekly returns are reliably positive out to 4 lags.
The results of Lo and MacKinlay (1988) and Conrad and Kaul (1988) show that, because of the variance reduction obtained from diversification, portfolios produce stronger indications of time variation in weekly expected returns than individual stocks. Their results also suggest that returns are more predictable for small-stock portfolios. The evidence is, however, clouded by the fact that the predictability of portfolio returns is in part due to nonsynchronous trading effects that, especially for small stocks, are not completely mitigated by using stocks that trade on successive Wednesdays.
An eye-opener among recent studies of short-horizon returns is French and Roll (1986). They establish an intriguing fact. Stock prices are more variable when the market is open. On an hourly basis, the variance of price changes is 72 times higher during trading hours than during weekend nontrading hours. Likewise, the hourly variance during trading hours is 13 times the overnight nontrading hourly variance during the trading week.
One of the explanations that French and Roll test is a market inefficiency hypothesis popular among academics; specifically, the higher variance of price changes during trading hours is partly transistory, the result of noise trading by uniformed investors (e.g., Black (1986)). Under this hypothesis, pricing errors due to noise trading are eventually reversed, and this induces negative autocorrelation in daily returns. French and Roll find that the first-order autocorrelations of daily returns on the individual stocks of larger (the top three quintiles of) NYSE firms are positive. Otherwise, the autocorrelations of daily returns on individual stocks are indeed negative, to 13 lags. Although reliably negative on a statistical basis, however, the autocorrelations are on average close to 0. Few are below −01.
One possibility is that the transitory price variation induced by noise trading only dissipates over longer horizons. To test this hypothesis, French and Roll examine the ratios of variances of N-period returns on individual stocks to the variance of daily returns, for N from 2 days to 6 months. If there is no transitory price variation induced by noise trading (specifically, if price changes are i.i.d.), the N-period variance should grow like N, and the variance ratios (standardized by N) should be close to 1. On the other hand, with transitory price variation, the N-period variance should grow less than in proportion to N, and the variance ratios should be less than 1.
For horizons (N) beyond a week, the variance ratios are more than 2 standard errors below 1, except for the largest quintile of NYSE stocks. But the fractions of daily return variances due to transitory price variation are apparently small. French and Roll estimate that for the average NYSE stock, the upper bound on the transitory portion of the daily variance is 11.7%. Adjusted for the spurious negative autocorrelation of daily returns due to bid-ask effects (Roll (1984)), the estimate of the transitory portion drops to 4.1%. The smallest quintile of NYSE stocks produces the largest estimate of the transitory portion of price variation, an upper bound of 26.9%. After correction for bid-ask effects, however, the estimate drops to 4.7%-hardly a number on which to conclude that noise trading results in substantial market inefficiency. French and Roll (1986, p. 23) conclude, “pricing errors … have a trivial effect on the difference between trading and non-trading variances. We conclude that this difference is caused by differences in the flow of information during trading and non-trading hours.”
In short, with the CRSP daily data back to 1962, recent research is able to show confidently that daily and weekly returns are predictable from past returns. The work thus rejects the old market efficiency-constant expected returns model on a statistical basis. The new results, however, tend to confirm the conclusion of the early work that, at least for individual stocks, variation in daily and weekly expected returns is a small part of the variance of returns. The more striking, but less powerful, recent evidence on the predictability of returns from past returns comes from long-horizon returns.
A. 2. Long-Horizon Returns
The early literature does not interpret the autocorrelation in daily and weekly returns as important evidence against the joint hypothesis of market efficiency and constant expected returns. The argument is that, even when the autocorrelations deviate reliably from 0 (as they do in the recent tests), they are close to 0 and thus economically insignificant.
The view that autocorrelations of short-horizon returns close to 0 imply economic insignificance is challenged by Shiller (1984) and Summers (1986). They present simple models in which stock prices take large slowly decaying swings away from fundamental values (fads, or irrational bubbles), but short-horizon returns have little autocorrelation. In the Shiller-Summers model, the market is highly inefficient, but in a way that is missed in tests on short-horizon returns.
To illustrate the point, suppose the fundamental value of a stock is constant and the unconditional mean of the stock price is its fundamental value. Suppose daily prices are a first-order autoregression (AR1) with slope less than but close to 1. All variation in the price then results from long mean-reverting swings away from the constant fundamental value. Over short horizons, however, an AR1 slope close to 1 means that the price looks like a random walk and returns have little autocorrelation. Thus in tests on short-horizon returns, all price changes seem to be permanent when fundamental value is in fact constant and all deviations of price from fundamental value are temporary.
In his comment on Summers (1986), Stambaugh (1986) points out that although the Shiller-Summers model can explain autocorrelations of short-horizon returns that are close to 0, the long swings away from fundamental value proposed in the model imply that long-horizon returns have strong negative autocorrelation. (In the example above, where the price is a stationary AR1, the autocorrelations of long-horizon returns approach −0.5.) Intuitively, since the swings away from fundamental value are temporary, over long horizons they tend to be reversed. Another implication of the negative autocorrelation induced by temporary price movements is that the variance of returns should grow less than in proportion to the return horizon.
The Shiller-Summers challenge spawned a series of papers on the predictability of long-horizon returns from past returns. The evidence at first seemed striking, but the tests turn out to be largely fruitless. Thus, Fama and French (1988a) find that the autocorrelations of returns on diversified portfolios of NYSE stocks for the 1926–1985 period have the pattern predicted by the Shiller-Summers model. The autocorrelations are close to 0 at short horizons, but they become strongly negative, around − 0.25 to − 0.4, for 3– to 5-year returns. Even with 60 years of data, however, the tests on long-horizon returns imply small sample sizes and low power. More telling, when Fama and French delete the 1926–1940 period from the tests, the evidence of strong negative autocorrelation in 3– to 5-year returns disappears.
Similarly, Poterba and Summers (1988) find that, for N from 2 to 8 years, the variance of N-year returns on diversified portfolios grows much less than in proportion to N. This is consistent with the hypothesis that there is negative autocorrelation in returns induced by temporary price swings. Even with 115 years (1871–1985) of data, however, the variance tests for long-horizon returns provide weak statistical evidence against the hypothesis that returns have no autocorrelation and prices are random walks.
Finally, Fama and French (1988a) emphasize that temporary swings in stock prices do not necessarily imply the irrational bubbles of the Shiller-Summers model. Suppose (1) rational pricing implies an expected return that is highly autocorrelated but mean-reverting, and (2) shocks to expected returns are uncorrelated with shocks to expected dividends. In this situation, expected-return shocks have no permanent effect on expected dividends, discount rates, or prices. A positive shock to expected returns generates a price decline (a discount rate effect) that is eventually erased by the temporarily higher expected returns. In short, a ubiquitous problem in time-series tests of market efficiency, with no clear solution, is that irrational bubbles in stock prices are indistinguishable from rational time-varying expected returns.
A. 3. The Contrarians
DeBondt and Thaler (1985, 1987) mount an aggressive empirical attack on market efficiency, directed at unmasking irrational bubbles. They find that the NYSE stocks identified as the most extreme losers over a 3– to 5-year period tend to have strong returns relative to the market during the following years, expecially in January of the following years. Conversely, the stocks identified as extreme winners tend to have weak returns relative to the market in subsequent years. They attribute these results to market overreaction to extreme bad or good news about firms.
Chan (1988) and Ball and Kothari (1989) argue that the winner-loser results are due to failure to risk-adjust returns. (DeBondt and Thaler (1987) disagree.) Zarowin (1989) finds no evidence for the DeBondt-Thaler hypothesis that the winner-loser results are due to overreaction to extreme changes in earnings. He argues that the winner-loser effect is related to the size effect of Banz (1981); that is, small stocks, often losers, have higher expected returns than large stocks. Another explanation, consistent with an efficient market, is that there is a risk factor associated with the relative economic performance of firms (a distressed-firm effect) that is compensated in a rational equilibrium-pricing model (Chan and Chen (1991)).
We may never be able to say which explanation of the return behavior of extreme winners and losers is correct, but the results of DeBondt and Thaler and their critics are nevertheless interesting. (See also Jagedeesh (1990), Lehmann (1990), and Lo and MacKinlay (1990), who find reversal behavior in the weekly and monthly returns of extreme winners and losers. Lehmann's weekly reversals seem to lack economic significance. When he accounts for spurious reversals due to bouncing between bid and ask prices, trading costs of 0.2% per turnaround transaction suffice to make the profits from his reversal trading rules close to 0. It is also worth noting that the short-term reversal evidence of Jegadeesh, Lehmann, and Lo and MacKinlay may to some extent be due to CRSP data errors, which would tend to show up as price reversals.)
B. Other Forecasting Variables
The univariate tests on long-horizon returns of Fama and French (1988a) and Poterba and Summers (1988) are a statistical power failure. Still, they provide suggestive material to spur the search for more powerful tests of the hypothesis that slowly decaying irrational bubbles, or rational time-varying expected returns, are important in the long-term variation of prices.
There is a simple way to see the power problem. An autocorrelation is the slope in a regression of the current return on a past return. Since variation through time in expected returns is only part of the variation in returns, tests based on autocorrelations lack power because past realized returns are noisy measures of expected returns. Power in tests for return predictability can be enhanced if one can identify forecasting variables that are less noisy proxies for expected returns that past returns.
B. 1. The Evidence
There is no lack of old evidence that short-horizon returns are predictable from other variables. A puzzle of the 1970's was to explain why monthly stock returns are negatively related to expected inflation (Bodie (1976), Nelson (1976), Jaffe and Mandelker (1976), Fama (1981)) and the level of short-term interest rates (Fama and Schwert (1977)). Like the autocorrelation tests, however, the early work on forecasts of short-horizon returns from expected inflation and interest rates suggests that the implied variation in expected returns is a small part of the variance of returns—less than 3% for monthly returns. The recent tests suggest, however, that for long-horizon returns, predictable variation is a larger part of return variances.
Thus, following evidence (Rozeff (1984), Shiller (1984)) that dividend yields (D/P) forecast short-horizon stock returns, Fama and French (1988b) use D/P to forecast returns on the value-weighted and equally weighted portfolios of NYSE stocks for horizons from 1 month to 5 years. As in earlier work, D/P explains small fractions of monthly and quarterly return variances. Fractions of variance explained grow with the return horizon, however, and are around 25% for 2– to 4-year returns. Campbell and Shiller (1988b) find that E/P ratios, especially when past earnings (E) are averaged over 10–30 years, have reliable forecast power that also increases with the return horizon. Unlike the long-horizon autocorrelations in Fama and French (1988a), the long-horizon forecast power of D/P and E/P is reliable for periods after 1940.
Fama and French (1988b) argue that dividend yields track highly autocorrelated variation in expected stock returns that becomes a larger fraction of return variation for longer return horizons. The increasing fraction of the variance of long-horizon returns explained by D/P is thus due in large part to the slow mean reversion of expected returns. Examining the forecast power of variables like D/P and E/P over a range of return horizons nevertheless gives striking perspective on the implications of slow-moving expected returns for the variation of returns.
B.2. Market Efficiency
The predictability of stock returns from dividend yields (or E/P) is not in itself evidence for or against market efficiency. In an efficient market, the forecast power of D/P says that prices are high relative to dividends when discount rates and expected returns are low, and vice versa. On the other hand, in a world of irrational bubbles, low D/P signals irrationally high stock prices that will move predictably back toward fundamental values. To judge whether the forecast power of dividend yields is the result of rational variation in expected returns or irrational bubbles, other information must be used. As always, even with such information, the issue is ambiguous.
For example, Fama and French (1988b) show that low dividend yields imply low expected returns, but their regressions rarely forecast negative returns for the value and equally weighted portfolios of NYSE stocks. In their data, return forecasts more than 2 standard errors below 0 are never observed, and more than 50% of the forecasts are more than 2 standard errors above 0. Thus there is no evidence that low D/P signals bursting bubbles, that is, negative expected stock returns. A bubbles fan can argue, however, that because the unconditional means of stock returns are high, a bursting bubble may well imply low but not negative expected returns. Conversely, if there were evidence of negative expected returns, an efficient-markets type could argue that asset-pricing models do not say that rational expected returns are always positive.
Fama and French (1989) suggest a different way to judge the implications of return predictability for market efficiency. They argue that if variation in expected returns is common to different securities, then it is probably a rational result of variation in tastes for current versus future consumption or in the investment opportunities of firms. They show that the dividend yield on the NYSE value-weighted portfolio indeed forecasts the returns on corporate bonds as well as common stocks. Moreover, two term-structure variables, (1) the default spread (the difference between the yields on lower-grade and Aaa long-term corporate bonds) and (2) the term spread (the difference between the long-term Aaa yield and the yield on 1-month Treasury bills), forecast returns on the value and equally weighted portfolios of NYSE stocks as well as on portfolios of bonds in different (Moodys) rating groups.
Keim and Stambaugh (1986) and Campbell (1987) also find that stock and bond returns are predictable from a common set of stock market and term-structure variables. Harvey (1991) finds that the dividend yield on the S&P 500 portfolio and U.S. term-structure variables forecast the returns on portfolios of foreign common stocks, as well as the S&P return. Thus the variation in expected returns tracked by the U.S. dividend yield and term-structure variables is apparently international.
Ferson and Harvey (1991) formally test the common expected returns hypothesis. Using the asset-pricing models of Merton (1973) and Ross (1976), they try to link the time-series variation in expected returns, captured by dividend yields and term-structure variables, to the common factors in returns that determine the cross-section of expected returns. They estimate that the common variation in expected returns is about 80% of the predictable time-series variation in the returns on Government bonds, corporate bonds, and common-stock portfolios formed on industry and size. They can't reject the hypothesis that all the time-series variation in expected returns is common.
Fama and French (1989) push the common expected returns argument for market efficiency one step further. They argue that there are systematic patterns in the variation of expected returns through time that suggest that it is rational. They find that the variation in expected returns tracked by D/P and the default spread (the slopes in the regressions of returns on D/P or the default spread) increase from high-grade bonds to low-grade bonds, from bonds to stocks, and from large stocks to small stocks. This ordering corresponds to intuition about the risks of the securities. On the other hand, the variation in expected returns tracked by the term spread is similar for all long-term securities (bonds and stocks), which suggests that it reflects variation in a common premium for maturity risks.
Finally, Fama and French (1989) argue that the variation in the expected returns on bonds and stocks captured by their forecasting variables is consistent with modern intertemporal asset-pricing models (e.g., Lucas (1978), Breeden (1979)), as well as with the original consumption-smoothing stories of Friedman (1957) and Modigliani and Brumberg (1955). The general message of the Fama-French tests (confirmed in detail by Chen (1991)) is that D/P and the default spread are high (expected returns on stocks and bonds are high) when times have been poor (growth rates of output have been persistently low). On the other hand, the term spread and expected returns are high when economic conditions are weak but anticipated to improve (future growth rates of output are high). Persistent poor times may signal low wealth and higher risks in security returns, both of which can increase expected returns. In addition, if poor times (and low incomes) are anticipated to be partly temporary, expected returns can be high because consumers attempt to smooth consumption from the future to the present.
For the diehard bubbles fan, these arguments that return predictability is rational are not convincing. Common variation in expected returns may just mean that irrational bubbles are correlated across assets and markets (domestic and international). The correlation between the common variation in expected returns and business conditions may just mean that the common bubbles in different markets are related to business conditions. On the other hand, if there were evidence of security-specific variation in expected returns, an efficient-markets type could argue that it is consistent with uncorrelated variation through time in the risks of individual securities. All of which shows that deciding whether return predictability is the result of rational variation in expected returns or irrational bubbles is never clearcut.
My view is that we should deepen the search for links between time-varying expected returns and business conditions, as well as for tests of whether the links conform to common sense and the predictions of asset-pricing models. Ideally, we would like to know how variation in expected returns relates to productivity shocks that affect the demand for capital goods, and to shocks to tastes for current versus future consumption that affect the supply of savings. At a minimum, we can surely expand the work in Chen (1991) on the relations between the financial market variables that track expected returns (D/P and the term-structure variables) and the behavior of output, investment, and saving. We can also extend the preliminary attempts of Balvers, Cosimano and McDonald (1990), Cechetti, Lam, and Mark (1990) and Kandel and Stambaugh (1990) to explain the variation through time in expected returns in the confines of standard asset-pricing models.
B. 3. A Caveat
The fact that variation in expected returns is common across securities and markets, and is related in plausible ways to business conditions, leans me toward the conclusion that, if it is real it is rational. But how much of it is real? The standard errors of the slopes for the forecasting variables in the return regressions are typically large and so leave much uncertainty about forecast power (Hodrick (1990), Nelson and Kim (1990)). Inference is also clouded by an industry-level data-dredging problem. With many clever researchers, on both sides of the efficiency fence, rummaging for forecasting variables, we are sure to find instances of “reliable” return predictability that are in fact spurious.
Moreover, the evidence that measured variation in expected returns is common across securities, and related to business conditions, does not necessarily mean that it is real. Suppose there is common randomness in stock and bond returns due to randomness in business conditions. Then measured variation in expected returns that is the spurious result of sample-specific conditions is likely to be common across securities and related to business conditions. In short, variation in expected returns with business conditions is plausible and consistent with asset-pricing theory. But evidence of predictability should always be met with a healthy dose of skepticism, and a diligent search for out-of-sample confirmation.
C. Volatility Tests and Seasonals in Returns
C. 1. Volatility Tests
Volatility tests of market efficiency, pioneered by LeRoy and Porter (1981) and Shiller (1979, 1981), have mushroomed into a large literature. Excellent reviews (West (1988), LeRoy (1989), Cochrane (1991)) are available, so here I briefly comment on why I concur with Merton (1987), Kleidon (1988), and Cochrane (1991) that the tests are not informative about market efficiency.
A central assumption in the early volatility tests is that expected returns are constant and the variation in stock prices is driven entirely by shocks to expected dividends. By the end of the 1970's, however, evidence that expected stock and bond returns vary with expected inflation rates, interest rates, and other term-structure variables was becoming commonplace (Bodie (1976), Jaffe and Mandelker (1976), Nelson (1976), Fama (1976a, b), Fama and Schwert (1977)). With all the more recent evidence on return predictability, it now seems clear that volatility tests are another useful way to show that expected returns vary through time.
The volatility tests, however, give no help on the central issue of whether the variation in expected returns is rational. For example, is it related in sensible ways to business conditions? Grossman and Shiller (1981) and Campbell and Shiller (1988a) attempt to move the volatility tests in this direction. Predictably, however, they run head-on into the joint hypothesis problem. They test market efficiency jointly with the hypothesis that their versions of the consumption-based asset-pricing model capture all rational variation in expected returns.
C. 2. Return Seasonality
The recent literature includes a spate of “anomalies” papers that document “seasonals” in stock returns. Monday returns are on average lower than returns on other days (Cross (1973), French (1980), Gibbons and Hess (1981)). Returns are on average higher the day before a holiday (Ariel 1990), and the last day of the month (Ariel (1987)). There also seems to be a seasonal in intraday returns, with most of the average daily return coming at the beginning and end of the day (Harris (1986)). The most mystifying seasonal is the January effect. Stock returns, especially returns on small stocks, are on average higher in January than in other months. Moreover, much of the higher January return on small stocks comes on the last trading day in December and the first 5 trading days in January (Keim (1983), Roll (1983)).
Keim (1988) reviews this literature. He argues that seasonals in returns are anomalies in the sense that asset-pricing models do not predict them, but they are not necessarily embarassments for market efficiency. For example, Monday, holiday, and end-of-month returns deviate from normal average daily returns by less than the bid-ask spread of the average stock (Lakonishok and Smidt (1988)). Turn-of-the-year abnormal returns for small stocks are larger, but they are not large relative to the bid-ask spreads of small stocks (Roll (1983)). There is thus some hope that these seasonals can be explained in terms of market microstructure, that is, seasonals in investor trading patterns that imply innocuous seasonals in the probabilities that measured prices are at ask or bid. The evidence in Lakonishok and Maberly (1990) on Monday trading patterns, and in Reinganum (1983), Ritter (1988), and Keim (1989) on turn-of-the-year trading are steps in that direction.
We should also keep in mind that the CRSP data, the common source of evidence on stock returns, are mined on a regular basis by many researchers. Spurious regularities are a sure consequence. Apparent anomalies in returns thus warrant out-of-sample tests before being accepted as regularities that are likely to be present in future returns. Lakonishok and Smidt (1988) find that the January, Monday, holiday, and end-of-month seasonals stand up to replication on data preceding the periods used in the original tests. The intramonth seasonal (most of the average return of any month comes in the first half) of Ariel (1987), however, seems to be specific to his sample period. Connolly (1989) finds that the Monday seasonal in NYSE returns is weaker after 1974.
Recent data on the premier seasonal, the January effect, tell an interesting story. Table I shows that for the 1941–1981 period, the average monthly January return on a value-weighted portfolio of the smallest quintile of CRSP stocks is 8.06% (!), versus 1.34% for the S&P 500. During the 1941–1981 period, there is only 1 year (1952) when the S&P January return is above the CRSP bottom-quintile return. Moreover, for 1941–1981, all of the advantage of the CRSP small-stock portfolio over the S&P comes in January; the February-to-December average monthly returns on the two portfolios differ by only 4 basis points (0.88% for CRSP Small versus 0.92% for the S&P).
For 1982–1991, however, the average January return on the CRSP small-stock portfolio, 5.32%, is closer to the January S&P return, 3.20%. More striking, the average January return on the DFA U.S. Small Company Portfolio, a passive mutual fund meant to roughly mimic the CRSP bottom quintile, is 3.58%, quite close to the January S&P return (3.20%) and much less than the January return for the CRSP small-stock portfolio (5.32%). The CRSP small-stock portfolio has a higher return than the DFA portfolio in every January of 1982–1991. But January is the exception; overall, the DFA portfolio earns about 3% per year more than the CRSP bottom quintile.
|The value-weighted CRSP small-stock portfolio (CRSP Small) contains the bottom qunitile of NYSE stocks, and the AMEX and NASDAQ stocks that fall below the size (price times shares) breakpoint for the bottom qunitile of NYSE stocks. The portfolio is formed at the end of each quarter and held for one quarter. Prior to June 1962, CRSP Small contains only the bottom quintile of NYSE stocks. AMEX stocks are added in July 1962 and NASDAQ stocks in January 1973. The DFA U.S. Small Company Portfolio (DFA Small) is a passive mutual fund meant to roughly mimic CRSP Small. DFA Small returns are only available for the 1982–1991 period.|
|Average Monthly Returns for January, February to December, and All Months|
|1941–1981||1982–1990 (91 for January)|
|Year-by-Year Comparison of January Returns for 1982–1991|
|Year||S&P||CRSP Small||DFA Small||CRSP-S&P||DFA-S&P|
Why these differences between the returns on the CRSP small-stock portfolio and a mimicking passive mutual fund? DFA does not try to mimic exactly the CRSP bottom quintile. Concern with trading costs causes DFA to deviate from strict value weights and to avoid the very smallest stocks (that are, however, a small fraction of a value-weighted portfolio). Moreover, DFA does not sell stocks that do well until they hit the top of the third (smallest) decile. This means that their stocks are on average larger than the stocks in the CRSP bottom quintile (a strategy that paid off during the 1982–1991 period of an inverted size effect.)
The important point, however, is that small-stock returns, and the very existence of a January bias in favor of small stocks, are sensitive to small changes (imposed by rational trading) in the way small-stock portfolios are defined. This suggests that, until we know more about the pricing (and economic fundamentals) of small stocks, inferences should be cautious for the many anomalies where small stocks play a large role (e.g., the overreaction evidence of DeBondt and Thaler (1985, 1987) and Lehmann (1990), and (discussed below) the size effect of Banz (1981), the Value Line enigma of Stickel (1985), and the earnings-announcement anomaly of Bernard and Thomas (1989, 1990)).
Finally, given our fascination with anomalies that center on small stocks, it is well to put the relative importance of small stocks in perspective. At the end of 1990, there were 5135 NYSE, AMEX, and NASDAQ (NMS) stocks. Using NYSE stocks to define size breakpoints, the smallest quintile has 2631 stocks, 51.2% of the total. But the bottom quintile is only 1.5% of the combined value of NYSE, AMEX, and NASDAQ stocks. In contrast, the largest quintile has 389 stocks (7.6% of the total), but it is 77.2% of market wealth.
IV. Cross-Sectional Return Predictability
At the time of the 1970 review, the asset-pricing model of Sharpe (1964), Lintner (1965), and Black (1972) was just starting to take hold. Ross's (1976) arbitrage-pricing model and the intertemporal asset-pricing models of Merton (1973), Rubinstein (1976), Lucas (1978), Breeden (1979), and Cox, Ingersoll, and Ross (1985) did not exist. In the pre–1970 efficient markets literature, the common “models” of market equilibrium were the informal constant expected returns model (random-walk and martingale tests) and the market model (event studies, like Fama, Fisher, Jensen, and Roll (1969)).
This section considers the post-1970 empirical research on asset-pricing models. This work does not place itself in the realm of tests of market efficiency, but this just means that efficiency is a maintained hypothesis. Depending on the emphasis desired, one can say that efficiency must be tested conditional on an asset-pricing model or that asset-pricing models are tested conditional on efficiency. The point is that such tests are always joint evidence on efficiency and an asset-pricing model.
Moreover, many of the front-line empirical anomalies in finance (like the size effect) come out of tests directed at asset-pricing models. Given the joint hypothesis problem, one can't tell whether such anomalies result from misspecified asset-pricing models or market inefficiency. This ambiguity is sufficient justification to review tests of asset-pricing models here.
We first consider tests of the one-factor Sharpe-Lintner-Black (SLB) model. I argue that the SLB model does the job expected of a good model. In rejecting it, repeatedly, our understanding of asset-pricing is enhanced. Some of the most striking empirical regularities discovered in the last 20 years are “anomalies” from tests of the SLB model. These anomalies are now stylized facts to be explained by other asset-pricing models.
The next step is to review the evidence on the multifactor asset-pricing models of Merton (1973) and Ross (1976). These models are rich and more flexible than their competitors. Based on existing evidence, they show some promise to fill the empirical void left by the rejections of the SLB model.
The final step is to discuss tests of the consumption-based intertemporal asset-pricing model of Rubinstein (1976), Lucas (1978), Breeden (1979), and others. The elegant simplicity of this model gives it strong appeal, and much effort has been devoted to testing it. The effort is bearing fruit. Recent tests add to our understanding of the behavior of asset returns in ways that go beyond tests of other models (e.g., the equity-premium puzzle of Mehra and Prescott (1985)). On the other hand, the tests have not yet taken up the challenges (like the size effect) raised by rejections of the SLB model.
A. The Sharpe-Lintner-Black (SLB) Model
A. 1. Early Success
The early 1970's produce the first extensive tests of the SLB model (Black, Jensen, and Scholes (1972), Blume and Friend (1973), Fama and MacBeth (1973)). These early studies suggest that the special prediction of the Sharpe-Lintner version of the model, that portfolios uncorrelated with the market have expected returns equal to the risk-free rate of interest, does not fare well. (The average returns on such “zero-β” portfolios are higher than the risk-free rate.) Other predictions of the model seem to do better.
The most general implication of the SLB model is that equilibrium pricing implies that the market portfolio of invested wealth is ex ante mean-variance efficient in the sense of Markowitz (1959). Consistent with this hypothesis, the early studies suggest that (1) expected returns are a positive linear function of market β (the covariance of a security's return with the return on the market portfolio divided by the variance of the market return), and (2) β is the only measure of risk needed to explain the cross-section of expected returns. With this early support for the SLB model, there was a brief euphoric period in the 1970's when market efficiency and the SLB model seemed to be a sufficient description of the behavior of security returns.
We should have known better. The SLB model is just a model and so surely false. The first head-on attack is Roll's (1977) criticism that the early tests aren't much evidence for the SLB model because the proxies used for the market portfolio (like the equally weighted NYSE portfolio) do not come close to the portfolio of invested wealth called for by the model. Stambaugh's (1982) evidence that tests of the SLB model are not sensitive to the proxy used for the market suggests that Roll's criticism is too strong, but this issue can never be entirely resolved.
A. 2. Anomalies
The telling empirical attacks on the SLB model begin in the late 1970's with studies that identify variables that contradict the model's prediction that market β's suffice to describe the cross-section of expected returns. Basu (1977, 1983) shows that earnings/price ratios (E/P) have marginal explanatory power; controlling for β, expected returns are positively related to E/P. Banz (1981) shows that a stock's size (price times shares) helps explain expected returns; given their market β's, expected returns on small stocks are too high, and expected returns on large stocks are too low. Bhandari (1988) shows that leverage is positively related to expected stock returns in tests that also include market β's. Finally, Chan, Hamao, and Lakonishok (1991) and Fama and French (1991) find that book-to-market equity (the ratio of the book value of a common stock to its market value) has strong explanatory power; controlling for β, higher book-to-market ratios are associated with higher expected returns.
One argument says that the anomalies arise because estimates of market β's are noisy, and the anomalies variables are correlated with true β's. For example, Chan and Chen (1988) find that when portfolios are formed on size, the estimated β's of the portfolios are almost perfectly correlated (− 0.988) with the average size of stocks in the portfolios. Thus, distinguishing between the roles of size and β in the expected returns on size portfolios is likely to be difficult. Likewise, theory predicts that, given a firm's business activities, the β of its stock increases with leverage. Thus leverage might proxy for true β's when β estimates are noisy.
Another approach uses the multifactor asset-pricing models of Merton (1973) and Ross (1976) to explain the SLB anomalies. For example, Ball (1978) argues that E/P is a catch-all proxy for omitted factors in asset-pricing tests. Thus, if two stocks have the same current earnings but different risks, the riskier stock has a higher expected return, and it is likely to have a lower price and higher E/P. E/P is then a general proxy for risk and expected returns, and one can expect it to have explanatory power when asset-pricing follows a multifactor model and all relevant factors are not included in asset-pricing tests.
Chan and Chen (1991) argue that the size effect is due to a distressed-firm factor in returns and expected returns. When size is defined by the market value of equity, small stocks include many marginal or depressed firms whose performance (and survival) is sensitive to business conditions. Chan and Chen argue that relative distress is an added risk factor in returns, not captured by market β, that is priced in expected returns. Fama and French (1991) argue that since leverage and book-to-market equity are also largely driven by the market value of equity, they also may proxy for risk factors in returns that are related to relative distress or, more generally, to market judgments about the relative prospects of firms.
Other work shows that there is indeed spillover among the SLB anomalies. Reinganum (1981) and Basu (1983) find that size and E/P are related; small stocks tend to have high E/P. Bhandari (1988) finds that small stocks include many firms that are highly levered, probably as result of financial distress. Chan, Hamao, and Lakonishok (1991) and Fama and French (1991) find that size and book-to-market equity are related; hard times and lower stock prices cause many stocks to become small, in terms of market equity, and so to have high book-to-market ratios. Fama and French (1991) find that leverage and book-to-market equity are highly correlated. Again, these links among the anomalies are hardly surprising, given that the common driving variable in E/P, leverage, size, and book-to-market equity is a stock's price.
How many of the SLB anomalies have separately distinguishable roles in expected returns? In tests aimed at this question, Fama and French (1991) find that for U.S. stocks, E/P, leverage, and book-to-market equity weaken but do not fully absorb the relation between size and expected returns. On the other hand, when size and book-to-market equity are used together, they leave no measurable role for E/P or leverage in the cross-section of average returns on NYSE, AMEX, and NASDAQ stocks. Chan, Hamao, and Lakonishok (1991) get similar results for Japan. The strong common result of Chan, Hamao, and Lakonishok (1991) and Fama and French (1991) is that for Japanese and U.S. stocks, book-to-market equity is the most powerful explanatory variable in the cross-section of average returns, with a weaker role for size. Thus, book-to-market equity seems to have displaced size as the premier SLB anomaly.
In truth, the premier SLB anomaly is not size or book-to-market equity but the weak role of market β in the cross-section of average returns on U.S. stocks. For example, Fama and French (1991) find that the relation between β and average returns on NYSE, AMEX, and NASDAQ stocks for 1963–1990 is feeble, even when β is the only explanatory variable. Their estimated premium per unit of β is 12 basis points per month (1.44% per year), and less than 0.5 standard errors from 0. Stambaugh (1982) and Lakonishok and Shapiro (1986) get similar results for NYSE stocks for 1953–1976 and 1962–1981.
Chan and Chen (1988) find that when the assets used in tests of the SLB model are common-stock portfolios formed on size, there is a strong relation between average returns and β in the 1954–1983 period. Fama and French (1991) show, however, that this result is due to the strong correlation between the β's of size portfolios and the average size of the stocks in the portfolios (−0.988 in Chan and Chen). Fama and French find that when portfolios are formed on size and β (as in Banz 1981), there is strong variation in β that is unrelated to size (the range of the β's just about doubles), and it causes the relation between β and average returns to all but disappear after 1950. In short, the rather strong positive relation between β and the average returns on U.S. stocks observed in the early tests of Black, Jensen, and Scholes (1972) and Fama and MacBeth (1973) does not seem to extend to later periods.
Finally, Stambaugh (1982) shows that when the assets in the SLB tests are extended to include bonds as well as stocks, there is a reliable positive relation between average returns and β in the post-1953 period. His results, along with those of Lakonishok and Shapiro (1986) and Fama and French (1991), suggest two conclusions. (1) As predicted by the SLB model, there is a positive relation between expected returns and β across security types (bonds and stocks). (2) On average, however, the relation between expected returns and β for common stocks is weak, even though stocks cover a wide range of β's.
A.3. Market Efficiency
The relations between expected returns and book-to-market equity, size, E/P, and leverage are usually interpreted as embarrassments for the SLB model, or the way it is tested (faulty estimates of market β's), rather than as evidence of market inefficiency. The reason is that the expected-return effects persist. For example, small stocks have high expected returns long after they are classified as small. In truth, though, the existing tests can't tell whether the anomalies result from a deficient (SLB) asset-pricing model or persistent mispricing of securities.
One can imagine evidence that bears on the matter. If a past anomaly does not appear in future data, it might be a market inefficiency, erased with the knowledge of its existence. (Or, the historical evidence for the anomaly may be a result of the profession's dogged data-dredging.) On the other hand, if the anomaly is explained by other asset-pricing models, one is tempted to conclude that it is a rational asset-pricing phenomenon. (But one should be wary that the apparent explanation may be the result of model-dredging.) In any case, I judge the maturity of the tests of other asset-pricing models in part on how well they explain, or at least address, the anomalies discovered in tests of the SLB model.
A. 4. The Bottom Line
With the deck of existing anomalies in hand, we should not be surprised when new studies show that yet other variables contradict the central prediction of the SLB model, that market β's suffice to describe the cross-section of expected returns. It is important to note, however, that we discover the contradictions because we have the SLB model as a sharp benchmark against which to examine the cross-section of expected returns. Moreover, the SLB model does its job. It points to empirical regularities in expected returns (size, E/P, leverage, and book-to-market effects) that must be explained better by any challenger asset-pricing model.
The SLB model also passes the test of practical usefulness. Before it became a standard part of MBA investments courses, market professionals had only a vague understanding of risk and diversification. Markowitz (1959) portfolio model did not have much impact on practice because its statistics are relatively complicated. The SLB model, however, gave a summary measure of risk, market β, interpreted as market sensitivity, that rang mental bells. Indeed, in spite of the evidence against the SLB model, market professionals (and academics) still think about risk in terms of market β. And, like academics, practitioners retain the market line (from the riskfree rate through the market portfolio) of the Sharpe-Lintner model as a representation of the tradeoff of expected return for risk available from passive portfolios.
B. Multifactor Models
In the SLB model, the cross-section of expected returns on securities and portfolios is described by their market β's, where β is the slope in the simple regression of a security's return on the market return. The multifactor asset-pricing models of Merton (1973) and Ross (1976) generalize this result. In these models, the return-generating process can involve multiple factors, and the cross-section of expected returns is constrained by the cross-sections of factor loadings (sensitivities). A security's factor loadings are the slopes in a multiple regression of its return on the factors.
The multifactor models are an empiricist's dream. They are off-the-shelf theories that can accommodate tests for cross-sectional relations between expected returns and the loadings of security returns on any set of factors that are correlated with returns. How have tests of the models fared?
One approach, suggested by Ross' (1976) arbitrage-pricing theory (APT), uses factor analysis to extract the common factors in returns and then tests whether expected returns are explained by the cross-sections of the loadings of security returns on the factors (Roll and Ross (1980), Chen (1983)). Lehmann and Modest (1988) test this approach in detail. Most interesting, using models with up to 15 factors, they test whether the multifactor model explains the size anomaly of the SLB model. They find that the multifactor model leaves an unexplained size effect much like the SLB model; that is, expected returns are too high, relative to the model, for small stocks and too low for large stocks.
The factor analysis approach to tests of the APT leads to unresolvable squabbles about the number of common factors in returns and expected returns (Dhrymes, Friend, and Gultekin (1984), Roll and Ross (1984), Dhrymes, Friend, Gultekin, and Gultekin (1984), Trzcinka (1986), Conway and Reinganum (1988)). The theory, of course, is no help. Shanken (1982) argues that the factor analysis approach to identifying the common factors in returns and expected returns is in any case doomed by fundamental inconsistencies.
I think the factor analysis approach is limited, but for a different reason. It can confirm that there is more than one common factor in returns and expected returns, which is useful. But it leaves one hungry for economic insights about how the factors relate to uncertainties about consumption and portfolio opportunities that are of concern to investors, that is, the hedging arguments for multifactor models of Fama (1970a) and Merton (1973).
Although more studies take the factor analysis approach, the most influential tests of the multifactor model are those of Chen, Roll, and Ross (1986). The alternative approach in Chen, Roll, and Ross is to look for economic variables that are correlated with stock returns and then to test whether the loadings of returns on these economic factors describe the cross-section of expected returns. This approach thus addresses the hunger for factors with an economic motivation, left unsatisfied in the factor analysis approach.
Chen, Roll, and Ross examine a range of business conditions variables that might be related to returns because they are related to shocks to expected future cash flows or discount rates. The most powerful variables are the growth rate of industrial production and the difference between the returns on long-term low-grade corporate bonds and long-term Government bonds. Of lesser significance are the unexpected inflation rate and the difference between the returns on long and short Government bonds. Chen, Roll, and Ross (1986) conclude that their business conditions variables are risk factors in returns, or they proxy for such factors, and the loadings on the variables are priced in the cross-section of expected returns.
Chen, Roll, and Ross confront the multifactor model with the SLB model. They find that including SLB market β's has little effect on the power of their economic factors to explain the cross-section of expected returns, but SLB market β's have no marginal explanatory power. They get similar results in tests of the multifactor model against the consumption-based model (see below). Moreover, Chan, Chen, and Hsieh (1985) argue that the business conditions variables in Chen, Roll, and Ross, especially the difference between low-grade corporate and Government bond returns, explain the size anomaly of the SLB model. These successes of the multifactor model are, however, tempered by Shanken and Weinstein (1990), who find that the power of the economic factors in Chen, Roll, and Ross is sensitive to the assets used in the tests and the way factor loadings are estimated.
The Chen, Roll, and Ross approach (identifying economic factors that are correlated with returns and testing whether the factor loadings explain the cross-section of expected returns) is probably the most fruitful way to use multifactor models to improve our understanding of asset-pricing. As in Ferson and Harvey (1991), the approach can be used to study the links between the common economic factors in the cross-section of returns and the financial (dividend-yield and term-structure) variables that track variation in expected returns through time. Since the approach looks for economic variables that are related to returns and expected returns, it can also be useful in the critical task of modelling the links between expected returns and the real economy (Chen (1991)). In the end, there is some hope with this approach that we can develop a unified story for the behavior of expected returns (cross-section and time-series) and the links between expected returns and the real economy.
There is an important caveat. The flexibility of the Chen, Roll, and Ross approach can be a trap. Since multifactor models offer at best vague predictions about the variables that are important in returns and expected returns, there is the danger that measured relations between returns and economic factors are spurious, the result of special features of a particular sample (factor dredging). Thus the Chen, Roll, and Ross tests, and future extensions, warrant extended robustness checks. For example, although the returns and economic factors used by Chen, Roll, and Ross are available for earlier and later periods, to my knowledge we have no evidence on how the factors perform outside their sample.
C. Consumption-Based Asset-Pricing Models
The consumption-based model of Rubinstein (1976), Lucas (1978), Breeden (1979), and others is the most elegant of the available intertemporal asset-pricing models. In Breeden's version, the interaction between optimal consumption and portfolio decisions leads to a positive linear relation between the expected returns on securities and their consumption β's. (A security's consumption β is the slope in the regression of its return on the growth rate of per capita consumption.) The model thus summarizes all the incentives to hedge shifts in consumption and portfolio opportunities that can appear in Merton's (1973) multifactor model with a one-factor relation between expected returns and consumption β's.
The simple elegance of the consumption model produces a sustained interest in empirical tests. The tests use versions of the model that make strong assumptions about tastes (time-additive utility for consumption and constant relative risk aversion (CRRA)) and often about the joint distribution of consumption growth and returns (multivariate normality). Because the model is then so highly specified, it produces a rich set of testable predictions about the time series and cross-section properties of returns.
The empirical work on the consumption model often jointly tests its time-series and cross-section predictions, using the pathbreaking approach in Hansen and Singleton (1982). Estimation is with Hansen's (1982) generalized method of moments. The test is based on a χ2 statistic that summarizes, in one number, how the data conform to the model's many restrictions. The tests usually reject. This is not surprising since we know all models are false. The disappointment comes when the rejection is not pursued for additional descriptive information, obscure in the χ2 test, about which restrictions of the model (time-series, cross-section, or both) are the problem. In short, tests of the consumption model sometimes fail the test of usefulness; they don't enhance our ability to describe the behavior of returns.
This is not a general criticism. Much interesting information comes out of the tests of the consumption model. For example, one result, from the so-called unconditional tests, that focus on the predictions of the model about the cross-section of expected returns, is the equity-premium puzzle (Mehra and Prescott (1985)). It says that the representative consumer, whose tastes characterize asset prices, must have high risk aversion to explain the large spread (about 6% per year) of the expected returns on stocks over low-risk securities like Treasury bills. In healthy scientific fashion, the puzzle leads to attempts to modify assumptions to accomodate a large equity premium. For example, Constantinides (1990) argues that a large premium is consistent with models in which utility depends on past consumption (habit formation).
The habit formation argument has a ring of truth, but I also think that a large equity premium is not necessarily a puzzle; high risk aversion (or low intertemporal elasticity of substitution for consumption) may be a fact. Roughly speaking, a large premium says that consumers are extremely averse to small negative consumption shocks. This is in line with the perception that consumers live in morbid fear of recessions (and economists devote enormous energy to studying them) even though, at least in the post-war period, recessions are associated with small changes in per capita consumption.
Moreover, the equity-premium puzzle is a special feature of unconditional tests that focus on the cross-section properties of expected returns. In these tests, estimates of the risk-aversion parameter are imprecise. Conditional tests, that also include the time-series predictions of the model, lead to reasonable estimates of the risk-aversion parameter of the representative consumer (Hansen and Singleton (1982, 1983)).
The central cross-section prediction of Breeden's (1979) version of the consumption model is that expected returns are a positive linear function of consumption β's. On this score, the model does fairly well. Breeden, Gibbons, and Litzenberger (1989) test for linearity on a set of assets that includes the NYSE value-weighted portfolio, 12 industry stock portfolios, and 4 bond portfolios. They argue that the expected returns on these assets are a positive linear function of their consumption β's. Wheatley (1988a) comes to a similar conclusion.
Wheatley (1988b) also cannot reject the hypothesis that the same linear relation between expected returns and consumption β's (with β's measured from U.S. consumption) holds for an opportunity set that includes portfolios of the common stocks of 17 international markets, as well as U.S. Government bonds, corporate bonds, and common stocks. Wheatley thus cannot reject the hypothesis that securities are priced as if the consumption-based model holds and capital markets are internationally integrated.
The plots in Breeden, Gibbons, and Litzenberger (1989) and Wheatley (1988a, b) suggest, however, that as in Stambaugh's (1982) tests of the SLB model, the evidence for a positive relation between expected returns and consumption β's comes largely from the spread between bonds (low β's and low average returns) and stocks (high β's and high average returns). The existence of a positive tradeoff among the stock portfolios is less evident in their plots, and they give no tests for stocks alone.
Breeden, Gibbons, and Litzenberger (1989) and Wheatley (1988a, b) bring the tests of the consumption model to about where tests of the SLB model were after the studies of Black, Jensen, and Scholes (1972), Blume and Friend (1973), and Fama and MacBeth (1973). In particular, a positive relation between expected returns and consumption β's is observed, but there is no confrontation between the consumption model and competing models.
Mankiw and Shapiro (1986) test the consumption model against the SLB model. They argue that in univariate tests, expected returns on NYSE stocks are positively related to their market β's and perhaps to their consumption β's. When the two β's are included in the same regression, the explanatory power of market β's remains, but consumption β's have no explanatory power. These results are, however, clouded by a survival bias. The sample of stocks used by Mankiw and Shapiro is limited to those continuously listed on the NYSE during the entire 1959–1982 period. Not allowing for delistings gives upward-biased average returns, and the bias is probably more severe for higher β (consumption or market) stocks.
Chen, Roll, and Ross (1986) include consumption β's with the β's for the economic variables used in their tests of multifactor models. Again, consumption β's have no marginal explanatory power. Thus Chen, Roll, and Ross reject the prediction of the consumption model that the explanatory power of other variables in the multifactor model is subsumed by consumption β's.
Finally, so far, the tests of the consumption model make no attempt to deal with the anomalies that have caused problems for the SLB model. It would be interesting to confront consumption β's with variables like size and book-to-market equity, that have caused problems for the market β's of the SLB model. Given that the consumption model does not seem to fare well in tests against the SLB model or the multifactor model, however, my guess is that the consumption model will do no better with the anomalies of the SLB model.
D. Where Do We Stand?
D. 1. The Bad News
Rejections of the SLB model are common. Variables like size, leverage, E/P, and book-to-market equity have explanatory power in tests that include market β's. Indeed, in recent tests, market β's have no explanatory power relative to the anomalies variables (Fama and French (1991)). The SLB model is also rejected in tests against multifactor models (Chen, Roll, and Ross (1986)).
If anything, the consumption-based model fares worse than the SLB model. It is rejected in combined (conditional) tests of its time-series and cross-section predictions (Hansen and Singleton (1982, 1983)). The equity-premium puzzle of Mehra and Prescott (1985) is ubiquitous in (unconditional) cross-section tests. And the model seems to fail miserably (consumption β's have no marginal explanatory power) in tests against the SLB model (Mankiw and Shapiro (1986)) and the multifactor model (Chen, Roll, and Ross (1986)).
The multifactor model seems to do better. It survives tests against the SLB and consumption-based models (Chen, Roll, and Ross (1986)). It helps explain the size anomaly of the SLB model (Chan, Chen, and Hsieh (1985), Chan and Chen (1991)). On the other hand, the evidence in Shanken and Weinstein (1990) that the results in Chen, Roll, and Ross and Chan, Chen, and Hsieh are sensitive to the assets used in the tests and the way the β's of economic factors are estimated is disturbing.
One can also argue that an open competition among the SLB, multifactor, and consumption models is biased in favor of the multifactor model. The expected-return variables of the SLB and consumption models (market and consumption β's are clearly specified. In contast, the multifactor models are licenses to search the data for variables that, ex post, describe the cross-section of average returns. It is perhaps no surprise, then, that these variables do well in competitions on the data used to identify them.
D. 2. The Good News
Fortunately, rejections of the SLB model and the consumption model are never clean. For the SLB model, it is always possible that rejections are due to a bad proxy for the market portfolio and thus poor estimates of market β's. With bad β's, other variables that are correlated with true β's (like size) can have explanatory power relative to estimated β's when in fact asset pricing is according to the SLB model.
Estimating consumption β's poses even more serious problems. Consumption is measured with error, and consumption flows from durables are difficult to impute. The model calls for instantaneous consumption, but the data are monthly, quarterly, and annual aggregates. Finally, Cornell (1981) argues that the elegance of the consumption model (all incentives to hedge uncertainty about consumption and investment opportunities are summarized in consumption β's likely means that consumption β's are difficult to estimate because they vary through time.
In this quagmire, it is possible that estimates of market β's are better proxies for consumption β's than estimates of consumption β's, and, as a result, the consumption model is mistakenly rejected in favor of the SLB model. It is even less surprising that the consumption model is rejected in favor of the multifactor model. Since the multifactor model is an expansion of the consumption model (Constantinides (1989)), the estimated β's of the multifactor model may well be better proxies for consumption β's than poorly estimated consumption β's.
These arguments against dismissal of the SLB and consumption models would be uninteresting if the predictions of the models about the cross-section of expected returns are strongly rejected. This is not the case. At least in univariate tests that include both bonds and stocks, expected returns are positively related to market β's and consumption β's, and the relations are approximately linear. Although other predictions of the SLB and consumption models are rejected, the rough validity of their univariate predictions about the cross-section of expected returns, along with their powerful intuitive appeal, keeps them alive and well.
Finally, it is important to emphasize that the SLB model, the consumption model, and the multifactor models are not mutually exclusive. Following Constantinides (1989), one can view the models as different ways to formalize the asset-pricing implications of common general assumptions about tastes (risk aversion) and portfolio opportunities (multivariate normality). Thus, as long as the major predictions of the models about the cross-section of expected returns have some empirical content, and as long as we keep the empirical shortcomings of the models in mind, we have some freedom to lean on one model or another, to suit the purpose at hand.
V. Event Studies
The original event study (of stock splits) by Fama, Fisher, Jensen and Roll (1969) is a good example of serendipity. The paper was suggested by James Lorie. The purpose was to have a piece of work that made extensive use of the newly developed CRSP monthly NYSE file, to illustrate the usefulness of the file, to justify continued funding. We had no clue that event studies would become a research industry. And we can't take much credit for starting the industry. Powerful computers and the CRSP data made it inevitable.
Event studies are now an important part of finance, especially corporate finance. In 1970 there was little evidence on the central issues of corporate finance. Now we are overwhelmed with results, mostly from event studies. Using simple tools, this research documents interesting regularities in the response of stock prices to investment decisions, financing decisions, and changes in corporate control. The results stand up to replication and the empirical regularities, some rather surprising, are the impetus for theoretical work to explain them. In short, on all counts, the event-study literature passes the test of scientific usefulness.
Here I just give a flavor of the results from event studies in corporate finance. The reader who wants a more extensive introduction is well served by the reviews of research on financing decisions by Smith (1986) and corporate-control events by Jensen and Ruback (1983) and Jensen and Warner (1988). Moreover, I mostly ignore the extensive event-study literatures in accounting, industrial organization, and macroeconomics. (See the selective reviews of Ball (1990), Binder (1985), and Santomero (1991).) I dwell a bit more on the implications of the event-study work for market efficiency.
A. Some of the Main Results
One interesting finding is that unexpected changes in dividends are on average associated with stock-price changes of the same sign (Charest (1978), Ahrony and Swary (1980), Asquith and Mullins (1983)). The result is a surprise, given that the Miller-Modigliani (1961) theorem, and its refinements (Miller and Scholes (1978)), predict either that dividend policy is irrelevant or that dividends are bad news because (during the periods of the tests) dividends are taxed at a higher rate than capital gains. The evidence on the response of stock prices to dividend changes leads to signalling models (Miller and Rock (1985)) and free-cash-flow stories (Easterbrook (1984), Jensen (1986)) that attempt to explain why dividend increases are good news for stock prices.
Another surprising result is that new issues of common stock are bad news for stock prices (Asquith and Mullins (1986), Masulis and Korwar (1986)), and redemptions, through tenders or open-market purchases, are good news (Dann (1981), Vermaelen (1981)). One might have predicted the opposite, that is, stock issues are good news because they signal that the firm's investment prospects are strong. Again, the evidence is the impetus for theoretical models that explain it in terms of (1) asymmetric information [managers issue stock when it is overvalued (Myers and Majluf (1984))], (2) the information in a stock issue that cash flows are low (Miller and Rock (1985)), or (3) lower agency costs when free cash flows are used to redeem stock (Jensen (1986)).
Like financing decisions, corporate-control transactions have been examined in detail, largely through event studies. One result is that mergers and tender offers on average produce large gains for the stockholders of the target firms (Mandelker (1974), Dodd and Ruback (1977), Bradley (1980), Dodd (1980), Asquith (1983)). Proxy fights (Dodd and Warner (1983)), management buyouts (Kaplan (1989)), and other control events are also wealth-enhancing for target stockholders. The political pressure to restrict the market for corporate control is strong, but my guess is that without the barrage of evidence that control transactions benefit stockholders, the pressure would be overwhelming.
An aside. The research on corporate control is a good example of a more general blurring of the lines between finance and other areas of economics. Many of the corporate-control studies appear in finance journals, but the work goes to the heart of issues in industrial organization, law and economics, and labor economics. The research is widely known and has contributors from all these areas. Likewise, research on time-varying expected returns and asset-pricing models (especially the consumption-based model) is now important in macroeconomics and international economics as well as in finance. At this point, it is not clear who are the locals and who are the invaders, but the cross-breeding between finance and other areas of economics has resulted in a healthy burst of scientific growth.
The cursory review above highlights just a smattering of the rich results produced by event studies in corporate finance. My focus is more on what this literature tells us about market efficiency.
B. Market Efficiency
The CRSP files of daily returns on NYSE, AMEX, and NASDAQ stocks are a major boost for the precision of event studies. When the announcement of an event can be dated to the day, daily data allow precise measurement of the speed of the stock-price response—the central issue for market efficiency. Another powerful advantage of daily data is that they can attenuate or eliminate the joint-hypothesis problem, that market efficiency must be tested jointly with an asset-pricing model.
Thus, when the stock-price response to an event is large and concentrated in a few days, the way one estimates daily expected returns (normal returns) in calculating abnormal returns has little effect on inferences (Brown and Warner (1985)). For example, in mergers and tender offers, the average increase in the stock price of target firms in the 3 days around the announcement is more than 15%. Since the average daily return on stocks is only about 0.04% (10% per year divided by 250 trading days), different ways of measuring daily expected returns have little effect on the inference that target shares have large abnormal returns in the days around merger and tender announcements.
The typical result in event studies on daily data is that, on average, stock prices seem to adjust within a day to event announcements. The result is so common that this work now devotes little space to market efficiency. The fact that quick adjustment is consistent with efficiency is noted, and then the studies move on to other issues. In short, in the only empirical work where the joint hypothesis problem is relatively unimportant, the evidence typically says that, with respect to firm-specific events, the adjustment of stock prices to new information is efficient.
To be fair, and to illustrate that efficiency issues are never entirely resolved, I play the devil's advocate. (Attacks on efficiency belong, of course, in the camp of the devil.) Although prices on average adjust quickly to firm-specific information, a common finding in event studies (including the original Fama-Fisher-Jensen-Roll split study) is that the dispersion of returns (measured across firms, in event time) increases around information events. Is this a rational result of uncertainty about new fundamental values? Or is it irrational but random over and underreaction to information that washes out in average returns? In short, since event studies focus on the average adjustment of prices to information, they don't tell us how much of the residual variance, generated by the deviations from average, is rational.
Moreover, when part of the response of prices to information seems to occur slowly, event studies become subject to the joint-hypothesis problem. For example, the early merger work finds that the stock prices of acquiring firms hardly react to merger announcements, but thereafter they drift slowly down (Asquith (1983)). One possibility is that acquiring firms on average pay too much for target firms, but the market only realizes this slowly; the market is inefficient (Roll (1986)). Another possibility is that the post-announcement drift is due to bias in measured abnormal returns (Franks, Harris, and Titman (1991)). Still another possiblity is that the drift in the stock prices of acquiring firms in the early merger studies is sample-specific. Mitchell and Lehn (1990) find no evidence of post-announcement drift during the 1982–1986 period for a sample of about 400 acquiring firms.
Post-announcement drift in abnormal returns is also a common result in studies of the response of stock prices to earnings announcements (e.g., Ball and Brown (1968)). Predictably, there is a raging debate on the extent to which the drift can be attributed to problems in measuring abnormal returns (Bernard and Thomas (1989), Ball, Kothari, and Watta (1990)).
Bernard and Thomas (1990) identify a more direct challenge to market efficiency in the way stock prices adjust to earnings announcements. They argue that the market does not understand the autocorrelation of quarterly earnings. As a result, part of the 3-day stock-price response to this quarter's earnings announcement is predictable from earnings 1 to 4 quarters back. This result is especially puzzling, given that earnings are studied so closely by analysts and market participants. The key (if there is one) may be in the fact that the delayed stock-price responses are strongest for small firms that have had extreme changes in earnings.
In short, some event studies suggest that stock prices do not respond quickly to specific information. Given the event-study boom of the last 20 years, however, some anomalies, spurious and real, are inevitable. Moreover, it is important to emphasize the main point. Event studies are the cleanest evidence we have on efficiency (the least encumbered by the joint-hypothesis problem). With few exceptions, the evidence is supportive.
VI. Tests for Private Information
The 1970 review points to only two cases of market inefficiency due to the information advantages of individual agents. (1) Neiderhoffer and Osborne (1966) show that NYSE specialists use their monopolistic access to the book of limit orders to generate trading profits, and (2) Scholes (1972) and others show that corporate insiders have access to information not reflected in prices. That specialists and insiders have private information is not surprising. For efficiency buffs, it is comfortable evidence against (in the old terms) strong-form efficiency. Moreover, Jensen's (1968, 1969) early evidence suggests that private information is not common among professional (mutual-fund) investment managers.
What has happened since 1970 that warrants discussion here? (1) The profitability of insider trading is now established in detail. (2) There is evidence that some security analysts (e.g., Value Line) have information not reflected in stock prices. (3) There is also some evidence that professional investment managers have access to private information (Ippolito (1989)), but it is seems to be more than balanced by evidence that they do not (Brinson, Hood, and Beebower (1986), Elton, Gruber, Das, and Hklarka (1991)).
A. Insider Trading
In the 1970's, with the early evidence (Black, Jensen, and Scholes (1972), Fama and MacBeth (1973)) that the SLB model seemed to be a good approximation for expected returns on NYSE stocks, the thinking was that the model should be used routinely in tests of market efficiency, to replace informal models like the market model and the constant expected returns model. Jaffe's (1974) study of insider trading is one of the first in this mold.
Like earlier work, Jaffe finds, not surprisingly, that for insiders the stock market is not efficient; insiders have information that is not reflected in prices. His disturbing finding is that the market does not react quickly to public information about insider trading. Outsiders can profit from the knowledge that there has been heavy insider trading for up to 8 months after information about the trading becomes public—a startling contradiction of market efficiency.
Seyhun (1986) offers an explanation. He confirms that insiders profit from their trades, but he does not confirm Jaffe's finding that outsiders can profit from public information about insider trading. Seyhun argues that Jaffe's outsider profits arise because he uses the SLB model for expected returns. Seyhun shows that insider buying is relatively more important in small firms, whereas insider selling is more important in large firms. From Banz (1981) we know that relative to the SLB model, small stocks tend to have high average returns and large stocks tend to have low average returns. In short, the persistent strong outsider profits observed by Jaffee seem to be a result of the size effect.
There is a general message in Seyhun's results. Highly constrained asset-pricing models like the SLB model are surely false. They have systematic problems explaining the cross-section of expected returns that can look like market inefficiencies. In market-efficiency tests, one should avoid models that put strong restrictions on the cross-section of expected returns, if that is consistent with the purpose at hand. Concretely, one should use formal asset-pricing models when the phenomenon studied concerns the cross-section of expected returns (e.g., tests for size, leverage, and E/P effects). But when the phenomenon is firm-specific (most event studies), one can use firm-specific “models,” like the market model or historical average returns, to abstract from normal expected returns without putting unnecessary constraints on the cross-section of expected returns.
B. Security Analysis
The Value Line Investment Survey publishes weekly rankings of 1700 common stocks into 5 groups. Group 1 has the best return prospects and group 5 the worst. There is evidence that, adjusted for risk and size, group 1 stocks have higher average returns than group 5 stocks for horizons out to 1 year (Black (1973), Copeland and Mayers (1982), and Huberman and Kandel (1987, 1990)).
Affleck-Graves and Mendenhall (1990) argue, however, that Value Line ranks firms largely on the basis of recent earnings surprises. As a result, the longer-term abnormal returns of the Value Line rankings are just another anomaly in disguise, the post-earnings-announcement drift identified by Ball and Brown (1968), Bernard and Thomas (1989), and others.
Stickel (1985) uses event-study methods to show that there is an announcement effect in rank changes that more clearly implies that Value Line has information not reflected in prices. He finds that the market takes up to 3 days to adjust to the information in changes in rankings, and the price changes are permanent. The strongest price changes, about 2.44% over 3 days, occur when stocks are upgraded from group 2 to group 1 (better to best). For most other ranking changes, the 3-day price changes are less than 1%.
The information in Value Line rank changes is also stronger for small stocks. For the smallest quintile of stocks, a change from group 2 to group 1 is associated with a 3-day return of 5.18%; for the largest quintile, it is 0.7%. Stickel argues that these results are consistent with models in which higher information costs for small stocks deter private information production. As a result, public information announcements (like Value Line rank changes) have larger effects on the prices of small stocks.
The announcement effects of Value Line rank changes are statistically reliable evidence against the hypothesis that information advantages do not exist. But except for small stocks upgraded from group 2 to 1 (or downgraded from 1 to 2), the price effects of rank changes (less than 1% over 3 days) are small. Moreover, Hulbert (1990) reports that the strong long-term performance of Value Line's group 1 stocks is weak after 1983. Over the 6.5 years from 1984 to mid-1990, group 1 stocks earned 16.9% per year compared with 15.2% for the Wilshire 5000 Index. During the same period, Value Line's Centurion Fund, which specializes in group 1 stocks, earned 12.7% per year —live testimony to the fact that there can be large gaps between simulated profits from private information and what is available in practice.
Finally, Lloyd-Davies and Canes (1978), and Liu, Smith, and Syed (1990) find that the touts of the security analysts surveyed in the Wall Street Journal's “Heard on the Street” column result in price changes that average about 1.7% on the announcement day, an information effect similar to that for Value Line rank changes.
The evidence of Stickel (1985), Lloyd-Davies and Canes (1978), and Liu, Smith, and Syed (1990) is that Value Line and some security analysts have private information that, when revealed, results in small but statistically reliable price adjustments. These results are consistent with the “noisy rational expectations” model of competitive equilibrium of Grossman and Stiglitz (1980). In brief, because generating information has costs, informed investors are compensated for the costs they incur to ensure that prices adjust to information. The market is then less than fully efficient (there can be private information not fully reflected in prices), but in a way that is consistent with rational behavior by all investors.
C. Professional Portfolio Management
Jensen's (1968, 1969) early results were bad news for the mutual-fund industry. He finds that for the 1945–1964 period, returns to investors in funds (before load fees, but after management fees, and other expenses) are on average about 1% per year below the market line (from the riskfree rate through the S&P 500 market portfolio) of the Sharpe-Lintner model, and average returns on more than half of his funds are below the line. Only when all published expenses of the funds are added back do the average returns on the funds scatter randomly about the market line. Jensen concludes that mutual-fund managers do not have private information.
Recent studies do not always agree. In tests on 116 mutual funds for the February 1968 to June 1980 period, Henriksson (1984) finds that average returns to fund investors, before load fees but after other expenses, are trivially different (0.02% per month) from the Sharpe-Lintner market line. Chang and Lewellen (1984) get similar results for 1971–1979. This work suggests that on average, fund managers have access to enough private information to cover the expenses and management fees they charge investors.
Ippolito (1989) provides a more extensive analysis of the performance of mutual funds. He examines 143 funds for the 20-year post-Jensen period 1965–1984. He finds that fund returns, before load fees but after other expenses, are on average 0.83% per year above the Sharpe-Lintner market line (from the 1-year Treasury bill rate through the S&P 500 portfolio). He finds no evidence that the deviations of funds from the market line are related to management fees, other fund expenses, or turnover ratios. Ippolito concludes that his results are in the spirit of the “noisy rational expectations” model of Grossman and Stiglitz (1980), in which informed investors (mutual fund managers) are compensated for their information costs.
Ippolito's mutual-fund evidence is not confirmed by performance tests on pension plans and endowment funds. Brinson, Hood, and Beebower (1986) examine the returns on 91 large corporate pension plans for 1974–1983. The individual plans range in size from $100 million in 1974 to over $3 billion in 1983. Individual plans commonly have more than 10 outside managers, and large influential professional managers are likely to be well-represented in the sample. The plans on average earn 1.1% per year less than passive benchmark portfolios of bonds and stocks—a negative performance measure for recent data much like Jensen's early mutual fund results. Beebower and Bergstrom (1977), Munnell (1983), and Ippolito and Turner (1987) also come to negative conclusions about the investment performance of pension plans. Berkowitz, Finney, and Logue (1988) extend the negative evidence to endowment funds.
How can we reconcile the opposite recent results for mutual funds and pension funds? Performance evaluation is known to be sensitive to methodology (Grinblatt and Titman (1989)). Ippolito (1989) uses the Sharpe-Lintner model to estimate normal returns to mutual funds. Brinson, Hood, and Beebower (1986) use passive portfolios meant to match the bond and stock components of their pension funds. We know the Sharpe-Lintner model has systematic problems explaining expected returns (size, leverage, E/P, and book-to-market equity effects) that can affect estimates of abnormal returns.
Elton, Gruber, Das, and Hklarka (1991) test the importance of the SL methodology in Ippolito's results. They find that during Ippolito's 1965–1984 period, his benchmark combinations of Treasury bills with the S&P 500 portfolio produce strong positive estimates of “abnormal” returns for passive portfolios of non-S&P (smaller) stocks—strong confirmation that there is a problem with the Sharpe-Lintner benchmarks (also used by Jensen (1968, 1969), Henriksson (1984), and Chang and Lewellen (1984)).
Elton, Gruber, Das, and Hklarka then use a 3-factor model to evaluate the performance of mutual funds for 1965–1984. The 3 factors are the S&P 500, a portfolio tilted toward non-S&P stocks, and a proxy for the market portfolio of Government and corporate bonds. As in Brinson, Hood, and Beebower (1986), the goal of the Elton-Gruber-Das-Hklarka approach is to allow for the fact that mutual funds hold bonds and stocks that are not in the universe covered by the combinations of Treasury bills and the S&P 500 that Ippolito uses to evaluate performance. In simplest terms, the Elton-Gruber-Das-Hklarka benchmarks are the returns from passive combinations of Treasury bills with S&P stocks, non-S&P stocks, and bonds.
Elton-Gruber-Das-Hklarka find that for Ippolito's 1965–1984 period, their benchmarks produce an abnormal return on mutual funds of −1.1% per year, much like the negative performance measures for pension funds (Brinson, Hood, and Beebower (1986)) and endowments (Berkowitz, Finney, and Logue (1988)). Moreover, unlike Ippolito, but in line with earlier work (Sharpe (1966)), Elton, Gruber, Das, and Hklarka find that abnormal returns on mutual funds are negatively related to fund expenses (including management fees) and turnover. In short, if mutual, pension, and endowment fund managers are the informed investors of the Grossman-Stiglitz (1980) model, they are apparently negating their inframarginal rents by pushing research and trading beyond the point where marginal benefits equal marginal costs.
The past 20 years have been a fruitful period for research on market efficiency and asset-pricing models. I conclude by reviewing briefly what we have learned from the work on efficiency, and where it might go in the future. (Section IV. D above provides a summary of tests of asset-pricing models.)
A. Event Studies
The cleanest evidence on market-efficiency comes from event studies, especially event studies on daily returns. When an information event can be dated precisely and the event has a large effect on prices, the way one abstracts from expected returns to measure abnormal daily returns is a second-order consideration. As a result, event studies can give a clear picture of the speed of adjustment of prices to information.
There is a large event-study literature on issues in corporate finance. The results indicate that on average stock prices adjust quickly to information about investment decisions, dividend changes, changes in capital structure, and corporate-control transactions. This evidence tilts me toward the conclusion that prices adjust efficiently to firm-specific information. More important, the research uncovers empirical regularities, many surprising, that enrich our understanding of investment, financing, and corporate-control events, and give rise to interesting theoretical work.
It would be presumptuous to suggest where event studies should go in the future. This is a mature industry, with skilled workers and time-tested methods. It continues to expand its base in accounting, macroeconomics, and industrial organization, with no sign of a letup in finance.
B. Private Information
There is less new research on whether individual agents have private information that is not in stock prices. We know that corporate insiders have private information that leads to abnormal returns (Jaffe (1974)), but outsiders cannot profit from public information about insider trading (Seyhun (1986)). We know that changes in Value Line's rankings of firms on average lead to permanent changes in stock prices. Except for small stocks, however, the average price changes are small (Stickel (1985)). The stock-price reactions to the private information of the analysts surveyed in the Wall Street Journal's “Heard on the Street” column are likewise statistically reliable but small.
The investors studied in most detail for private information are pension fund and mutual fund managers. Unlike event studies, however, evaluating the access of investment managers to private information involves measuring abnormal returns over long periods. The tests thus run head-on into the joint-hypothesis problem: measured abnormal returns can result from market inefficiency, a bad model of market equilibrium, or problems in the way the model is implemented. It is perhaps no surprise, then, that Ippolito (1989), using the 1-factor benchmarks of the Sharpe-Lintner model, finds that mutual fund managers have private information that generates positive abnormal returns. In contrast, using 2– and 3-portfolio benchmarks that are consistent with multifactor asset-pricing models, Elton, Gruber, Das, and Hklarka (1991) and Brinson, Hood, and Beebower (1986) find that mutual funds and pension funds on average have negative abnormal returns.
The 1-factor Sharpe-Lintner model has many problems explaining the cross-section of expected stock returns (e.g., the size and book-to-market equity anomalies, and, worst of all, the weak relation between average returns and β for stocks). Multifactor models seem to do a better job on expected returns (Chen, Roll, and Ross (1986), Chan and Chen (1991), Fama and French (1991)). These results lean me toward the conclusion that the multifactor performance evaluation methods of Elton, Gruber, Das, and Hklarka (1991) and Brinson, Hood, and Beebower (1986), and their negative conclusions about the access of investment managers to private information, are more reliable than the positive results of Ippolito (1989) and others that are based on the Sharpe-Lintner model. In truth, though, the most defensible conclusion is that, because of the joint-hypothesis problem and the rather weak state of the evidence for different asset-pricing models, strong inferences about market efficiency for performance evaluation tests are not warranted.
Since we are reviewing studies of performance evaluation, it is well to point out here that the efficient-markets literature is a premier case where academic research has affected real-world practice. Before the work on efficiency, the presumption was that private information is plentiful among investment managers. The efficiency research put forth the challenge that private information is rare. One result is the rise of passive investment strategies that simply buy and hold diversified portfolios (e.g., the many S&P 500 funds). Professional managers who follow passive strategies (and charge low fees) were unheard of in 1960; they are now an important part of the investment-management industry.
The market-efficiency literature also produced a demand for performance evaluation. In 1960, investment managers were free to rest on their claims about performance. Now, performance measurement relative to passive benchmarks is the rule, and there are firms that specialize in evaluating professional managers (e.g., SEI, the data source for Brinson, Hood, and Beebower (1986)). The data generated by these firms are a resource for tests for private information that academics have hardly tapped.
C. Return Predictability
There is a resurgence of interesting research on the predictability of stock returns from past returns and other variables. Controversy about market efficiency centers largely on this work.
The new research produces precise evidence on the predictability of daily and weekly returns from past returns, but the results are similar to those in the early work, and somewhat lacking in drama. The suggestive evidence in Fama (1965) that first-order autocorrelations of daily returns on the stocks of large firms are positive (but about 0.03) becomes more precise in the longer samples in French and Roll (1986). They also show that the higher-order autocorrelations of daily returns on individual stocks are reliably negative, but reliably small. The evidence in Fisher (1966) that autocorrelations of short-horizon returns on diversified portfolios are positive, larger than for individual stocks, and larger for portfolios tilted toward small firms is confirmed by the more precise results in Lo and MacKinlay (1988) and Conrad and Kaul (1988). This latter work, however, does not entirely allay Fisher's fear that the higher autocorrelation of portfolio returns is in part the spurious result of nonsynchronous trading.
In contrast to the work on short-horizon returns, the new research on the predictability of long-horizon stock returns from past returns is high on drama but short on precision. The new tests raise the intriguing suggestion that there is strong negative autocorrelation in 2– to 10-year returns due to large, slowly decaying, temporary (stationary) components of prices (Fama and French (1988a), Poterba and Summers (1988)). The suggestion is, however, clouded by low statistical power; the data do not yield many observations on long-horizon returns. More telling, the strong negative autocorrelation in long-horizon returns seems to be due largely to the Great Depression.
The recent evidence on the predictability of returns from other variables seems to give a more reliable picture of the variation through time of expected returns. Returns for short and long horizons are predictable from dividend yields, E/P ratios, and default spreads of low-over high-grade bond yields (Keim and Stambaugh (1986), Campbell and Shiller (1988b), Fama and French (1988b, 1989)). Term spreads (long-term minus short-term interest rates) and the level of short rates also forecast returns out to about a year (Campbell (1987), Fama and French (1989), Chen (1991)). In contrast to the autocorrelation tests on long-horizon returns, the forecast power of D/P, E/ P, and the term-structure variables is reliable for periods after the Great Depression.
D/P, E/P, and the default spread track autocorrelated variation in expected returns that becomes a larger fraction of the variance of returns for longer return horizons. These variables typically account for less than 5% of the variance of monthly returns but around 25–30% of the variances of 2– to 5-year returns. In short, the recent work suggests that expected returns take large, slowly decaying swings away from their unconditional means.
Rational variation in expected returns is caused either by shocks to tastes for current versus future consumption or by technology shocks. We may never be able to develop and test a full model that isolates taste and technology shocks and their effects on saving, consumption, investment, and expected returns. We can, however, hope to know more about the links between expected returns and the macro-variables. The task has at least two parts.
- 1.If the variation in expected returns traces to shocks to tastes or technology, then the variation in expected returns should be common across different securities and markets. We can profit from more work, like that in Keim and Stambaugh (1986), Campbell (1987), and Fama and French (1989), on the common variation in expected returns across bonds and stocks. We can also profit from more work like that in Harvey (1991) on the extent to which the variation in expected returns is common across international markets. Most important, closure demands a coherent story that relates the variation through time in expected returns to models for the cross-section of expected returns. Thus we can profit from more work like that in Ferson and Harvey (1991) on how the variation through time in expected returns is related to the common factors in returns that determine the cross-section of expected returns.
- 2.The second interesting task is to dig deeper and establish (or show the absence of) links between expected returns and business conditions. If the variation through time in expected returns is rational, driven by shocks to tastes or technology, then the variation in expected returns should be related to variation in consumption, investment, and savings. Fama and French (1989) argue that the variation in expected returns on corporate bonds and common stocks tracked by their dividend yield, default spread, and term spread variables is related to business conditions. Chen (1991) shows more formally that these expected-return variables are related to growth rates of output in ways that are consistent with intertemporal asset-pricing models. Output is an important variable, and Chen's work is a good start, but we can reasonably hope for a more complete story about the relations between variation in expected returns and consumption, investment, and saving.
In the end, I think we can hope for a coherent story that (1) relates the cross-section properties of expected returns to the variation of expected returns through time, and (2) relates the behavior of expected returns to the real economy in a rather detailed way. Or we can hope to convince ourselves that no such story is possible.