Presidential Address: Discount Rates

Authors

  • JOHN H. COCHRANE

    Search for more papers by this author
    • University of Chicago Booth School of Business, and NBER. I thank John Campbell, George Constantnides, Doug Diamond, Gene Fama, Zhiguo He, Bryan Kelly, Juhani Linnanmaa, Toby Moskowitz, Lubos Pastor, Monika Piazzesi, Amit Seru, Luis Viceira, Lu Zhang, and Guofu Zhou for very helpful comments. I gratefully acknowledge research support from CRSP and outstanding research assistance from Yoshio Nozawa.


ABSTRACT

Discount-rate variation is the central organizing question of current asset-pricing research. I survey facts, theories, and applications. Previously, we thought returns were unpredictable, with variation in price-dividend ratios due to variation in expected cashflows. Now it seems all price-dividend variation corresponds to discount-rate variation. We also thought that the cross-section of expected returns came from the CAPM. Now we have a zoo of new factors. I categorize discount-rate theories based on central ingredients and data sources. Incorporating discount-rate variation affects finance applications, including portfolio theory, accounting, cost of capital, capital structure, compensation, and macroeconomics.

image

John H. Cochrane President of the American Finance Association 2010

Asset prices should equal expected discounted cashflows. Forty years ago, Eugene Fama (1970) argued that the expected part, “testing market efficiency,” provided the framework for organizing asset-pricing research in that era. I argue that the “discounted” part better organizes our research today.

I start with facts: how discount rates vary over time and across assets. I turn to theory: why discount rates vary. I attempt a categorization based on central assumptions and links to data, analogous to Fama's “weak,”“semi-strong,” and “strong” forms of efficiency. Finally, I point to some applications, which I think will be strongly influenced by our new understanding of discount rates. In each case, I have more questions than answers. This paper is more an agenda than a summary.

I. Time-Series Facts

A. Simple Dividend Yield Regression

Discount rates vary over time. (“Discount rate,”“risk premium,” and “expected return” are all the same thing here.) Start with a very simple regression of returns on dividend yields,1 shown in Table I.

Table I. Return-Forecasting Regressions
The regression equation is Rtt+ke = a+b×Dt/Pt+ɛt+k. The dependent variable Rtt+ke is the CRSP value-weighted return less the 3-month Treasury bill return. Data are annual, 1947–2009. The 5-year regression t-statistic uses the Hansen–Hodrick (1980) correction. σ[Et(Re)] represents the standard deviation of the fitted value, σ(b^×Dt/Pt).
Horizon k b t(b) R 2σ[Et(Re)] σ[Et(Re)]E(Re)
1 year 3.8(2.6)0.09 5.460.76
5 years20.6(3.4)0.2829.30.62

The 1-year regression forecast does not seem that important. Yes, the t-statistic is “significant,” but there are lots of biases and fishing. The 9% R2 is not impressive.

In fact, this regression has huge economic significance. First, the coefficient estimate is large. A one percentage point increase in dividend yield forecasts a nearly four percentage point higher return. Prices rise by an additional three percentage points.

Second, five and a half percentage point variation in expected returns is a lot. A 6% equity premium was already a “puzzle.”2 The regression implies that expected returns vary by at least as much as their puzzling level, as shown in the last two columns of Table I.

By contrast, R2 is a poor measure of economic significance in this context.3 The economic question is, “How much do expected returns vary over time?” There will always be lots of unforecastable return movement, so the variance of ex post returns is not a very informative comparison for this question.

Third, the slope coefficients and R2 rise with horizon. Figure 1 plots each year's dividend yield along with the subsequent 7 years of returns, in order to illustrate this point. Read the dividend yield as prices upside down: Prices were low in 1980 and high in 2000. The picture then captures the central fact: High prices, relative to dividends, have reliably preceded many years of poor returns. Low prices have preceded high returns.

Figure 1.

Dividend yield and following 7-year return. The dividend yield is multiplied by four. Both series use the CRSP value-weighted market index.

B. Present Values, Volatility, Bubbles, and Long-Run Returns

Long horizons are most interesting because they tie predictability to volatility, “bubbles,” and the nature of price movements. I make this connection via the Campbell–Shiller (1988) approximate present value identity,

dptj=1kρj1rt+jj=1kρj1Δdt+j+ρkdpt+k,(1)

where dpt ≡ dt − pt = log(Dt/Pt), rt+1≡ log R, and ρ≈ 0.96 is a constant of approximation.

Now, consider regressions of weighted long-run returns and dividend growth on dividend yields:

j=1kρj1rt+j=ar+br(k)dpt+εt+kr,(2)
j=1kρj1Δdt+j=ad+bd(k)dpt+εt+kd,(3)
dpt+k=adp+bdp(k)dpt+εt+kdp.(4)

The present value identity (1) implies that these long-run regression coefficients must add up to one,

1br(k)bΔd(k)+ρkbdp(k).(5)

To derive this relation, regress both sides of the identity (1) on dpt.

Equations (1) and (5) have an important message. If we lived in an i.i.d. world, dividend yields would never vary in the first place. Expected future returns and dividend growth would never change. Since dividend yields vary, they must forecast long-run returns, long-run dividend growth, or a “rational bubble” of ever-higher prices.

The regression coefficients in (5) can be read as the fractions of dividend yield variation attributed to each source. To see this interpretation more clearly, multiply both sides of (5) by var(dpt), which gives

var(dpt)cov[ dpt,j=1kρj1rt+j]cov[ dpt,j=1kρj1Δdt+j]+ρkcov(dpt,dpt+k).(6)

The empirical question is, how big is each source of variation? Table II presents long-run regression coefficients, each calculated three ways.

Table II. Long-Run Regression Coefficients
Table entries are long-run regression coefficients, for example, br(k) in j=1kρj1rt+j=a+br(k)dpt+ɛt+kr. See equations (2)–(4). Annual CRSP data, 1947–2009. “Direct” regression estimates are calculated using 15-year ex post returns, dividend growth, and dividend yields as left-hand variables. The “VAR” estimates infer long-run coefficients from 1-year coefficients, using estimates in the right-hand panel of Table III. See the Appendix for details.
Method and HorizonCoefficient
br(k)bΔd(k)ρkbdp(k)
Direct regression, k = 151.01−0.11−0.11
Implied by VAR, k = 151.05 0.27 0.22
VAR, k = ∞1.35 0.35 0.00

The long-run return coefficients, shown in the first column, are all a bit larger than 1.0. The dividend growth forecasts, in the second column, are small, statistically insignificant, and the positive point estimates go the “wrong” way—high prices relative to current dividends signal low future dividend growth. The 15-year dividend yield forecast coefficient is also essentially zero.

Thus, the estimates summarized in Table II say that all price-dividend ratio volatility corresponds to variation in expected returns. None corresponds to variation in expected dividend growth, and none to “rational bubbles.”

In the 1970s, we would have guessed exactly the opposite pattern. Based on the idea that returns are not predictable, we would have supposed that high prices relative to current dividends reflect expectations that dividends will rise in the future, and so forecast higher dividend growth. That pattern is completely absent. Instead, high prices relative to current dividends entirely forecast low returns.

This is the true meaning of return forecastability.4 This is the real measure of “how big” the point estimates are—return forecastability is “just enough” to account for price volatility. This is the natural set of units with which to evaluate return forecastability. What we expected to be zero is one; what we expected to be one is zero.

Table II also reminds us that the point of the return-forecasting project is to understand prices, the right-hand variable of the regression. We put return on the left side because the forecast error is uncorrelated with the forecasting variable. This choice does not reflect “cause” and “effect,” nor does it imply that the point of the exercise is to understand ex post return variation.

How you look at things matters. The long-run and short-run regressions are equivalent, as each can be obtained from the other. Yet looking at the long-run version of the regressions shows an unexpected economic significance. We will see this lesson repeated many times.

Some quibbles: Table II does not include standard errors. Sampling variation in long-run estimates is an important topic.5 My point is the economic importance of the estimates. One might still argue that we cannot reject the alternative views. But when point estimates are one and zero, arguing we should believe zero and one because zero and one cannot be rejected is a tough sell.

The variance of dividend yields or price-dividend ratios corresponds entirely to discount-rate variation, but as much as half of the variance of price changes Δpt+1=−dpt+1 + dpt + Δdt+1 or returns rt+1 ≈ −ρdpt +1+ dpt + Δdt+1 corresponds to current dividends Δdt+1. This fact seems trivial but has caused a lot of confusion.

I divide by dividends for simplicity, to capture a huge literature in one example. Many other variables work about as well, including earnings and book values.

C. A Pervasive Phenomenon

This pattern of predictability is pervasive across markets. For stocks, bonds, credit spreads, foreign exchange, sovereign debt, and houses, a yield or valuation ratio translates one-for-one to expected excess returns, and does not forecast the cashflow or price change we may have expected. In each case our view of the facts has changed completely since the 1970s.

  • Stocks. Dividend yields forecast returns, not dividend growth.6

  • Treasuries. A rising yield curve signals better 1-year returns for long-term bonds, not higher future interest rates. Fed fund futures signal returns, not changes in the funds rate.7

  • Bonds. Much variation in credit spreads over time and across firms or categories signals returns, not default probabilities.8

  • Foreign exchange. International interest rate spreads signal returns, not exchange rate depreciation.9

  • Sovereign debt. High levels of sovereign or foreign debt signal low returns, not higher government or trade surpluses.10

  • Houses. High price/rent ratios signal low returns, not rising rents or prices that rise forever.

Since house prices are so much in the news, Figure 2 shows house prices and rents, and Table III presents forecasting regressions. High prices relative to rents mean low returns, not higher subsequent rents, or prices that rise forever. The housing regressions are almost the same as the stock market regressions. (Not everything about house and stock data is the same of course. Measured house price data are more serially correlated.)

Figure 2.

House prices and rents. OFHEO is the Office of Federal Housing Enterprise Oversight “purchase-only” price index. CSW are Case-Shiller-Weiss price data. All data are from http://www.lincolninst.edu/subcenters/land-values/rent-price-ratio.asp.

Table III. House Price and Stock Price Regressions
Left panel: Regressions of log annual housing returns rt+1, log rent growth Δdt+1, and log rent/price ratio dpt+1 on the rent/price ratio dpt, xt+1= a + b × dpt + ɛt+1 1960–2010. Right panel: Regressions of log stock returns rt+1, dividend growth Δdt+1and dividend yields dpt+1 on dividend yields dpt, annual CRSP value-weighted return data, 1947–2010.
 HousesStocks
btR2btR2
 rt+10.12(2.52)0.150.13(2.61)0.10
 Δdt+10.03(2.22)0.070.04(0.92)0.02
dpt+10.90(16.2)0.900.94(23.8)0.91

There is a strong common element and a strong business cycle association to all these forecasts.11 Low prices and high expected returns hold in “bad times,” when consumption, output, and investment are low, unemployment is high, and businesses are failing, and vice versa.

These facts bring a good deal of structure to the debate over “bubbles” and “excess volatility.” High valuations correspond to low returns, and are associated with good economic conditions. All a “price bubble” can possibly mean now is that the equivalent discount rate is “too low” relative to some theory. Though regressions do not establish causality, this equivalence guides us to a much more profitable discussion.

D. The Multivariate Challenge

This empirical project has only begun. We see that one variable at a time forecasts one return at a time. We need to understand their multivariate counterparts, on both the left and the right sides of the regressions.

For example, the stock and bond regressions on dividend yield and yield spread (ys) are

rt+1stock=as+bs×dpt+εt+1s,rt+1bond=ab+cb×yst+εt+1b.

We have some additional predictor variables zt, from similar univariate or at best bivariate (i.e., including bs × dpt) explorations:

rt+1stock=as+bs×dpt+ds×zt+εt+1s.

First, which of these variables are really important in a multiple regression sense? In particular, do the variables that forecast one return forecast another? What are cs, ds, bb, and db in regressions

rt+1stock=as+bs×dpt+cs×yst_+dszt+εt+1s,rt+1bond=ab+bb×dpt_+cb×yst+dbzt_+εt+1b?(7)

(I underline the variables we need to learn about.)

Second, how correlated are the right-hand terms of these regressions? What is the factor structure of time-varying expected returns? Expected returns Et(rt+1i) vary over time t. How correlated is such variation across assets and asset classes i? How can we best express that correlation as factor structure?

As an example to clarify the question, suppose we find that the stock return coefficients are all double those of the bonds,

rt+1stock=as+2×dpt+4×yst+εt+1s,rt+1bond=ab+1×dpt+2×yst+εt+1b.

We would see a one-factor model for expected returns, with stock expected returns always changing by twice bond expected returns,

Et(rt+1stock)=2×factort,Et(rt+1bond)=1×factort.(8)

Third, what are the corresponding pricing factors? We relate time-varying expected returns to covariances with pricing factors or portfolio returns,

Et(rt+1i)=covt(rt+1ift+1)λt.

As a small step down this road, Cochrane and Piazzesi (2005, 2008) find that forward rates of all maturities help to forecast bond returns of each maturity. Multiple regressions matter as in (7). Furthermore, the right-hand sides are almost perfectly correlated across left-hand maturities.12 A single common factor describes 99.9% of the variance of expected returns as in (8). Finally, the spread in time-varying expected bond returns across maturities corresponds to a spread in covariances with a single “level” factor. The market prices of slope, curvature, and expected-return factor risks are zero.

What similar patterns hold across broad asset classes? The challenge, of course, is that there are too many right-hand variables, so we cannot simply run huge multiple regressions. But these are the vital questions.

E. Multivariate Prices

I advertised that much of the point of running regressions with prices on the right-hand side is to understand those prices. How will a multivariate investigation change our picture of prices and long-run returns?

Again, the Campbell–Shiller present value identity

dptj=1ρj1rt+jj=1ρj1Δdt+j(9)

provides a useful way to think about these questions. Since this identity holds ex post, it holds for any information set. Dividend yields are a great forecasting variable because they reveal market expectations of dividend growth and returns. However, dividend yields combine the two sources of information. A variable can help the dividend yield to forecast long-run returns if it also forecasts long-run dividend growth. A variable can also help predict 1-year returns rt+1 without much changing long-run expected returns, if it has an offsetting effect on longer run returns {rt + j}. Such a variable signals a change in the term structure of risk premia{Etrt + j}.

I examine Lettau and Ludvigson's (2001a, 2001b, 2005) consumption to wealth ratio cay as an example to explore these questions. Table IV presents forecasting regressions.

Table IV. Forecasting Regressions with the Consumption-Wealth Ratio
Annual data 1952–2009. Long-run coefficients in the last two rows of the table are computed using a first-order VAR with dpt and cayt as state variables. Each regression includes a constant. Cay is rescaled so σ(cay) = 1. For reference, σ(dp) = 0.42.
Left-Hand VariableCoefficients t-StatisticsOther Statistics
dptcaytdptcaytR2σ[Et(yt+1)]%σ[Et(yt+1)]E(yt+1)
rt+10.120.071(2.14)(3.19)0.268.990.91
Δdt+1 0.0240.025(0.46)(1.69)0.052.800.12
dpt+10.94−0.047 (20.4)(−3.05)  0.91  
cayt+10.150.65 (0.63)(5.95)0.43  
rtlr=j=1ρj1rt+j1.290.033   0.51 
Δdtlr=j=1ρj1Δdt+j0.290.033   0.12 

Cay helps to forecast one-period returns. The t-statistic is large, and it raises the variation of expected returns substantially. Cay only marginally helps to forecast dividend growth. (Lettau and Ludvigson report that it works better in quarterly data.)

Figure 3 graphs the 1-year return forecast using dp alone, the 1-year return forecast using dp and cay together, and the actual ex post return. Adding cay lets us forecast business-cycle frequency “wiggles” while not affecting the “trend.”

Figure 3.

Forecast and actual 1-year returns. The forecasts are fitted values of regressions of returns on dividend yield and cay. Actual returns rt+1 are plotted on the same date as their forecast, a + b × dpt.

Long-run return forecasts are quite different, however. Figure 4 contrasts long-run return forecasts with and without cay. Though cay has a dramatic effect on one-period return rt+1 forecasts in Figure 3, cay has almost no effect at all on long-run return j=1ρj1rt+j forecasts in Figure 4.

Figure 4.

Log dividend yielddpand forecasts of long-run returnsj=1ρj1rt+j. Return forecasts are computed from a VAR including dp, and a VAR including dp and cay.

Figure 4 includes the actual dividend yield, to show (by (9)) how dividend yields break into long-run return forecasts versus long-run dividend growth forecasts. The last two rows of Table IV give the corresponding long-run regression coefficients. Essentially all price-dividend variation still corresponds to expected-return forecasts.

How can cay forecast one-year returns so strongly, but have such a small effect on the terms of the dividend yield present value identity? In the context of (9), cay alters the term structure of expected returns.

We can display this behavior with impulse-response functions. Figure 5 plots responses to a dividend growth shock, a dividend yield shock, and a cay shock. In each case, I include a contemporaneous return response to satisfy the return identity rt+1= Δdt+1ρdpt+1 + dpt.

Figure 5.

Impulse-response functions. Response functions to dividend growth, dividend yield, and cay shocks. Calculations are based on the VAR of Table IV. Each shock changes the indicated variable without changing the others, and includes a contemporaneous return shock from the identity rt+1=Δdt+1ρdpt+1+dpt. The vertical dashed line indicates the period of the shock.

These plots answer the question: “What change in expectations corresponds to each shock?” The dividend growth shock corresponds to permanently higher expected dividends with no change in expected returns. Prices jump to their new higher value and stay there. It is thus a pure “expected cashflow” shock. The dividend yield shock is essentially a pure discount-rate shock. It shows a rise in expected returns with little change in expected dividend growth.

Though there is a completely transitory component of prices in this multivariate representation, the implied univariate return representation remains very close to uncorrelated. A fall in prices with no change in dividends is likely to mean-revert, but observing a fall in prices without observing dividends carries no such implication. As a result, stocks are not “safer in the long run”: Stock return variance still scales nearly linearly with horizon.

The cay shock in the rightmost panel of Figure 5 corresponds to a shift in expected returns from the distant future to the near future, with a small similar movement in the timing of a dividend growth forecast. It has almost no effect on long-run returns or dividend growth. We could label it a shock to the term structure of risk premia.13

So, cay strongly forecasts 1-year returns, but has little effect on price-dividend ratio variance attribution. Does this pattern hold for other return forecasters? I don't know. In principle, consistent with the identity (9), other variables can help dividend yields to predict both long-run returns and long-run dividend growth. Consumption and dividends should be cointegrated, and since dividends are so much more volatile, the consumption-dividend ratio should forecast long-run dividend growth. Cyclical variables should work: At the bottom of a recession, both discount rates and expected growth rates are likely to be high, with offsetting effects on dividend yields. Reflecting both ideas, Lettau and Ludvigson (2005) report that “cdy, ” a cointegrating vector including dividends, forecasts long-run dividend growth in just this way. However, the lesser persistence of typical forecasters will work against their having much of an effect on price-dividend ratios. Cay's coefficient of only 0.65 on its own lag, and the fact that cay does not forecast dividend yields in my regressions, are much of the story for cay's failure to affect long-run forecasts.

Even so, additional variables can only raise the contribution of long-run expected returns to price-dividend variation. Additional variables do not shift variance attribution from returns to dividends. A higher long-run dividend forecast must be matched by a higher long-run return forecast if it is not to affect the dividend yield.

This is a suggestive first step, not an answer. We have a smorgasbord of return forecasters to investigate, singly and jointly, including information in additional lags of returns and dividend yields (see the Appendix). The point is this: Multivariate long-run forecasts and consequent price implications can be quite different from one-period return forecasts. As we pursue the multivariate forecasting question using the large number of additional forecasting variables, we should look at pricing implications, and not just run short-run R2 contests.

II. The Cross-Section

In the beginning, there was chaos. Practitioners thought that one only needed to be clever to earn high returns. Then came the CAPM. Every clever strategy to deliver high average returns ended up delivering high market betas as well. Then anomalies erupted, and there was chaos again. The “value effect” was the most prominent anomaly.

Figure 6 presents the Fama–French 10 book-to-market sorted portfolios. Average excess returns rise from growth (low book-to-market, “high price”) to value (high book-to-market, “low price”). This fact would not be a puzzle if the betas also rose. But the betas are about the same for all portfolios.

Figure 6.

Average returns and betas. 10 Fama–French book-to-market portfolios. Monthly data, 1963–2010.

The fact that betas do not rise with value is really the heart of the puzzle. It is natural that stocks, which have fallen on hard times, should have higher subsequent returns. If the market declines, these stocks should be particularly hard hit. They should have higher average returns—and higher betas. All puzzles are joint puzzles of expected returns and betas. Beta without expected return is just as much a puzzle—and as profitable—as expected return without beta.14

Fama and French (1993, 1996) brought order once again with size and value factors. Figure 6 includes the results of multiple regressions on the market excess return and Fama and French's hml factor,

Rtei=αi+bi×rmrft+hi×hmlt+εit.

The figure shows the separate contributions of bi × E(rmrf) and hi × E(hml) in accounting for average returns E(Rei). Higher average returns do line up well with larger values of the hi regression coefficient.

Fama and French's factor model accomplishes a very useful data reduction. Theories now only have to explain the hml portfolio premium, not the expected returns of individual assets.15 This lesson has yet to sink in to a lot of empirical work, which still uses the 25 Fama–French portfolios to test deeper models.

Covariance is in a sense Fama and French's central result: If the value firms decline, they all decline together. This is a sensible result: Where there is mean, there must be comovement, so that Sharpe ratios do not rise without limit in well-diversified value portfolios.16 But theories now must also explain this common movement among value stocks. It is not enough to simply generate temporary price movements in individual securities, “fads” that produce high or low prices, and then fade away, rewarding contrarians. All the securities with low prices today must rise and fall together in the future.

Finally, Fama and French found that other sorting variables, such as firm sales growth, did not each require a new factor. The three-factor model took the place of the CAPM for routine risk adjustment in empirical work.

Order to chaos, yes, but once again, the world changed completely. None of the cross-section of average stock returns corresponds to market betas. All of it corresponds to hml and size betas.

Alas, the world is once again descending into chaos. Expected return strategies have emerged that do not correspond to market, value, and size betas. These include, among many others, momentum,17 accruals, equity issues and other accounting-related sorts,18 beta arbitrage, credit risk, bond and equity market-timing strategies, foreign exchange carry trade, put option writing, and various forms of “liquidity provision.”

A. The Multidimensional Challenge

We are going to have to repeat Fama and French's anomaly digestion, but with many more dimensions. We have a lot of questions to answer:

First, which characteristics really provide independent information about average returns? Which are subsumed by others?

Second, does each new anomaly variable also correspond to a new factor formed on those same anomalies? Momentum returns correspond to regression coefficients on a winner–loser momentum “factor.” Carry-trade profits correspond to a carry-trade factor.19 Do accruals return strategies correspond to an accruals factor? We should routinely look.

Third, how many of these new factors are really important? Can we again account for N independent dimensions of expected returns with K < N factor exposures? Can we account for accruals return strategies by betas on some other factor, as with sales growth?

Now, factor structure is neither necessary nor sufficient for factor pricing. ICAPM and consumption-CAPM models do not predict or require that pricing factors correspond to big common movements in asset returns. And big common movements, such as industry portfolios, need not correspond to any risk premium. There always is an equivalent single-factor pricing representation of any multifactor model: The mean-variance efficient portfolio return is the single factor. Still, the world would be much simpler if betas on only a few factors, important in the covariance matrix of returns, accounted for a larger number of mean characteristics.

Fourth, eventually, we have to connect all this back to the central question of finance: Why do prices move?

B. Asset Pricing as a Function of Characteristics

To address these questions in the zoo of new variables, I suspect we will have to use different methods.Following Fama and French, a standard methodology has developed: Sort assets into portfolios based on a characteristic, look at the portfolio means (especially the 1–10 portfolio alpha, information ratio, and t-statistic), and then see if the spread in means corresponds to a spread of portfolio betas against some factor. But we cannot do this with 27 variables.

Portfolio sorts are really the same thing as nonparametric cross-sectional regressions, using nonoverlapping histogram weights. Figure 7 illustrates the point.

Figure 7.

Portfolio means versus cross-sectional regressions.

For one variable, portfolio sorts and regressions both work. But we cannot chop portfolios 27 ways, so I think we will end up running multivariate regressions.20 The Appendix presents a simple cross-sectional regression to illustrate the idea.

More generally, “time-series” forecasting regressions, “cross-sectional” regressions, and portfolio mean returns are really the same thing. All we are ever really doing is understanding a big panel-data forecasting regression,

Rt+1ei=a+bCit+εt+1i.

We end up describingexpected returns as a function of characteristics,

E(Rt+1e|Ct),

where Ct denotes some big vector of characteristics,

Ct=[size, bm, momentum, accruals, dp, credit spread.].

Is value a “time-series” strategy that moves in and out of a stock as that stock's book-to-market ratio changes, or is it a “cross-sectional” strategy that moves from one stock to another following book-to-market signals? Well, both, obviously. They are the same thing. This is the managed-portfolio theorem:21 An instrument zt in a time-series test 0=E[(mt+1Rt+1e)zt] is the same thing as a managed-portfolio return Rt+1ezt in an unconditional test 0=E[(mt+1Rt+1ezt)].

Once we understand expected returns, we have to see if expected returns line up with covariances of returns with factors. Sorted-portfolio betas are a nonparametric estimate of this covariance function,

covt(Rt+1ei,ft+1)=g(Cit).

Parametric approaches are natural here as well, to address a multidimensional world. For example, we can run regressions such as

[Rt+1eiE(Rt+1ei|Cit)]ft+1=c+dCit+εt+1ig(C)=c+dC.

(The errors may not be normal, but they are mean-zero and uncorrelated with the right-hand variable.)

We want to see if the mean return function lines up with the covariance function: Is it true that

E(Re|C)=g(C)×λ?

An implicit assumption underlies everything we do: Expected returns, variances, and covariances are stable functions of characteristics such as size and book-to-market ratio, and not security names. This assumption is why we use portfolios in the first place. Without this assumption, it is hard to tell if there is any spread in average returns at all. It means that asset pricing really is about the equality of two functions: The function relating means to characteristics should be proportional to the function relating covariance to characteristics.

Looking at portfolio average returns rather than forecasting regressions was really the key to understanding the economic importance of many effects, as was looking at long-horizon returns. For example, serial correlation with an R2 of 0.01 does not seem that impressive. Yet it is enough to account for momentum: The last year's winners went up 100%, so an annual autocorrelation of 0.1, meaning 0.01 R2, generates a 10% annual portfolio mean return. (An even smaller amount of time-series cross correlation works as well.) As another classic example, Lustig, Roussanov, and Verdelhan (2010a) translate carry-trade return-forecasting regressions to means of portfolios formed on the basis of currency interest differentials. This step leads them to look for and find a factor structure of country returns that depends on interest differentials, a “high minus low” factor. This step follows Fama and French (1996) exactly, but no one thought to look for it in 30 years of running country-by-country time-series forecasting regressions.

The equivalence of portfolio sorts and regressions goes both ways. We can still calculate these measures of economic significance if we estimate panel-data regressions for means and covariances. From the spread of lagged returns and return autocorrelation, we can calculate the momentum-portfolio implications directly. The 1–10 portfolio information ratio is the same thing as the Sharpe ratio of the underlying factor, or t-statistic of the cross-sectional regression coefficient. (See the Appendix.) We can study the covariance structure of panel-data regression residuals as a function of the same characteristics rather than actually form portfolios,

covt(Rt+1i,Rt+1j)=h(Cit,Cjt).

Running multiple panel-data forecasting regressions is full of pitfalls of course. One can end up focusing on tiny firms, or outliers. One can get the functional form wrong.

However, uniting time series and cross-section will yield new insights as well. For example, variation in book-to-market over time for a given portfolio has a larger effect on returns than variation in book-to-market across the Fama–French portfolios, and a recent change in book-to-market also seems to forecast returns. (See the Appendix.)

I did not say it will be easy! But we must address the factor zoo, and I do not see how to do it by a high-dimensional portfolio sort.

C. Prices

Then, we have to answer the central question, what is the source of price variation?

When did our field stop being “asset pricing” and become “asset expected returning?” Why are betas exogenous?22 A lot of price variation comes from discount-factor news. What sense does it make to “explain” expected returns by the covariation of expected return shocks with market expected return shocks? Market-to-book ratios should be our left-hand variable, the thing we are trying to explain, not a sorting characteristic for expected returns.

Focusing on expected returns and betas rather than prices and discounted cashflows makes sense in a two-period or i.i.d. world, since in that case betas are all cashflow betas. It makes much less sense in a world with time-varying discount rates.

A long-run, price-and-payoff perspective may also end up being simpler. As a hint of the possibility, solve the Campbell–Shiller identity for long-run returns,

j=1ρj1rt+j=j=1ρj1Δdt+jdpt.

Long-run return uncertainty all comes from cashflow uncertainty. Long-run betas are all cashflow betas. The long run looks just like a simple one-period model with a liquidating dividend:

Rt+1=Dt+1Pt=(Dt+1Dt)/(PtDt),rt+1=Δdt+1dpt.

A natural start is to forecast long-run returns and to form price decompositions in the cross-section, just as in the time series: to estimate forecasts such as

j=1ρj1rt+ji=a+bCit+εi,

and then understand valuations with present value models as before.23 The Appendix includes two simple examples.

In a formal sense, it does not matter whether you look at returns or prices. The expressions 1 = Et(mt+1Rt+1) and Pt=Etj=1mt,t+jDt+j each imply the other. But, as I found with return forecasts, our economic understanding may be a lot different in a price, long-run view than if we focus on short-run returns. What constitutes a “big” or “small” error is also different if we look at prices rather than returns. At a 2% dividend yield, D/P = (r − g) implies that an “insignificant” 10 bp/month expected return error is a “large” 12% price error, if it is permanent. For example, since momentum amounts to a very small time-series correlation and lasts less than a year, I suspect it has little effect on long-run expected returns and hence the level of stock prices. Long-lasting characteristics are likely to be more important. Conversely, small transient price errors can have a large impact on return measures. A tiny i.i.d. price error induces the appearance of mean reversion where there is none. Common procedures amount to taking many differences of prices, which amplify the error to signal ratio. For example, the forward spread ft(n)yt(1)=pt(n1)pt(n)+pt(1) is already a triple difference of price data.

III. Theories

Having reviewed a bit of how discount rates vary, let us think now about why discount rates vary so much.

It is useful to classify theories by their main ingredient, and by which data they use to measure discount rates. My goal is to suggest for discount rates something like Fama's (1970) classification of informational possibilities. Here is an outline of the classification:

  • 1Theories based on fundamental investors, with few frictions.
    • (a) Macroeconomic theories. Ties to macro or microeconomic quantity data.
    • i. Consumption, aggregate risks.
    • ii. Risk sharing and background risks; hedging outside income.
    • iii. Investment and production.
    • iv. General equilibrium, including macroeconomics.
    • (b) Behavioral theories, focusing on irrational expectations. Ties to price data. Other data?
    • (c) Finance theories. Expected return-beta models, return-based factors, affine term structure models. Ties to price data, returns explained by covariances.
  • 2Theories based on frictions.
    • (a) Segmented markets. Different investors are active in different markets; limited risk bearing of active traders.
    • (b) Intermediated markets. Prices are set by leveraged intermediaries; funding difficulties.
    • (c) Liquidity.
    • i. Idiosyncratic liquidity: Is it easy to sell the asset?
    • ii. Systemic liquidity: How does an asset perform in times of market illiquidity?
    • iii. Trading liquidity: Is a security useful to facilitate trading?

A. Macroeconomic Theories

“Macro” theories tie discount rates to macroeconomic quantity data, such as consumption or investment, based on first-order conditions for the ultimate investors or producers.

For example, the canonical consumption-based model with power utility Etβtu(Ct), u(C) = C1 −γ/(1 −γ) relates discount rates to consumption growth,

mt+1=βuc(t+1)uc(t)=β(Ct+1Ct)γ,Et(Rt+1ei)=Rfcov(Rt+1ei,mt+1)γcov(Rt+1eiΔct+1),

where Rf is the risk-free rate, Rei is an excess return, and c = log(C). High expected returns (low prices) correspond to securities that pay off poorly when consumption is low. This model combines frictionless markets, rational expectations and utility maximization, and risk sharing so that only aggregate risks matter for pricing. It evidently ties discount-rate variation to macroeconomic data.

A vast literature has generalized this framework, including (among others)24 (1) nonseparability across goods, such as durable and nondurable,25 or traded and nontraded goods; (2) nonseparability over time, such as habit persistence,26 (3) recursive utility and long-run risks;27 and (4) rare disasters, which alter measurements of means and covariances in “short” samples.28

A related category of theories adds incomplete markets or frictions preventing some consumers from participating. Though they include “frictions,” I categorize such models here because asset prices are still tied to some fundamental consumer or investor's economic outcomes. For example, if nonstockholders do not participate, we still tie asset prices to the consumption decisions of stockholders who do participate.29

With incomplete markets, consumers still share risks as much as possible. The complete-market theorem that “all risks are shared,” marginal utility is equated across people i and j, mt+1i=mt+1j, becomes “all risks are shared as much as possible.” The projection of marginal utility on asset payoffs X is the same across people proj(mt+1i|X)=proj(mt+1j|X)x*. We can still aggregate marginal utility another than aggregate consumption before constructing marginal utility. A discount factor mt+1=Et+1(mt+1i)=f(i)mt+1idi prices assets, where Et+1 takes averages across people conditional on aggregates. For example, with power utility we have

mt+1=βEt+1[ (Ct+1iCti)γ].

The fact that we aggregate nonlinearly across people means that variation in the distribution of consumption matters to asset prices. Times in which there is more cross-sectional risk will be high discount-factor events.30

Outside or nontradeable risks are a related idea. If a mass of investors has jobs or businesses that will be hurt especially hard by a recession, they avoid stocks that fall more than average in a recession.31 Average stock returns then reflect the tendency to fall more in a recession, in addition to market risk exposure. Though in principle, given a utility function, one could see such risks in consumption data, individual consumption data will always be so poorly measured that tying asset prices to more fundamental sources of risk may be more productive.

If we ask the “representative investor” in December 2008 why he or she is ignoring the high premiums offered by stocks and especially fixed income, the answer might well be “that's nice, but I'm about to lose my job, and my business might go under. I can't take any more risks right now, especially in securities that will lose value or become hard to sell if the recession gets worse.” These extensions of the consumption-based model all formalize this sensible intuition—as opposed to the idea that these consumers have wrong expectations, or that they would have been happy to take risks but intermediaries were making asset-pricing decisions for them.

Investment-based models link asset prices to firms' investment decisions, and general equilibrium models include production technologies and a specification of the fundamental shocks. This is clearly the ambitious goal toward which we are all aiming. The latter tries to answer the vexing questions, where do betas come from, and what makes a company a “growth” or “value” company in the first place?32

B. Behavioral Theories

I think “behavioral” asset pricing's central idea is that people's expectations are wrong.33 It takes lessons from psychology to find systematic patterns to the “wrong” expectations. There are some frictions in many behavioral models, but these are largely secondary and defensive, to keep risk-neutral “rational arbitrageurs” from coming in and undoing the behavioral biases. Often, simple risk aversion by the rational arbitrageurs would serve as well. Behavioral models, like “rational” models, tie asset prices to the fundamental investor's willingness, ability, or (in this case) perception of risk.

Behavioral theories are also discount-rate theories. A distorted probability with risk-free discounting is mathematically equivalent to a different discount rate:

p=sπsmsxs=1Rfsπs*xs,

where

πs*πsmsRf=πsms/sπsms,

s denote states of nature, πs are true probabilities, ms is a stochastic discount factor or marginal utility growth, xs is an asset payoff in state s, and π*s are distorted probabilities.

It is therefore pointless to argue “rational” versus “behavioral” in the abstract. There is a discount rate and equivalent distorted probability that can rationalize any (arbitrage-free) data. “The market went up, risk aversion must have declined” is as vacuous as “the market went up, sentiment must have increased.” Any model only gets its bite by restricting discount rates or distorted expectations, ideally tying them to other data. The only thing worth arguing about is how persuasive those ties are in a given model and data set, and whether it would have been easy for the theory to “predict” the opposite sign if the facts had come out that way.34 And the line between recent “exotic preferences” and “behavioral finance” is so blurred35 that it describes academic politics better than anything substantive.

A good question for any theory is what data it uses to tie down discount rates. By and large, behavioral research so far largely ties prices to other prices; it looks for price patterns that are hard to understand with other models, such as “overreaction” or “underreaction” to news. Some behavioral research uses survey evidence, and survey reports of people's expectations are certainly unsettling. However, surveys are sensitive to language and interpretation. People report astounding discount rates in surveys and experiments, yet still own long-lived assets, houses, and durable goods. It does not take long in teaching MBAs to realize that the colloquial meanings of “expect” and “risk” are entirely different from conditional mean and variance. If people report the risk-neutral expectation, then many surveys make sence. An “optimistic” cashflow growth forecast is the same as a “rational” forecast, discounted at a lower rate, and leads to the correct decision, to invest more. And the risk-neutral expectation—the expectation weighted by marginal utility—is the right sufficient statistic for many decisions. Treat painful outcomes as if they were more probable than they are in fact.

Of course, “rational” theories beyond the simple consumption-based model struggle as well. Changing expectations of consumption 10 years from now (long-run risks) or changing probabilities of a big crash are hard to distinguish from changing “sentiment.” At least one can aim for more predictions than assumptions, tying together several phenomena with a parsimonious specification.

C. Finance Theories

“Finance” theories tie discount rates to broad return-based factors. That's great for data reduction and practical applications. The more practical and “relative-pricing” the application, the more “factors” we accept on the right-hand side. For example, in evaluating a portfolio manager, hedging a portfolio, or finding the cost of capital for a given investment, we routinely include momentum as a “factor” even though we do not have a deep theory of why the momentum factor is priced.

However, we still need the deeper theories for deeper “explanation.” Even if the CAPM explained individual mean returns from their betas and the market premium, we would still have the equity premium puzzle—why is the market premium so large? (And why are betas what they are?) Conversely, even if we had the perfect utility function and a perfect consumption-based model, the fact that consumption data are poorly measured means we would still use finance models for most practical applications.36

A nice division of labor results. Empirical asset pricing in the Fama and French (1996) tradition boils down the alarming set of anomalies to a small set of large-scale systematic risks that generate rewards. “Macro,”“behavioral,” or other “deep” theories can then focus on why the factors are priced.

D. Theories with Frictions

Models that emphasize frictions are becoming more and more popular, especially since the financial crisis. At heart, these models basically maintain the “rational” assumption. Admittedly, there are often “irrational” agents in such models. However, these agents are usually just convenient shortcuts rather than central to the vision. A model may want some large volume of trade,37 or to include some “noise traders,” while focusing clearly on the delegated management problem or the problem of leveraged intermediaries. For such a purpose, it is easy simply to assume a slightly irrational class of trader rather than spell out those trader's motives from first principles. However, such assumptions are not motivated by deep reading of psychology or lab experiments. The focus is on the frictions and behavior of intermediaries rather than the risk-bearing ability of ultimate investors or their psychological misperceptions.

I think it is useful to distinguish three categories of frictions: (1) segmented markets, (2) intermediated markets or “institutional finance,”38 and (3) liquidity. Surely, this is a broad brush categorization and more detailed divisions can usefully be made.

E. Segmented and Intermediated Markets

I distinguish “segmented markets” from “intermediated markets,” as illustrated in Figure 8. Segmentation is really about limited risk sharing among the pool of investors who are active in a particular market.39 Their limited risk bearing can generate “downward-sloping demands” (in quotes, because maybe it is “supply”), and average returns that depend on a “local” factor, little and poorly linked CAPMs.40 Given the factor zoo, which is an attractive idea.

Figure 8.

Segmented markets versus intermediated markets.

“Intermediated markets” or “institutional finance” refers to a different, vertical rather than horizontal, separation of investors from payoffs. Investors use delegated managers. Agency problems in delegated management then spill over into asset prices. For example, suppose investors have “equity” and “debt” claims on the mangers. When losses appear, the managers stave off bankruptcy by trying to sell risky assets. But since all the managers are doing the same thing, prices fall and discount rates rise. Colorful terms such as “fire sales” and “liquidity spirals” describe this process.41

Of course, one must document and explain segmentation and intermediation. As suggested by the dashed arrows in Figure 8, there are strong incentives to undo any price anomaly induced by segmentation or intermediation. Models with these frictions often abstract from deep-pockets unintermediated investors—sovereign wealth funds, pension funds, endowments, family offices, and Warren Buffets—or institutional innovation to bridge the friction. Your “fire sale” is their “buying opportunity” and business opportunity. A little more attention to the reasons for segmentation and intermediation may help us to understand when and for how long these models apply. For example, transactions costs, attention costs, or limited expertise suggest that markets can be segmented until the “deep pockets” arrive, but that they do arrive eventually. So if this is why markets are segmented, segmentation will be more important in the short run, after unusual events, or in more obscure markets. If I try to sell a truckload of tomatoes at 2 am in front of the Booth school, I am not likely to get full price. But if I do it every night, tomato buyers will start to show up. In the flash crash, it took about 10 minutes for buyers to show up, which is either remarkably long or remarkably short, depending on your point of view.

A crucial question is, as always, what data will this class of theories use to measure discount rates? Arguing over puzzling patterns of prices is weak. The rational-behavioral debate has been doing that for 40 years, rather unproductively. Ideally, one should tie price or discount-rate variation to central items in the models, such as the balance sheets of leveraged intermediaries, data on who is actually active in segmented markets, and so forth. I grant that such data are hard to find.42

F. Liquidity

We have long recognized that some assets have higher or lower discount rates in compensation for greater or lesser liquidity.43 We have also long struggled to define and measure liquidity. There are (at least) three kinds of stories for liquidity that are worth distinguishing. Liquidity can refer to the ease of buying and selling an individual security. Illiquidity can also be systemic: Assets will face a higher discount rate if their prices fall when the market as a whole is illiquid, whether or not the asset itself becomes more or less illiquid. Finally, assets can have lower discount rates if they facilitate information trading for assets, as money facilitates physical trading of goods.

I think of “liquidity” as different from “segmentation” in that segmentation is about limited risk-bearing ability, while liquidity is about trading. Liquidity is a feature of assets, not the risks to which they are claims. Many theories of liquidity emphasize asymmetric information, not limited risk-bearing ability—assets become illiquid when traders suspect that anyone buying or selling knows something, not because traders are holding too much of a well-understood risk. Some kinds of liquidity, such as the off-the-run Treasury spread, refer to different prices of economically identical claims. Understanding liquidity requires us to unravel the puzzle of why real people and institutions trade so much more than they do in our models.

G. Efficiency and Discount Rates

All of these theories and related facts are really about discount rates, expected returns, risk bearing, risk sharing, and risk premiums. None are fundamentally about slow or imperfect diffusion of cashflow information, informational “inefficiency” as Fama (1970) defined it. Informational efficiency is not wrong or disproved. Efficiency basically won, and we moved on. When we see information, it is quickly incorporated into asset prices. There is a lot of asset-price movement not related to visible information, but Hayek (1945) told us that would happen, and we learned that a lot of such price variation corresponds to expected returns. Little of the (large) gulf between the above models is really about information. Seeing the facts and the models as categories of discount-rate variation seems much more descriptive of most (not all) theory and empirical work.

Informational efficiency is much easier for markets and models to obtain than wide risk sharing or desegmentation, which is perhaps why it was easier to verify. A market can become informationally efficient with only one informed trader, who does not need to actually buy anything or take any risk. He should run into a wall of indexers, and just bid up the asset he knows is underpriced.44 Though in reality price discovery seems to come with a lot of trading, it does not have to do so. Risk sharing needs everyone to change their portfolios and bear a risk in order to eliminate segmentation. For example, if the small-firm effect came from segmentation, the passively managed small stock fund should have ended it—but it also took the invention and marketing of such funds to end it. The actions of small numbers of arbitrageurs could not do so.

IV. Recent Performance

This is not the place for a deep review of theory and empirical work supporting or confronting theories. Instead, I think it will be more productive to think informally about how these classes of models might be able to handle big recent events.

A. Consumption

I still think the macro-finance approach is promising. Figure 9 presents the market price-dividend ratio, and aggregate consumption relative to a slow-moving “habit.” The habit is basically just a long moving average of lagged consumption, so the surplus-consumption ratio line is basically detrended consumption.45

Figure 9.

Surplus-consumption ratio and price-dividend ratio. The price-dividend ratio is that of the CRSP NYSE value-weighted portfolio. The surplus consumption is formed from monthly real nondurable consumption using the Campbell and Cochrane (1999) specification and parameters, multiplied by three to fit on the same scale.

Consumption and stock market prices did collapse together in 2008. Many high average-return securities and strategies (stocks, mortgage-backed securities, low-grade bonds, momentum, currency carry) collapsed more than low average-return counterparts. The basic consumption-model logic—securities must pay higher returns, or fetch lower prices, if their values fall more when consumption falls—is not drastically wrong.

Furthermore, the habit model captures the idea that people become more risk averse as consumption falls in recessions. As consumption nears habit, people are less willing to take risks that involve the same proportionate risk to consumption. Discount rates rise, and prices fall. Lots of models have similar mechanisms, especially models with leverage.46 In the habit model, the price-dividend ratio is a nearly log-linear function of the surplus-consumption ratio. The fit is not perfect, but the general pattern is remarkably good, given the hue and cry about how the crisis invalidates all traditional finance.

B. Investment

The Q theory of investment is the off-the-shelf analogue to the simple power-utility model from the producer point of view. It predicts that investment should be low when valuations (market to book) are low, and vice versa,

1+αitkt=marketvaluetbookvaluet=Qt,(10)

where it = investment and kt = capital.

Figure 10 contrasts the investment-capital ratio, market-to-book ratio, and price-dividend ratio. The simple Q theory also links asset prices and investment better than you probably supposed, both in the tech boom and in the financial crisis.

Figure 10.

Investment-capital ratio, price-dividend ratio, and market-to-book ratio. Investment is real private nonresidential fixed investment. Capital is cumulated from investment with an assumed 10% annual depreciation rate. The price-dividend ratio is that of the CRSP S&P500 portfolio. The market-to-book ratio comes from Ken French's website.

Many finance puzzles are stated in terms of returns. To make that connection, one can transform (10) into a relation linking asset returns to investment growth. Many return puzzles are mirrored in investment growth as the Q theory suggests.47

Q theory also reminds us that supply as well as demand matters in setting asset prices. If capital could adjust freely, stock values would never change, no matter how irrational investors are. Quantities would change instead.

I do not argue that consumption or investment caused the boom or the crash. Endowment-economy causal intuition does not hold in a production economy. These first-order conditions are happily consistent with a view, for example, that losses on subprime mortgages were greatly amplified by a run on the shadow banking system and flight to quality,48 which certainly qualifies as a “friction.” The first-order conditions are consistent with many other views of the fundamental determinants of both prices and quantities. But the graphs do argue that asset prices and discount rates are much better linked to big macroeconomic events than most people think (and vice versa). They suggest an important amplification mechanism: If people did not become more risk averse in recessions, and if firms could quickly transform empty houses into hamburgers, asset prices would not have declined as much.

I do not pretend to have perfect versions of either of these first-order conditions, let alone a full macro model that captures value or the rest of the factor zoo. These are very simple and rejectable models. Each makes a 100% R2 prediction that is easy, though a bit silly, to formally reject: The habit model predicts that the price-dividend ratio is a function of the surplus-consumption ratio, with no error. The Q theory predicts that investment is a function of Q, with no error, as in (10). The point is only that research and further elaboration of these kinds of models, as well as using their basic intuition as an important guide to events, is not a hopeless endeavor.

C. Comparisons

Conversely, I think the other kinds of models, though good for describing particular anomalies, will have greater difficulty accounting for big-picture asset-pricing events, even the huge movements of the financial crisis.

We see a pervasive, coordinated rise in the premium for systematic risk,49 common across all asset classes, and present in completely unintermediated and unsegmented assets. For example, Figure 11 plots government and corporate rates, and Figure 12 plots the BAA-AAA spread with stock prices. You can see a huge credit spread open up and fade away along with the dip in stock prices.

Figure 11.

BAA, AAA, and Treasury yields. Source: Board of Governors of the Federal Reserve via Fred website.

Figure 12.

Common Risk Premiums. P/D is the S&P500 price-dividend ratio from CRSP. S&P500 is the level of the S&P500 index from CRSP. BAA-AAA is that bond spread, from the Board of Governors. P/D is divided by 15 and the S&P500 is divided by 500 to fit on the same scale.

Behavioral ideas—narrow framing, salience of recent experience, and so forth—are good at generating anomalous prices and mean returns in individual assets or small groups. They do not easily generate this kind of coordinated movement across all assets that looks just like a rise in risk premium. Nor do they naturally generate covariance. For example, “extrapolation” generates the slight autocorrelation in returns that lies behind momentum. But why should all the momentum stocks then rise and fall together the next month, just as if they are exposed to a pervasive, systematic risk?

Finance models do not help, of course, because we are looking at variation of the factors, which those models take as given.

Segmented or institutional models are not obvious candidates to understand broad market movements. Each of us can easily access stocks and bonds through low-cost indices.

And none of these models naturally describe the strong correlation of discount rates with macroeconomic events. Is it a coincidence that people become irrationally pessimistic when the economy is in a tailspin, and they could lose their jobs, houses, or businesses if systematic events get worse?

Again, macro is not everything—understanding the smaller puzzles is important. The point is that looking for macro underpinnings discount-rate variation, even through fairly simple models, is not as hopelessly anachronistic as many seem to think.

D. Arbitrages?

One of the nicest pieces of evidence for segmented or institutional views is that arbitrage relationships were violated in the financial crisis.50 Unwinding the arbitrage opportunities required one to borrow dollars, which intermediary arbitrageurs could not easily do.

Figure 13 gives one example. CDS plus Treasury should equal a corporate bond, and usually does. Not in the crisis.

Figure 13.

Citigroup CDS and Bond Spreads. Source: Fontana (2010).

Figure 14 gives another example: covered interest parity. Investing in the United States versus investing in Europe and returning the money with forward rates should give the same return. Not in the crisis.

Figure 14.

Three-month Libor and FX Swap Rates. Source: Baba and Packer (2009).

Similar patterns happened in many other markets, including even U. S. Treasuries.51 Now, any arbitrage opportunity is a dramatic event. But in each case here the difference between the two ways of getting the same cashflow is dwarfed by the overall change in prices. And, though an “arbitrage,” the price differences are not large enough to attract long-only deep-pocket money. If your precious cash is in a U.S. money market fund, 20 basis points in the depth of a financial crisis is not enough to get you to listen to the salesman offering offshore investing with an exchange rate hedging program.

So maybe it is possible that the “macro” view still builds the benchmark story of overall price change, with very interesting spreads opening up due to frictions. At least we have a theory for that. Constructing a theory in which the arbitrage spreads drive the coordinated rise in risk premium seems much harder.

The price of coffee displays arbitrage opportunities across locations at the ASSA meetings. (The AFA gave it away for free downstairs while Starbucks was selling it upstairs.) The arbitrage reflects an interesting combination of transactions costs, short-sale constraints, consumer biases, funding limits, and other frictions. Yet we do not dream that this fact matters for big-picture variation in worldwide commodity prices.

E. Liquidity Premia and Trading Value

Trading-related liquidity does strike me as potentially important for the big picture, and a potentially important source of the low discount rates in “bubble” events.52

I am inspired by one of the most obvious “liquidity” premiums: Money is overpriced—it has a lower discount rate—relative to government debt, though they are claims to the same payoff in a frictionless market. And this liquidity spread can be huge, hundreds of percent in hyperinflations.

Now, money is “special” for its use in transactions. But many securities are “special” in trading. Trading needs a certain supply of their physical shares. We cannot support large trading volumes by recycling one outstanding share at arbitrarily high speed. Even short sellers must hold a share for a short period of time.

When share supply is small, and trading demand is large, markets can drive down the discount rate or drive up the price of highly traded securities, as they do for money. These effects have long been seen in government bonds, for example, in the Japanese “benchmark” effect, the spreads between on-the-run and off-the-run Treasuries, or the spreads between Treasury and agency bonds.53 Could these effects extend to other assets?

Figures 15 and 16 are suggestive. The stock price rise and fall of the late 1990s was concentrated in NASDAQ and NASDAQ Tech. The stock volume rise and fall was concentrated in the same place. Every asset price “bubble”—defined here by people's use of the label—has coincided with a similar trading frenzy, from Dutch tulips in 1620 to Miami condos in 2006.

Figure 15.

NADSAQ Tech, NASDAQ, and NYSE indices. Source: Cochrane (2003).

Figure 16.

Dollar volume in NASDAQ Tech, NASDAQ, and NYSE. Source: Cochrane (2003).

Is this a coincidence? Do prices rise and fall for other reasons, and large trading volume follows, with no effect on price? Or is the high price—equivalently a low discount rate—explained at least in part by the huge volume, that is, by the value of shares in facilitating a frenzy of information trading?

To make this a deep theory, we must answer why people trade so much. At a superficial level, we know the answer: The markets we study exist to support information-based trading. Yet, we really do not have good models of information-based trading.54 Perhaps the question of how information is incorporated in asset markets will come back to the center of inquiry.

V. Applications

Finance is about practical application, not just deep explanation. Discount-rate variation will change applications a lot.

A. Portfolio Theory

A huge literature explores how investors should exploit the market-timing and intertemporal-hedging opportunities implicit in time-varying expected returns.55

But the average investor must hold the market portfolio. We cannot all time the market, we cannot all buy value, and we cannot all be smarter than average. We cannot even all rebalance. No portfolio advice other than “hold the market” can apply to everyone. A useful and durable portfolio theory must be consistent with this theorem. Our discount-rate facts and theories suggest one, built on differences between people.

Consider Fama and French's (1996) story for value. The average investor is worried that value stocks will fall at the same time his or her human capital falls. But then some investors (“steelworkers”) will be more worried than average, and should short value despite the premium; others (“tech nerds”) will have human capital correlated with growth stocks and buy lots of value, effectively selling insurance. A two-factor model implies a three-fund theorem, and a three-dimensional multifactor efficient frontier as shown inFigure 17.56 It is not easy for an investor to figure out how much of three funds to hold.

Figure 17.

Multifactor efficient frontiers. Investors minimize variance given mean and covariance with the extra factor. A three-fund theorem emerges (left). The market portfolio is multifactor efficient, but not mean-variance efficient (right).

And now we have dozens of such systematic risks for each investor to consider. Time-varying opportunities create more factors, as habits or leverage shift some investors' risk aversion over time more or less than others. Unpriced factors are even more important. Our steelworker should start by shorting a steel industry portfolio, even if it has zero alpha. Zero-alpha portfolios are attractive, as they provide actuarially fair insurance. We academics should understand the variation across people in risks that are hedgeable by systematic factors, and find low-cost portfolios that span that variation.57 Yet we have spent all our time looking for priced factors that are only interesting for the measure-zero mean-variance investor!

All of this sounds hard. That's good! We finally have a reason for a fee-based “tailored portfolio” industry to exist, rather than just to deplore it as folly. We finally have a reason for us to charge large tuitions to our MBA students. We finally have an interesting portfolio theory that is not based on chasing zero-sum alpha.

A.1. State Variables

Discount-rate variation means that state-variable hedging should matter. It is almost completely ignored in practice. Almost all hedge funds, active managers, and institutions still use mean-variance optimizers. This is particularly striking given that they follow active strategies, predicated on the idea that expected returns and variances vary a lot over time!

Perhaps state-variable hedging seems nebulous, and therefore maybe small and easy to ignore. Here is a story to convince you otherwise. Suppose you are a highly risk-averse investor, with a 10-year horizon. You are investing to cover a defined payment, say your 8-year-old child's future tuition at the University of Chicago. The optimal investment is obviously a 10-year zero-coupon indexed Treasury (TIP).58Figure 18 tracks your investment over time.

Figure 18.

Bond Price Example. The solid line tracks the price of a zero-coupon bond that matures in year 10. The dashed lines illustrate the yield, and the fact that the bond price will recover by year 10. The shaded area represents a “crisis” in which interest rates rise and interest-rate volatility rises.

Suppose now that bond prices plunge, and volatility surges, highlighted in the graph. Should you sell in a panic to avoid the risk of further losses? No. You should tear up the statement. “Short-term volatility” is irrelevant. Every decline in price comes with a corresponding rise in expected return. Evaluating bonds with a one-period mean-variance, alpha–beta framework is silly—though a surprising amount of the bond investing world does it!

That is pretty obvious, but now imagine that you are a stock investor in December 2008—say, you are running your university's endowment. As shown inFigure 19, stocks plummeted, and stock volatility, rose dramatically, from 16% to 70%.

Figure 19.

S&P500 Price Index and index volatility. Realized volatility is a 20-day average.

Figure 20.

Volatility. A 20-day realized daily volatility, and VIX index.

Should you sell, to avoid the risks of further losses? The standard formula says so. Picking a mean return and risk aversion to justify a 60% equity share in normal times, you should reduce the portfolio's equity share to 4%:

Equity Share=1γE(Re)σ2(Re).0.6=120.040.182120.040.702=0.04.

(You might object that mean returns rose too. But they would had to have risen to 4 × 0.702/0.182= 60% for this formula to tell you not to change allocation. You also may object that many investors had leverage, tenured professor salaries to pay, or other habit-like considerations for becoming more risk averse. Fair enough, but then one-period mean-variance theory is particularly inappropriate in the first place.)

But not everyone can do this. The market did not fall 1 − 4/60 = 93%. If you are selling, who is buying? Is everyone else being stupid? Does it make any sense to think that the market was irrationally over-valued in the midst of the financial crisis?

The answer, of course, is that one-period mean-variance analysis is completely inappropriate. If the world were i.i.d., volatility could not change in the first place. Stocks are a bit like bonds: Price-dividend drops increase expected returns.59 To some extent, “short-run volatility” does not matter to a long-run investor. State-variable hedging matters a lot, even for simple real-world applications. And, by ICAPM logic, we should therefore expect to observe multiple priced factors. Time-series predictability should be a strong source of additional pricing factors in the “cross section.”

A.2. Prices and Payoffs

Or maybe not. Telling our bond investor to hold 10-year zeros because their “price happens to covary properly with state variables for their investment opportunities” just confuses the obvious. It is much clearer to look at the final payoff and tell him to ignore price fluctuations. Maybe dynamic portfolio theory overall might get a lot simpler if we look at payoff streams rather than look at dynamic trading strategies that achieve those streams.

If you look at payoff streams, it is obvious that an indexed perpetuity (or annuity) is the risk-free asset for long-term investors, despite arbitrary time-varying return moments, just as the 10-year zero was obviously the risk-free asset for my bond investor. it is interesting that coupon-only TIPS are perceived to be an exotic product, or a way to speculate on inflation, not the benchmark risk-free asset for every portfolio in place of a money-market investment.

How about risky investments? Here is a simple and suggestive step.60 If utility is quadratic,

max{ct}Et=0βt(12)(ctc*)2,

it turns out that we can still use two-period mean-variance theory to think about streams of payoffs (loosely, streams of dividends), no matter how much expected returns vary over time. To be specific,

  • 1Every optimal payoff stream combines an indexed perpetuity and a claim to the aggregate dividend stream. Less risk-averse investors hold more of the claim to aggregate dividends, and vice versa.
  • 2Optimal payoffs lie on a long-run mean/long-run variance frontier. The “long-run mean” E~(x) in this statement sums over time as well as across states,
    E~(x)=11βj=0βjE(xt+j).
  • 3State variables disappear from portfolio theory, just as they did for our 10-year TIP investor, once he looked at the 10-year problem.

If our stock market investor thought this way when facing the crisis, he would answer: “I invested in the aggregate dividend stream. Why should I buy or sell? I don't look at the statements.” This is a lot simpler to explain and implement than state-variable identification, deep time-series modeling of investment opportunities, value function calculations, and optimal hedge portfolios!

If investors have outside income in this framework, they first short a payoff stream most correlated with their outside income stream, and then hold the mean-variance efficient payoffs. Calculating long-run correlations of income streams this way may be easier than trying to impute discount-rate induced changes in the present value of outside income streams, which are needed to calculate return-based hedge portfolios.

If investors have no outside income, long-run expected returns (payoffs divided by initial prices) line up with long-run market betas. A CAPM emerges, despite arbitrary time variation in expected returns and variances. ICAPM pricing factors fade away as we look at longer horizons. If investors do have outside income, the payoff corresponding to average outside income payoff emerges as a second priced factor, in the style of Fama and French's (1996) human capital story for the value effect.

Of course, quadratic utility is a troublesome approximation, especially for long-term problems. Still, this simple example captures the possibility that a price and payoff approach can give a much simpler view of pricing and portfolio theory than we get by focusing on the high-frequency dynamic trading strategy that achieves those payoffs in a given market structure.

B. Alphas, Betas, and Performance Evaluation

In the 1970 view, there is one source of systematic risk, the market index. Active management chases “alpha,” which means uncovering assets whose prices do not reflect available information.

Now we have dozens of dimensions of systematic risks. Many hedge fund strategies include an element of option writing. For example, Figure 21 shows the annual returns of “equity market neutral” hedge funds together with the market return. “Providing liquidity” looks a lot like writing out-of-the-money!61

Figure 21.

Hedge Fund Returns. One-year excess returns of the “equity market neutral” hedge fund index and the CRSP value-weighted portfolio. Data source: hedgeindex.com and CRSP.

I tried telling a hedge fund manager, “You don't have alpha. Your returns can be replicated with a value-growth, momentum, currency and term carry, and short-vol strategy.” He said, “‘Exotic beta’ is my alpha. I understand those systematic factors and know how to trade them. My clients don't.” He has a point. How many investors have even thought through their exposures to carry-trade or short-volatility “systematic risks,” let alone have the ability to program computers to execute such strategies as “passive,” mechanical investments? To an investor who has not heard of it and holds the market index, a new factor is alpha. And that alpha has nothing to do with informational inefficiency.

Most active management and performance evaluation today just is not well described by the alpha–beta, information-systematic, selection-style split anymore. There is no “alpha.” There is just beta you understand and beta you do not understand, and beta you are positioned to buy versus beta you are already exposed to and should sell.

C. Procedures, Corporate Finance, Accounting, and Regulation

Time-varying discount rates and multiple factors deeply change many applications.

The first slide in a capital budgeting lecture looks something like this

Value of investment=Expected payoutRf+β[E(Rm)Rf],

with a 6% market premium. All of which, we now know, is completely wrong. The market premium is not always 6%, but varies over time by as much as its mean. (And uncertainty about the market premium is also several percentage points.) Expected returns do not line up with CAPM betas, but rather with multifactor betas to the extent we understand them at all. And since expected returns change over time, the discount rate is different for cashflows at different horizons.

It is interesting that investment decisions get so close to right anyway, with high investment when stock prices are high. (Remember Figure 10.) Evidently, a generation of our MBAs figured out how to jigger the numbers and get the right answer despite a wrong model. Perhaps what we often call “irrational” cashflow forecasts, optimistic in good times and pessimistic in bad times, are just a good way to offset artificially constant discount rates. Or perhaps they understood the Q theory lecture and just follow its advice.

I do not think the answer lies in multifactor betas62 or discounting with dynamic present value models and time-varying risk premia, at least not yet. Capital budgeting is a “relative pricing” exercise—we want to use available information in asset markets to help us decide what the discount rate for a given project should be. For this purpose, simply looking at average returns of “similar” securities is enough. Understanding discount rates as a function of characteristics—or, better, understanding valuations directly as a function of characteristics (the use of “comparables”)—may end up being more fruitful. We do not have to explain discount rates—relate expected returns to betas and understand their deep economics—in order to use them. We do not need an all-purpose model of everything to extend prices from known assets to a new one. Even when discount rates are explained, the characterization (characteristic models) may be a better measure for practical relative pricing than the explanation (beta models). Conversely, capital budgeting gives the same answer if discount rates are “wrong.” When you shop for a salad, all you care about is the price of tomatoes. Whether tomatoes are expensive because the trucks got stuck in bad weather or because of an irrational bubble in the tomato futures market makes no difference to your decision.

Many procedures in accounting, regulation, and capital structure implicitly assume that returns are independent over time, and hence that prices only reflect cashflow information.

Suppose that a firm has a single cashflow in 10 years, and is funded by a zero-coupon bond and equity. In most accounting, capital structure, and regulation procedures we would use the stock and bond prices to calculate the probability of and distance to default. But if prices decline because discount rates rise, that fact has no implication for the probability of or distance to default.

Perhaps banks' complaint that low asset prices represent “illiquidity” or “temporarily depressed valuations” rather than insolvency—a lesser chance of making future interest and principal repayments—make some sense. Perhaps capital requirements do not have to respond immediately to such events. Perhaps “hold to maturity” accounting is not as silly as it sounds. Perhaps the fact that firms change capital structures very slowly in response to changes in equity valuations makes some sense.63

Of course, in such an event the risk-neutral probability of default has risen. Maybe regulators, bondholders, and capital structure should respond to a rise in the state price of the default event exactly as they respond to a rise in the real probability of that event. Maybe, but at least it is a very different issue and worth asking the question.

I am not arguing that mark-to-market accounting is bad, or that fudging the numbers is a good idea. The point is only that what you do with a mark-to-market number might be quite different in a world driven by discount-rate variation than in a world driven by cashflow variation.64 The mark-to-market value is no longer a sufficient statistic. A loss of value coincident with a rise in expected return has different implications than a loss of value with a decline in expected return. Decisions need to incorporate more information, not less.

The view that the stock price is driven by expected earnings lies behind stock-based executive compensation as well. It is already a bit of a puzzle that executives holding company stock or options should therefore hold systematic risks due to recessions, market betas, or commodity price exposures, about which they can do nothing. Understanding that a large fraction of stock returns reflect changes in discount rates or new-factor beta exposures makes the logic of such incentives even more curious. Perhaps stock-based compensation has less to do with effort and operating performance, and more with tax treatment or incentives for risk management.

D. Macroeconomics

Large variation in risk premia implies exciting changes for macroeconomics. Most of macroeconomics focuses on variation in a single intertemporal price, “the” interest rate, which intermediates saving and investment. Yet in the recent recession, as shown in Figure 11, interest rates paid by borrowers (and received by any investors willing to lend) spiked up, while short-term government rates went down. Recessions are all about changes in credit spreads, about the willingness to bear risk and the amount of risk to be borne, far more than they are about changes in the desire for current versus future certain consumption. Most of the Federal Reserve's response consisted of targeting risk premiums, not changing the level of interest rates or addressing a transactions demand for money.

Macroeconomics and finance have thought very differently about consumer (we call them investors) and firm behavior. For example, the consumers in the Cambpell and Cochrane (1999) habit model balance very strong precautionary saving motives with very strong intertemporal substitution motives, and have large and time-varying risk aversion. Their behavior is very far from the permanent-income intuition, its borrowing-constrained alternative, or central role of simple intertemporal substitution (the modern “IS” curve) in macroeconomic thinking.

As one simple example, macroeconomists often think about how consumers will respond to a change in “wealth” coming from a change in stock prices or house prices. Financial economists might suspect that consumers will respond quite differently to a decline in value coming from a discount-rate rise—a temporary change in price with no change in capital stock or cashflow—than one that comes from a change in expected cashflows or destruction of physical capital stock.

Financial models also emphasize adjustment costs or irreversibilities: If firms can freely transform consumption goods to capital, then stock prices (Q) are constant. Yet, most “real business cycle” literature following King, Plosser, and Rebelo (1988) left out adjustment costs because they did not need them to match basic quantity correlations. The first round of “new-Keynesian” literature abstracted from capital altogether, and much work in that tradition continues to do so. Figure 10 pretty strongly suggests that Q is not constant. This figure, the regression evidence of Table I, and interest rates in Figure 11, together suggest that variations in the risk premium drive investment, not variation in the level of risk-free interest rates. Adjustment costs lead to basic differences in analysis. For example, without adjustment costs, the marginal product of capital can be less than one, clashing with the zero bound on nominal rates. With adjustment costs, the price of capital can fall, giving a positive real rate of interest.

Formal macroeconomics has started to introduce some of the same ingredients that macro-finance researchers are using to understand discount-rate variation, including “new” preferences, adjustment cost or other frictions in capital formation.65 Since the 2008 financial crisis, there has been an explosion of macroeconomic models with financial frictions, especially credit-market frictions. Still, we are a long way from a single general-equilibrium model that matches basic quantity and price facts.

Everyone is aware of the question. The job is just hard. Macroeconomic models are technically complicated. Macroeconomic models with time-varying risk premia are even harder. Adding financial frictions while maintaining the models' dynamic intertemporal character is harder still. At a deeper level, successful “grand synthesis” models do not consist of just mixing all the popular ingredients together and stirring the pot; they must maintain the clear quantitative-parable feature of good economic analysis.

An asset-pricing perspective also informs monetary economics, and the interaction between monetary and fiscal policy. From a finance perspective, nominal government debt is “equity” in the government: it is the residual claim to primary fiscal surpluses. Hence, the price level must satisfy the standard asset-pricing equation:

DebttPricelevelt=Etj=0mt,t+j(realprimarysurplust+j).(11)

Inflation can absorb shocks to surpluses, just as equity absorbs shocks to profit streams. This fact is at least an important constraint on monetary policy, especially in a time of looming deficits.66 It suggests there is a large component of inflation that the Fed is powerless to avoid. The analogy to stocks also suggests that variation in the discount rate mt, t + j for government debt is important. A “flight to quality” lowers the discount rate for government debt, raising the right-hand side of (11). People want to hold more government debt, which means getting rid of goods and services. This story links the “rising risk premium” which finance people see as the core of a recession with the “decline in aggregate demand” which macroeconomists see. The standard corporate finance perspective also illuminates the choice of government debt maturity structure and denomination. Indexed or foreign-currency debt is debt, which must be repaid or defaulted. Domestic-currency debt is equity, which can be inflated. Long-term debt allows bond prices to temporarily absorb shocks to future surplus, rather than result in inflation immediately when short-term debt cannot be rolled over.

VI. Conclusion

Discount rates vary a lot more than we thought. Most of the puzzles and anomalies that we face amount to discount-rate variation we do not understand. Our theoretical controversies are about how discount rates are formed. We need to recognize and incorporate discount-rate variation in applied procedures.

We are really only beginning these tasks. The facts about discount-rate variation need at least a dramatic consolidation. Theories are in their infancy. And most applications still implicitly assume i.i.d. returns and the CAPM, and therefore that price changes only reveal cashflow news. Throughout, I see hints that discount-rate variation may lead us to refocus analysis on prices and long-run payoff streams rather than one-period returns.

Appendix

A. Return Forecasts and VARs

A.1. Identities

Campbell and Shiller (1988) Taylor-expand the definition of log return to obtain the approximation

rt+1=κρdpt+1+dpt+Δdt+1,(A1)

where dpt ≡ log(Dt/Pt), rt = log(Rt), dt = log(Dt), ρPD1+PD, PD denotes the point about which one takes the approximation, and κ=−(1 −ρ)log(1 −ρ) −ρlog(ρ). The point of approximation PD need not be the mean, so one can use this and following identities to examine cross-sectional variation in dividend yields without security-specific approximation points. When we are only interested in second moments, we interpret symbols as deviations from means, leaving

rt+1=ρdpt+1+dpt+Δdt+1.(A2)

Iterating (A2) forward, we obtain

dpt=j=1kρj1rt+jj=1kρj1Δdt+j+ρkdpt+k.

Assuming the latter term goes to zero—the absence of “rational bubbles”—we obtain the Cambpell–Shiller present value identity,

dpt=j=1ρj1rt+jj=1ρj1Δdt+j.(A3)

This is an ex post identity, so it also holds in conditional expectations using any information set.

A.2. Dividend Construction

I create dividends from the CRSP annual return series with and without dividends. These series are defined as

Rt+1Pt+1+Dt+1Pt;RXt+1Pt+1Pt.

Then, I construct the dividend yield as

Dt+1Pt+1=Rt+1RXt+11.

By using an annual horizon,I avoid the strong seasonal in dividend payments even when using monthly observations.

This definition reinvests dividends to the end of the year.67 Therefore, dividend growth also includes some return information. Annual sums of dividends are a good deal less volatile. However, the identity Rt+1= (Pt+1+ Dt+1)/Pt and therefore (A2), (A3), and following calculations do not hold using sums of dividends, or dividends reinvested at the risk-free rate.

This definition of dividends has a small practical advantage as well. The resulting dividend-price ratio is a better univariate return forecaster because it removes a good deal of 1-year dividend growth forecastability. For example, consider the sharp stock market decline in Fall 2008. Using a simple sum of past dividends, we would see a large decline in the price-dividend ratio. But much of the stock price decline surely reflected news that dividends in 2009 were going to fall. Reinvested dividends were lower than the sum, and the resulting price-dividend ratio thus included the information that dividends would decline in 2009.

I construct dividend growth by

Dt+1Dt=(Dt+1/Pt+1)(Dt/Pt).

For the VAR in Tables II–IV, I use instead dividend growth implied by the identity (A1),

Δdt+1=κ+rt+1+ρdpt+1dpt.

Actual dividend growth gives very similar results. However, this construction means that Cambpell–Shiller approximate identities hold exactly, so it is easier to see the identities in the results. To make identities hold, it is better to use “pure” returns rather than infer returns from dividend growth, otherwise approximation errors can show up as magic investment opportunities.

A.3. Shock Definition

I identify the dividend growth, dividend yield, and cay shocks in Figure 5 by setting changes to the other variables to zero in each case.

By (A2), this dividend growth shock must come with a contemporaneous return shock,

εt+1d=1,εt+1dp=0,εt+1cay=0,εt+1r=1,

and this dividend yield shock must come with a contemporaneous negative return shock,

εt+1d=0,εt+1dp=1,εt+1cay=0,εt+1r=ρ.

Since this cay shock affects neither dividend growth nor dividend yield, it also comes with no change in return rt,

εt+1d=0,εt+1dp=0,εt+1cay=1,εt+1r=0.

I choose this definition of shocks because it leads to nicely interpretable responses as “cashflow” and “discount rate” news. Because dividends remain roughly unpredictable, this “short-run” identification gives almost the same result as a “long-run” identification. If, rather than define the first shock as ɛt+1dp=0 and ɛt+1cay=0, we had identified the first shock by

(Et+1Et)j=1ρj1rt+j=0,

we would have gotten nearly the same result. The resulting shocks are nearly uncorrelated, which is also convenient. However, there is no guarantee that these elegant properties will survive as we add more variables.

This VAR is very simple, since I left dividend growth and returns out of the right-hand side. My purpose is to distill the essential message of more complex VARs, not to deny that there may be some information in additional lags of these variables.

A.4. Identities in the cay VAR

The present-value identity (A3), conditioned down and reproduced here,

dpt=E[ j=1ρj1rt+j|It]E[ j=1ρj1Δdt+j|It],

implies that an extra variable can only help dp to forecast long-horizon returns if it also helps dp to forecast long-horizon dividend growth. An extra variable can help to forecast 1-year returns by changing the term structure of return forecasts as well. Here I show how this intuition applies algebraically to multiple regression coefficients and impulse-response functions.

Regressing both sides of (A3) on dpt and zt,

rtlr=j=1ρj1rt+j=ar+brlrdpt+crlrzt+εr,Δdtlr=j=1ρj1Δdt+j=ad+bdlrdpt+cdlrzt+εd,

we obtain the generalized restriction on long-run multiple regression coefficients,

1=brlrbdlr,(A4)
0=crlrcdlr.(A5)

Equation (A4) is the same as before, now applied to the multiple regression coefficient. Equation (A5) expresses the idea that a new variable can only help to forecast long-run returns if it also helps to forecast long-run dividend growth. The extra dividend growth and return forecasts will be perfectly negatively correlated. In this way, extra long-run dividend growth forecastability means more long-run return forecastability, not less.

In terms of individual long-horizon regressions

rt+j=br(j)dpt+cr(j)zt+εt+jr,

etc., (A3) similarly implies

1=j=1ρj1br(j)j=1ρj1bd(j),0=j=1ρj1cr(j)j=1ρj1cd(j).

A variable can help to forecast 1-year returns, cr(1)0, only if it correspondingly changes the forecast of longer horizon returns or dividend growth.

Since impulse-response functions are the same as regression coefficients of future variables such as rt + j on shocks at time t, the impulse-response functions must obey the same relation,

1=j=1ρj1edpr(j)j=1ρj1edpΔd(j),0=j=1ρj1ezr(j)j=1ρj1ezΔd(j),

where edpr(j) denotes the response of rt + j to a dpt shock. This fact lets me easily interpret the change in forecastability by adding cay, in the context of the present value identity, by plotting the impulse responses. The numbers in Figure 5 are terms of this decomposition.

A.5. Results Using Welch–Goyal Predictors

To see if the pattern of the cay VAR holds more generally, I tried a number of the forecasting variables in Welch and Goyal (2008). The results are in Table AI. Each of these variables helps substantially to forecast one-period returns. Yet the variables mean-revert quickly and do not forecast dividends much, so the contribution to the variance of dividend yields still comes mostly from the variance of long-run expected returns.

Table AI. Return Forecasts Using Additional Predictors
The return forecasts are of the form
rt+1=a+b×dpt+c×zt+εt+1.
Data are from Welch and Goyal (2008), 1947–2009. I calculate the variance of long-horizon expected returns and dividend growth from a bivariate VAR, and using actual (not identity) dividend growth forecasts. EQUIS, percentage equity issuance, is the ratio of equity issuing activity as a fraction of total issuing activity. SVAR is stock variance, computed as the sum of squared daily returns on the S&P500. IK is the investment to capital ratio, the ratio of aggregate (private nonresidential fixed) investment to aggregate capital for the whole economy. DFY, the default yield spread, is the difference between BAA- and AAA-rated corporate bond yields.
  dp cay EQUIS SVAR IK DFY
c 2.21−0.711.48−5.305.25
t(c) (1.73)(−2.53)(3.40)(−0.85)(1.86)
b0.130.100.190.150.110.13
t(b)(2.61)(1.82)(3.75)(3.05)(2.16)(2.53)
R20.100.160.190.150.110.13
σ(EtΣj=1ρj1rt+j)0.520.460.490.420.530.49
σ(EtΣj=1ρj1Δdt+j)0.170.130.160.110.170.14
A.6. More Lags of Dividend Growth and Returns

An obvious first source of additional variables is less restrictive VARs than the simple first-order VAR that I presented in the text. As usual in VARs, individual coefficients largely enter insignificantly, so it takes some art or prior information to see robust patterns. And by (A2), the same forecasts can be interpreted as regressions using additional lags of any two of returns, dividend yield, and dividend growth.

The second lag of dividend yields is at least economically important. Table AII presents the regressions. The change in dividend yield helps the return forecast, increasing R2 from 0.09 to 0.15, and correspondingly increasing the more interesting measures of expected return variation. The change in dividend yield really helps to forecast dividend growth, with a 3.27 t-statistic, 5% standard deviation of forecast, and a forecast that varies by 90% of the mean. However, the 0.10 autocorrelation in Δdpt suggests that this will be a very short-lived signal, one with little impact on forecasts of long-run dividend growth or returns, and thus to our view of the sources of price-dividend ratio volatility.

Table AII. Return Forecasts with Additional Lags
Forecasts using dividend yield and change in dividend yield. CRSP value-weighted return, 1947–2009. Δdpt = dpt − dpt − 1.
Left-Hand Variable dptΔdpt t(dpt) tdpt) R2σ[Et(y)]% σ[Et(y)]E(y)
rt+1 0.130.26(2.45)(1.83)0.156.760.65
Δdt+1 0.030.35(0.62)(3.27)0.144.980.90
dpt+1 0.930.10(24.7)(0.85)0.91  
Δdpt+1−0.070.10(−1.85)(0.85)0.06  

Similarly, while individual rt − j and Δdt − j coefficients do not look big and do not have much pattern, they can nonetheless help as a group, especially if one sensibly restricts the pattern of lagged coefficients. In this vein, Lacerda and Santa-Clara (2010) and Koijen and van Binsbergen (2010) find that moving averages of past dividend growth help to forecast both returns and dividend growth (as they must, given the present value identity), almost doubling the return-forecast R2.

A.7. VAR Calculations

To find long-run regression coefficients implied by a first-order VAR as in Table II, I run

rt+1=brdpt+εt+1r,(A6)
Δdt+1=bddpt+εt+1d,dpt+1=ϕdpt+εt+1dp.(A7)

I then report

br(k)=br1(ρϕ)k1ρϕ.

To calculate long-run regression coefficients as in Table IV, with z = cay, I write the VAR as

[ dpt+1zt+1]=Φ[ dptzt]+[ εt+1dpεt+1z],[ rt+1Δdt+1]=B[ dptzt]+[ εt+1rεt+1d].

I then report

Et[ rtlrΔdtlr]=Bj=1ρj1Φj1[ dptzt]=B(IρΦ)1[ dptzt].

B. Asset Pricing as a Function of Characteristics

B.1. Portfolio Spreads

In the text, I related 1–10 portfolio means to Sharpe ratios of underlying factors. Here is the result. Suppose that expected returns rise with a characteristic Ci (for example, log book-to-market ratio)

E(Rei)=a+b×Ci,

and this variation corresponds exactly to a factor (for example, hml),

Rtei=βi×ft+εt,

with betas that also rise with the characteristic

βi=aE(f)+bE(f)×Ci,

and with residuals that are uncorrelated

cov(εi,εj)=0.

Now, consider the usual 1–10 or 1–20 spread portfolio. Its mean and variance are

E(ReiRej)=b(CiCj),σ2(ReiRej)=(βiβj)2σ(f)2+2σε2N=b2E(f)2(CiCj)2σ(f)2+2σε2N,

where N is the number of securities in each portfolio. The Sharpe ratio, which is proportional to the t-statistic for the mean spread-portfolio return, is

E(ReiRej)σ(ReiRej)=E(f)σ(f)b(CiCj)b2(CiCj)2+2σε2NE(f)2σ(f)2.

This Sharpe ratio rises as we look at further-separated portfolios. As Ci − Cj increases, it approaches the pure Sharpe ratio of the factor E(f)/σ(f). The Sharpe ratio does not increase forever. Splitting into finer portfolios can get the magic 1% per month portfolio mean or alpha, but cannot arbitrarily raise Sharpe ratios or t-statistics. Splitting the portfolio more finely reduces N, so splits that are too fine end up reducing the Sharpe ratio and t-statistic by including too much idiosyncratic risk.

Having seen this analysis, however, it would seem more efficient to use the information in all assets, not just the tail portfolios, by examining the cross-sectional regression coefficient b^. Since b^=cov(E(Ri),Ci)/var(Ci)=E(Ri×[CiE(Ci)])/var(Ci), this regression coefficient is the mean of a factor that is formed as a linear function of the characteristic.

B.2. Value, Betas, and Samples

In the text, I emphasized that all puzzles are joint puzzles of expected returns and betas, and cautioned that the value puzzle does not hold in pre-1963 U.S. data. Figure A1 presents the CAPM in the Fama–French 10 book-to-market portfolios before and after 1963. In the left-hand panel, you see the familiar failure of the CAPM—average returns are higher in the value portfolio, but there is no association between the wide spread in average returns and market betas. The right-hand panel shows average returns and betas before 1963. Here the CAPM works remarkably well. The big change is not in the pattern of average returns. Value still earns more than growth. The big change is betas: Value firms have higher betas than growth firms in the pre-1963 period.68

Figure A1.

Value effect before and after 1963. Average returns on Fama–French 10 portfolios sorted by book-to-market equity versus CAPM betas. Monthly data. Source: Ken French's website.

B.3. Time Series and Cross-Section

As a first step toward understanding mean returns as a function of characteristics, and to help make the ideas concrete, Table AIII presents regressions using the Fama–French 25 size and book-to-market portfolios. I use log book-to-market and log size relative to the market portfolio.

Table AIII. Characteristic Regressions
The regression specification in the first row is
E(Rt+1ei)=a+b×E(sizeit)+c×E(bmit)+εi;i=1,2,25.
The remaining rows are
Rt+1ei=a+(at)+(ai)+b×sizei,t+c×bmi,t+d×(sizei,tsizei,t12)+e×(bmi,tbmi,t12).
Terms in parentheses only appear in some regressions. size is log(market equity/total market equity). bm is log(book equity/market equity). Monthly data, 1947–2009. Data from Ken French's website.
MethodRegression Coefficients
sizetbmtΔsizetΔbmt
1. Cross-section−0.0300.27  
2. Pooled−0.0220.55  
3. Time dummies−0.0310.29  
4. Portfolio dummies−0.0871.48  
5. Pooled−0.0300.46−0.381.11

The first row of Table AIII gives a pure cross-sectional regression. The regression fits the portfolio average returns quite well, with a 77% R2. (One does better still with a size × bm cross-term, allowing the growth portfolios to have a different slope on size than the value portfolios.)

The second row of Table AIII gives a pooled forecasting regression, which is the most natural way to integrate time-series and cross-sectional approaches. The size coefficient is a little smaller, and the bm coefficient is much larger.

To diagnose the difference between the cross-sectional and pooled regressions, rows 3 and 4 present a regression with time dummies and a regression with portfolio dummies respectively. Variation over time in a given portfolio's book-to-market ratio is a much stronger signal of return variation than the same variation across portfolios in average book-to-market ratio.

When we run such regressions for individual firms, we cannot use dummies, since the average return of a specific company over the whole sample is meaningless. The goal of this regression is to mirror portfolio formation and remove firm name completely from the list of characteristics. The last line of Table AIII gives a way to capture the difference between time-series and cross-sectional approaches without dummies: It allows an independent effect of recent changes in the characteristics. This specification accounts quite well for the otherwise unpalatable time and portfolio dummies. The portfolio dummy regression coefficient that captures time-series variation is quite similar to the sum of the level and recent-change coefficients. It is also gratifyingly similar to the “recent-change” effect in aggregate dividend yield regressions of Table AII. One could of course capture the same phenomenon with portfolios, by sorting based on level and recent change of characteristics. But my goal is to explore the other direction of this equivalence.

Next, we want to run regressions like this on individual data, and find a similar characterization of the covariance matrix as a function of characteristics. Then, we can expand the analysis to multiple right-hand variables.

B.4. Prices in the Cross-Section

Section II.C of the text suggested that we try to understand the variation in prices (price-dividend ratios) across time and portfolios by exploring long-run return predictability in the cross-section. How much of the difference between one asset's price-dividend ratio (or price earnings, book to market, etc.) and another's is due to variation in expected returns, and how much is due to expected dividend growth or other cashflow expectations?69

To explore this question and clarify the idea, I examine the 10 Fama–French book-to-market portfolios. Eventually, we want to do this analysis in individual security data and avoid the use of portfolios altogether, but the portfolios are a simple place to start. Figure A2 presents the average return, dividend growth, and dividend yield of the portfolios.

Figure A2.

Components of Average Returns. Average return rt+1, dividend growth Δdt+1, and dividend yield dpt for the Fama–French 10 book-to-market portfolios, 1947–2009. The dashed Δd line gives mean dividend growth implied by the approximate identity Δdt+1=rt+1κ+ρdpt+1dpt.

Over long horizons, dividend yields are stationary so long-term average returns come from dividend yields and dividend growth. Taking unconditional means of the return identity (A2), and imposing stationarity so E(dpt) = E(dpt+1),

E(ri)=(1ρ)E(dpi)+E(Δdi).(A8)

Figure A2 shows that value portfolio returns come roughly half from greater dividend growth and half from a larger average dividend yield.

Our objective is to produce variance decompositions over time and across securities as I did with the market return. Flipping (A8) around, we have

E(dpi)=11ρ[E(ri)E(Δdi)].(A9)

The same observation about the sources of return gives a fairly extreme version of the usual surprising result about prices. Low prices—high dividend yields—correspond to high dividend growth and thus to even higher returns.70

The first column of Table AIV, Panel A expresses the same idea in a purely cross-sectional regression. From (A9), the coefficients in such a regression obey

1=brcs1ρbdcs1ρ,(A10)

where the b are the cross-sectional regression coefficients of the terms in (A9). We can interpret these coefficients as the fraction of  cross-sectional dividend yield variation driven by discount rates and the fraction driven by dividend growth. (Vuolteenaho (2002) uses a different present value identity to understand variation in the book-to-market ratio directly, rather than use dividend yields as I have. This is a better procedure for individual stocks, which often do not pay dividends. I use dividend yields here for simplicity.) The results are quite similar to the time-series regressions for the market portfolio from Tables II to IV: More than 100% of the cross-sectional variation in average dividend yields of these portfolios comes from cross-sectional variation in expected returns (1.33). Expected dividend growth goes “the wrong way”—low prices correspond to high dividend growth, as seen in Figure A2. (Sample means obey the identity

E(dpti)=11ρ[ E(rt+1i)E(Δdt+1i)+ρ1T(dpTidp1i)].

The last term is not zero, which is why the coefficients in the b/(1 −ρ) column do not add up following (A10).)

Table AIV. Long-Run Panel-Data Regressions
The regression specification in the first column is
E(Rt+1ei)=a+b×E(dpit)+εi;i=1,2,10,
and similarly for Δd. The specification in the remaining columns is
Rt+1ei=a+(at)+(ai)+b×dpi,t+(c×Δdpi,t)+εi,t+1,
and similarly for Δd and dp. ϕ represents the coefficient of dpt+1i on dpti in the same regression. Terms in parentheses only appear in some regressions. Annual data on 10 Fama–French size and book-to-market sorted portfolios, 1947–2009. Data from Ken French's website. Δdt+1= rt+1κ+ρdpt+1− dpt. I use ρ= 0.96.
Left-HandCross-SectionPortfolio DummiesTime DummiesPooledPooled
Variablebb1ρbb1ρϕbb1ρϕbb1ρϕdpΔdp
Panel A: Book-to-Market Portfolios
r0.0531.330.1070.900.0440.330.0950.970.0900.074
Δd0.0260.64−0.011−0.10−0.092−0.68−0.003−0.03−0.0120.076
dp  0.92 0.90 0.94 0.940.002
Panel B: Size Portfolios
r−0.014−0.360.0771.020.0230.270.0670.95
Δd−0.048−1.200.0020.02−0.063−0.73−0.004−0.05
dp  0.963 0.952 0.968 

We can, of course, ask how much of the time variation in these dividend yields around their portfolio average corresponds to return versus dividend growth forecasts. A regression that includes portfolio dummies, shown next in Table AIV, Panel A, addresses this question. The 0.11 return-forecasting coefficient for portfolios is almost the same as the return-forecasting coefficient for the market as a whole seen in Tables II–IV. The dividend growth forecast is also nearly zero. So all variation in book-to-market sorted portfolio dividend yields over time, about portfolio means, corresponds to variation in expected returns, much like that of market returns.

The regression with time dummies, next in Table AIV, Panel A, paints a different picture. The return coefficient is smaller at 0.044, and ϕ is smaller as well, so expected returns only account for 33% of the variation in dividend yields. We finally see an important dividend growth forecast, with the right sign, −0.09, accounting for 68% of dividend yield volatility. The strong contrast of this result with the pure cross-sectional regression means that a time of unusually large cross-sectional dispersion in dividend yields corresponds to unusually high dispersion in dividend growth forecasts.

This is an important regression, in that it shows a component of variation in valuations that does correspond to dividend growth forecasts. The unusual dispersion in dividend growth forecasts adds up to zero, so this kind of variation cannot be seen in the aggregate dividend yield and its forecasting relations. There is variation in forecastable dividend growth, which drives some individual variation in dividend-price ratios. But it averages out across all securities, so that the aggregate dividend yield is driven primarily by expected returns.

A pooled regression with no dummies looks much like the time-series regression with portfolio dummies. There is more time variation in dividend yields than cross-sectional variation, so, adding them up evenly, the time variation dominates the pooled regression.

The last column of Table AIV, Panel A follows Table AIII, to try to unite time-series and cross-sectional variation without using dummies. It shows a very similar result, with the Δdp variable accounting for much of the dividend growth forecastability. The next step is to calculate the price implications of this multivariate regression, as I did with cay, but that takes us too far afield of this simple example.

The Fama–French size portfolios, shown in Table AIV, Panel B, paint a quite different picture. The pure cross-sectional regression (first column) shows cashflow effects: Higher prices (low dividend yields) are associated with higher subsequent dividend growth, which by one measure fully accounts for the dividend yield variation! However, with portfolio dummies we again see that practically all dividend yield variation over time for a given portfolio comes from expected return variation, just as for the market as a whole. With time dummies, variation across portfolios in a given time period is split between return and dividend growth forecasts.

Footnotes

Ancillary