#### A. Empirical Approach

Theory and historical anecdote both suggest that sentiment may cause systematic patterns of mispricing. Because mispricing is hard to identify directly, however, our approach is to look for systematic patterns of mispricing *correction*. For example, a pattern in which returns on young and unprofitable growth firms are (on average) especially low when beginning-of-period sentiment is estimated to be high may represent the correction of a bubble in growth stocks.

Specifically, to identify sentiment-driven changes in cross-sectional predictability patterns, we need to control for two more basic effects, namely, the generic impact of investor sentiment on all stocks and the generic impact of characteristics across all time periods. Thus, we organize our analysis loosely around the following predictive specification:

- (1)

where *i* indexes firms, *t* denotes time, **x** is a vector of characteristics, and *T* is a proxy for sentiment. The coefficient *a*_{1} picks up the generic effect of sentiment, and the vector **b**_{1} the generic effect of characteristics. Our interest centers on **b**_{2}. The null is that **b**_{2} equals zero or, more precisely, that any nonzero effect is rational compensation for systematic risk. The alternative is that **b**_{2} is nonzero and reveals cross-sectional patterns in sentiment-driven mispricing. We call Equation (1) a “conditional characteristics model” because it adds conditional terms to the characteristics model of Daniel and Titman (1997).

#### B. Characteristics and Returns

The firm-level data are from the merged CRSP-Compustat database. The sample includes all common stock (share codes 10 and 11) between 1962 through 2001. Following Fama and French (1992), we match accounting data for fiscal year-ends in calendar year *t*− 1 to (monthly) returns from July *t* through June *t*+ 1, and we use their variable definitions when possible.

Table I shows summary statistics. Panel A summarizes returns variables. Following common practice, we define momentum, *MOM*, as the cumulative raw return for the 11-month period from 12 through 2 months prior to the observation return. Because momentum is not mentioned as a salient characteristic in historical anecdote, and theory does not suggest a direct connection between momentum and the difficulty of valuation or arbitrage, we use momentum merely as a control variable to understand the independence of our results from known mispricing patterns.

Table I. ** Summary Statistics, 1963–2001** Panel A summarizes the returns variables. Returns are measured monthly. Momentum (*MOM*) is defined as the cumulative return for the 11-month period between 12 and 2 months prior to *t*. Panel B summarizes the size, age, and risk characteristics. Size is the log of market equity. Market equity (*ME*) is price times shares outstanding from CRSP in the June prior to *t*. Age is the number of years between the firm's first appearance on CRSP and *t*. Total risk (σ) is the annual standard deviation in monthly returns from CRSP for the 12 months ending in the June prior to *t*. Panel C summarizes profitability variables. The earnings-book equity ratio is defined for firms with positive earnings. Earnings (*E*) is defined as income before extraordinary items (Item 18) plus income statement deferred taxes (Item 50) minus preferred dividends (Item 19). Book equity (*BE*) is defined as shareholders equity (Item 60) plus balance sheet deferred taxes (Item 35). We also report an indicator variable equal to one for firms with positive earnings. Panel D reports dividend variables. Dividends (*D*) are equal to dividends per share at the ex date (Item 26) times shares outstanding (Item 25). We scale dividends by assets and report an indicator variable equal to one for firms with positive dividends. Panel E shows tangibility measures. Plant, property, and equipment (Item 7) and research and development (Item 46) are scaled by assets. We only record research and development when it is widely available after 1971; for that period, a missing value is set to zero. Panel F reports variables used as proxies for growth opportunities and distress. The book-to-market ratio is the log of the ratio of book equity to market equity. External finance (*EF*) is equal to the change in assets (Item 6) less the change in retained earnings (Item 36). When the change in retained earnings is not available we use net income (Item 172) less common dividends (Item 21) instead. Sales growth decile is formed using NYSE breakpoints for sales growth. Sales growth is the percentage change in net sales (Item 12). In Panels C through F, accounting data from the fiscal year ending in *t*− 1 are matched to monthly returns from July of year *t* through June of year *t*+ 1. All variables are Winsorized at 99.5 and 0.5%. | Full Sample | Subsample Means |
---|

*N* | Mean | SD | Min | Max | 1960s | 1970s | 1980s | 1990s | 2000 − 1 |
---|

Panel A: Returns |

*R*_{t} (%) | 1,600,383 | 1.39 | 18.11 | −98.13 | 2,400.00 | 1.08 | 1.56 | 1.25 | 1.46 | 1.28 |

*MOM*_{t}_{−1} (%) | 1,600,383 | 13.67 | 58.13 | −85.56 | 343.90 | 21.62 | 12.24 | 15.02 | 13.06 | 11.02 |

Panel B: Size, Age, and Risk |

*ME*_{t}_{−1} ($M) | 1,600,383 | 621 | 2,319 | 1 | 23,302 | 388 | 238 | 395 | 862 | 1,438 |

*Age*_{t} (Years) | 1,600,383 | 13.36 | 13.41 | 0.03 | 68.42 | 15.90 | 12.62 | 13.61 | 13.26 | 13.47 |

σ_{t−1} (%) | 1,574,981 | 13.70 | 8.73 | 0.00 | 60.77 | 9.44 | 12.51 | 13.32 | 13.89 | 19.55 |

Panel C: Profitability |

*E*+/*BE*_{t}_{−1} (%) | 1,600,383 | 10.70 | 10.03 | 0.00 | 65.14 | 12.10 | 12.05 | 11.37 | 9.54 | 9.49 |

*E* > 0_{t}_{−1} | 1,600,383 | 0.78 | 0.41 | 0.00 | 1.00 | 0.95 | 0.91 | 0.78 | 0.71 | 0.68 |

Panel D: Dividend Policy |

*D*/*BE*_{t}_{−1} (%) | 1,600,383 | 2.08 | 2.98 | 0.00 | 17.94 | 4.42 | 2.75 | 2.11 | 1.58 | 1.43 |

*D* > 0_{t}_{−1} | 1,600,383 | 0.48 | 0.50 | 0.00 | 1.00 | 0.77 | 0.66 | 0.50 | 0.37 | 0.33 |

Panel E: Tangibility |

*PPE*/*A*_{t−1} (%) | 1,476,109 | 54.66 | 37.15 | 0.00 | 187.69 | 70.21 | 59.14 | 55.49 | 51.28 | 45.49 |

*RD*/*A*_{t−1} (%) | 1,452,840 | 2.97 | 7.27 | 0.00 | 54.75 | | 1.22 | 2.29 | 3.86 | 4.68 |

Panel F: Growth Opportunities and Distress |

*BE*/*ME*_{t}_{−1} | 1,600,383 | 0.94 | 0.86 | 0.02 | 5.90 | 0.70 | 1.37 | 0.95 | 0.76 | 0.82 |

*EF*/*A*_{t}_{−1} (%) | 1,549,817 | 11.44 | 24.24 | −71.23 | 127.30 | 7.17 | 6.45 | 10.59 | 13.97 | 17.71 |

*GS*_{t}_{−1} (Decile) | 1,529,508 | 5.94 | 3.16 | 1.00 | 10.00 | 5.67 | 5.66 | 6.01 | 6.08 | 5.91 |

The remaining panels summarize the firm and security characteristics that we consider. The previous sections' discussions point us directly to several variables. To that list, we add a few more characteristics that, by introspection, seem likely to be salient to investors. Overall, we roughly group characteristics as pertaining to firm size and age, profitability, dividends, asset tangibility, and growth opportunities and/or distress.

Size and age characteristics include market equity, *ME*, from June of year *t*, measured as price times shares outstanding from CRSP. We match *ME* to monthly returns from July of year *t* through June of year *t*+ 1. Firm age, *Age*, is the number of years since the firm's first appearance on CRSP, measured to the nearest month,^{7} and *Sigma* is the standard deviation of monthly returns over the 12 months ending in June of year *t*. If there are at least nine returns available to estimate it, *Sigma* is then matched to monthly returns from July of year *t* through June of year *t*+ 1. While historical anecdote does not identify stock volatility itself as a salient characteristic, prior work argues that it is likely to be a good proxy for the difficulty of both valuation and arbitrage.

Profitability characteristics include the return on equity, *E+/BE*, which is positive for profitable firms and zero for unprofitable firms. Earnings (*E*) is income before extraordinary items (Item 18) plus income statement deferred taxes (Item 50) minus preferred dividends (Item 19), if earnings are positive; book equity (*BE*) is shareholders equity (Item 60) plus balance sheet deferred taxes (Item 35). The profitability dummy variable *E* > 0 takes the value one for profitable firms and zero for unprofitable firms.

Dividend characteristics include dividends to equity, *D/BE*, which is dividends per share at the ex date (Item 26) times Compustat shares outstanding (Item 25) divided by book equity. The dividend payer dummy *D* > 0 takes the value one for firms with positive dividends per share by the ex date. The decline noted by Fama and French (2001) in the percentage of firms that pay dividends is apparent.

The referee suggests that asset tangibility may proxy for the difficulty of valuation. Asset tangibility characteristics are measured by property, plant and equipment (Item 7) over assets, *PPE/A*, and research and development expense over assets (Item 46), *RD/A*. One concern is the coverage of the R&D variable. We do not consider this variable prior to 1972, because the Financial Accounting Standards Board did not require R&D to be expensed until 1974 and Compustat coverage prior to 1972 is very poor. Also, even in recent years less than half of the sample reports positive R&D.

Characteristics indicating growth opportunities, distress, or both include book-to-market equity, *BE/ME*, whose elements are defined above. External finance, *EF/A*, is the change in assets (Item 6) minus the change in retained earnings (Item 36) divided by assets. Sales growth (*GS*) is the change in net sales (Item 12) divided by prior-year net sales. Sales growth *GS*/10 is the decile of the firm's sales growth in the prior year relative to NYSE firms' decile breakpoints.

As will become clear below, one must grasp the multidimensional nature of the growth and distress variables in order to understand how they interact with sentiment. In particular, book-to-market wears at least three hats: High values may indicate distress; low values may indicate high growth opportunities; and, as a scaled-price variable, book-to-market is also a generic valuation indicator that varies with any source of mispricing or rational expected returns. Similarly, sales growth and external finance wear at least two hats: Low values (which are negative) may indicate distress, and high values may reflect growth opportunities. Further, to the extent that market timing motives drive external finance, *EF/A* also wears a third hat as a generic misvaluation indicator.

All explanatory variables are Winsorized each year at their 0.5 and 99.5 percentiles. Finally, in Panels C through F, the accounting data for fiscal years ending in calendar year *t*− 1 are matched to monthly returns from July of year *t* through June of year *t*+ 1.

#### C. Investor Sentiment

Prior work suggests a number of proxies for sentiment to use as time-series conditioning variables. There are no definitive or uncontroversial measures, however. We therefore form a composite index of sentiment that is based on the common variation in six underlying proxies for sentiment: the closed-end fund discount, NYSE share turnover, the number and average first-day returns on IPOs, the equity share in new issues, and the dividend premium. The sentiment proxies are measured annually from 1962 to 2001. We first introduce each proxy separately, and then discuss how they are formed into overall sentiment indexes.

The closed-end fund discount, *CEFD*, is the average difference between the net asset values (NAV) of closed-end stock fund shares and their market prices. Prior work suggests that *CEFD* is inversely related to sentiment. Zweig (1973) uses it to forecast reversion in Dow Jones stocks, and Lee et al. (1991) argue that sentiment is behind various features of closed-end fund discounts. We take the value-weighted average discount on closed-end stock funds for 1962 through 1993 from Neal and Wheatley (1998), for 1994 through 1998 from CDA/Wiesenberger, and for 1999 through 2001 from turn-of-the-year issues of the *Wall Street Journal*.

NYSE share turnover is based on the ratio of reported share volume to average shares listed from the *NYSE Fact Book*. Baker and Stein (2004) suggest that turnover, or more generally liquidity, can serve as a sentiment index: In a market with short-sales constraints, irrational investors participate, and thus add liquidity, only when they are optimistic; hence, high liquidity is a symptom of overvaluation. Supporting this, Jones (2001) finds that high turnover forecasts low market returns. Turnover displays an exponential, positive trend over our period and the May 1975 elimination of fixed commissions also has a visible effect. As a partial solution, we define *TURN* as the natural log of the raw turnover ratio, detrended by the 5-year moving average.

The IPO market is often viewed as sensitive to sentiment, with high first-day returns on IPOs cited as a measure of investor enthusiasm, and the low idiosyncratic returns on IPOs often interpreted as a symptom of market timing (Stigler (1964), Ritter (1991)). We take the number of IPOs, *NIPO*, and the average first-day returns, *RIPO*, from Jay Ritter's website, which updates the sample in Ibbotson, Sindelar, and Ritter (1994).

The share of equity issues in total equity and debt issues is another measure of financing activity that may capture sentiment. Baker and Wurgler (2000) find that high values of the equity share predict low market returns. The equity share is defined as gross equity issuance divided by gross equity plus gross long-term debt issuance using data from the *Federal Reserve Bulletin*.^{8}

Our sixth and last sentiment proxy is the dividend premium, *P*^{D−ND}, the log difference of the average market-to-book ratios of payers and nonpayers. Baker and Wurgler (2004) use this variable to proxy for relative investor demand for dividend-paying stocks. Given that payers are generally larger, more profitable firms with weaker growth opportunities (Fama and French (2001)), the dividend premium may proxy for the relative demand for this correlated bundle of characteristics.

Each sentiment proxy is likely to include a sentiment component as well as idiosyncratic, non-sentiment-related components. We use principal components analysis to isolate the common component. Another issue in forming an index is determining the relative timing of the variables—that is, if they exhibit lead-lag relationships, some variables may reflect a given shift in sentiment earlier than others. For instance, Ibbotson and Jaffe (1975), Lowry and Schwert (2002), and Benveniste et al. (2003) find that IPO volume lags the first-day returns on IPOs. Perhaps sentiment is partly behind the high first-day returns, and this attracts additional IPO volume with a lag. More generally, proxies that involve firm supply responses (*S* and *NIPO*) can be expected to lag behind proxies that are based directly on investor demand or investor behavior (*RIPO, P*^{D-ND}, *TURN*, and *CEFD*).

We form a composite index that captures the common component in the six proxies and incorporates the fact that some variables take longer to reveal the same sentiment.^{9} We start by estimating the first principal component of the six proxies and their lags. This gives us a first-stage index with 12 loadings, one for each of the current and lagged proxies. We then compute the correlation between the first-stage index and the current and lagged values of each of the proxies. Finally, we define *SENTIMENT* as the first principal component of the correlation matrix of six variables—each respective proxy's lead or lag, whichever has higher correlation with the first-stage index—rescaling the coefficients so that the index has unit variance.

This procedure leads to a parsimonious index

- (2)

where each of the index components has first been standardized. The first principal component explains 49% of the sample variance, so we conclude that one factor captures much of the common variation. The correlation between the 12-term first-stage index and the *SENTIMENT* index is 0.95, suggesting that little information is lost in dropping the six terms with other time subscripts.

The *SENTIMENT* index has several appealing properties. First, each individual proxy enters with the expected sign. Second, all but one enters with the expected timing; with the exception of *CEFD*, price and investor behavior variables lead firm supply variables. Third, the index irons out some extreme observations. (The dividend premium and the first-day IPO returns reached unprecedented levels in 1999, so for these proxies to work as individual predictors in the full sample, these levels must be matched exactly to extreme future returns.)

One might object to equation (2) as a measure of sentiment on the grounds that the principal components analysis cannot distinguish between a common sentiment component and a common business cycle component. For instance, the number of IPOs varies with the business cycle in part for entirely rational reasons. We want to identify when the number of IPOs is high for *no* good reason. We therefore construct a second index that explicitly removes business cycle variation from each of the proxies prior to the principal components analysis.

Specifically, we regress each of the six raw proxies on growth in the industrial production index (Federal Reserve Statistical Release G.17), growth in consumer durables, nondurables, and services (all from BEA National Income Accounts Table 2.10), and a dummy variable for NBER recessions. The residuals from these regressions, labeled with a superscript ⊥, may be cleaner proxies for investor sentiment. We form an index of the orthogonalized proxies following the same procedure as before. The resulting index is

- (3)

Here, the first principal component explains 53% of the sample variance of the orthogonalized variables. Moreover, only the first eigenvalue is above 1.00. In terms of the signs and the timing of the components, *SENTIMENT*^{⊥} retains all of the appealing properties of *SENTIMENT*.

Table II summarizes and correlates the sentiment measures, and Figure 1 plots them. The figure shows immediately that orthogonalizing to macro variables is a second-order issue. It does not qualitatively affect any component of the index or the overall index (see Panel E). Indeed, Table II suggests that on balance the orthogonalized proxies are slightly *more* correlated with each other than are the raw proxies. If the raw variables were driven by common macroeconomic conditions (that we failed to remove through orthogonalization) instead of common investor sentiment, one would expect the opposite. In any case, to demonstrate robustness we present results for both indexes in our main analysis.

Table II. ** Investor Sentiment Data, 1962–2000** Means, standard deviations, and correlations for measures of investor sentiment. In the first panel, we present raw sentiment proxies. The first (*CEFD*) is the year-end, value-weighted average discount on closed-end mutual funds. The data on prices and net asset values (NAVs) come from Neal and Wheatley (1998) for 1962 through 1993, CDA/Wiesenberger for 1994 through 1998, and turn-of-the-year issues of the *Wall Street Journal* for 1999 and 2000. The second measure (*TURN*) is detrended natural log turnover. Turnover is the ratio of reported share volume to average shares listed from the NYSE Fact Book. We detrend using the past 5-year average. The third measure (*NIPO*) is the annual number of initial public offerings. The fourth measure (*RIPO*) is the average annual first-day returns of initial public offerings. Both IPO series come from Jay Ritter, updating data analyzed in Ibbotson, Sindelar, and Ritter (1994). The fifth measure (*S*) is gross annual equity issuance divided by gross annual equity plus debt issuance from Baker and Wurgler (2000). The sixth measure (*P*^{D}^{−}^{ND}) is the year-end log ratio of the value-weighted average market-to-book ratios of payers and nonpayers from Baker and Wurgler (2004). Turnover, the average annual first-day return, and the dividend premium are lagged 1 year relative to the other three measures. *SENTIMENT* is the first principal component of the six sentiment proxies. In the second panel, we regress each of the six proxies on the growth in industrial production, the growth in durable, nondurable, and services consumption, the growth in employment, and a flag for NBER recessions. The orthogonalized proxies, labeled with a “^{⊥},” are the residuals from these regressions. *SENTIMENT*^{⊥} is the first principal component of the six orthogonalized proxies. Superscripts a, b, and c denote statistical significance at the 1%, 5%, and 10% level, respectively. | Mean | SD | Min | Max | Correlations with Sentiment | Correlations with Sentiment Components |
---|

*SENTIMENT* | *SENTIMENT*^{⊥} | *CEFD* | *TURN* | *NIPO* | *RIPO* | *S* | *P*^{D − ND} |
---|

Panel A: Raw Data |

*CEFD*_{t} | 9.03 | 8.12 | −10.41 | 23.70 | −0.71^{a} | −0.60^{a} | 1.00 | |

*TURN*_{t−}_{1} | 11.99 | 18.27 | −26.70 | 42.96 | 0.71^{a} | 0.68^{a} | −0.29^{c} | 1.00 | |

*NIPO*_{t} | 358.41 | 262.76 | 9.00 | 953.00 | 0.74^{a} | 0.66^{a} | −0.55^{a} | 0.38^{b} | 1.00 | |

*RIPO*_{t−}_{1} | 16.94 | 14.93 | −1.67 | 69.53 | 0.76^{a} | 0.80^{a} | −0.42^{a} | 0.50^{a} | 0.35^{b} | 1.00 | |

*S*_{t} | 19.53 | 8.34 | 7.83 | 43.00 | 0.33^{b} | 0.44^{a} | −0.01 | 0.30^{c} | 0.16 | 0.26 | 1.00 | |

*P*^{D−ND}_{t−1} | 0.20 | 18.67 | −33.17 | 36.06 | −0.83^{a} | −0.76^{a} | 0.52^{a} | −0.50^{a} | −0.56^{a} | −0.58^{a} | −0.12 | 1.00 |

Panel B: Controlling for Macroeconomic Conditions |

*CEFD*^{⊥}_{t} | 0.00 | 6.25 | −18.32 | 9.60 | −0.62^{a} | −0.63^{a} | 1.00 | |

*TURN*^{⊥}_{t−1} | 0.00 | 15.49 | −26.03 | 26.37 | 0.69^{a} | 0.71^{a} | −0.26 | 1.00 | |

*NIPO*^{⊥}_{t} | 0.00 | 226.30 | −435.98 | 484.15 | 0.73^{a} | 0.74^{a} | −0.45^{a} | 0.39^{b} | 1.00 | |

*RIPO*^{⊥}_{t−1} | 0.00 | 14.31 | −23.55 | 46.54 | 0.77^{a} | 0.83^{a} | −0.46^{a} | 0.53^{a} | 0.44^{a} | 1.00 | |

*S*^{⊥}_{t} | 0.00 | 6.15 | −12.17 | 14.29 | 0.55^{a} | 0.67^{a} | −0.41^{a} | 0.32^{b} | 0.50^{a} | 0.47^{a} | 1.00 | |

*P*^{D−ND⊥}_{t−1} | 0.00 | 16.89 | −43.20 | 35.96 | −0.78^{a} | −0.77^{a} | 0.26 | −0.60^{a} | −0.46^{a} | −0.68^{a} | −0.28^{c} | 1.00 |

More importantly, Figure 1 shows that the sentiment measures roughly line up with anecdotal accounts of fluctuations in sentiment. Most proxies point to low sentiment in the first few years of the sample, after the 1961 crash in growth stocks. Specifically, the closed-end fund discount and dividend premium are high, while turnover and equity issuance-related variables are low. Each variable identifies a spike in sentiment in 1968 and 1969, again matching anecdotal accounts. Sentiment then tails off until, by the mid 1970s, it is low by most measures (recall that for turnover this is confounded by deregulation). The late 1970s through mid 1980s sees generally rising sentiment, and, according to the composite index, sentiment has not dropped far below a medium level since 1980. At the end of 1999, near the peak of the Internet bubble, sentiment is high by most proxies. Overall, *SENTIMENT*^{⊥} is positive for the years 1968–1970, 1972, 1979–1987, 1994, 1996–1997, and 1999–2001. This correspondence with anecdotal accounts seems to confirm that the measures capture the intended variation.

There are other variables that one might reasonably wish to include in a sentiment index. The main constraint is availability and consistent measurement over the 1962–2001 period. We have considered insider trading as a sentiment measure. Unfortunately, a consistent series does not appear to be available for the whole sample period. However, Nejat Seyhun shared with us his monthly series, which spans 1975 to 1994, on the fraction of public firms with net insider buying (as plotted in Seyhun (1998, p. 117)). Lakonishok and Lee (2001) study a similar series. We average Seyhun's series across months to obtain an annual series. Over the overlapping 20-year period, insider buying has a significant negative correlation with both the raw and orthogonalized sentiment indexes, and also correlates with the six underlying components as expected.