Over the years, many asset pricing studies have employed the sample cross-sectional regression (CSR) R2 as a measure of model performance. We derive the asymptotic distribution of this statistic and develop associated model comparison tests, taking into account the impact of model misspecification on the variability of the CSR estimates. We encounter several examples of large R2 differences that are not statistically significant. A version of the intertemporal capital asset pricing model (CAPM) exhibits the best overall performance, followed by the Fama–French three-factor model. Interestingly, the performance of prominent consumption CAPMs is sensitive to variations in experimental design.