Forecasting GDP in Europe with Textual Data

We evaluate the informational content of news-based sentiment indicators for forecasting Gross Domestic Product (GDP) and other macroeconomic variables of the five major European economies. Our data set includes over 27 million articles for 26 major newspapers in 5 different languages. The evidence indicates that these sentiment indicators are significant predictors to forecast macroeconomic variables and their predictive content is robust to controlling for other indicators available to forecasters in real-time.


Introduction
Business and consumer surveys are an essential tool used by policy-makers and practitioners to monitor and forecast the economy.Their most valuable feature is to provide timely information about the current and expected state of economic activity that is relevant to integrate the sluggish release of macroeconomic indicators.Interestingly, surveys are often interpreted as measures of economic sentiment in the sense of providing the pulse of different aspects of the economy, such as the consumers' attitude toward spending or the expectation of purchasing managers about inflation.Some prominent examples are represented by the Survey of Consumers of the University of Michigan (MCS) for the United States (Curtin and Dechaux, 2015) and the Business and Consumer Survey (BCS) for the European Union (European Commission, 2016).Although surveys are very valuable and accurate proxies of economic activity, they are typically released at the monthly frequency which might limit their usefulness in high-frequency nowcasting of economic variables (Aguilar et al., 2021;Algaba et al., 2023).
The goal of this paper is to contribute an alternative measure of economic sentiment that, similarly to surveys, captures the overall attitude of the public toward specific aspects of the economy.The measure that we propose is obtained from textual analysis of a large data set of daily newspaper articles.We believe that news represent a valuable source of information as they report on economic and political events that could possibly influence economic decisions.In this sense, we expect that measuring economic sentiment from news text can provide a signal regarding the current state of the economy as well as its future trajectory.Deriving sentiment measures from news provides several advantages.One is that sentiment can be measured at high frequencies since news are available on a daily basis, as opposed to surveys that are typically available at the monthly frequency as discussed earlier.Such a feature can be very helpful when the economy is rapidly changing direction and high-frequency monitoring in real-time is needed.Another convenient feature of using news is that they allow to compute sentiment about any economic aspect of interest at a small additional cost relative to surveys that have a fixed structure and cannot be changed very easily.For instance, the MCS and BCS are mostly focused on surveying the attitude of consumers and businesses regarding employment, output, and prices, while neglecting topics such as financial conditions and monetary policy.Overall, the findings in Larsen et al. (2021) show that news have a significant role in shaping the inflation expectations of consumers.A relevant question to answer is whether there is any residual predictability in the news-based sentiment once we take into account the sentiment from surveys.In this paper, we thus focus on understanding the incremental value of sentiment constructed from textual news data in the context of nowcasting and forecasting GDP for five European countries that are characterized by considerable delays in the release of their official statistics.Estimating GDP is a very complex task that requires to gather information about the economic activities of thousands of consumers and businesses mostly via surveys.The data collection process is quite involved and time consuming which leads to significant delays in releasing macroeconomic data.For instance, Eurostat publishes the preliminary flash estimate of GDP 30 days after the end of the quarter based on one or at best two releases of the monthly indicators.More accurate estimates are released with the official flash estimate after 45 days and the traditional (non-flash) estimates are normally published with a 65 days delay (Eurostat, 2016).
We construct the sentiment indicators based on a data set that includes news from 26 major newspapers in France, Germany, Italy, Spain and the UK.The data set amounts to over 27 million articles and a total of 12.5 billion words in 5 languages.To transform the text into a sentiment measure we follow the Fine-Grained, Aspect-based Sentiment (FiGAS) approach proposed by Consoli et al. (2022) and applied to the US by Barbaglia et al. (2023).The FiGAS approach has been originally designed for the analysis of text in the English language, and thus cannot be easily adapted to the analysis of text in other languages.To overcome this problem, we translate the articles from French, German, Spanish and Italian to English relying on a neural machine translation service, which contributes to further increasing the computational burden of the analysis.This work joins an emerging literature that uses textual data from news to forecast economic variables with a focus, in particular, on the US (Ardia et al., 2019;Barbaglia et al., 2023;Ellingsen et al., 2022;Shapiro et al., 2022), the UK (Rambaccussing and Kwiatkowski, 2020;Kalamara et al., 2022), France (Bortoli et al., 2018), Germany (Feuerriegel andGordon, 2019), Italy (Aprigliano et al., 2023), Norway (Larsen and Thorsrud, 2019), and Spain (Aguilar et al., 2021).There are very few studies in economics and finance that perform a textual analysis of news across several countries and languages (e.g., Baker et al., 2016;Ashwin et al., 2021).Although this adds considerable complications, in particular in the computational aspect, we believe that it offers the opportunity to validate the robustness of our findings across countries and to reveal country-specific characteristics of the relationship between news and macroeconomic variables.
We design six sentiment indicators to capture the attitude in the news regarding the economy, unemployment rate, inflation, manufacturing, financial sector, and monetary policy.
The choice of a wide range of topics aims at capturing various aspects of economic activity and policy.We find that the sentiment measures related to the real sector have a significant business cycle component in all countries considered.In addition, the evidence indicates that these measures are highly correlated across countries, and in particular in the case of the Monetary Policy and Inflation sentiments for the countries in the euro area.This is consistent with the fact that these countries participate in a monetary union that contributes to synchronising news about monetary policy, such as the actions of the European Central Bank.
The results indicate also that the sentiment measures are significant predictors for GDP growth at horizons ranging from 30 days to a year before the official release.This result is robust to controlling for the real-time flow of macroeconomic releases as well as survey information.In this sense, we believe that the predictive power of sentiment is genuinely incremental relative to the informational content of survey confidence indexes.Furthermore, we find that the sentiment indicators that are relevant to predict GDP are heterogeneous across countries.In the case of Germany, the predictive content is mostly embedded in the news regarding the unemployment rate and inflation, while news on monetary policy appear to be particularly relevant in Spain and Italy.We also find that news discussing the financial sector are informative in the case of France and the UK.These results suggest that the attention of the public to news might depend on country-specific characteristics, such as the Germans' concern with inflation or the Italians' sensitivity to monetary policy decisions.
In addition, we also validate our findings for GDP on other macroeconomic variables of interest, such as the unemployment and inflation rates, and the growth rate of industrial production and confirm, to a large extent, the evidence that sentiment measures provide genuine predictive power.
The paper is structured as follows.Section 2 introduces the sentiment-based indicators and the forecasting methodology, while Section 3 describes the data and discusses the proposed news-based indicators and their relation with official surveys.The in-sample and out-of-sample real-time forecasting exercises are carried out in Sections 4 and 5, respectively, while Section 6 evaluates the robustness of our findings to forecast other macroeconomic variables.Finally, Section 7 concludes the paper.

Methodology
We follow the FiGAS approach proposed by Consoli et al. (2022) to construct sentiment indicators of different aspects of economic activity at the daily frequency.A key feature of FiGAS is that it computes sentiment that relates to an economic topic (e.g., Economy or Financial Sector) by considering only the words that are connected to the terms of interest in the sentence.More specifically, the goal is to isolate the neighbouring words that are grammatically related to the concept of interest and that modify and characterize its tone.
The approach produces a sentiment indicator that is targeted to monitor the attitude in the news regarding a topic of interest, as opposed to alternative approaches that calculate sentiment for the entire article.Such an approach might be able to capture the sentiment regarding a wide range of concepts that are discussed in the news text and could be useful to measure the overall state of the economy.In this sense, the FiGAS sentiment is aspectbased since it refers to a particular topic.This could be relevant for policy-makers when the concern is to extract the sentiment regarding a specific variable, for example inflation, rather than an aggregate indicator.
The sentiment of the text is then obtained by assigning a category to each word (e.g., positive or negative) or a numerical value that measures the sentiment content of the word.
Our approach is fine-grained since it relies on a dictionary that scores words in the range [−1, 1], rather than simply classifying them in predefined categories (Loughran and Mc-Donald, 2011).The advantage of a numerical score is that it weighs words based on the sentiment content they convey.In addition, the sign of the sentiment deriving from the dependent words is adjusted to reflect the tone of the term of interest.For instance, we reverse the sign of the sentiment for words that are dependent on the term "unemployment" which has a negative connotation, or if we detect a negation in the chunk of text considered.
Another characteristic of this approach is that it aims at extracting from the text the reference to a geographic location.The goal is to include in the calculation of the sentiment only the text that references to the country we are targeting to monitor.In case no country can be identified in the text, we assume that it refers to the place where the journal is published.
Overall, the FiGAS approach promises to deliver an indicator targeted to the topic of interest and that accurately measures sentiment due to its fine-grained and aspect-based nature.
As an illustration, consider the sentence: "The French economy has been experiencing its worst recession since 1968, while Italy entered into recession with a GDP drop ...", which appeared in the French newspaper La Tribune on September 9 th , 2020.If we are interested in extracting the sentiment about the topic Economy with location France, the FiGAS algorithm detects that only the first part of the sentence relates to this combination of topic and location.The algorithm also recognizes a specific semantic rule which, in this case, is that the topic of interest is followed by a direct object, and identifies the terms that characterize it, namely "experience", "worst" and "recession".We assign a sentiment score to each of these words based on the proposed fine-grained dictionary and obtain an overall sentiment score of -0.98, which captures the negative outlook expressed in the sentence about the French economy.Consoli et al. (2022) provides a detailed presentation of the algorithm and a comparison with other popular sentiment analysis methods1 .
In our application, we are interested in tracking the sentiment conveyed by news regarding the overall state of economic activity.We thus choose keywords that relate to the Economy, Financial Sector, Inflation, Manufacturing, and Monetary Policy.More specifically, we use the same terms as in Barbaglia et al. (2023), except for the Monetary Policy indicator where we include terms such as European Central Bank (ECB) and Bank of England (BOE), among others, to adapt the searches to the European scenario.The terms are: • Economy: economy; • Financial Sector (finsector): bank, derivative, lending, borrowing and combinations of [banking, financial] with [sector, commercial, and investment]; • Inflation: inflation; • Manufacturing (manuf): manufacturing and combinations of [industrial, manufacturing, construction, factory, auto] with [sector, production, output, and activity]; • Monetary Policy (monpol): ecb, european central bank, boe, bank of england, money supply, monetary policy, base rate, interest rate, refinancing rate, marginal lending facility and deposit facility; • Unemployment: unemployment.
In the Online Appendix, we discuss in detail the main features of the algorithm and the fine-tuning required to adapt the analysis to the European news.While the sentiment algorithm is the same used by Barbaglia et al. (2023) to study the US case, its implementation differs in few aspects when working with non-English news from European newspapers.
Other than the keyword selection as detailed above for the case of monetary policy, also the filtering on the most relevant geographic location based on named-entity recognition needs to be adapted.Given that the sentiment algorithm is based on linguistic rules designed for the English language, we decided to translate the news article from Spanish, French, Italian, and German to English.We refer to the Online Appendix for further details on the robustness of the translation approach in sentiment analysis.

Forecasting models
Our goal is to forecast annualized real GDP growth at the quarterly frequency at horizons that range from 15 days before the official release to approximately a year.The baseline specification that we consider is where: • Y d t represents the GDP growth for quarter t that is released on day d; we use two indexes to track both the calendar time of the variable as well as the irregular release dates; • S d−h denotes a confidence or sentiment indicator available at the time the forecast is made on day d − h, that is, h days before the official release date, d; the goal here is to accurately track the real-time flow of information that is available to forecasters; • X d−h indicates a vector of variables that are available on day d − h; in this vector we include lags of the GDP growth rate, current and lagged values of macroeconomic variables, and all survey and sentiment indicators other than S d−h .In the empirical exercise, we track the release dates of the macroeconomic variables, allowing us to include in the vector X d−h only the values of the variables that are known to a forecaster on day d − h when the forecast is produced; • α h , η h , and β h are the parameters to be estimated; we include the h index to stress the fact that these coefficients are horizon-specific; is the forecast error for the quarter t release of GDP growth that occurs on day d based on the information available on day d − h.
The application involves data at mixed frequencies since the dependent variable (GDP growth) is available quarterly, while the predictors are at the monthly frequency.We handle the mixed frequency of the variables following the unrestricted mixed-data sampling (U-MIDAS) approach proposed by Foroni et al. (2015).This approach consists of including the monthly variables and their lags as regressors in the forecasting regression.The method is unrestricted relative to the proposal by Ghysels et al. (2007) and Andreou et al. (2010) where the coefficients are assumed to follow a polynomial form.The dimension of the vector X d−h can be large2 , relative to the sample size.We use the U-MIDAS approach since it can be simply combined with a selection procedure that handles a situation in which the number of parameters is large relative to the sample size, such as lasso.
The parameter of interest in our analysis is η h that measures the effect on GDP growth of a change in the confidence or sentiment indicator.To conduct accurate inference on the parameter of interest η h , we estimate Equation (1) using the double lasso methodology proposed by Belloni et al. (2014).The approach consists of a post-lasso estimation method in which the forecasting model is estimated using only the variables selected in a preliminary lasso selection step.However, the standard post-lasso procedure might lead to inconsistent estimates due to the possibility of eliminating relevant variables in the selection stage.Belloni et al. (2014) propose to make the procedure more robust by running an additional lasso selection step in which the soft indicator S d−h is regressed on X d−h .The union of the variables selected in these two preliminary lasso regressions is then used in the post-lasso step which delivers consistent and asymptotically normal estimates of η h .In addition, in our in-sample analysis we test the significance of η h across multiple horizons h which might lead to inaccurate inference.We thus correct the p-values for the multiple testing problem using the approach proposed by Benjamini et al. (2006).

Data
Our data set consists of approximately 27 million articles published between January 1 st 1995 and September 30 th 2020 for the major economic and general purpose newspapers in Germany, France, Italy, Spain and the United Kingdom 3 .We obtain the articles from the Dow Jones Data, News and Analytics (DNA) commercial platform 4 .The information provided with each article consists of title, snippet, body, date of publication, author(s), and topic categories.We collect articles for all DNA categories except for sport-related news.
Since the semantic rules in our FiGAS algorithm are designed for the English language, we translate all articles from German, French, Italian and Spanish to English using the neural machine translation service provided by the European Commission (EC) 5 .
In addition to the economic sentiment produced by the FiGAS algorithm, we consider We refer to the Appendix for an exploratory evaluation of the translation quality.
survey expectations as an alternative source of real-time information about the state of the economy.We collect data for the BCS provided by the EC regarding six composite indicators, namely the Construction Confidence Index (CCI), the Industry Confidence Index (ICI), the Retail Sales Confidence Index (RCI), the Consumers Confidence Index (CSMCI), the Services Confidence Index (SCI), and the Economic Sentiment Index (ESI).The Survey starts in 1985 and the monthly indicators are released after the 20 th day of each month.
Finally, we obtain macroeconomic variables from the ALFRED database of the Federal Reserve Bank of Saint Louis6 .In our analysis, we forecast the real GDP growth rate for the 5 European countries mentioned earlier and consider as predictors the industrial production index, the consumer price index and the unemployment rate at monthly frequencies.We transform the first two indexes to percentage growth rates, while we take the first difference of the unemployment rate.The database provides historical vintages and release dates starting in January 2011 for the monthly variables and, as of January 2016, for the GDP.In the empirical analysis, we forecast the flash estimate that is typically released 45 days after the end of the quarter (Eurostat, 2016).We consider forecast horizons starting from 15 days before the official release up to approximately a year at intervals of 15 days for both the in-sample and out-of-sample exercise.

Descriptive statistics
Figure 1 shows the sentiment indicators of economic activity together with the recession periods as defined by the Centre for Economic Policy Research (CEPR) and Euro Area Business Cycle Network (EABCN) dating committee 7 .We aggregate the sentiment indicators from the daily to the monthly frequency by averaging the values within the month, and standardize the measures to have mean zero and variance one to make them comparable across countries and measures.The Economy and Unemployment sentiments represent, for most countries, the more pro-cyclical indicators turning from positive to negative approximately at the beginning and at the end of the recessions.We also notice that the decline in sentiment during the 2008-09 economic slowdown is followed by a slow recovery and a further deterioration coinciding with the recession of 2011-2013.Germany seems to diverge from this pattern as there was a significant slump in sentiment about Unemployment during the first recessionary period, but only a minor decline in the Economy measure.In addition, Interestingly, we do not observe large swings in sentiment during the double-dip recession, probably due to the fact that negative news were mostly concentrated on the financial sector and the labor market.The Inflation measure appears to be counter-cyclical, becoming negative during the expansion years of the early to mid-2000s, while being positive after 2013 when inflation was moderated by a recovering economy.
The behavior of the sentiment about Monetary Policy reflects the decisions of the ECB and the BOE.In particular, the ECB policy rate peaked between October 2000 and April 2001 to 3.75%, followed by a steady decline toward 1% that lasted until December 2005.
After that, the ECB increased rates at steps of 0.25% until reaching another peak of 3.25% between July and October 2008.The overall level of sentiment in Spain, France, and Italy shows an increasing pattern which appears to follow rate increases, while declining as rates lowered.In addition, the peak in sentiment that is observed between the 2008-09 and 2011-13 recessions in these countries can also be related to the ECB policy decision to increase rates starting in April 2011, a decision that was ultimately reversed in November 2011.The Monetary Policy sentiment for the UK seems to have a break in mean and volatility in 2008 when its level and variation declined significantly.A possible explanation for this change of behavior is the shift in monetary policy by the BOE at the onset of the Great Recession.
In the years between 1995 and 2008 the policy rate fluctuated between 3.5% and 7.5% and the sentiment cycles follow quite closely the dynamics of the rate 8 .The BOE response to the Great Recession was to lower the rate between September 2008 and March 2009 from 5% to 0.5%, where it stayed for several years.The Monetary Policy sentiment experienced a decline as well and much lower variability, probably due to the fact that the rate was fixed at 0.5% in that period.
The dependence of the scaled sentiment measures on the business cycle is also apparent in Figure 2 which shows the kernel density estimate of the indicators during expansionary and recessionary periods.In addition to the pro-cyclical nature of the Economy and Unemployment sentiments that is also evident in these graphs, another interesting insight is the larger variability of the Inflation indicator during recessions, relative to expansions in Germany and the UK.A possible explanation is that, during recessions, there is more uncertainty about the impact on inflation of stimulative fiscal and monetary policies and, in general, on the state of the economy.Another indicator that experienced a significant fall during recessions in most countries was the Financial Sector.This is not surprising as a result of the double-dip recession developed as a consequence of the significant deterioration 8 The fact that the sentiment for Monetary Policy co-varies positively with the level of the interest rate is due to the fact that the keyword interest rate has a positive tone which, associated with words like increase or raise, provides an overall positive sentiment.More specifically, the word rate has neutral sentiment while interest is positive in the dictionary.Our algorithm propagates the sentiment of interest to rate so that the combination interest rate has a positive sentiment.Similarly, the words increase and raise carry positive sentiment and when they refer to interest rate reinforce its positive sentiment.The positivity of interest rate is debatable, in particular in a macroeconomic and monetary policy context.However, an increase in interest rate represents a price that benefits lenders and depositors while hurting borrowers.This example demonstrate the complexity of assigning sentiment to words and sentences due to the aspect-specific meaning of many words.On the other hand, since we are using these indicators in a regression context, the negative macroeconomic effect of an increase in rates (associated with positive sentiment) can be accommodated by the negative coefficients. of financial conditions in many European economies.
The time series behavior of the indicators in Figure 1 seems to suggest a significant degree of co-movement in sentiment across European countries.To evaluate this commonality,  ESI index is significant at 5% for France and Italy at horizons of one quarter or shorter.
Regarding the news-based sentiment measures, we find for most countries a clearly defined pattern in which only a few sentiment measures provide incremental predictive power relative to the macroeconomic variables and the confidence indicators.In particular, we notice that Unemployment is strongly significant to forecast GDP at the one-quarter ahead and longer horizons in Germany and Spain.The tone embedded in the news regarding Monetary Policy seems particularly relevant in Spain and Italy at similar forecast horizons, while the Financial Sector indicator seems especially useful to forecast GDP in France and the UK, although for Germany it is most relevant at nowcasting horizons.Interestingly, the Manufacturing sentiment is not selected as a predictor for GDP growth in any of the countries.Finally, the sentiment about the Economy appears to provide predictive power at short horizons, such as in the case of Spain and Italy.
Overall, the evidence suggests that both surveys and news-based sentiment indicators are useful to predict GDP growth.While the survey measures are more focused on capturing the expectations of consumers and businesses, the sentiment indicators aggregate the more general tone of the economic discussion regarding the state of the economy.An advantage of using news is that they allow estimating sentiment on a broader set of topics of interest.
The inclusion of news-based sentiment measures about the Financial Sector and Monetary Policy, which are not included in the EC survey, seems to be a relevant factor in predicting GDP growth in Spain, France and the UK.The analysis also shows that sentiment indicators that capture different aspects of economic activity might be useful to accommodate the local attitudes in each country.For example, the German public might be more sensitive to news that discuss the current and future outlook of inflation, which thus conveys sentiment that is more informative to forecast GDP.On the other hand, news articles reporting on the health of the banking and financial sectors might draw more attention given their relevance for the UK economy, and thus carrying higher predictive content for the GDP.

Out-of-sample analysis
The aim of this section is to evaluate the out-of-sample performance of the sentiment indicators to forecast the real GDP growth of the five European countries we are considering.In particular, we would like to evaluate the incremental predictive content of the news-based sentiment measures relative to the information already provided by the macroeconomic indicators and the confidence indexes of the EC.We perform this by including in the vector X d−h in Equation ( 1) the lags of the dependent variable, and the current and lagged values of the macroeconomic variables and of the survey indicators.Our benchmark forecasting model, which we call ARX, consists of the post-lasso model that includes only the variables selected following the double lasso procedure discussed in Section 2. Instead, the ARXS model augments the baseline specification with a news-based sentiment measure and produces forecasts that are evaluated against the ARX benchmark forecasts.Should we find higher predictive accuracy of the ARXS forecasts relative to the ARX forecasts, it would indicate that the sentiment measures carry genuine and relevant information that is incremental to that provided by the macroeconomic and survey indicators.
The forecast period ranges from the first quarter of 2007 to the fourth quarter of 2019, for a total of 13 years of quarterly forecasts.The choice of starting the out-of-sample period in the first quarter of 2007 is motivated by the intention to include in the evaluation the doubledip recession that occurred in Europe between 2008 and 2013.We use the same forecast horizons h employed for the in-sample analysis that ranges from 15 to 495 days before the release date, at intervals of 15 days.The forecasting models are estimated using an expanding window that starts in the first quarter of 1997.The size of the samples for both estimation and forecast evaluation are quite short, and this might affect negatively the accuracy of our forecasts.Nevertheless, we believe that it is still useful and informative to perform an outof-sample exercise to measure the relative performance of the competing forecasting models.
Besides, we can always rely on the in-sample results presented in Section 4 if we consider that the out-of-sample period is structurally different from the rest of the sample.In addition to the ARX and ARXS forecasts, we consider the case of a pooled forecast obtained by averaging the six ARXS forecasts, which we refer to as the Average forecast.
First, we evaluate the performance of the sentiment indicators in terms of reduction of the Mean Square Forecast Error (MSFE) of the ARXS forecasts relative to the ARX forecasts at each horizon h. Figure 5 shows that in several countries there is a considerable reduction in MSFE that exceeds 20% at some horizons.Similarly to the in-sample results, we find that the sentiment measures play a larger role in increasing the accuracy of the forecasts at long horizons, rather than in nowcasting.However, the patterns are quite different across countries.For Spain we find that the majority of the sentiment measures contribute to increase forecast accuracy, although it tends to deteriorate approaching the release date, with the exception of the Economy indicator.The largest reductions in MSFE are obtained when considering the Unemployment, Financial Sector and Monetary Policy indicators in quarters t − 2 and t − 3. France has a similar behavior, although the sentiments achieve their maximum gains during quarter t − 1, while deteriorating quickly in nowcasting.In the case of Italy, the results indicate that the Monetary Policy provides significant gains in predictability between quarter t − 4 and t − 2, while Inflation and the Average provide more modest improvements.In both the UK and Germany the only measure that provides a gain larger than 10% is the Financial Sector at some selected horizons.In terms of nowcasting, the most significant result is the Financial Sector indicator in Germany, the Economy in Spain and Italy, and several measures in France and the UK, although with more limited gains.Overall, while high-frequency macroeconomic indicators are often found to be powerful predictors at nowcasting horizons (McCracken et al., 2022), our results provide evidence that economic sentiments are relevant to improve forecast accuracy in particular at forecast horizons of 1 to 4 quarters ahead.
Figure 5 shows that there are instances of significant improvements in forecast accuracy when considering the sentiment measures as predictors of GDP.To evaluate the statistical significance of these improvements, we apply the multi-horizon average Superior Predictive q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0.8 1.0 1.2 rel.t t−1 t−2 t−3 t−4 DE q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0.8 ES q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0.7 0.8 0.9 1.0 1.1 rel.t t−1 t−2 t−3 t−4 FR q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0.8 0.9 IT q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0.8 0.9 Ability (aSPA) test proposed by Quaedvlieg (2021).In order to evaluate separately the performance of the sentiment measures at the nowcasting and forecasting horizons, we perform the test on the set of horizons that are smaller or larger than 165 days, respectively9 .The multi-horizon test provides a more robust performance assessment of the competing models, although it does not provide an insight on the horizons, if any, in which the sentiment-based forecasts outperform the macro/survey-based forecasts.
The results for the multi-horizon test are provided in Table 1, with nowcast and forecast horizons in the top and bottom parts of the table, respectively.The values in the table represent the p-values for the one-sided hypothesis that the ARX macro/survey-based forecasts are outperformed by the ARXS sentiment-based forecasts.The results indicate that the Average provides more accurate nowcasts for all countries except Italy at nowcasting horizons.The performance of the forecasts based on the individual sentiment indicators is mixed.The sentiment about the Financial Sector is significant in nowcasting GDP growth in Spain and the UK.We believe this finding might be driven by the rapid deterioration of financial conditions during the double-dip recession in Europe that is captured in real-time by the sentiment measures, and only reflected with a long delay in the macroeconomic and confidence indicators.Other significant indicators to nowcast GDP are the Economy sentiment for Spain and the UK, Inflation and Unemployment in France, Monetary Policy in Spain, and Manufacturing in the UK.When considering the longer horizons, we find that the out-of-sample results are more consistent with the in-sample evidence discussed in the previous section.In particular, the sentiment about Unemployment is more accurate (relative to the benchmark) for Spain and France, while the Monetary Policy sentiment measure results to be significant for both Spain and Italy.The sentiment about Inflation provides predictive accuracy for Spain and Italy, whereas the Financial Sector sentiment shows to be significant to forecast French GDP relative to the benchmark forecasts.Interestingly, none of the sentiment measures is significant at 10% to forecast British GDP.nowcasting case, we find that averaging the sentiment-based forecasts delivers more accurate forecasts relative to the macro/survey-based forecasts for all considered countries.Overall, the results support the in-sample evidence, showing that sentiment measures based on textual analysis of news articles provide a useful addition to the tools of economic forecasters both at short and long horizons.
The evidence in Barbaglia et al. (2023) for the case of the US indicates that the contribution of news-based sentiment measures to increase the forecast accuracy is episodic, that is, limited to specific periods of time during the sample.To assess this hypothesis in the context investigated here, we perform the fluctuation test proposed by Giacomini and Rossi (2010) which allows to evaluate the variation over time in the relative performance of the forecasts.
Figure 6 shows the fluctuation test at three horizons that correspond, approximately, to the end of the nowcasting quarter, the previous quarter, and four quarters ahead the target.The lines represent the predictive accuracy of the ARXS forecasts, relative to the benchmark, using the six sentiment indicators that have been computed on a rolling window of 20% of the out-of-sample period.We find that the earlier results that the Economy sentiment is useful to forecast in Germany is probably due to its high accuracy during the second recessionary period in 2011-2013.In addition, for Spain, Italy, and France the higher accuracy of the sentiment indicators is concentrated, to a large extent, during the recessionary period.
These results are, to some extent, consistent with the conditional predictive accuracy test discussed in the Online Appendix, even though the latter might have higher power to detect non-smooth breaks relative to the fluctuation test.For the UK, only at the shortest horizons the Economy measure provides significant accuracy relative to the benchmark around 2015.
The Online Appendix reports additional analysis that attest the quality and robustness of the paper's results.First, we test the significance of the forecast gains using the predictive accuracy test by Giacomini and White (2006), as well as, its conditional version by Granziera and Sekhposyan (2019).The proposed sentiment measures provide more accurate forecast q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0.8 1.0 1.2 rel.t t−1 t−2 t−3 t−4 DE q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0.8 1.0 1.2 rel.t t−1 t−2 t−3 t−4 ES q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0.7 0.8 0.9 1.0 1.1 rel.t t−1 t−2 t−3 t−4 FR q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0.8 0.9 IT q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 0.8 0.9 Interestingly, Table 3 shows that the pattern of significance when forecasting the IPI growth rate is similar to the case of the unemployment rate.In particular, we find that at short horizons the sentiment about Monetary Policy and Unemployment contribute to produce more accurate forecasts in the case of Spain, as well as the Economy indicator for the UK.
However, at the longer horizons there are only a few indicators that are significant to predict the IPI growth rate, in particular Monetary Policy, while the average forecast is significant at all horizons for most countries.
Finally, Table 4 reports the results when forecasting the CPI inflation.In this case, the sentiment measure about Inflation is an important predictor at short horizons for Spain, Italy, and the UK, but not at longer horizons.Sentiment about Unemployment is significant at all horizons to forecast inflation in the UK, while the Financial Sector is a useful predictor of inflation in Italy.Overall, we find that economic news are more relevant in forecasting the monthly variables at short horizons of 1 to 3 months, relative to the case of the GDP where the bulk of the predictability is at intermediate horizons.

Conclusions
The increased availability of large amounts of textual data, such as news articles, offers Our findings provide in-sample and out-of-sample evidence that the proposed news-based sentiment indicators produce useful information in forecasting GDP across five European countries.In addition, the predictability provided by the sentiment is incremental relative to the macroeconomic and survey confidence indicators that are available to forecasters.We thus conclude that there is a case for considering sentiment measures calculated from news as an additional tool that economic forecasters should adopt in the quest to produce accurate predictions of economic activity.In future work we plan to extend our methodology to perform better in such scenarios.Since our approach is keyword-based, a possible direction could be to extend the set of terms that we consider in constructing the sentiment indicator.
On the modelling side, we could make the parameters of the linear forecasting model a function of the number of articles or sentences that discuss the topic in any given day.This would introduce a much needed nonlinearity in the model that could contribute to explain such extreme events.
the sentiment in German news about Unemployment is mostly negative in the early part of the sample and turns positive only in 2007, when the unemployment rate declined below 9% for the first time since 1993.The sentiment about the Financial Sector is predominantly positive in the early part of the sample for all countries, but sharply declines during the 2008-2009 recession in Germany and the UK, and during the second recession in Spain, France, and Italy.The different behavior of the measure can be explained by news about the large exposure of German and British banks to the financial crisis in the United States, as opposed to banks in Spain, France, and Italy that were affected by exposure to the Greek debt crisis starting in 2011.Sentiment about Manufacturing is mostly negative during the 1990s and early 2000s in Spain, France, and the UK, while it is generally positive after 2013.

Figure 1 :
Figure 1: Time series of the standardized news-based sentiment indicators for Germany (DE), Spain (ES), France (FR), Italy (IT), and the United Kingdom (UK).The sentiment is averaged within the quarter and sampled at the quarterly frequency.The shaded areas represent the recessions established by the CEPR-EABCN business cycle dating committee.

Figure 3 Figure 2 :
Figure 3 provides the estimates of the correlation between the sentiment indicators across countries calculated at the quarterly frequency.The correlation across indicators and countries ranges between -0.08 (Unemployment in Germany and Italy) and 0.68 (Inflation in Italy and Spain).Correlation in the sentiment for Manufacturing is typically lower among all pairs of countries, as opposed to correlation regarding Financial Sector, where sentiment is more correlated.Interestingly, there are large co-movements in sentiment among the euro area countries about Monetary Policy and Inflation driven by the common flow of news regarding monetary events.Overall, Germany and the United Kingdom seem to be less correlated on the Economy and Manufacturing indicators while France, Italy and Spain constitute a more correlated block.In the Online Appendix we also investigate the Granger-causality patterns between survey's confidence and news sentiments.The evidence suggests the existence of bidirectional relations, with news having significant additional explanatory power with respect to traditional indicators.

Figure 4 :
Figure 4: Statistical significance of the survey and sentiment measures as predictors of GDP growth by country with p-values corrected for multiple testing across horizons.The grey area represents the quarter being forecast and release indicates the release date.The x-axis reports the horizon h, which ranges from 15 days before the release date to approximately 4 quarters ahead at intervals of 15 days.The color of the tile represents the p-value of the coefficient of the survey or sentiment indicators η h in Equation (1): the darker the tile, the smaller the p-value.

Figure 5 :
Figure 5: Ratio of the MSFE for the ARXS specification relative to the ARX benchmark across horizons.The grey area represents the quarter being forecast and rel.indicates the release date.The x-axis reports the horizon h, which ranges from 15 days before the release date to approximately 4 quarters ahead at intervals of 15 days.

Figure 6 :
Figure 6: Fluctuation test for the ARXS and AVERAGE forecasts at four representative horizons.The size of the rolling window is 20% of the out-of-sample period.The test statistic is centered at the midpoint of the rolling window and the grey areas represent the CEPR-EABCN recession periods.The dashed line represents the one-sided critical value at 10%.
the opportunity to construct text-based sentiment indexes that can potentially complement official macroeconomic and survey-based indicators.In addition, the Covid-19 pandemic has amplified the need of having timely and reliable economic indicators about different aspects of the current state of economic activity.The goal of this paper is to offer an evaluation of the usefulness of sentiment indicators constructed on news articles.We consider a big data set of 27 million articles for the most important newspapers in five European countries, and propose a set of fine-grained, aspect-based sentiment indicators of economic activity.Realtime sentiment measures that complement macroeconomic and confidence indicators could be extremely useful for European countries given the considerable delay in the publication of official statistics.

Table 1 :
Multi-horizon test for equal predictive accuracy when considering exclusively the nowcast horizons (top part), and only the forecast horizons (bottom part).The benchmark model is the ARX and the alternative is the ARXS specification that augments the ARX with a news-based sentiment measure.The Average forecast is obtained from averaging the ARXS forecasts.The value in the table represents the p-value for the one-sided hypothesis that the forecasting models in the column outperform the ARX forecasts: p-values indicating significance at 5% are denoted in bold and at 10% in italics.

Table 2 :
The dependent variable is Unemployment Rate.Multi-horizon test for equal predictive accuracy when considering exclusively the short-run horizons (top part), and only the long-run horizons (bottom part).The benchmark model is the ARX and the alternative is the ARXS specification that augments the ARX with a news-based sentiment measure.The Average forecast is obtained from averaging the ARXS forecasts.The value in the table represents the p-value for the one-sided hypothesis that the forecasting models in the column outperform the ARX forecasts: p-values indicating significance at 5% are denoted in bold and at 10% in italics.

Table 3 :
The dependent variable is Industrial Production Index.Multi-horizon test for equal predictive accuracy when considering exclusively the short-run horizons (top part), and only the long-run horizons (bottom part).The benchmark model is the ARX and the alternative is the ARXS specification that augments the ARX with a news-based sentiment measure.The Average forecast is obtained from averaging the ARXS forecasts.The value in the table represents the p-value for the one-sided hypothesis that the forecasting models in the column outperform the ARX forecasts: p-values indicating significance at 5% are denoted in bold and at 10% in italics.

Table 4 :
The dependent variable is Consumer Price Index.Multi-horizon test for equal predictive accuracy when considering exclusively the short-run horizons (top part), and only the long-run horizons (bottom part).The benchmark model is the ARX and the alternative is the ARXS specification that augments the ARX with a news-based sentiment measure.The Average forecast is obtained from averaging the ARXS forecasts.The value in the table represents the p-value for the one-sided hypothesis that the forecasting models in the column outperform the ARX forecasts: p-values indicating significance at 5% are denoted in bold and at 10% in italics.