(i) Motor Vehicles and Parts
As an initial example we use the ‘Motor Vehicles and Parts Dealers’ series from the US Census Bureau ‘Advance Monthly Sales for Retail and Food Services’ report.^{2}
This index summarises results from a survey sent to motor vehicle and parts dealers that asks about current sales. The preliminary index is released two weeks after the end of each month. The data is available in both seasonally adjusted and unadjusted form; here we use the unadjusted data.
Let y_{t} be the log of the observation at time t. We first estimate a simple baseline seasonal AR1 model for the period 20040101 to 20110701.
 Estimate  Standard Error  t value  Pr(>t) 


(Intercept)  0.67266  0.76355  0.881  0.381117 
lag(y, −1)  0.64345  0.07332  8.776  3.59e−13*** 
lag(y, −12)  0.29565  0.07282  4.060  0.000118*** 
Google Trends contains several automotiverelated categories. A little experimentation shows that two of these categories, Trucks & SUVs and Automotive Insurance significantly improve insample fit when added to this regression.
 Estimate  Standard Error  t value  Pr(>t) 


(Intercept)  −0.45798  0.78438  −0.584  0.561081 
lag(y, −1)  0.61947  0.06318  9.805  5.09e−15*** 
lag(y, −12)  0.42865  0.06535  6.559  6.45e−09*** 
suvs  1.05721  0.16686  6.336  1.66e−08*** 
insurance  −0.52966  0.15206  −3.483  0.000835*** 
However, the perils of insample forecasting are wellknown. The question of interest is whether the Trends variables improve outofsample forecasting.
To check this, we use a rolling window forecast where we estimate the model using the data for periods k through t − 1 and then forecast y_{t} using y_{t−1}, y_{t−12}, and the contemporaneous values of the Trends variables as predictors. Since the series is actually released two weeks after the end of each month, this gives us a meaningful forecasting lead. The value of k is chosen so that there are a reasonable number of observations for the first regression in the sequence. In this case we chose k = 17, which implied the forecasts start on 20050601.
The results are shown in Figure 2. The mean absolute error of log(y_{t}) using the baseline seasonal AR1 model is 6.34 per cent while the MAE using the Trends data is 5.66 per cent, an improvement of 10.5 per cent. If we look at the MAE during the recession (December 2007 through June 2009) we find that the MAE without Trends data is 8.86 per cent and with Trends data is 6.96 per cent, an improvement of 21.5 per cent.
(ii) Initial Claims for Unemployment Benefits
Each Thursday morning the US Department of Labor releases a report describing the number of people who filed for unemployment benefits in the previous week.^{3}
Initial claims have a good record as a leading indicator. Macroeconomist Robert Gordon indicates that there is a ‘surprisingly tight historical relationship in past US recessions between the cyclical peak in new claims for unemployment insurance (measured as a fourweek moving average) and the subsequent National Bureau of Economic Research (NBER) trough.’^{4} Furthermore, a cursory inspection of the relationship between initial claims and the unemployment rate indicates that initial claims tend to peak a few months before the unemployment rate peaks.
When someone becomes unemployed it is natural to expect that they will issue searches such as [file for unemployment], [unemployment office], [unemployment benefits], [unemployment claim], [jobs], [resume] and so on. Google Trends classifies search queries like these into two categories, Local/Jobs and Society/Social Services/Welfare & Unemployment.
In this example we work with the seasonally adjusted initial claims data, since that is the number used by most economic forecasters. Since our dependent variable is seasonally adjusted, it makes sense to seasonally adjust the independent variables as well, so we used the stl command in R to remove the seasonal component of the Trends data.
In this case, our baseline regression is a simple AR1 model on the log of initial claims.
Start = 20040117, End = 20110702 


 Estimate  Standard Error  t value  Pr(>t) 
(Intercept)  0.25488  0.12951  1.968  0.0498* 
L(y, 1)  0.98022  0.01007  97.368  <2e−16*** 
Note that the coefficient on the lagged term is almost one, suggesting that the process for initial claims is very close to a random walk (with drift).
As Nelson and Plosser (1982) and many subsequent authors have pointed out, it is very common for macroeconomic data to be represented as a random walk. For a random walk, the best univariate forecast for y_{t} is simply y_{t−1}. However, perhaps we can improve on this baseline forecast by using additional predictors from Google Trends.
Using the Google Trends categories Jobs and Welfare...Unemployment we find that these are marginally significant but have little impact on insample fit.
 Estimate  Standard Error  t value  Pr(>t) 


(Intercept)  1.0563440  0.2686360  3.932  9.98e−05*** 
L(y, 1)  0.9183560  0.0208778  43.987  <2e−16*** 
Jobs  0.0007069  0.0003847  1.838  0.0669 
Welfare...Unemployment  0.0003752  0.0001838  2.042  0.0418* 
When we look at onestepahead outofsample forecasts we find that the MAE goes from 3.37 per cent using the baseline forecast to 3.68 per cent using the Trends data, which is a 5.95 per cent reduction in fit. However, when we look at the series a bit more closely a rather different picture emerges.
It is wellknown that it is difficult to identify ‘turning points’ in economic series. A smoothly increasing or decreasing trend is easy to fit with a simple linear AR model. Turning points in time series are much harder to forecast.
If we look just at the recession period (December 2007 through June 2009) we find that using Trends data reduces the MAE from 3.98 per cent to 3.44 per cent, an improvement of 13.6 per cent. Looking more closely at the series, we see that there are four notable turning points indicated by the shaded areas in Figure 3. The MAE for the period surrounding these turning points are reported in Table 1. Note that there is a reduction in MAE at all turning points, with particularly pronounced reductions in the first two. In this case, the Google Trends data seems to help in identifying at least two of the turning points in the series.
Table 1. Behavior of MAE around Turning PointsStart  End  MAE base  MAE trends  1ratio 

20090301  20090501  0.0306  0.02398  21.85% 
20091201  20100201  0.0356  0.03127  12.36% 
20100715  20100715  0.0513  0.05101  0.65% 
20110101  20110501  0.0252  0.02446  3.22% 
Figure 4 plots the difference in MAE for the Base and Trends model. A positive value indicates that the Trends forecast had a smaller error. Here it is clear that the Trends model fits better during the recession (December 2007 through June 2009), while the Base fits better immediately after.
(iii) Travel
The Internet is commonly used for travel planning which suggests that Google Trends data about destinations may be useful in predicting visits to that destination. We illustrate this using data from the Hong Kong Tourism Board.^{5}
The Hong Kong Tourism Board publishes monthly visitor arrival statistics, including ‘Monthly visitor arrival summary’ by country/territory of residence. For this study we use visitor data from US, Canada, Great Britain, Germany, France, Italy, Australia, Japan and India.
‘Hong Kong’ is also one of the subcategories under Vacation Destinations in Google Trends. We can examine the query index for this category by country of origin.
The Hong Kong visitor arrival data is not seasonally adjusted, nor is the Google Trends data. We used the average query index in the first two weekly observations of the month to predict the total monthly visitors. Since the data is released with a onemonth lag, this gives us roughly a sixweek lead in terms of forecasting
We let y_{t} be the visitors from a given country in month t, and x_{t} be the average Google Trends index for Vacation Destinations/Hong Kong for the first two weeks of that month. We can specify a basic seasonal AR1 model of the form .
We estimate this model for each country and compare the actual to the fitted results in Figure 5. Unlike the previous examples, we have here used insample fits. As can be seen, the insample fits are pretty good, with the exception of Japan. Excluding Japan, the average R^{2} is 73.3 per cent. In Choi and Varian (2009a) we use a more elaborate random effects model with some additional predictors and find a somewhat better insample fit.