SUMMARY
 Top of page
 SUMMARY
 INTRODUCTION
 LITERATURE REVIEW
 STRUCTURAL BLACK MARKET MODELS
 THE BLACK MARKET DATASET
 INSAMPLE RESULTS FOR THE LINEAR STRUCTURAL MODELS
 FORECASTING MODELS, THE BLACK MARKET TRADING STRATEGY, AND MEASUREMENT OF RESULTS
 RESULTS ON ‘EX POST’ AND ‘EX ANTE’ OUTOFSAMPLE FORECASTING EXERCISES
 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 Supporting Information
Although speculative activity is central to black markets for currency, the outofsample performance of structural models in those settings is unknown. We substantially update the literature on empirical determinants of black market rates and evaluate the outofsample performance of linear models and nonparametric Bayesian treed Gaussian process (BTGP) models against the random walk benchmark. Fundamentalsbased models outperform the benchmark in outofsample prediction accuracy and trading rule profitability measures given future values of fundamentals. In simulated realtime trading exercises, however, the BTGP achieves superior realized profitability, accuracy and market timing, while linear models do no better than a random walk. Copyright © 2013 John Wiley & Sons, Ltd.
INTRODUCTION
 Top of page
 SUMMARY
 INTRODUCTION
 LITERATURE REVIEW
 STRUCTURAL BLACK MARKET MODELS
 THE BLACK MARKET DATASET
 INSAMPLE RESULTS FOR THE LINEAR STRUCTURAL MODELS
 FORECASTING MODELS, THE BLACK MARKET TRADING STRATEGY, AND MEASUREMENT OF RESULTS
 RESULTS ON ‘EX POST’ AND ‘EX ANTE’ OUTOFSAMPLE FORECASTING EXERCISES
 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 Supporting Information
In a seminal paper, Meese and Rogoff (1983) demonstrated that empirical models, in particular linear regression models and vector autoregressions (VARs), designed from the most important structural exchange rate models of the 1970s were inferior to a naïve random walk in outofsample forecasting performance, even when the realized nextperiod values of independent variables were taken as given. A great deal of literature, which we review selectively below, has repeatedly verified the basic conclusions of the Meese and Rogoff (1983) study, in particular for major crosscurrency exchange rate pairs.
The same year in which Meese and Rogoff published their study, another seminal paper by Dornbusch et al. (1983) on ‘The black market for dollars in Brazil’ laid out the basic stock–flow model that has underlain most subsequent empirical work to date on the behavior of exchange rates in black markets for currency. However, a comprehensive evaluation of the outofsample performance of the black market structural models subsequently built upon the Dornbusch framework—the equivalent of the Meese and Rogoff exercise for black markets—has been entirely lacking from the literature. The absence of such a study is particularly puzzling in light of the widespread notion that speculative activity is a key driver of black market activity, and the fact that black markets, which possess certain special features that set them apart from regular currency markets, might be expected a priori to offer better opportunities for intertemporal arbitrage to speculators with superior predictive abilities.
To fill this gap, the present paper exploits a monthly database of 34 black market episodes during the past 50 years to test the outofsample fit of structural black market models built from the variables that have proven most useful according to past studies of empirical regularities in black markets for currency. In fact, we go one step further: in addition to testing the outofsample fit of structural black market models when the nextperiod value of the structural variables is known, as in Meese and Rogoff (1983), we evaluate the success of trading rule profitability, predictive accuracy and other measures when the nextperiod vector of fundamentals is unknown. The latter exercise provides a much more realistic evaluation of the ability (or lack thereof) of currency speculators to generate profits based on realtime trading. Against the standard benchmarks of the random walk with and without drift, we evaluate the performance of parametric linear regression models and a nonparametric Bayesian treed Gaussian process model due to Gramacy and Lee (2008), which makes no assumptions about the stationarity of the conditional relationship between exchange rates and fundamentals, or the stationarity of the exchange rate or fundamentals themselves. The adaptation of the Bayesian treed Gaussian process (BTGP) model to exchange rate prediction, the first of its kind in the literature, is reasonable in light of previous findings, reviewed below, that indicate a role for nonlinear and nonstationary model features in improving exchange rate forecasting performance.
To preview the main findings of the paper, we find that insample the implied monthly return to the ‘reverse carry trade’ strategy of borrowing in (high interest) local currency and earning the (low interest) US interest rate in anticipation of large depreciations is by far the most robustly statistically significant determinant of the black market exchange rate across all 34 episodes studied.1 The average coefficient across episodes for this variable, of 0.45, in our linear model in first differences implies that for a 1% increase in the local currency returns to the reverse carry trade strategy the monthly depreciation rate of the black market currency relative to the US dollar increases by nearly half a percent, thus making this determinant economically very significant as well. The second most statistically significant determinant of the black market exchange rate across episodes is the official exchange rate, which is not surprising in light of the fact, emphasized by previous literature, that official devaluations typically have major impacts on black market exchange rates. Across episodes, a 100% official devaluation translates on average into a 59% contemporaneous depreciation of the black market exchange rate. Other variables emphasized in previous empirical studies of black market exchange rates, such as the logarithm of the ratio of M2 to the official exchange rate,2 the logarithm of the real official exchange rate (the official exchange rate multiplied by the ratio of US and domestic price indices), the growth rate of international reserves, and countryspecific commodity price indices,3 are less robustly important across country episodes, although there are specific country episodes for which these variables are variously highly statistically significant at conventional levels.
Outofsample, all of our structural models significantly outperform the random walk and random walk with drift on measures of realized trading rule profitability in ‘ex post’ forecasting exercises in which the nextperiod vector of fundamentals is given. Consistent with the findings of Meese and Rogoff (1983), however, we find that the random walk achieves significantly lower root mean squared errors (RMSEs) and mean absolute errors (MAEs) than the structural models in this setting. The apparent conflict between the outperformance of structural models on measures of trading rule profitability and underperformance with respect to RMSEs is explained by the fact that the structural models achieve significantly better directional prediction accuracy, particularly in anticipation of large return events, despite the fact that their point predictions are further off from the true values than those of the random walk model, on average. Our findings lend strong support to the claim that comparison of competing models based solely on their average errors and RMSEs may lead to very misleading conclusions with respect to their usefulness for the implementation of profitable speculative trading strategies. In ‘ex ante’ outofsample forecasting exercises, in which the nextperiod values of fundamentals is unknown, we find that while parametric linear models perform no better than a random walk, the BTGP model is capable of achieving economically and statistically significant trading profits—of the order of 17% annual percentage rate (APR) per month on average across episodes. This suggests that the BTGP model may prove a useful forecasting engine in other financial market contexts as well: in particular, foreign exchange.
The rest of this paper is organized as follows. Section 2 briefly summarizes the literature on empirical exchange rate models, with a focus on contributions stemming from the seminal Meese and Rogoff (1983) study. Section 3 lays out the basic empirical structural black market models, whose explanatory variables are inspired by contributions stemming from the Dornbusch et al. (1983) framework. Section 4 describes our black market dataset. Section 5 reports insample results for the linear models, and substantially updates the literature on the empirical determinants of black market exchange rates by extending and comparing previous results in a common framework. Section 6 describes the forecasting models, the trading strategy used in black markets based on the forecast signals, and the statistics used to measure model forecasting performance. Section 7 reports results for the outofsample fit exercise, which is comparable with the exercises in the Meese and Rogoff tradition, as well as the outofsample forecasting exercise, in which the nextperiod values of fundamentals are unknown. Section 8 concludes.
LITERATURE REVIEW
 Top of page
 SUMMARY
 INTRODUCTION
 LITERATURE REVIEW
 STRUCTURAL BLACK MARKET MODELS
 THE BLACK MARKET DATASET
 INSAMPLE RESULTS FOR THE LINEAR STRUCTURAL MODELS
 FORECASTING MODELS, THE BLACK MARKET TRADING STRATEGY, AND MEASUREMENT OF RESULTS
 RESULTS ON ‘EX POST’ AND ‘EX ANTE’ OUTOFSAMPLE FORECASTING EXERCISES
 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 Supporting Information
In the years directly following the publication of Meese and Rogoff's (1983) paper, their findings were further ratified in work employing linear models with timevarying features, such as Alexander and Thomas (1987) and Wolff (1987, 1988). Subsequently, given findings of nonlinear dependency in exchange rate changes (Baillie and McMahon, 1989; Hsieh, 1989; Hong and Lee, 2003), some authors attempted to exploit such behavior to beat the forecasting accuracy of the random walk. Such attempts included Diebold and Nason (1990) and Meese and Rose (1990), who used nonparametric kernel regressions, and Engel and Hamilton (1990) and Engel (1994), who employed a Markov switching model. None of these strategies proved useful in beating the random walk in outofsample forecasting performance.
More recently, Preminger and Franck (2007) report that work done on exchange rate forecasting using neural networks provides some evidence that they are better than other nonlinear models in terms of outofsample forecasting ability. The evidence on that score, however, is also mixed. While Kuan and Tung (1995); Brooks (1997) and Gençay (1999) all report that neural networks can beat random walks for daily exchange rate data, for example by achieving lower root mean squared errors, a study by Qi and Wu (2003), who employ a neural network with monetary fundamentals at onemonth, sixmonth and 12month horizons, finds that their model cannot beat the random walk model at those horizons. As Rogoff (2001) himself has noted, in a survey of the literature stemming from his paper with Meese, the inability of structural models to explain the movements of G3 exchange rates remains a robust stylized fact.
During the past decade, however, a handful of studies have managed to deliver more optimistic results with respect to the forecasting ability of structural models in foreign exchange. These studies have obtained their results, for the most part, either by implementing existing models using more sophisticated model selection criteria or by expanding the range of exchange rates studied to include smaller country cross rates with the dollar or another major currency. Papers in the first group include Nag and Mitra (2002); Preminger and Franck (2007) and Sarno and Valente (2009). Nag and Mitra (2002) focus on potential improvements from model nonlinearity, providing evidence that a genetically optimized neural network model can modestly outperform generalized autoregressive conditional heteroskedasticity (GARCH) type models on the three nominal exchange rate pairs examined in the Meese and Rogoff (1983) study. Preminger and Franck (2007) propose a robust regression approach, which is less sensitive to data contaminated by outliers, for improving the outofsample performance of the standard linear autoregressive and neural network models. Upon implementing their method for the pound/dollar and yen/dollar cross rates, they find that robust models tend to improve the forecasting ability of both types of models at all time horizons studied. Sarno and Valente (2009), using realtime data on a broad set of economic fundamentals for five major US dollar exchange rates, find that the difficulty of selecting the best predictive model is mostly due to frequent shifts in the set of fundamentals driving exchange rates, which can be interpreted as reflecting swings in market expectations over time. These authors employ a model selection procedure due to Pesaran and Timmermann (1995) to outperform a random walk for three out of five of the exchange rates studied. However, they also find that if conventional model selection criteria are used to choose the best model ex ante, the same set of economic fundamentals is not useful at all in forecasting exchange rates outofsample.
Papers in the second group, which expand the focus to smaller cross rates, include Liu et al. (1994), Yang et al. (2008) and Cerra and Saxena (2010). Liu et al. (1994) provide evidence showing that a monetary/asset model in a VAR representation does have forecasting value for some exchange rates. Yang et al. (2008) examine the potential martingale behavior of euro exchange rates in the context of outofsample forecasts, with special attention paid to potential nonlinearityinmean. Their findings indicate that while martingale behavior cannot be rejected for euro exchange rates with the yen, pound and dollar, there is indeed nonlinear predictability in terms of economic criteria with respect to several smaller currencies. Most recently, Cerra and Saxena (2010) revisit the dynamic failure of the monetary models in explaining exchange rate movements by using information content from 98 countries to test for cointegration between nominal exchange rates and monetary fundamentals. They find robust evidence for cointegrating relationships, and that fundamentalsbased models are very successful at beating a random walk in outofsample exchange rate prediction. The Cerra and Saxena paper, however, uses annual data on nominal exchange rates, many of which were pegged during the period of study, visàvis the dollar.
While the above literature pertains to nonblack market exchange rates, we believe that the main lessons of the papers discussed above that manage to achieve improved exchange rate forecasting performance can be capitalized on in the black market context as well. In particular, besides linear models, we employ a BTGP model due to Gramacy and Lee (2008) to forecast onemonthahead exchange rates. The BTGP is a nonparametric model that can be viewed as a tree whose leaves represent random functions, with the functions modeled using Gaussian processes. The flexibility afforded by being able to classify observations into different ‘leaves’ of the BTGP allows for the modeling of nonstationary data, in which the function relating the black market rate and structural explanatory variables in a given month may represent one of several Gaussian processes indexed by the leaves of the tree. This is consistent with earlier findings that neural network models, and methods that explicitly allow for nonstationarity in the relationship between fundamentals and exchange rates, can improve outofsample prediction.
Second, besides the central fact that the outofsample performance of structural models is essentially unknown in the black market context, our focus on black markets is reasonable given the evidence that smaller cross rates may provide a more fertile ground in which structural models may have a fighting chance against the random walk outofsample. Finally, in contrast to most previous studies, our focus is centered not only upon the question of whether or not outofsample forecasting ability for structural models exists, but also on whether or not such an ability confers the capacity to generate speculative profits in simulated realtime trading. For that reason, we go beyond the usual reporting of RMSEs and measures of outofsample fit when nextperiod values of fundamentals are known (à la Meese and Rogoff, 1983). We report directional accuracy measures, the Anatolyev and Gerko (2005) statistic of market timing ability and the crosssectional distribution of realized monthly APRs from simulated trading, and contrast the results of the outofsample fit exercise with those of the outofsample forecasting exercise, in which nextperiod values of fundamentals are unknown, as it would be for currency speculators.
THE BLACK MARKET DATASET
 Top of page
 SUMMARY
 INTRODUCTION
 LITERATURE REVIEW
 STRUCTURAL BLACK MARKET MODELS
 THE BLACK MARKET DATASET
 INSAMPLE RESULTS FOR THE LINEAR STRUCTURAL MODELS
 FORECASTING MODELS, THE BLACK MARKET TRADING STRATEGY, AND MEASUREMENT OF RESULTS
 RESULTS ON ‘EX POST’ AND ‘EX ANTE’ OUTOFSAMPLE FORECASTING EXERCISES
 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 Supporting Information
The structural models we consider in both our insample and outofsample analyses employ the same set of six independent variables, which have proven to be robust determinants of the black market premium in at least one of the past empirical studies. Our vector of fundamentals, , consists of the following variables: (a) the logarithm of the official exchange rate, ē; (b) the differential rate of expected profits (DREP), proposed by Fishelson (1988) in his empirical test of the model of Dornbusch et al. (1983) on 19 black market episodes; (c) the logarithm of the ratio of M2 to the official exchange rate, ; (d) the logarithm of the real official exchange rate (the official exchange rate multiplied by the ratio of US and domestic price indices; roer); (e) the logarithm of the international reserves, r; and (f) the logarithm of the country's commodity price index, p, where the index is defined as the price of the country's primary export commodity, in US dollars, during the black market episode. The DREP variable, which measures the deviation from uncovered interest rate parity, is calculated according to the formula , where i_{t} is the local currency monthly interest rate earned from time t − 1 to time t, d_{t} = E_{t}/E_{t − 1} − 1 is the contemporaneous (time t) monthly rate of depreciation of the local currency relative to the US dollar and is the US monthly interest rate earned from time t − 1 to time t.5 The practice of including the contemporaneous deviation from uncovered interest rate parity (UIP), using either the official devaluation rate or the black market depreciation rate d_{t}, in models of the contemporaneous black market exchange rate (or premium) is standard in the black market literature given the key assumption of the Dornbusch et al. (1983) model that this quantity should directly affect the equilibrium stock of black market dollars held by market participants. It is important to note that such a specification does not involve explaining the contemporaneous black market rate with itself, and that in fact the insample pairwise correlations between the log black market exchange rate e_{t} and the variable DREP_{t} are usually small and positive, and frequently even negative, for most episodes in our sample.6
In the next section, as a preliminary to the outofsample analysis, we report insample results for the linear model (1) in levels and in differences for each of the 34 country episodes in our dataset. Besides updating previous work on empirical regularities in black markets for currency by Fishelson (1988), Culbertson (1989) and Shachmurove (1999), whose studies examined 19, 10, and 17 country episodes, respectively, our insample analysis substantially expands upon these previous multicountry studies both in terms of country coverage and the range of explanatory variables considered.
Our database of 34 black market episodes is constructed by supplementing data on parallel and official exchange rates provided by Reinhart and Rogoff (2004)7 with data for Taiwan sourced from Luintel (2000) and the Taiwanese Central Bank, as well as data from Venezuela sourced from Malone and ter Horst (2010). Monthly data on macroeconomic aggregates were taken from the International Monetary Fund and the US Federal Reserve, with countryspecific commodity price indices sourced from the United Nations Commodity Trade Statistics. The first black market episode in our dataset began in January 1963 in India, several of the most recent episodes ended in July 1998, in the cases of South Africa, Paraguay, Jordan, and Egypt, and the last episode in our dataset belongs to Venezuela, whose policy of capital controls began in February of 2003, with our data on that country extending to July of 2009.
INSAMPLE RESULTS FOR THE LINEAR STRUCTURAL MODELS
 Top of page
 SUMMARY
 INTRODUCTION
 LITERATURE REVIEW
 STRUCTURAL BLACK MARKET MODELS
 THE BLACK MARKET DATASET
 INSAMPLE RESULTS FOR THE LINEAR STRUCTURAL MODELS
 FORECASTING MODELS, THE BLACK MARKET TRADING STRATEGY, AND MEASUREMENT OF RESULTS
 RESULTS ON ‘EX POST’ AND ‘EX ANTE’ OUTOFSAMPLE FORECASTING EXERCISES
 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 Supporting Information
Given the breadth of our coverage of black market episodes in recent history, it is useful to document the performance of equation (1), in levels and in first differences, insample. This will provide an initial assessment of the ability of our benchmark parametric model to explain variations in the black market rate, plus an evaluation of the sign and significance of each determinant in driving changes in black market exchange rates. We include the results of the model in first differences due to the possibility that the black market rate or some subset of the determinants might be integrated of order 1. Tables OA.1a–OA.1c of the online Appendix (supporting information) summarize the results of the insample exercise, performed at monthly frequency, for all of the countries in our sample.
Overall, the six determinants selected from the black market literature perform well in the insample exercises in terms of their ability to explain the variations in the black market exchange rate. The Fstatistics for joint significance are all significant at the 1% level, for the linear model in differences as well as in levels. The lowest value of the R^{2} statistic for any country or model was 44%, for the Costa Rica episode for the linear model in levels, with the majority of models displaying R^{2} statistics above 60%. While the models in levels tended to achieve higher R^{2} values, they also tended to achieve higher RMSEs. The RMSEs for the model in differences range from 1%, in the cases of Costa Rica, Ireland, Jamaica, Malaysia, Paraguay, Philippines, Taiwan, Uruguay and Colombia, to 13% for the case of Uganda, with the majority of RMSEs being in the mid to lowsingle digits. The high joint explanatory power of the six variables included in our linear models insample, and insample RMSEs that are lower, for a large fraction of countries, than the RMSEs achieved in the outofsample exercises of Meese and Rogoff (1983) for the random walk or structural models at the monthly frequency, raises the possibility that our structural black market model might stand a fighting chance against the random walk in the outofsample exercises we present in the following section.
Before proceeding to that analysis, and given the possibility that many of the level regressions above may be spurious due to firstorder integration of some variables, we performed an augmented Dickey–Fuller regression for each variable, in levels and differences, to test for a unit root. The results of those tests are presented in Table 1, and can be summarized as follows. With the exception of the variable DREP, both the black market rate and the other five explanatory variables in levels present strong evidence of firstorder integration. After taking first differences, however, the vast majority of (differenced) variables across episodes appear to be integrated of order zero. Thus, while an error correction model with a cointegrating equation might be a more efficient way of estimating the model in levels for most episodes, we can conclude that the tstatistics, Fstatistics and R^{2} values obtained for the model in differences are valid for the vast majority of countries. For our purposes, this finding is sufficient to justify limiting the competitors to the random walk in outofsample fit and forecasting exercises to the linear model in levels and differences, plus the BTGP model, as the latter is designed in a much more general way to detect and handle nonstationarity in the relationship between the dependent and explanatory variables.
Table 1. Average augmented Dickey–Fuller test statistics and pvalues across episodes for variables in levels and differencesLevels 


 e  ē  DREP   roer  r  p 
Avg. ADF test statistic  −1.528617  −1.128112  −7.836299  −1.684567  −1.766066  −1.858649  −1.393658 
Avg. pvalue  0.5511364  0.6572991  0.0007091  0.505599  0.5103677  0.4721053  0.5671836 
Differences 
 Δe  Δē  ΔDREP   Δroer  Δr  Δp 
Avg. ADF test statistic  −8.11247  −6.480458  −12.67555  −12.14543  −6.820205  −7.613862  −5.954635 
Avg. pvalue  0.0008661  0.0404112  6.50e − 07  0.0020095  0.0067459  0.0029218  0.0111197 
The summary statistics of the point estimates of the linear model parameters, for the model in levels and in differences, are reported in Table 2. Several facts are worth mentioning. First, the null hypothesis β^{ē} = 1, according to a twosided ttest of the average value of this parameter given its variation across episodes for the model in levels, cannot be rejected at the 10% level. This is consistent with the theoretical value of unity for this parameter implied by the structural models of the black market literature. Second, the variable DREP is significant at the 10% level or better in 32 out of 34 episodes for the model in levels, and in all 34 of the episodes for the model in differences, with its average estimated value across countries remaining essentially the same, at around 0.45 (for every 1% increase in the realized deviation from uncovered interest rate parity, the black market rate increases by nearly half a percent), whether the model is estimated in levels or differences. This variable is the single most robust determinant of the black market exchange rate, consistent with the notion that speculative activity plays a key role in such markets. Finally, it should be noted that the incidence of episodes in which the other explanatory variables are significant falls when the linear model is run in differences, consistent with the previous finding that many of the variables in levels are integrated of order 1.
Table 2. Summary statistics of point estimates for model coefficients across episodesLinear model in levels 


 α  β^{ē}  β^{DREP}   β^{roer}  β^{r}  β^{p} 
Mean  0.7447638  1.005669  0.4636094  0.0626091  −0.1067344  −0.0548011  0.0728697 
SD  7.739526  1.273033  0.1585501  0.4293493  0.6990775  0.1691361  1.569532 
Min.  −32.13485  −1.212451  0.0210541  −0.6696389  −1.56892  −0.7680858  −2.758923 
Max.  14.0946  7.09862  0.8783977  1.915627  1.937548  0.1662522  7.231605 
Incidence of significance at 10% level  22  30  32  19  21  22  22 
Linear model in differences 
 α  β^{Δē}  β^{ΔDREP}   β^{Δroer}  β^{Δr}  β^{Δp} 
Mean  0.0064986  0.5899542  0.4509227  −0.0118268  −0.0545028  −0.0260724  0.023007 
SD  0.0156786  0.7522906  0.0916729  0.2237978  0.4269505  0.0829999  1.3197 
Min.  −0.0231931  −0.699896  0.0428525  −0.7456525  −1.318188  −0.4223309  −5.85733 
Max.  0.0580682  3.786434  0.5256265  0.6747291  0.8715442  0.0668854  4.710825 
Incidence of significance at 10% level  6  14  34  6  7  4  2 
RESULTS ON ‘EX POST’ AND ‘EX ANTE’ OUTOFSAMPLE FORECASTING EXERCISES
 Top of page
 SUMMARY
 INTRODUCTION
 LITERATURE REVIEW
 STRUCTURAL BLACK MARKET MODELS
 THE BLACK MARKET DATASET
 INSAMPLE RESULTS FOR THE LINEAR STRUCTURAL MODELS
 FORECASTING MODELS, THE BLACK MARKET TRADING STRATEGY, AND MEASUREMENT OF RESULTS
 RESULTS ON ‘EX POST’ AND ‘EX ANTE’ OUTOFSAMPLE FORECASTING EXERCISES
 CONCLUSION
 ACKNOWLEDGEMENTS
 REFERENCES
 Supporting Information
A couple of points are worth mentioning as a preface to reporting the results. In a few exceptional cases, the BTGP needed additional months in the training data phase to converge in comparison to the linear models. In the case of Jamaica 17 months were needed instead of 12, and in the case of Egypt substantially many more months were needed for the algorithm to converge. We opted to leave Jamaica in the sample, with 43 versus 48 outofsample data points, and to remove Egypt entirely from the sample averages for model 5. Also, in a few exceptional cases, bankruptcy was arrived at by at least one of the models during the outofsample tests. These cases included, for the ‘ex post’ forecasting exercise, Morocco and Myanmar for the random walk and random walk with drift (models 1 and 2, respectively), and Morocco for the BTGP (model 5). For the ‘ex ante’ forecasting exercise, cases of bankruptcy included Morocco and Myanmar for both the random walk and random walk with drift, Myanmar for the linear model in differences (model 4) and Morocco for the BTGP. In cases of bankruptcy, AE, RMSE and MAE statistics may still be computed in the same manner, as can the Anatolyev–Gerko statistic, the directional prediction accuracy percentages and the Theil Ustatistic versus the benchmark. Thus, in the exceptional cases of bankruptcy, we opted to include these countries in the calculations of the aforementioned statistics, and omit them from the calculations of the summary statistics of the monthly APR's by episode, as the monthly APR in cases of bankruptcy is not well defined.
Table 3 contains summary statistics for the ME, RMSE and MAE calculations, both in the case of the ‘ex post’ outofsample fit exercise (panel A) and the ‘ex ante’ outofsample forecasting exercise (panel B). Table 4 displays the summary statistics, across our sample of 34 episodes, for realized cumulative monthly returns, Theil Ustatistics (specifically, instances of significance at the 5% level for either the benchmark or alternative model), the proportion of correct directional forecasts and the Anatolyev–Gerko results (instances of significance at the 5% level along with the sign of the EP statistic), with outofsample fit and forecasting exercises, respectively, reported in panels A and B of the table.
Table 3. Sample error statistics for outofsample fit and forecasting exercises, models (1)–(5)Model:  (1)  (2)  (3)  (4)  (5) 


Number of countries:  34  34  34  34  33 
Panel A. Outofsample fit (ex post values of independent variables known) 
Mean error  Mean  −0.0163  −0.0011  −0.0078  0.0039  −0.0059 
SD  0.0341  0.0081  0.0403  0.0323  0.0166 
Min.  −0.1646  −0.0419  −0.2202  −0.0544  −0.0440 
Max.  0.0060  0.0141  0.0205  0.1343  0.0387 
RMSE  Mean  0.0819  0.0821  0.1258  0.1360  0.2230 
SD  0.0752  0.0736  0.1206  0.1490  0.1776 
Min.  0.0173  0.0173  0.0170  0.0152  0.0265 
Max.  0.2697  0.2806  0.6111  0.7079  0.7289 
MAE  Mean  0.0545  0.0554  0.0757  0.0667  0.1473 
SD  0.0528  0.0507  0.0716  0.0642  0.1094 
Min.  0.0100  0.0105  0.0118  0.0102  0.0168 
Max.  0.2012  0.2003  0.3327  0.2883  0.4473 
Panel B. Outofsample forecasting (ex post values of independent variables unknown) 
Mean error  Mean  −0.0163  −0.0011  −0.0067  −0.0022  −0.0065 
SD  0.0341  0.0081  0.0231  0.0398  0.0215 
Min.  −0.1646  −0.0419  −0.0949  −0.1560  −0.0553 
Max.  0.0060  0.0141  0.0362  0.1236  0.0507 
RMSE  Mean  0.0819  0.0821  0.1872  0.2330  0.2676 
SD  0.0752  0.0736  0.1631  0.2507  0.2458 
Min.  0.0173  0.0173  0.0332  0.0258  0.0232 
Max.  0.2697  0.2806  0.6193  1.1064  1.0827 
MAE  Mean  0.0545  0.0554  0.1111  0.1143  0.1893 
SD  0.0528  0.0507  0.1019  0.1110  0.1581 
Min.  0.0100  0.0105  0.0201  0.0207  0.0138 
Max.  0.2012  0.2003  0.4050  0.4537  0.6402 
Table 4. Sample performance statistics for outofsample fit and forecasting exercises, models (1)–(5)Model:  (1)  (2)  (3)  (4)  (5) 


Number of countries:  34  34  34  34  33 
Panel A. ‘Ex post’ outofsample fit (ex post values of independent variables known) 
Cumulative monthly return (APR)  Mean  −0.055  0.048  0.311  0.357  0.176 
SD  0.327  0.258  0.437  0.446  0.209 
Min.  −1.636  −0.235  −0.006  0.098  0.002 
Max.  0.297  1.316  2.149  2.076  0.985 
Theil Ustatistic results (baseline of random walk)  RW outperforms  —  0  24  14  29 
Insignificant diff.  —  33  7  14  4 
Alt. model outperforms  —  1  3  6  0 
Theil Ustatistic results (baseline of random walk with drift)  RWWD outperforms  1  —  25  15  30 
Insignificant difference  33  —  6  13  3 
Alt. model outperforms  0  —  3  6  0 
Proportion of correct direction forecast  Mean  0.576  0.566  0.667  0.701  0.608 
SD  0.150  0.134  0.100  0.073  0.088 
Min.  0.167  0.342  0.472  0.563  0.477 
Max.  0.889  0.889  0.894  0.851  0.875 
Anatolyev–Gerko test  EP > 0, significant at 5% level  13  2  16  12  11 
EP insignificantly different from zero  1  5  7  9  12 
EP < 0, significant at 5% level  20  27  11  13  10 
Panel B. ‘Ex ante’ outofsample forecasting (ex post values of independent variables unknown) 
Cumulative Monthly return (APR)  Mean  −0.055  0.048  0.003  −0.020  0.170 
SD  0.327  0.258  0.193  0.178  0.249 
Min.  −1.636  −0.235  −0.213  −0.869  −0.045 
Max.  0.297  1.316  0.956  0.241  1.028 
Theil Ustatistic results (baseline of random walk)  RW outperforms  —  0  33  33  29 
Insignificant diff.  —  33  1  1  3 
Alt. model outperforms  —  1  0  0  1 
Theil Ustatistic results (baseline of random walk with drift)  RWWD outperforms  1  —  33  33  29 
Insignificant difference  33  —  1  1  3 
Alt. model outperforms  0  —  0  0  1 
Proportion of correct direction forecast  Mean  0.576  0.566  0.541  0.534  0.588 
SD  0.150  0.134  0.082  0.078  0.085 
Min.  0.167  0.342  0.333  0.333  0.417 
Max.  0.889  0.889  0.750  0.722  0.813 
Anatolyev–Gerko test  EP > 0, significant at 5% level  13  2  2  2  13 
EP insignificantly different from zero  1  5  15  17  11 
EP < 0, significant at 5% level  20  27  17  15  9 
Let us turn first to the results displayed in Table 3. For both the ‘ex post’ (panel A) and ‘ex ante’ forecasting (panel B) exercises, all models appear to achieve average errors close to zero, with the structural models even slightly outperforming the random walk and random walk with drift on that measure. The RMSEs, however, are strictly increasing by model number for both exercises, with the BTGP displaying the highest RMSEs in both panels A and B, and attaining RMSEs that are nearly three times those achieved by the random walk. The MAEs, in both panels A and B, follow essentially the same pattern. Not surprisingly, the RMSEs and MAEs for nearly all models are slightly lower in panel A than in panel B, owing to the extra uncertainty introduced in the ‘ex ante’ exercise where the nextperiod values of the fundamentals are unknown. It is instructive to compare the results displayed in panel A of Table 4 with the RMSEs reported in Table 1 of Meese and Rogoff (1983) for major currency cross rates and structural exchange rate models of the 1970s. Compared to the monthly RMSEs reported by those authors, there are two main differences of note. First, the average RMSE obtained for the random walk is just over 8%, whereas the comparable figure for the yen, mark and pound was just over 3% on average in the Meese and Rogoff (1983) study. This is not surprising, and reflects the higher levels of volatility observed in black markets for currency. Second, the structural models studied by Meese and Rogoff (1983) exhibit monthly RMSEs that are only slightly higher than those of the random walk in that setting, with the RMSE obtained from the VAR model approximately double that of the linear structural models, at around 6–7% on average. In our case, linear structural black market models in levels and differences exhibit RMSEs that are 50–60% higher than the benchmark RMSEs obtained by the random walk (12–13% versus 8%). The RMSE of the BTGP is even higher, at around 22%. These differences, which may be explained by the use of a wider range of fundamentals in our exercise, and higher levels of fundamental volatility, nevertheless are instructive, as they suggest that the underperformance (with respect to RMSEs) of fundamental models in the black market context is even starker than in the setting examined by Meese and Rogoff (1983) and many subsequent authors.
An examination of Table 4, nonetheless, suggests that assessing relative model performance according to the RMSE criterion may be very misleading when it comes to the issue of the potential to generate speculative profits. In particular, although the tabulation of Theil Ustatistics of the structural models versus the benchmarks confirms clear outperformance of the random walk (with and without drift) on the RMSE criterion, a look at the proportion of correct directional guesses tells a quite different story. In the ‘ex post’ outofsample fit exercise (panel A), the linear structural model in differences achieves an average directional accuracy across episodes of 70.1%, versus only 57.6% for the random walk. The BTGP does slightly better than the random walk, with a directional accuracy of 60.8%. When we turn to the realized monthly APRs generated by each model, the dominance of structural models is clear and quite impressive: conditional on perfect foresight of the nextperiod values of fundamentals, the average realized APRs across episodes generated by our simple trading strategy follow the order of the directional accuracy measures exactly, and range from 35.7% for the linear model in differences to −5.5% for the random walk. The BTGP is in the middle of the pack, with an average monthly return of 17.6% APR across episodes. The average monthly APRs for the three structural models are all significantly different from zero in the crosssection of episodes at the 1% level, as each average structural model error is greater than four times the crosssectional standard error for the APR. Neither the random walk nor the random walk with drift, however, produces a mean trading strategy return that is significantly different from zero across episodes.
Turning to panel B of Table 4, we see that in the ‘ex ante’ outofsample forecasting exercise, as in the ‘ex post’ exercise in panel A, the Theil Ustatistic confirms statistically significant outperformance of the random walk when it comes to generating lower RMSEs versus the structural models. Also, the average directional accuracy statistics reveal that the two linear models achieve slightly lower correct guessing percentages than the random walk and random walk with drift: 54.1% and 53.4% for the linear model in levels and changes, respectively, versus 57.6% for the random walk and 56.6% for the random walk with drift. The BTGP achieves the highest directional accuracy of all, at 58.8%. A large difference emerges, in addition, between the average monthly APRs of the structural strategies, with both linear strategies earning returns that are statistically indistinguishable from zero, while the BTGP manages to earn an average monthly APR across episodes, before transactions costs, of 17%. The latter figure is the only return that is statistically different from zero in the crosssection and, at 3.9 standard errors from zero, is significant at the 1% level. Surprisingly, the average realized trading strategy returns for the BTGP in panel B, in which the nextperiod values of fundamentals was unknown, is only slightly lower than the average value across episodes achieved in panel A, when nextperiod values were given. It is clear that, even with substantial transactions costs, for example of the order of 5% APR per month, the realized excess trading strategy returns for the BTGP displayed in panel B would remain statistically and economically very significant.
Finally, recall that the Anatolyev–Gerko EP statistic measures the market timing ability of the strategy being tested against the market timing ability of a benchmark strategy whose directional guessing accuracy is equal to that of the strategy being tested, but which forecasts the direction of the dependent variable according to the flip of a coin biased according to the aforementioned directional probabilities. While the EP (excess profitability) statistic is not a direct measure of profitability per se, since the benchmark ‘naïve’ strategy may or may not be profitable (and transactions costs have not been taken into account), it is a good measure of the ability of the strategy being tested to correctly identify the precise periods in which the dependent variable will rise or fall. In the outofsample fit exercise in panel A of Table 4, the linear model in levels obtains an EP statistic that is positive and significant at the 5% level for a total of 16 countries, the most of any model, followed by the random walk with 13, the linear model in differences with 12, the BTGP with 11, and the random walk with drift with two. The three structural models can clearly be said to have outperformed the random walk models, however, as they generate far fewer instances of negative EP statistics that are significant at the 5% level, versus 20 such cases for the random walk and 27 for the random walk with drift.
In panel B of Table 4, both linear models perform substantially worse on the EP score compared to their performance in the outofsample fit exercise, with both the linear model in levels and in differences obtaining only two instances of positive EP statistics significant at the 5% level, and 17 and 15 instances of negative EP statistics significant at the 5% level, respectively. The BTGP, however, slightly improves its performance in the ‘ex ante’ outofsample forecasting exercise, obtaining 13 instances of positive EP statistics significant at the 5% level, and only nine instances of negative EP statistics significant at the 5% level. On the whole, the results of the EP test suggest that, in outofsample fit exercises where future values of fundamentals are given, linear structural models exhibit somewhat better market timing ability compared to the random walk benchmarks and results largely comparable to those of the BTGP. In the simulated realtime trading exercises most relevant to speculators, however, the BTGP model exhibits a market timing ability superior to that of all other models, although again in this context the most naïve model, the random walk, clearly dominates the linear structural models.
There are perhaps three major lessons to take away from the results just presented. First, RMSE results alone may be very misleading when it comes to evaluating the economic value of model forecasts when it comes to the ability to generate speculative profits. Second, in outofsample fit exercises, linear structural models outperform both the BTGP model and the benchmarks in terms of accuracy and realized profitability. Third, when nextperiod values of fundamentals are unknown, as is the case for realworld speculators, the BTGP model outperforms all other models, and linear structural models are unable to produce profits that are statistically different from zero or from the benchmarks. It can be inferred that the additional uncertainty involved in forecasting nextperiod fundamentals is what separates, to a significant extent, the BTGP model from the linear models, as the more general structure of the BTGP allows it to capture the evolution of the joint distribution of the fundamentals simultaneously with the evolution of the black market exchange rate.
A Note on RealWorld Profitability: Transactions Costs in Black Markets
The average APR of 17% across black market episodes found for the simple monthly trading strategy based on the predictions from the BTGP is highly economically significant, and it bears repeating that this figure does not take into account the bid–ask spread present in the markets studied, which is likely to be substantially greater than bid–ask spreads typically found in large, liquid foreign exchange markets.11 Although very little data exist on the bid–ask spread in black markets, one exception is an early paper by Dornbusch and Pechman (1985), who examine the bid–ask spread in the black market for dollars in Brazil. In terms of monthly averages of daily data extracted from the Journal do Comercio, they report that the bid–ask spread in that market during the period from March 1979 to December 1983 ranged from a low of 1.9% to a high of 8.4%, with a mean of 3.6%. Based on these figures, it is plain to see that our (reverse) carry trade strategy would not be profitable on average for a typical market participant. As an example, the approximately 1.5% average monthly profit we find prior to transactions costs would turn into an approximately 2.2% loss net of transactions costs for a typical carry trade involving borrowing at a 0.5% monthly interest rate in dollars, earning a 2% interest rate in local currency, and assuming a realized monthly depreciation rate of zero, with the bid set at 96.4% of the ask rate quoted by the local dealer, consistent with the average monthly spread reported in the Dornbusch and Pechman (1985) study.12 However, it should also be noted that sophisticated local dealers capable of employing our strategy, in the face of lower transactions costs given their natural position as a market maker, might indeed be able to reap economically significant profits net of the transactions costs they face in such contexts. Also, it is not clear to what extent we can generalize the findings on the bid–ask spread reported in the Dornbusch and Pechman (1985) study to other black market episodes, especially the more recent ones, in our sample. What is certain is that the level and drivers of transactions costs in black markets for currency appear to be a sorely understudied subject worthy of further attention.