Predicting crypto-currencies using sparse non-Gaussian state space models

In this paper we forecast daily returns of crypto-currencies using a wide variety of different econometric models. To capture salient features commonly observed in ﬁnancial time series like rapid changes in the conditional variance, non-normality of the measurement errors and sharply increasing trends, we develop a time-varying parameter VAR with t-distributed measurement errors and stochastic volatility. To control for overparameter-ization, we rely on the Bayesian literature on shrinkage priors that enables us to shrink coefﬁcients associated with irrelevant predictors and/or perform model speciﬁcation in a ﬂexible manner. Using around one year of daily data we perform a real-time forecasting exercise and investigate whether any of the proposed models is able to outperform the naive random walk benchmark. To assess the economic relevance of the forecasting gains produced by the proposed models we moreover run a simple trading exercise.


Introduction
In the present paper we develop a non-Gaussian state space model to predict the price of three crypto-currencies. Taking a Bayesian stance enables us to introduce shrinkage into the modeling framework, effectively controlling for model and specification uncertainty within the general class of state space models. To control for potential outliers we propose a timevarying parameter VAR model (Cogley and Sargent, 2005;Primiceri, 2005) with heavy tailed innovations 1 as well as a stochastic volatility specification of the error variances. Since the literature on robust determinants of price movements in crypto-currencies is relatively sparse (for an example, see, Cheah and Fry, 2015), we apply Bayesian shrinkage priors to decide whether using information from a set of potential predictors improves predictive accuracy.
The recent price dynamics of various crypto-currencies point towards a set of empirical key features an appropriate modeling strategy should accommodate. First, conditional heteroscedasticity appears to be an important regularity commonly observed (Chu et al., 2017). This implies that volatility is changing over time with persistent manner. If this feature is neglected, predictive densities are either too wide (during tranquil times) or too narrow (in the presence of tail events, i.e. pronounced movements in the price of a given asset). 2 Second, the conditional mean of the process is changing. This implies that, within a standard regression framework, the relationship between an asset price and a set of exogenous covariates is time-varying. In the case of various crypto-currencies this could be due to changes in the degree of adoption of institutional and/or private investors, regulatory changes, issuance of additional crypto-currencies or general technological shifts (Böhme et al., 2015). Thus, it might be necessary to allow for such shifts by means of time-varying regression coefficients. Third, and finally, a rather strong degree of co-movement between various crypto-currencies (see Urquhart, 2017). In our paper, we consider Bitcoin, Ethereum and Litecoin, three popular choices. All three of them tend to be strongly correlated with each other, implying that a successful econometric framework should incorporate this information.
The goal of this paper is to systematically assess how different empirically relevant forecasting models perform when used to predict daily changes in the price of Bitcoin, Ethereum and Litecoin. The models considered include a wide range of univariate and multivariate models that are flexible along several dimensions. We consider vector autoregressions that feature drifting parameters as well as time-varying error variances. To cope with the curse of dimensionality we introduce recent shrinkage priors (see Feldkircher et al., 2017) and a flexible specification for the law of motion of the regression parameters . In addition, we introduce a heavy tailed measurement error distribution to capture potential outlying observations (see, among others, Carlin et al., 1992;Geweke and Tanizaki, 2001).
We jointly forecast the three crypto-currencies considered by using daily data from October 2016 to October 2017, with the last 160 days being used as a hold-out period. In a forecasting comparison, we find that time-varying parameter VARs with some form of shrinkage perform well, beating univariate benchmarks like the AR(1) model with stochastic volatil-ity (SV) as well as a random walk with SV. Constant parameter VARs tend to be inferior to their counterparts that feature time-varying parameters, but still prove to be relevant competitors. Especially during days which are characterized by large price changes, controlling for heteroscedasticity in combination with a flexible error variance covariance structure pays off in terms of predictive accuracy. These findings are generally corroborated by considering probability integral transforms, showing that more flexible models lead to better calibrated predictive distributions. Moreover, a trading exercise provides a comparable picture. Models that perform well in terms of predictive likelihoods also tend to do well when used to generate trading signals.
The remainder of this paper is structured as follows. Section 2 provides an overview of the data as well as empirical key features of the three crypto-currencies considered. Moreover, this section details how the additional explanatory variables are constructed. Section 3 introduces the econometric framework adopted, providing a brief discussion of the model as well as the Bayesian prior setup and posterior simulation. Section 4 presents the empirical forecasting exercise while Section 5 focuses on applying the proposed models to perform portfolio allocation tasks. Finally, the last section summarizes and concludes the paper.

Empirical key features
In this section we first identify important empirical key features of crypto-currencies and then propose a set of covariates that aim to explain the low to medium frequency behavior of the underlying price changes.
For the present paper, we focus on the daily change in the log price of Bitcoin, Ethereum and Litecoin. To explain movements in the price of the three crypto-currencies considered, we include information on equity prices (measured through the log returns of the S&P500 index), the relative number of search queries for each respective crypto-currency from Google trends, the number of English Wikipedia page views as well as the difference between the weekly cumulative price trend from common mining hardware and similar, but mining-unsuitable, GPU-related products to capture the effect of supply-side factors.
The data spans the period from 26th November 2016 to 3rd October 2017, yielding a panel of 316 daily oberservations. Bitcoin, Ethereum and Litecoin closing prices are taken from a popular crypto-currency meta-platform. 3 They originate from major crypto exchanges and are averaged according to their daily trading volume. Furthermore, alternative financial investments are represented by the S&P500 indices daily closing prices. Additionally, demandside predictors like the relative number of world-wide search operations from Google trends and the number of Wikipedia page views (in english) are used. Because large-scale cryptocurrency mining impacts supply and prices for the required equipment at the same time, hardware price trends are utilized to express changes in supply. To capture these effects, we gather GPU prices from Amazon's bestseller lists and extract the price trend of common mining hardware. We construct this predictor by computing the difference between the weekly cumulative price trend from common mining hardware (e.g., AMD Radeon RX 480 graphic Second, the degree of co-movement between the three currencies increased markedly in 2017, where most major peaks and troughs coincide. This carries over to the squared returns, where we find that especially the sharp increase in volatility in September 2017 was common to all three currencies considered.
These two empirical regularities suggest that the proposed model should be able to capture co-movement between Bitcoin, Litecoin and Ethereum prices as well as changes in the first moment of the sampling density. Moreover, the right panel indicates that large shocks appear to be quite common, calling for a flexible error distribution that allows for heteroscedasticity.
In order to provide further information on the amount of co-movement in our dataset, Fig. 2 shows a heatmap of the lower Cholesky factor of the empirical correlation matrix of the nine time series included. The upper part of the figure reveals that all three assets display a pronounced degree of comovement. This indicates that each individual time series might carry important information on the behavior of the remaining two time series, pointing towards the necessity to control for this empirical regularity. For the remaining factors we do find non-zero correlation but these correlations appear to be rather muted. Nevertheless, we conjecture that the set of fundamentals above should be a reasonable starting point to explain movements in the price of crypto-currencies.

A multivariate state space model
To capture the empirical features of the three crypto-currencies, a flexible econometric model is needed. We assume that the three crypto-currencies as well as the additional covariates are stored in an m-dimensional vector {y t } T t=1 that follows a VAR(p) model with time-varying coefficients, y t = A 1t y t−1 + · · · + A pt y t−p + ε t , (3.1) with A jt (for j = 1, . . . , p) being a set of m × m-dimensional coefficient matrices and ε t is a multivariate vector of reduced form shocks with a time-varying variance covariance matrix, Hereby we let U t be a lower uni-triangular matrix with diag = ι m and ι m being a mdimensional vector of ones. Moreover, H t is a diagonal matrix with typical diagonal element [H t ] jj = e h jt . The logarithmic volatilities are assumed to follow an AR(1) process, µ j denotes the unconditional mean of the log-volatility process while ρ j and ς j are the persistence and variance parameters, respectively. Following Carriero et al. (2015) and Feldkircher et al. (2017) we rewrite Eq. (3.1) as follows,Ũ and η t is a vector of orthogonal shocks with a time-varying variancecovariance matrix.
Note that the ith equation (for i > 1) of this system can be written as, We let x t = (y t−1 , . . . , y t−p ) be the stacked vector of covariates and A t = [A 1t , . . . , A pt ] is the m × mp matrix of stacked coefficients with A i•,t selecting the ith row of the matrix concerned. Eq. (3.5) is a simple regression model with heteroscedastic innovations and the (negative) of the reduced form shocks of the preceding i − 1 equations as additional regressors. In the case of i = 1, Eq. (3.5) reduces to a simple univariate regression with x t as covariates. It proves to be convenient to rewrite Eq. (3.5) as follows One important implication of Eq. (3.6) is that the covariance parameters are effectively estimated in one step alongside the VAR coefficients. We assume that β it evolves according to a random walk process, The shocks to the states This parameterization, labeled the non-centered parameterization, implies that the state innovation variances are moved into the observation equation (see Eq. (3.8)) and treated as standard regression coefficients. Thus, if ϑ ij = 0, the coefficient associated with the jth element in z it is constant over time.
Up to this point we have remained silent on the distributional assumptions on the measurement errors. In what follows we depart from the literature on TVP-VARs and assume that the measurement errors are heavy tailed and follow a t-distribution. This choice is based on evidence in the literature (Geweke, 1994;Gallant et al., 1997;Jacquier et al., 2004) which calls for heavy tailed distributions when used to model daily financial market data. As can be seen in Fig. 1, we also observe multiple outlying observations for all three crypto-currencies under consideration.
Since the assumption of non-Gaussian errors would render typical estimation methods like the Kalman filter infeasible, we follow Harrison and Stevens (1976); West (1987); Gordon and Smith (1990) and use a scale mixture of Gaussians to approximate the t-distribution, Notice that the degree of freedom parameter v i is equation-specific, implying that the excess kurtosis of the underlying error distribution is allowed to change across equations, a feature that might be important given the different time series involved. The latent process φ it simply serves to rescale the Gaussian distribution in case of large shocks. Notice that if φ it = 1 for all i, t we obtain the standard time-varying parameter VAR as in Primiceri (2005).

Prior specification
The prior setup adopted closely follows Feldkircher et al. (2017). More specifically, we use a Normal-Gamma (NG) shrinkage prior on the elements of β i0 and √ Ω i . The NG prior comprises of a Gaussian prior on the coefficients alongside a set of local and global shrinkage parameters for the first mp elements of β i0 and diag( √ for i = 1, . . . , m and j = 1, . . . , mp. Here we let τ 2 s,ij (for s ∈ {β, ϑ}) denote local shrinkage parameters with κ is a hyperparameter specified by the researcher and λ L is a global shrinkage parameter that is lag-specific, i.e. applied to the elements in β i0 and √ Ω i associated with the Lth lag of y t , and constructed as follows ( 3.16) This implies that if π l > 1, the prior introduces more shrinkage with increasing lag orders. The degree of overall shrinkage is controlled through the hyperparameters c 0 and d 0 .
Notice that this specification pools the parameters that control the amount of time-variation as well as the time-invariant regression parameters. This captures the notion that if a variable is not included initially, the probability of having a time-varying coefficient also decreases (by increasing the lag-specific shrinkage parameter λ L ).
For the covariance parameters indexed by j = mp + 1, . . . , k i the prior is specified analogously to Eqs. (3.13) -(3.14) but with λ L replaced by . This choice implies that all covariance parameters as well as the corresponding process innovation variances are pushed to zero simultaneously. For we again use a Gamma distributed prior, with a 0 , b 0 being hyperparameters. This prior specification has the convenient property that the parameters λ L and introduce prior dependence, pooling information across different coefficient types (i.e. regression coefficients and process innovation variances), introducing strong global shrinkage on all coefficients concerned. By contrast, the introduction of the local scaling parameters τ s,ij serves to provide flexibility in the presence of strong overall shrinkage introduced by λ L and . Thus, even if the aforementioned global scaling parameters are large (i.e. heavy shrinkage is introduced in the model), the local scalings provide sufficient flexibility to drag away posterior mass from zero and allowing for non-zero coefficients. The role of the hyperparameter κ is to control the tail behavior of the prior. If κ is small (close to zero), the prior places more mass on zero but the tails of the marginal prior obtained after integrating over the local scales become thicker (see Griffin et al., 2010, for a discussion).
For the parameters of the log-volatility equation in Eq. (3.3) we follow Kastner and Frühwirth-Schnatter (2014); Kastner (2015a) and use a normally distributed prior on µ j ∼ N (0, 10 2 ), a Beta prior on ρ j +1 2 ∼ B(25, 5) and a Gamma prior on ς j ∼ G(1/2, 1/2). In addition, we specify a uniform prior on v i ∼ U(2, 20), effectively ruling out the limiting case of a Gaussian distribution if v i becomes excessively large.

Full conditional posterior simulation
Estimation of the model is carried out using Markov chain Monte Carlo (MCMC) techniques. Our MCMC algorithm consists of the following blocks: 1. Conditional on the remaining parameters/states in the model, simulate the full history of {β it } T t=1 using a forward-filtering backward sampling algorithm (Carter and Kohn, 1994;Frühwirth-Schnatter, 1994) on an equation-by-equation basis.
2. The full history of the log-volatility process as well as the parameters of Eq. (3.3) are obtained by relying on the algorithm proposed in Kastner and Frühwirth-Schnatter (2014) and implemented in the R package stochvol (Kastner, 2015b).
3. The time-invariant components β i0 as well as θ i = diag(Θ i ) are simulated from a multivariate Gaussian posterior that takes a standard form (see Feldkircher et al., 2017).
4. The sequence of local scaling parameters is simulated from a generalized inverted Gaussian (GIG) distributed posterior distribution given by, for j ∈ A L . The posterior distribution for the scalings associated with the covariance parameters is similar with λ L replaced by .
5. We obtain draws from the posterior of the lag-specific shrinkage parameter associated with the lth lag by combining the likelihood m i=1 j∈A l p(τ 2 β,ij , τ 2 ϑ,ij |π l , λ l−1 ) with the prior on π l . The resulting posterior distribution is a Gamma distribution, with the • indicating the conditioning on everything else, R = 2pm 2 and λ 0 = 1. The set A l selects all coefficients associated with the lth lag of y t .
Similarly, the conditional posterior of is given by where ν = m(m − 1) denotes the number of covariance parameters in addition to the number of process variances for the corresponding parameters.
7. To simulate the degrees of freedoms v i , we perform an independent Metropolis Hastings (MH) step described in Kastner (2015c).
This algorithm is repeated a large number of times with the first N burn observations being discarded as burn-in. 4 Notice that the equation-by-equation algorithm yields significant computational gains relative to competing estimation algorithms that rely on full-system estimation of the VAR model.

Model specification and design of the forecasting exercise
In this section, we briefly describe model specification and the design of the forecasting exercise. The prior setup for our benchmark specification (henceforth labeled the t-TVP NG) model closely follows the existing literature on NG shrinkage priors (Griffin et al., 2010;Bitto and Frühwirth-Schnatter, 2016;Feldkircher et al., 2017). More specifically, we set κ = 0.1, c 0 = 1.5 and c 1 = 1 to center the prior on π l above unity while a 0 = b 0 = 0.01. The choice for κ implies that we place a large amount of prior mass on zero while at the same time allow for relatively thick tails. Our choice for the Gamma prior on introduces heavy shrinkage on the covariance parameters as well as the corresponding process standard deviations. For all models (i.e. the competitors introduced in the next subsection) we consider as well as the proposed model we include a single lag of the endogenous variables. Higher lag orders are generally possible but given the high dimension of the state space and the increased computational complexity we stick to one lag. In addition, experimenting with slightly higher lag orders leads to models that are relatively unstable during several points in time in our estimation sample.
The design of our forecasting exercise is the following. We start with an initial estimation period that spans the period between the end of November 2016 (22nd of November) to the end of April 2017 (26th of April ). The remaining 160 days are used as a hold-out period. After obtaining the one-step-ahead predictive density for the 27th of April 2017, we consequently expand the estimation sample by a single day until the end of the sample is reached. This yields a sequence of 160 one-day-ahead predictive densities.
To assess the predictive fit of our model we use the log-predictive likelihood (LPL), motivated in, e.g., Geweke and Amisano (2010), and the root mean square forecast error (RMSE). Using LPLs enables us to assess not only how well the model fits in terms of point predictions but also how well higher moments of the predictive density are captured. In addition, to assess model calibration we use univariate probability integral transforms (Diebold et al., 1998;Clark, 2011;Amisano and Geweke, 2017).

Competing models
Our set of competing models ranges from univariate benchmark models that feature SV to a wide set of multivariate benchmark models. The first set of models considered are a random walk (RW-SV) and the AR(1) model (henceforth labeld AR-SV), both estimated with SV. We use non-informative priors on the AR(1) regression coefficient and the same prior setup for the log-volatility equation as discussed in the previous section. These two models serve to illustrate whether a multivariate modeling approach pays off and, in addition, whether allowing for structural changes in the underlying regression parameters improves predictive capabilities.
In addition, we consider a set of nested multivariate benchmark models. To quantify the accuracy gains of time-varying parameter specifications, we estimate three constant parame-ter VARs with SV. The first VAR uses the prior setup described above but with √ ϑ ij = 0 for all i, j. The second model is a non-conjugate Minnesota VAR with asymmetric shrinkage across equations. To select the hyperparameters we follow Giannone et al. (2015) and place hyperpriors on all hyperparameters and estimate them using a random walk Metropolis Hastings step. The last VAR we consider is a model that features a stochastic search variable selection (SSVS) prior specified as in George et al. (2008). This implies that a two component Gaussian prior is used with the Gaussians differing in terms of their prior variance. One component features a large prior variance (labeled the slab distribution) which introduces relatively little prior information whereas the second component has a prior variance close to zero (the spike component) that strongly forces the posterior of the respective coefficient to zero. We set the hyperparameters (i.e. the prior standard deviations) for the slab distribution by using the OLS standard deviation times a constant (ten in our case) while the prior standard deviation on the spike component is set equal to 0.1 times the OLS standard deviation.
Moreover, we include two time-varying parameter models with SV and Gaussian measurement errors. The first TVP-VAR considered (labeled TVP) is based on an uninformative prior (obtained by setting the prior variances to unity for both, the initial states as well as the process standard deviations). The next benchmark model (called TVP NG) is our proposed specification with a NG prior but with Gaussian errors (i.e. φ it = 1 for all i, t). This choice serves to assess whether additional flexibility on the measurement errors is needed.
Finally, the last model considered is the most flexible specification in terms of the law of motion of the latent states. This model, labeled the threshold TVP-VAR (labeled TTVP) is based on  and captures the notion that parameter movements are only allowed if they are sufficiently large. To achieve this, a threshold specification for the process variances is adopted. This specification depends on a latent indicator that, in turn, is driven by the absolute size of parameter changes. Thus, if the change in a given regression parameter is large (i.e. exceeds a certain threshold we estimate), we use a large variance in Eq. (3.7). By contrast, if the change is small the process variance is set to a small constant that is close to zero. The prior specification adopted here closely follows the benchmark specification outlined in  and we refer to the original paper for additional details.

Out of sample forecasting performance
We start by considering the forecasting performance in terms of log predictive likelihoods (LPS). Table 1 displays the LPS as well as the RMSEs for the competing models. The first column shows the joint LPS for the three crypto-currencies considered while the next three columns display the marginal LPS for a given crypto-currency. The final three columns show the RMSEs.
Considering the joint LPS indicates that across models, the t-TVP NG specification outperforms the remaining models. This points towards the necessity to allow for both, a flexible error distribution as well as time-varying parameters with appropriate shrinkage priors. Especially when compared to the constant parameter VAR models, all three TVP-VAR specifications with some form of shrinkage yield pronounced accuracy gains. Notice also that the AR(1) model with SV proves to be a tough competitor relative to the set of Bayesian VARs. The necessity of introducing shrinkage in the TVP-VAR framework can be seen by comparing the joint forecasting performance of the TVP model with the remaining TVP-VARs considered. Notice that in our medium-scale model, a TVP-VAR with relatively little shrinkage leads to overfitting issues which in turn are detrimental for forecasting performance.
Zooming into the results for the three crypto-currencies, we generally observe that models performing well in terms of the joint LPS also do well on average. One interesting exception is our proposed t-TVP NG specification. While the performance gains for Litecoin and Ethereum appear to be substantial vis-a-vis the competing models, we find that Bitcoin predictions appear to be inferior relative to the TTVP and the TVP NG specifications. If the researcher is interested in predicting the price of Bitcoin, the two best performing models are the TTVP specification and the Bayesian VAR with a Normal-Gamma shrinkage prior. Interestingly, notice that the comparatively weaker joint performance of the BVAR models stems from weaker Litecoin and Ethereum predictions whereas Bitcoin predictions appear to be rather precise.
Considering point forecasting performance generally corroborates the findings for density forecasts. Here we again observe that models which yield precise predictive densities also work well when only point predictions are considered. Notice, however, that the differences in terms of RMSE between multivariate models and the univariate AR(1) model are negligible. This somewhat highlights that forecasting gains in terms of predictive likelihoods stem from higher moments of the predictive density like the predictive variance (in terms of the marginal log scores) or a more appropriate modeling strategy for the predictive variance-covariance structure.
Next, we investigate whether differences in forecasting performance appear to be timevarying. Fig. 3 shows the log predictive Bayes factors relative to the random walk with SV. Comparing the model performances over time points towards a pronounced degree of het- erogeneity over time. For Bitcoin (see panel (a)) shows that the two best performing models are the TTVP and the TVP NG specifications. While the former yields a slightly better performance over time, the latter proves to be the best performing model during the first part of the hold-out period. For the remaining models we find only relatively little time-variation in their predictive performance. Considering the results for Litecoin (see panel (b)) we find pronounced movements in relative forecasting accuracy. More specifically, we find that while forecasting performance appears to be homogeneous during the first months of the hold-out period. From May 2017 onward, the t-TVP NG specification starts to perform extraordinarily well, improving upon all competitors by large margins.
Finally, panels (c) and (d) show the performance for Ethereum as well as the overall performance over time. Here we generally find results that are comparable with the findings described above. Notice that the overall log predictive likelihood displays a pattern similar to the one of the marginal LPS for the remaining crypto-currencies. However, compared to panel (a) we observe that the t-TVP specification also excels in terms of joint density predictions. The main difference is that the superior performance of the t-TVP NG model in terms of predicting Litecoin prices lifts the log predictive Bayes factor above the ones obtained for all competing models.

Model evaluation using probability integral transforms
Following Diebold et al. (1998);Clark (2011); Amisano and Geweke (2017), if a given model M i is correctly specified one can show that for t = t 0 , . . . , T and j = 1, . . . , m and t 0 indicating the first observation of the hold-out period (i.e. 22nd of November). Hereby we let Φ −1 denote the inverse distribution function of the standard normal distribution and F y (y jt |y 1:t−1 , M i ) denotes the cumulative distribution function associated with the underlying predictive distribution of model i. If the model is correctly specified the sequence of normalized forecast errors {z jt } T t=t 0 is independent and identically standard normally distributed. Fig. 4 (a) to (c) shows the normalized forecast errors across models and for all three crypto-currencies considered while Table 2 provides statistical tests that aim to support our visual assessment of Fig. 4. In the case of Bitcoin and Litecoin, we find that the mean appears to be close to zero. This finding is corroborated by the first column in Table 2 which displays the empirical mean obtained by regressing z jt,i on a constant, with p values in parentheses. Notice that for Ethereum, we find the normalized forecast errors of the majority of models to be centered above zero. The two exceptions are the TVP NG specification and the Minnesota prior VAR. Considering again panel (c) reveals that these deviations from zero are mainly driven by the failure to capture the conditional mean during the beginning of the hold-out period.
Considering the variances reveals that in the case of Bitcoin, the variances of the normalized errors are all well below unity, indicating that the estimated predictive variance is generally too high. Put differently, this is an indication for a situation where too many actual observations fall in the center of the predictive distribution. This finding appears to be strongly supported by the second column of Table 2, which displays the estimated variance of the normalized forecast error obtained by regressing the squared error on a constant. For the t-TVP NG and TTVP specifications we find slightly higher variances. Our interpretation is that allowing for a flexible error specification either by directly using non-Gaussian shocks in conjunction with stochastic volatility or by introducing more flexibility on the law of motion of the latent states slightly helps to push the variances towards one.
For Litecoin and Ethereum, the variances appear to be closer to one for all TVP specifications except for the TTVP model (in the case of Litecoin). It is noteworthy that especially for Litecoin, constant parameter models with SV tend to either underestimate the predictive variance or fail to capture observations in the tail of the empirical distribution.
Finally, considering the persistence of z jt,i reveals that most models tend to produce normalized errors which display muted persistence levels. This is corroborated by the final column of Table 2 which shows the persistence parameter obtained by estimating AR(1) models in z jt,i along with its p-values.

Economic performance criteria: A simple trading exercise
To assess which model performs well in terms of economic performance criteria, we perform a trading exercise where each model is used to generate a set of optimal weights attached to each of the three crypto-currencies considered. Using the models discussed in the previous  sections as well as two additional investment strategies that are based on equal weights and a simple passive investments in Bitcoin allows us to infer whether constructing a trading strategy based on more sophisticated econometric models pays off in terms of generating superior returns. We assume that investors adopt two strategies to find a optimal sequence of weights w it = (w 1i,t , w 2i,t , w 3i,t ) . The first one is the standard minimum variance portfolio problem that aims to allocate money between the three assets considered such that the portfolio variance is minimized. This implies that for t = t 0 , . . . , T the investor solves minimize w it w it P i,t|t−1 w it subject to 1 w it = 1,  Table 3: Annualized sharpe ratios of various competing investment strategies over the holdout sample. Min-Variance refers to the minimum variance portfolio whereas target mean-variance is the target mean-variance portfolio for different target returns. Equal weights refers to using w jt = 1/3 for all j, t and only BTC sets the weight associated with Bitcoin equal to one.
where 1 is a 3-dimensional vector of ones and P i,t|t−1 denotes the variance of model i's onestep-ahead predictive density. The second strategy adds a specific portfolio target return to the optimization problem in Eq. (5.1), i.e., w it µ it|t−1 ≥ r * t . (5.2) Here we let µ it|t−1 denote the one-step-ahead predictive mean of model i and r * t is a potentially time-varying target return the investor wants to match. This strategy, called the target mean-variance portfolio, tries to minimize the overall portfolio variance while at the same time maintaining the desired return r * t (see Markowitz, 1952). Table 3 shows annualized Sharpe ratios for the minimum-variance portfolio strategy as well as for the target mean-variance portfolio strategy for r * t = r * ∈ { 0.10 252 , 0.15 252 , 0.30 252 }. Considering the performance of the minimum variance portfolio (see first column in Table 3) shows that performance differences across models appear to be relatively small. This indicates that weights generated by the set of econometric models are similar, and when compared to the other strategies, more stable over time. Inspection of the weights (not shown) also suggests that this strategy yields weights that are seldom above one in absolute values (i.e. leveraged long/short positions). The single best performing model is the no-shrinkage TVP specification, closely followed by the TVP NG model. Notice that using simple equal weights also yields favorable risk/return ratios.
Considering the target mean-variance strategy for different target returns yields more heterogeneous model performances. The two best performing models are the TTVP model and the constant parameter VAR coupled with the SSVS prior. For the TVP VAR and the TVP NG model, we find that performance decreases when compared to the minimum variance portfolio strategy while for the proposed t-TVP NG we observe increasing Sharpe ratios. Comparing different r * yields no discernible differences, with most models that do well for modest target returns also performing well if target returns become more ambitious.
Across strategies it is worth noting that performing a passive investment in Bitcoin only (i.e. setting the corresponding weight equal to one for all t) also works well but one could still improve upon that strategy by considering more flexible portfolio allocation strategies.

Conclusive remarks
In this paper we perform a systematic comparison of univariate and multivariate time series models in terms of predicting one-day-ahead returns for three crypto-currencies, namely Bitcoin, Litecoin and Ethereum. To match the pronounced degree of volatility observed in daily returns of crypto-currencies, we propose a medium-scale multivariate state space model that features heavy-tailed measurement errors and stochastic volatility, a feature that turns out to be advantageous for density predictions. More generally, we find that it pays off to allow for time-varying parameters and a flexible error distribution only if suitable shrinkage priors are introduced. These priors introduce shrinkage to select the subset of time-varying coefficients in a flexible manner. To gauge the economic significance of our findings we also perform a trading exercise. The results show that models which perform well in forecasting also tend to work well when used to guide investment decisions.