An n -dimensional Rosenbrock distribution for Markov chain Monte Carlo testing

The Rosenbrock function is a ubiquitous benchmark problem in numerical optimization, and variants have been proposed to test the performance of Markov chain Monte Carlo algorithms on distributions with a curved and narrow shape. In this work we discuss the Rosen-brock distribution and the advantages and limitations of its current n -dimensional extensions. We then pro-pose a new extension to arbitrary dimensions called the Hybrid Rosenbrock distribution, which addresses all the limitations that affect the current extensions. The Hybrid Rosenbrock distribution is composed of conditional normal kernels arranged in such a way that preserves the key features of the original Rosenbrock kernel. Moreover, due to its structure, the Hybrid Rosen-brock distribution is analytically tractable, and possesses several desirable properties which make it an excellent test model for computational algorithms. We conclude with numerical experiments that show how commonly used Markov chain Monte Carlo algorithms may fail to explore densities with curved correlation structure, restating the importance of a reliable benchmark problem for this class of densities.


INTRODUCTION
The Rosenbrock function (Rosenbrock, 1960) is a popular test problem in the optimization literature due to its challenging features: Its minimum is located at the bottom of a long and narrow parabolic valley.The original function can be turned into a probability density that maintains these features, and has been adopted by the Markov chain Monte Carlo (MCMC) community to serve as a benchmark problem when testing MCMC algorithms (e.g., Goodman & Weare, 2010).One of the current frontiers of research in this field is developing algorithms (e.g., Girolami et al., 2011;Parno, 2014) that can sample efficiently from densities that have 2-d marginals with nonconstant or curved correlation structure (see, e.g., Figure 1).Such shapes make it difficult for MCMC algorithms to take large steps, increasing the autocorrelation time and decreasing the quality of the MCMC sample.
Researchers developing new methods for distributions with nonlinear correlation structure often test their algorithms on only a handful of benchmark models, among which the (two-dimensional) Rosenbrock kernel is quite popular (Hogg & Foreman-Mackey, 2018).However, few properties of the Rosenbrock kernel have been investigated and formalized, especially regarding multivariate extensions of the density for the purpose of MCMC sampling.As we will show in Section 2.1, sometimes the properties of this distribution are so poorly understood that extending the kernel from two to three dimensions radically changes its shape.
In this work we present the Hybrid Rosenbrock distribution, a benchmark model that i) has curved 2-d marginal distributions; ii) is easily extendable to more than two dimensions; iii) has known normalization constant; iv) explains clearly the effects that changes in the parameters have on its shape; v) allows for direct and efficient Monte Carlo sampling.To our knowledge, there is no benchmark model for distributions with curved correlation structure that possesses all of the above properties.
The Hybrid Rosenbrock distribution can be used by researchers developing new MCMC methods to test how algorithms perform on distributions with curved 2-d marginals.Moreover, the Hybrid Rosenbrock distribution can also be used by savvy MCMC practitioners to perform algorithm selection.The shape and features of the Hybrid Rosenbrock can be tweaked to match those of the model of interest, which would provide the practitioners with a tailor-made toy problem to test their algorithm of choice and assess how well it performs when compared with the true solution.
Furthermore, the Hybrid Rosenbrock distribution can be used to test the accuracy of algorithms that estimate the normalizing constant of a kernel (Gelfand & Smith, 1990;Satagopan et al., 2000).Prominent approaches include Chib (1995), DiCiccio et al. (1997), and Moral et al. (2006), among various other contributions.Due to the number of approaches suggested,  having a challenging benchmark problem for which the normalizing constant is known would prove a valuable assessment tool.The structure of the article is as follows.In Section 2 we review the current literature on 2-d Rosenbrock distributions and the available n-dimensional extensions.In Section 3 we present our n-dimensional extension, and discuss how it improves on the shortcomings of current solutions.In Section 4 we discuss how changes in the structure and shape of the Hybrid Rosenbrock density affect the difficulty of obtain an MCMC sample from it, and compare the performance of some popular MCMC algorithms.In Section 5 we report our conclusions.
Our code is available in the form of a simple tutorial at https://github.com/FilippoPagani/hybridRosenbrock, and the R package "Rosenbrock" is available on CRAN.

CURRENT LITERATURE
The simplest nontrivial case of the Rosenbrock distribution is the 2-d case, where the kernel can be written as We follow (Goodman & Weare, 2010) when rescaling Equation ( 1) by 1/20, so that the distribution takes the shape of a curved narrow ridge-shown in Figure 1 on the left side-which is normally quite challenging for MCMC algorithms to explore.
It is not clear from the literature how the shape of the kernel in Equation ( 1) is affected by changes in the coefficients.Moreover, the normalizing constant is generally unknown, and there is more than one way to extend the distribution beyond two dimensions.Two methods have been proposed in the literature, and we will review them to point out their advantages and limitations.

Full Rosenbrock distribution
We will refer to the n-dimensional extension in Goodman and Weare (2010) as "Full Rosenbrock" kernel in the following paragraphs.The kernel has the following structure: The normalizing constant is unknown, and the effect of the coefficients on the shape of the distribution is unclear.
Figure 1 shows a comparison between the variables x 1 and x 2 from the 2-d kernel, and the same variables from a 3-d Full Rosenbrock kernel.A more exhaustive plot of a 3-d Full Rosenbrock kernel is shown in Figure A1 in Appendix A. The stark difference between the plots in Figure 1 shows how extending the Rosenbrock kernel from two to three dimensions as described in Equation (2) alters the joint plot between the variables x 1 and x 2 .The long narrow ridge has become much more concentrated around the mode, decreasing the difficulty of the problem from an MCMC perspective.
However, the Full Rosenbrock kernel does have some desirable features: as n increases, the variance of x n increases steeply.Densities with such properties (e.g., Neal's Normal in Neal, 2010) are challenging to MCMC algorithms, especially if the dependence between components is nonlinear.

Even Rosenbrock distribution
In the optimization literature, Dixon and Mills (1994) proposes a simpler version of the Full Rosenbrock function used in Section 2.1, which can be turned into a kernel as where n must be an even number, and we maintain the 1/20 mentioned in the previous section.The normalizing constant is unknown, and the effect of the coefficients on the shape of the distribution is unclear.This density is in essence the product of n/2 independent blocks, each containing a 2-d Rosenbrock kernel.Unlike the Full Rosenbrock kernel, the Even Rosenbrock does maintain the shape of the joint 2-d marginals as n increases.However, only a small fraction of the joint distributions are curved narrow ridges, while the majority of the 2-d marginals are uncorrelated (see Figure A2 in Appendix A for more details).

THE HYBRID ROSENBROCK DISTRIBUTION
As outlined in Section 1, the overall goal of this article is presenting an n-dimensional benchmark model that has the required marginal structure, known normalization constant, parameters whose changes have clear effects on its shape, and it admits simple and robust Monte Carlo sampling.These properties are vital for a suitable benchmark distribution.The Hybrid Rosenbrock density fulfills all of the outlined properties, and draws on both models described in Section 2 to provide a reliable target where every single 2-d marginal distribution has a complex dependency structure.Its kernel can be written as: where , x j,i ∈ R; a, b j,i ∈ R + (∀j, i), and where the final dimension of the distribution is given by the formula n = (n 1 − 1)n 2 + 1.The dependency structure between the components x 1 , … , x n 2 ,n 1 of the Hybrid Rosenbrock distribution can be represented with a graphical model as shown in Figure 2.
Each "row" of the diagram in Figure 2 represents a "block of variables."The index i in Equation (4) identifies variables within a "block", with n 1 denoting the block size, while the index j identifies a single block among the n 2 blocks present.The indices on the coefficients b j,i follow the block structure, as do the indices on the random variables x j,i .The variable x j,1 = x 1 , ∀ j = 1, … , n 2 is represented with only one index, as it is common to all blocks.
Figure 3 shows the contour plots obtained from a Monte Carlo sample of the kernel in Equation (4) when taking n 2 = 2, n 1 = 3, and  = 1, a = 1/20, and b j,i = 100/20 (∀j, i), that is, The Hybrid kernel inherits from the Full Rosenbrock kernel the feature of having variables with very different variances, as can be observed in the scales of the plots in Figure 3.Moreover, as opposed to the Even Rosenbrock (contours shown in Figure A2 in Appendix A), no plot in Figure 3 presents trivial correlation structure: every 2-d marginal is a straight or curved ridge with very long tails.At the same time, the Hybrid kernel inherits from the Even kernel the block F I G U R E 2 Graphical model representing the dependency structure of the Hybrid Rosenbrock distribution.The circles represent the kernels of each variable, while the edges represent the direct dependence between the kernels x 1,3 . . .structure, which guarantees that as n grows, the variance of each variable is computationally stable to compute.Remark 2 in Section 4 will address this point.
The true strength of the Hybrid Rosenbrock distribution lies not only in the way we connect the terms in Equation ( 4), but also in recognizing that each term in Equation ( 4) is in fact a conditional normal kernel.For example, the term exp and variance (2b j,i ) −1 , while the first term in Equation ( 4) simply represents an unconditional normal kernel  (x 1 |, (2a) −1 ).This new perspective allows us to give the normalization constant for the Hybrid Rosenbrock distribution.
Proposition 1.The normalization constant of the Hybrid Rosenbrock kernel given in Equation ( 4) Proof.We use the conditional structure of the density to split the integrals of the normal kernels and solve them one at a time.The details are shown in Appendix C. ▪ Interpreting the Hybrid Rosenbrock density as the composition of normal kernels also provides us with a simple interpretation for the coefficients.As 2a and 2b j,i represent the precisions of the conditional normal kernels, increasing a increases the slope of the distribution along the ridge formed around the parabola x 2 j,i−1 , while increasing b j,i decreases the dispersion around the parabola.The parameter  determines the position of the mode of the variable x 1 along the parabola.
Remark 1.In this work we will only investigate the case where the kernels are normal and connected through the mean function, and where the mean function is the parabola x 2 .Our choice is based on the historical importance of the Rosenbrock function, and on the fact that "banana"-like distributions are common in many fields of knowledge.Indeed, any polynomial can be used as mean function, as well as other functions such as exp(x), sin(x), 1/x.In fact, any function f (x) ∶ R → E ⊆ R that does not alter the behavior of the integrals in the proof of Proposition 1 is a viable candidate as mean function.Furthermore, as long as the same conditions are satisfied, kernels other than normal can be used.This should provide more than enough variety for the practitioner to adapt the Hybrid Rosenbrock to their own specific problem.Depending on the choice of the mean function, the block structure can be adopted to control the variance of those components that grow too quickly, as mentioned in Remark 2.
Using the conditional normal structure of the model, it is possible to obtain an i.i.d.Monte Carlo sample from the joint distribution by using Algorithm 1. Notably, Algorithm 1 is not a Gibbs sampler as the first kernel in x 1 is always independent of any other kernel.
Algorithm 1. Pseudocode to sample from a Hybrid Rosenbrock distribution

NUMERICAL TESTS
In this section we investigate how varying the parameters n 1 , n 2 , , a, b j,i (∀ j, i) influences the performance of MCMC algorithms sampling from the Hybrid Rosenbrock distribution.

Model comparison
In this section we compare the integrated autocorrelation time  calculated using MCMC samples from models with different sets of parameters.The integrated autocorrelation time roughly measures how many steps on average an MCMC algorithm needs in order to return a sample that is completely uncorrelated with the original position x.Studying how  varies for each model provides insights into how easy it is to sample from that model via MCMC (see, e.g., Goodman & Weare, 2010 and references therein).
To obtain the MCMC samples we rely on a simplified manifold MALA (sMMALA) algorithm (Girolami et al., 2011) with SoftAbs metric (Betancourt, 2013), tuned with  = 10 6 and acceptance ratio set at roughly 50%.sMMALA is an algorithm particularly well suited for this class of densities, as it uses the local correlation structure of the target to make more ambitious moves (see Appendix D for an accurate description).As all our models are multidimensional, each MCMC sample yields a vector of n autocorrelation times  i , one for each component of the state space, where  i is defined as where y i is the MCMC sample from the ith component of the state space, and L is an integer number representing the last lag where the sample autocorrelation is significantly different from zero.We then record only the highest autocorrelation time among all components: Naturally, the smaller the value of , the better the algorithm mixes.Assuming the autocorrelation in an MCMC sample is always nonnegative, an algorithm that generates i.i.d.samples achieves the smallest possible value of , that is,  = 1.The integrated autocorrelation time is closely related to the effective sample size (ESS), as ESS = N∕, where N is the number of Monte Carlo samples available.The integrated autocorrelated times  i are calculated here with initial sequence estimators (see, e.g., Christen & Fox, 2010;Geyer, 1992), and the results are reasonably consistent between different runs, as the uncertainty bars confirm in Figure 5.Our results are also consistent with estimating  i by fitting an autoregressive (AR) process to the time series (Thompson, 2010).Unlike Roberts and Rosenthal (2001) we deliberately did not divide  i by n, as it is of interest to us to capture the effect that an increase in n has on the difficulty of sampling from the target.
In this section we will test six separate distribution structures or models, indexed by the parameters n 1 , n 2 , and for each of them, we will vary the parameters , a, b j,i (∀ j, i) to change the model's shape.For simplicity, we will fix b j,i = b, ∀j, i.The structure of the six models is represented in Figure 4.

F I G U R E 4 Graphical models of the six different Hybrid Rosenbrock structures tested in this section
Model 1 corresponds to the 2-d Rosenbrock density, that is, Equation (4) with n 2 = 1 and n 1 = 2, and is included to represent the baseline against which every other model is compared.
Model 2 (n 2 = 2, n 1 = 2) and Model 3 (n 2 = 1, n 1 = 3) are both 3-d distribution.Model 2 captures the effect of extending the 2-d density by adding an extra block, while Model 3 captures the effect of increasing the number of variables in the same block.We expect Model 3 to be more challenging than Model 2, as the difference between the variance of the variables in Model 3 should be higher than in Model 2.
Model 4 (n 2 = 4, n 1 = 2) and Model 5 (n 2 = 1, n 1 = 5) are simply larger versions of Models 2 and 3, and they capture the effects that an increase in dimension of the state space has on the sampling algorithm.
Overall, Models 2 to 5 are extensions of the 2-d case obtained by only increasing either the number of blocks or the number of variables in the single block available.
Instead, Model 6 (n 2 = 2, n 1 = 3) is a fully Hybrid Rosenbrock distribution, with multiple blocks and multiple variables in each block.Its 2-d marginals can be seen in Figure 3. Remark 2. Model 5 was included for comparison, but it is a viable option only for certain parameter values and only in low dimension.The reason is that as n increases, the variance of x 1,n grows too quickly.Using the parameters  = 1, a = 1/20, b = 100/20, already with n = 10 some of the values of the sample covariance are so large that computers treat them as infinite.Algorithms that rely on the sample covariance matrix or the Hessian to adaptively explore the target would not work properly in that case.This behavior onsets for even lower values of n if  has a value far from zero, and a is small (with respect to the standard parametrization).Hence we recommend using the block structure, to be able to increase n at pleasure while avoiding uncontrolled behavior, which is particularly undesirable in a test problem.As the choices for the shape parameters , a, and b are infinite, we performed our experiments for nine different sets of parameters, which provide a wide selection of the shapes that the Hybrid Rosenbrock distribution can take.For reference, Figure B1 in Appendix B shows how the shape of the first two variables of a Hybrid Rosenbrock vary for each parameter set we chose to utilize, complementing the description we gave in Section 3. From now on, we will refer to the set of parameters  = 1, a = 1∕20, and b = 100/20 (values that originate from Equation 1) as the standard parametrization, which occupies the central plot in Figure 5.
Figure 5 shows how sMMALA tends to perform well (low ) on Models 1, 2, and 4, with low variability, while it tends to perform worse on Models 3, 5, and 6, with higher variability.This behavior is explained by the fact that Models 1, 2, and 4 are all parametrized by the same value of n 1 = 2, while significant differences in the scales of the components start appearing only when n 1 ≥ 3, as is the case for Models 3, 5, and 6.Therefore the difference between the variance of the components of the state space appears to be responsible for the increased difficulty of sampling from Models 3, 5, and 6.However, there does not appear to be a significant difference between Models 3, 6, both with a value of n 1 = 3, and Model 5, with a value of n 1 = 5.This suggests that increasing the value of n 1 beyond three creates difficulties even for a sophisticated algorithm like sMMALA.
Moreover, examining the performance of sMMALA in Figure 5 in conjunction with the distribution shapes (Figure B1) suggests that models with values of , a, and b that yield rounder and more concentrated shapes-that is, large values of a, small values of b and to a lesser extent, values of  near zero-tend to have lower  values.Conversely, values of , a, and b that yield narrower and more elongated shapes-that is, large a, small b and  far from zero-tend to have higher  values.
Notably, the integrated autocorrelation time from Model 5 does not seem to be sensitive to changes in the shape parameters , a, and b.However, the uncertainty is quite large, so more computationally intensive tests may be needed to pinpoint the exact effects that the parameters , a, and b have on the difficulty of sampling from Model 5 with a sMMALA algorithm.It is likely that the tails of Model 5 are so elongated that even a state of the art algorithm like sMMALA has difficulties exploring them sufficiently well.Nonetheless, as explained in Remark 2, such results may not be particularly useful in practice.

Algorithm comparison
In this section we study how the performance of popular MCMC algorithms changes as we vary the parameters of our model.Preliminary experiments suggest that, due to the difficult nature of the target, the integrated autocorrelation time of simple or naively tuned algorithms may be too large to be measured accurately with our computational means.Results based on the ESS and ESS per second (ESS/s) are similarly affected1 .
In this section we compare MCMC algorithms in terms of both the Kolmogorov-Smirnov (KS) distance, and the Anderson-Darling (AD) distance.The KS distance for the ith component of the state space is defined as that is, the supremum of the absolute value of the difference between the empirical cumulative distribution function (CDF) constructed from the MCMC sample, Fi , and the empirical CDF constructed from a large i.i.d.Monte Carlo sample, F i .We then only store the largest KS distance on the marginal distributions, which is simply The KS distance is a good measure of similarity between two CDFs.However, as the supremum of the distance between two CDFs is quite sensitive to local variation, it sometimes focuses on the region around the mode while ignoring differences in the tails.To address this limitation, we also measured the AD distance, which is defined as where N is the number of samples, and only stored the largest distance among all components, The AD distance is a squared distance similar to the Cramer-Von Mises (CM) distance.Like the CM distance, the integrand consists of the square difference between two CDFs, which is then weighted by the term ( The AD distance tends to assign more weight to the tails of the distribution than the CM distance or the KS distance (Stephens, 1974), which is where we expect to find discrepancies in our analysis.
In practice, we constructed F i from 100 million i.i.d.Monte Carlo samples, which is a considerable number for our purposes.Every marginal CDF F i constructed this way takes up 2 Gb of memory on our machine, and our tests suggest that the effect of the sample variation is orders of magnitude smaller than the results we obtain in our experiments.
In this section we restrict our attention to Model 6, and we vary the parameters to change the difficulty of sampling from it via MCMC.In particular, we focus on the parameter b, which controls the dispersion of the distribution around the curved ridge.Roughly, the smaller the value of b, the flatter the distribution becomes, thereby making it easier for the MCMC algorithms to jump from one arm of the distribution to the other.
The MCMC algorithms tested in this section are the Random Walk Metropolis (RWM) (Metropolis et al., 1953), Hamiltonian Monte Carlo (HMC) (Duane et al., 1987;Neal, 2010), simplified Manifold MALA (sMMALA) (Girolami et al., 2011) with SoftAbs metric (Betancourt, 2013), and NUTS (Homan & Gelman, 2014) as implemented in (Carpenter et al., 2017).An important difference that sets these algorithms apart is that RWM, HMC, and NUTS explore the target using a global metric (i.e., both the variance of the transition kernel and the mass matrix do not depend on the current position of the algorithm), while the sMMALA algorithm takes advantage of the local curvature of the target when proposing moves in the state space.This property makes sMMALA particularly well suited to sample from targets like the Hybrid Rosenbrock distribution, and it provides a meaningful term of comparison for the RWM and HMC algorithms in this setting.Even though the details of the sMMALA algorithm are not essential to the exposition of our findings, we included a description of the algorithm in Appendix D.
We tuned the RWM to accept about 23% of the proposed moves (Roberts & Rosenthal, 1997), although in practice this probability oscillated between 22% and 29% depending on how difficult the target was.We took the variance of the transition kernel to be the identity matrix, as the local correlation structure changes significantly depending on the current position of the algorithm.
Tuning HMC on our target is particularly difficult as it is characterized by strong nonlinear correlations, and the optimality results in Beskos et al. (2013) only apply to multivariate normal targets with independent components.We decided to tune HMC naively, ignoring the problems that a curved density may cause to MCMC algorithms, and choosing the parameter values that yielded the best ESS/s.We took the mass matrix to be the identity matrix.We maintained the acceptance ratio between 78% and 85%, and found that 20 leapfrog steps produced an acceptable result for all the values of b considered in this section.We tuned sMMALA to maximize the value of the KS and AD distance, which yielded a different acceptance probability for each value of b, as shown in the below table.We used a value of  = 10 6 for the correction parameter (see Appendix D for the meaning of ).The algorithm NUTS was also tuned to obtain the best result in terms of the KS and the AD distance.Based on purely empirical arguments, we sought an acceptance of 98%.We selected a diagonal mass matrix, as that yields slightly more stable results than the identity matrix in this case.
The number of iterations for RWM, HMC, and sMMALA was selected by measuring the same wall clock time for the three algorithms, and making sure that the number of iterations was large enough to yield a good result, yet not too large for our computational resources.The results are 21.5 million samples for RWM, 1 million samples for HMC, and 1.2 million samples for sMMALA.Tuning NUTS in the same way was problematic, as our RMW, HMC, and sMMALA algorithms are written in R, while NUTS and the package Stan are written in C++, which is significantly faster.In order to balance our analysis, once the number of iterations had been selected for RWM, HMC, and sMMALA, we counted the number of gradient evaluation that our naive HMC performed during the whole run.We then run NUTS and stopped it when the number of leapfrog steps (i.e., gradient evaluations) reached the same value.A consequence of this setup is that the number of iterations that NUTS performed was different for every run.
Figure 6 shows the results of our analysis.We repeated the experiments 16 times in order to assess the uncertainty around our estimates.The dot represents the mean value of the KS and AD distance, while the whiskers represent the two-sided 95% credibility region around our estimate.Note that both axes are expressed in the logarithmic scale.
As mentioned before, low values of the parameter b make the distribution flatter around the ridge, and easier to sample from.High values of b make the distribution narrower, restricting movement and hampering exploration.The results from the sMMALA algorithm support this view, with low values of the KS and AD distance for small values of b, and large values of higher values of the KS and AD distance for large values of b.
Surprisingly, despite tuning RWM as optimally as possible and taking a large number of samples, the performance of the algorithm is quite poor, even for low values of b.This result highlights the risk of trusting simple MCMC methods when the target is higher dimensional and exhibits curved correlation structure, even though the number of samples drawn may be large.
Interestingly, the naively tuned HMC algorithm seems to achieve results that are only marginally better than RWM.The uncertainty bars on the KS distance between RWM and HMC overlap in most of the measurements.The naively tuned HMC seems to achieve better results in terms of AD distance, suggesting that it explores the tails better than RWM.
Perhaps the most interesting detail of Figure 6 is the performance of NUTS.For high values of b, it equals the performance of sMMALA in terms of KS distance, while it surpasses sMMALA for low values of b.In terms of AD distance, NUTS is superior to sMMALA for all values of b considered, as much as by two orders of magnitude.This result suggests that the coherent exploration of straight ridges that characterizes HMC is also effective on curved ridges, when the number of steps is adjusted properly.On the contrary, sMMALA explores the state space by taking one step at a time, which can lead to diffusive behavior and slower exploration.However, the high computational cost of each NUTS iteration leads to a small final sample, which may be undesirable for certain applications.The algorithm sMMALA may be a more effective compromise between precision and final sample size.

CONCLUSIONS
The 2-d Rosenbrock distribution and its current extensions to higher dimensions are common benchmark models in the field of Markov chain Monte Carlo.However, their normalization constant is generally unknown, and their shape seems to change in unexpected ways as the dimension of the state space increases, as shown in Section 2. These features characterize them as poor benchmark models for assessing the quality of MCMC samples.This is particularly true in higher dimensions, when visual inspection involves a significant amount of resources and is not always possible.A poor benchmark model may cause confusion in the interpretation of the results, as it can appear that an algorithm is working appropriately, while it is struggling to explore entire regions of the support of the distribution.
In this article we present the Hybrid Rosenbrock distribution, a reliable benchmark model with curved correlation structure that addresses all the shortcomings of the Rosenbrock distribution and of its higher dimensional extensions.The Hybrid Rosenbrock has a very challenging structure, due to how the conditional normal kernels are linked to each other.
Its shape can be made arbitrarily easier or harder to sample from by varying its parameters, for which we provide clear guidance.In Section 3 we give its normalization constant, and an algorithm to obtain an i.i.d.Monte Carlo sample without having to rely on MCMC samplers.The ability to sample the model directly proved to be very useful in Section 4.2, where we tested and compared the performance of some popular MCMC algorithms when sampling from the Hybrid Rosenbrock distribution.
Our results from Section 4 show how popular MCMC algorithms are not equipped to sample from a target with curved correlation structure, especially in high dimensions, and perform poorly.Despite that, common performance metrics used by practitioners may still suggest that the algorithms are performing well.A sophisticated algorithm such as sMMALA or NUTS can help detect the problem, but models used in the applied sciences are often too complex to obtain all the quantities sMMALA needs (e.g., the gradient and Hessian matrix of the target).This highlights the importance of testing MCMC algorithms on a reliable benchmark problem such as the Hybrid Rosenbrock, where results can be easily compared against the ground truth.

APPENDIX A. CURRENT LITERATURE
In three dimensions the Full Rosenbrock kernel in Equation ( 2) can be written as Figure A1 shows contour plots of a 2 million sample obtained running a sMMALA algorithm (described in Appendix D) on Equation (A1), with starting point x = [1, … , 1] ⊤ ,  = 10 6 , and an acceptance ratio of about 50%.
Note the difference in curvature between the (x 1 , x 2 ) plot and the (x 1 , x 3 ) plot.Also, the three variables involved have very different variances, as can be seen from the scales of the plots.
Figure A2 shows the shape of the 2-d marginal distributions of (3) when taking n = 4 and  1 =  3 = 1, which result in the kernel The contours were plotted using a sample from a sMMALA algorithm tuned exactly as described in the previous section, with  = 10 6 , x = 1, and acceptance ratio roughly 50%.Equation (3) represents a more straightforward problem than Equation (A1).The round shapes and lack of ridges in the lower left plots of Figure A2 (specifically for the pairs (x 1 , x 3 ), (x 1 , x 4 ), (x 2 , x 3 ), and (x 2 , x 4 )) confirm the lack of complex dependencies that characterize the Full Rosenbrock kernel.

APPENDIX B. INTERPRETATION OF THE PARAMETERS
We can rewrite the 2-d Rosenbrock kernel from Equation (1) in general form as: where a = 1/20, b = 100/20, and  = 1, and more generally a, b ∈ R + ,  ∈ R, x 1 , x 2 ∈ R. Equation (B1) should make it clear that the density is composed of two normal kernels, that is, ) .
This allows us to calculate the normalization constant as follows.Consequently, the integral of (B2) with respect to x 3 , does not yield Equation (1).In order to obtain a more compact expression for the kernel in the variable x 2 , we expand the second and third terms of Equation (B2) to a sum of monomials, and complete the squares by adding and subtracting the necessary terms: which can be substituted back in (B2).The first term in (B3) represents the new normal kernel for x 2 , that is, This kernel is not influenced by the x 2 variable present in the last term of (B2) as it disappears after integrating in the variable x 3 , as we showed in the proof of Proposition 2. The other terms in (B3) are remaining terms from the calculations that we cannot simply include in the normalizing constant, as they depend on x 1 .This changes the kernel of the variable x 1 , whose distribution is now unknown and cannot be sampled from directly.The variable x 2 |x 1 also changes shape drastically.Looking at Equation (B4), the value of the variance of x 2 |x 1 changes from 1/2b to 1/(2b + 2c), producing the effect observed in Figure 1.
These considerations extend to higher dimensions (n > 3), where every time the dimension of the model is increased to n + 1 according to the scheme in (2), the kernels of the variables x 1 , … , x n change as described above.On the other hand, the quantiles from RWM and HMC appear to be very far from the true solution, even despite the use of a considerably larger sample than in Section 4.2.Both algorithms appear to have troubles exploring the tails in both arms of the distribution, especially in the right arm, where the tail reaches farther.Despite the large number of samples, RWM never abandons the local region around the mode of the distribution.This is particularly important when trying to estimate tail probabilities in order to produce constraints on parameters, for example, in cosmology.HMC appears to be marginally better than RWM, but it should be noted that the tuning parameters used in this example make each HMC iteration between 15 and 20 times more expensive than a RWM iteration in terms of run time.
The variables x 1,2 and x 2,2 , shown in the middle plots, have tails that stretch moderately far from the mode.Again, the algorithm sMMALA agrees quite well with the sample from Algorithm 1.However, the samples from RWM and HMC reveal how these algorithms struggle to abandon the mode.
The last two plots on the right side of Figure E1 show the QQ-plots for variables x 2,2 and x 2,3 , which have tails that reach very far from the mode.Once again, the results from sMMALA and Algorithm 1 are in good agreement, while RWM and HMC never explore the tails.
In order to control for Monte Carlo error in the quantiles constructed using Algorithm 1, we repeated the experiment in this section with 20 million samples instead of 10.The results in Figure E1 did not change significantly, leading us to believe that the Monte Carlo error that Algorithm 1 introduces in our analysis is negligible.
These experiments show how despite taking very large MCMC samples and using well established metrics, having a reliable benchmark model is crucial when testing MCMC algorithms on difficult targets with curved correlation structure.
Contour plot of a (n 1 , n 2 ) = (3, 2) Hybrid Rosenbrock density as described in Equation (4), with parameters a = 1/20, b j,i = 100/20 (∀j, i),  = 1, obtained via direct sampling.Every joint distribution is either a straight or curved ridge [Color figure can be viewed at wileyonlinelibrary.com] Integrated autocorrelation times  obtained varying the parameters , a, b for Models 1 to 6, as described in Figure4.The horizontal axis shows the single parameter value of either , a or b that takes a different value from the standard parametrization (i.e.,  = 1, a = 1∕20, and b = 100/20).The dot represents the average value, while the whiskers represent the two-sided 95% credibility region [Color figure can be viewed at wileyonlinelibrary.com] Performance comparison of the algorithms sMMALA, HMC, RWM, and NUTS in terms of the Kolmogorov-Smirnov distance, and of the Anderson-Darling distance, measured on Model 6 for different values of the parameter b.The dot represents the average value of 16 runs, while the whiskers represent the two-sided 95% confidence region around the estimate [Color figure can be viewed at wileyonlinelibrary.com]

F
I G U R E A1 Contour plot of a 3-d Full Rosenbrock density, as described in Equation (A1), obtained from a sMMALA MCMC sample [Color figure can be viewed at wileyonlinelibrary.com]F I G U R E A2 Contour plot of a 4-d Even Rosenbrock density, as described in Equation (3).Most of the joint distributions are uncorrelated [Color figure can be viewed at wileyonlinelibrary.com]

F
I G U R E B1 Contour plot for the variables (x 1 , x 1,2 ) of a Hybrid Rosenbrock density, as the parameters , a, and b take different values.For comparison, the central plot represents a kernel with the original values of the parameters, that is,  = 1, a = 1/20, and b 1,2 = 100/20 [Color figure can be viewed at wileyonlinelibrary.com] QQ-plots for each variable of the Hybrid Rosenbrock distribution (Equation5).The horizontal axis show the quantiles obtained from direct Monte Carlo sampling, while the vertical axis shows the quantiles calculated from 100 million RWM MCMC samples, 100 million HMC, and 1 million sMMALA samples [Color figure can be viewed at wileyonlinelibrary.com] The step sizes used are shown in the below table.