A regularity structure for rough volatility

A new paradigm recently emerged in financial modelling: rough (stochastic) volatility, first observed by Gatheral et al. in high-frequency data, subsequently derived within market microstructure models, also turned out to capture parsimoniously key stylized facts of the entire implied volatility surface, including extreme skews that were thought to be outside the scope of stochastic volatility. On the mathematical side, Markovianity and, partially, semi-martingality are lost. In this paper we show that Hairer's regularity structures, a major extension of rough path theory, which caused a revolution in the field of stochastic partial differential equations, also provides a new and powerful tool to analyze rough volatility models.


INTRODUCTION
We are interested in stochastic volatility (SV) models given in Itô differential form Dupire's local volatility model, the SABR, Stein-Stein, and Heston models. In all named SV models, one has Markovian dynamics for the variance process Constant correlation ∶= ⟨ , ⟩ ∕ is incorporated by working with a 2D standard Brownian motion ( , ), This paper is concerned with an important class of non-Markovian (fractional) SV models, dubbed rough volatility (RV) models, in which case (equivalently: ≡ 2 ) is modeled via a fractional Brownian motion (fBm) in the regime ∈ (0, 1∕2). The term "rough" stems from the fact that in such models, SV (variance) sample paths are ( − )-Hölder continuous, for any > 0, hence "rougher" than Brownian sample paths. Note the stark contrast to the idea of "trending" fractional volatility, which amounts to taking > 1∕2. The evidence for the rough regime (recent calibration suggests as low as 0.05) is now overwhelming-both under the physical and the pricing measure (see Alòs, León, & Vives, 2007;Bayer, Friz, & Gatheral, 2016;Forde & Zhang, 2017;Fukasawa, 2011Fukasawa, , 2017Gatheral, Jaisson, & Rosenbaum, 2018;Mijatović & Tankov, 2016). It should be noted, however, that these different regimes can be easily mixed, so that rough volatility governs the short time behavior, while trending volatility affects the long time behavior; we refer to Comte and Renault (1998), Comte, Coutin, and Renault (2012), Alòs and Yang (2017), and Bennedsen, Lunde, and Pakkanen (2016) for more information on this.
Much attention in the above references on rough volatility models has, in fact, been given to "simple" rough volatility models of the form (Later on, we will allow for explicit time dependence of in order to cover the rough Bergomi model .) In other words, volatility is an explicit function of an fBm, with fixed Hurst parameter. More specifically, following , we work with the Volterra fBm, a.k.a. Riemann-Liouville fBm, but other choices such as the Mandelbrot van Ness fBm, with suitably modified kernel , are possible. Note that, in contrast to many classical SV models (such as Heston), the SV is explicitly given, and no rough or stochastic differential equation needs to be solved (hence the term "simple"). Rough volatility not only provides remarkable fits to both time series and option price data, but it also has a market microstructure justification: starting with a Hawkes process model. Rosenbaum and coworkers (El Euch, Fukasawa, & Rosenbaum, 2018; find, in a suitable scaling limit, functions (⋅), (⋅), (⋅) such that ∶= (̂) ⋯ "nonsimple rough volatility (RV)" = + ∫ 0 ( , ) ( ) + ∫ 0 ( , ) ( ) .
Such stochastic Volterra dynamics provide a natural generalization of simple rough volatility. We refer to this class of models as "nonsimple": in contrast to the aforementioned simple model, (7) generally does not admit a closed-form solution.

Markovian stochastic volatility models
For comparison with rough volatility, which will be discussed in more detail below, we first mention a selection of tools and methods well known for Markovian SV models.
• PDE methods are ubiquitous in (low-dimensional) pricing problems, as are; • Monte Carlo methods, noting that knowledge of strong and weak rates of convergence of time discretizations of stochastic differential equations (typically with rates 1∕2 and 1, respectively) is the starting point of modern multilevel methods (multilevel Monte Carlo [MLMC]); • Quasi-Monte Carlo (QMC) methods are widely used; related in spirit we have the Kusuoka-Lyons-Victoir cubature approach, popularized in the form of the Ninomiya-Victoir splitting scheme, nowadays available in standard software packages; • Freidlin-Wentzell's theory of small noise large deviations is essentially immediately applicable, as are various "strong" large deviations (a.k.a. exact asymptotic) results, used, for example, to derive the famous SABR formula.
For several reasons, it can be useful to write model dynamics in Stratonovich form: From a PDE perspective, the operators then take a sum-of-squares form that can be exploited in many ways (think Hörmander theory, Malliavin calculus, etc.). From a numerical perspective, we note that the Kusuoka-Lyons-Victoir scheme (Kusuoka, 2001;Lyons & Victoir, 2004) also requires the full dynamics to be rewritten in Stratonovich form. In fact, viewing the Ninomiya-Victoir scheme Ninomiya and Victoir (2008) as level-5 cubature, in the sense of Lyons and Victoir (2004), its level-3 variant is nothing but the familiar Wong-Zakai approximation for diffusions. Another financial example that requires a Stratonovich formulation comes from interest rate model validation (Davis & Mataix-Pastor, 2007), based on the Stroock-Varadhan support theorem. We further note that QMC (based on Sobol numbers, say) works particularly well if the noise has a multiscale decomposition, as obtained by interpreting a (piecewise) linear Wong-Zakai approximation as a Haar wavelet expansion of the driving white noise. Indeed, the naturally induced order of random coefficients, in terms of their importance, leads to a lower "effective dimension" of the integration problem, see, for instance, Acworth, Broadie, and Glasserman (1998).

Complications with rough volatility
Due to loss of Markovianity, PDE methods are not applicable, and nor are (off-the-shelf) Freidlin-Wentzell large deviation estimates (but see Forde & Zhang, 2017). Moreover, the variance process in rough volatility models is not a semimartingale, which complicates the use of several established stochastic analysis tools. In particular, rough volatility admits no Stratonovich formulation. Closely related, one lacks a (Wong-Zakai type of) approximation theory for rough volatility. To see this, focus on the "simple" situation, that is, (1) and (3), so that Inside the (classical) stochastic exponential ( )( ) = exp( − 1 2 [ ] ) we have the martingale term In essence, the trouble is due to the underbraced, innocent looking Itô-integral. Indeed, any naive attempt to put it in Stratonovich form ∫ 0 (̂)• ∶= ∫ 0 (̂) + (It̂− Stratonovich correction), (10) or, in the spirit of Wong-Zakai approximations, must fail for < 1∕2. The Itô-Stratonovich correction is given by the quadratic covariation, defined (whenever possible) as the limit, in probability, of along any sequence ( ) of partitions with mesh-size tending to zero. Disregarding trivial situations, this limit does not exist. For instance, when ( ) = , fractional scaling immediately gives divergence (at rate − 1∕2) of the expression (12). This issue also arises in the context of option pricing; compare Theorem 1.4 and Section 6 below. All these problems remain present, of course, for the more complicated situation of "nonsimple" rough volatility, as discussed in Section 5.

Description of main results
Motivated by singular SPDE theory, such as Hairer's work on Kardar-Parisi-Zhang (KPZ) (Hairer, 2013) and the Hairer-Pardoux "renormalized" Wong-Zakai theorem (Hairer & Pardoux, 2015), we provide a (necessarily renormalized) strong approximation theory for rough volatility. Rough path theory, despite its very purpose to deal with low regularity paths, is not applicable to the problem at hand. (We shall elaborate on this at the beginning of Section 2.) In essence, what one needs is a more flexible type of rough paths theory, which is exactly what Hairer's theory of regularity structures Hairer (2014) supplies. As a consequence of fundamental continuity statements in "model" (think: "rough path") metrics, we will discuss short-time large deviations for rough volatility models. Following, for example P. K. Friz and Hairer (2014, Section 9.3), we also envision support results in "rough" interest rate models in the spirit of Davis and Mataix-Pastor (2007). To state our basic approximation results, writė∶= for a suitable approximation at scale to white noise, with the induced approximation to fBm denoted bŷ. Throughout, the Hurst parameter ∈ (0, 1∕2] is fixed and is a smooth function, such that (8) is a (local) martingale, as required by standard financial theory. More precisely, let denote the Haar wavelet construction of the Brownian motion truncated at level = − log 2 ( ), see Section 3.4 for details. Then,̇is simply defined as the time derivative of the (piecewise linear) process and̂is obtained by integrating (in a pathwise fashion)̇against the Volterra kernel , that is,̂= ∫ 0 ( , )̇.
Similar results hold for more general ("nonsimple") rough volatility models.
Remark 1.2. When = 1∕2, this result is an easy consequence of the well-known Itô-Stratonovich conversion formula. In the case < 1∕2, Theorem 1.1 provides the interesting insight that genuine renormalization (in the sense of subtracting diverging quantities) is required if and only if the correlation parameter is nonzero. This is the case in equity (and many other) markets. Also note that naive approximations without renormalization (i.e., without subtracting the -term) will in general diverge.
Remark 1.3. Mollification of the noise by truncation of the wavelet representation of the driving Brownian motion is natural for numerical purposes. First, it gives a simple sampling technique in terms of independent, identically distributed (IID) standard normals. Second, the construction provides a canonical hierarchy that is beneficial for QMC methods, compare the discussion in Section 1.1.
To formulate implications for option pricing, define the Black-Scholes pricing function where denotes a standard normal random variable. We then have following theorem.
and also the approximate total variance Then the price of a European call option, under the pricing model (1), (3), struck at with time to maturity, is given by Similar results hold for more general ("nonsimple") rough volatility models.
Let us discuss right away how to reduce the statements of Theorems 1.1 and 1.4 to the actual convergence statements that will occupy us in Section 3 of the main text. First, note that The approximations , , and ∶= + converge uniformly to the obvious limits, so that it suffices to understand the convergence of the stochastic integral. Note that̂is heavily correlated with but independent of . The interesting part is then the convergence as stated and proved in Theorem 3.25. For the other part, no correction terms arise due to independence, and it can be seen with standard methods that in the sense of convergence in probability, uniformly on compacts in . The convergence result of Theorem 1.1 then follows readily. As for pricing, in Theorem 1.4, we consider the call payoff An elementary conditioning argument w.r.t. (first used by Romano-Touzi in the context of Markovian SV models), and then shows that the call price is given as expectation of ( Specializing to the case = (̂), in combination with Theorem 3.25, then yields Theorem 1.4. Note that extensions to nonsimple RV are immediate from suitable extensions of Theorem 3.25, as discussed in Section 5.2. From a mathematical perspective, the key issue in proving the above theorems is to establish convergence of the renormalized approximate integrals, as → 0, Here, we find much inspiration from singular SPDE theory, which also requires renormalized approximations for convergence to the correct Itô-object. Specifically, we see that the theory of regularity structures (Hairer, 2014), which essentially emerged from the theory of rough paths and Hairer's KPZ analysis (see P. K. Friz & Hairer, 2014, for a discussion and references), is a very appropriate tool here. In turn, we add an interesting new class of examples to the existing instances of regularity structures (polynomials, rough paths, many singular SPDEs, etc.). This new example avoids all considerations related to spatial structure (notably multilevel Schauder estimates; cf. Hairer, 2014, Chapter 5), yet comes with the genuine need for renormalization. In fact, because we do not restrict ourselves to approximations of the white noise obtained by mollification (i.e., by convolution oḟwith a rescaled mollifier function, say ( , ) = −1 ( −1 ( − ))), our analysis naturally leads us to renormalization functions. In case of mollifier approximations, which is the usual choice of Hairer and coworkers (Chandra & Hairer, 2016;Hairer, 2013Hairer, , 2014)-but rules out wavelet approximations-the renormalization function turns out to be constant (becausėis still stationary). In this case, we would obtain with = ( ) explicitly given by an integral, compare (40). If, on the other hand, we consider a Haar wavelet approximation of white noise, we get 1 It is natural to ask if ( ) can be replaced, after all, by its mean . (Of course, this mean is still diverging for → 0, as < 1∕2.) For > 1∕4, the answer is yes, with an interesting phase transition when = 1∕4, compare Section 3.2.
From a numerical perspective, Theorem 1.4 avoids any sampling of the independent factor . A brute force approach then consists in simulating a scalar Brownian motion , followed by discrete approximation of the stochastic integral =̂= ∫ 0 ( , ) . However, given the singularity of the Volterra kernel , this is not advisable and it is preferable to simulate the two-dimensional Gaussian process ( ,̂∶ 0 ≤ ≤ ), whose covariance function is readily available. A remaining problem is that the speed of convergence of taken in a partition of mesh size ∼ 1∕ , is very slow aŝhas little regularity when is small Gatheral et al., 2018, report ≈ 0.05.). Here, higher order approximations come to help, and we include quantitative estimates, more precisely: strong rates, throughout. Such rates are essential for the design of MLMC algorithms, as was also seen in the context of general Gaussian rough differential equations (Bayer, Friz, Riedel, & Schoenmakers, 2016). (The important analysis of weak rates is left for future work.) Numerical aspects are further discussed in Section 6.
The second set of results concerns large deviations for rough volatility models. Thanks to the contraction principle and fundamental continuity properties of Hairer's reconstruction map, the problem is reduced to understanding a large deviations principle (LDP) for a suitable enhancement of the noise. This approach requires (sufficiently) smooth coefficients, but comes with no growth restrictions, which is indeed quite suitable for financial modeling: we improve the Forde-Zhang short-time large deviations (Forde & Zhang, 2017) for simple rough volatility models such as to include of exponential type, a defining feature in the works of Gatheral and coauthors Gatheral et al., 2018). (Such an extension is also subject of the recent works Jacquier, Pakkanen, & Stone, 2018;Gulisashvili, 2018.) Theorem 1.5. Let = ( ∕ 0 ) be the log-price under simple rough SV model, that is, (1) and (3). Then, ( − 1 2 ∶ ≥ 0) satisfies a short-time large-deviation principle (LDP) with speed 2 and rate function given by Theorem 1.5 is proved below as Corollary 4.3.
Remark 1.6. A potential short coming is the nonexplicit form of the rate function. Geometric or "Hamiltonian" interpretations of the rate function, studied in a Markovian setting by many authors (e.g.,  Deuschel et al., 2014a;Deuschel, Friz, Jacquier, & Violante, 2014b), are then lost. A partial remedy here is to move from large deviations to (higher order) moderate deviations. Analytic tractability is so restored and one still captures the main feature of the volatility smile close to the money. This method was introduced in a Markovian setting in P. Friz, Gerhold, and Pinter (2018), the extension to simple rough volatility models was given in Bayer, Friz, Gulisashvili, Horvath, and Stemper (2019), relying either on Forde and Zhang (2017) or the above Theorem 1.5.
The reader may be interested in further applications of the regularity structure view on rough volatility developed in this paper. The Stratonovich formulation opens up the possibility of constructing cubature methods (in the sense of Kusuoka, 2001;Lyons & Victoir, 2004) for rough SV methods. Indeed, our method can be seen as a level-3 Ninomiya-Victoir Ninomiya and Victoir (2008) scheme. Further, having said much about large deviations, it is not far-fetched to think about a support theorem (another classical application area of rough paths and regularity structures, cf. P. K. Friz & Hairer, 2014, Section 9.3), which, in turn, invites to revisit Davis and Mataix-Pastor (2007) in a setting of "rough" interest rate models. Another concrete application, content of the recent P. K. Friz, Gassiat, and Pigato (2018), concerns precise asymptotics, allowing for considerable refinement of large deviations. (Translated to financial terms, this improvement leads to higher order implied volatility expansions.) Structure of the article. In Section 2, we explain why the classical formulation of rough paths is not suitable for rough volatility models, and then go on to introduce essentials of the theory of regularity structures. We use the KPZ equation as a guiding example, which offers several similarities to rough volatility. The most basic "pricing structure" is introduced in Section 3. In Section 4, we consider a regularity structure for two-dimensional noise, which is necessary to study the asset price process in addition to the volatility process. Section 5 then discusses the case of nontrivial dynamics for rough volatility. Some numerical results are presented in Section 6, followed by several appendices with technical details. From Section 3 on, all our work relies on the framework of Hairer's regularity structures. There seems to be no point in repeating all the necessary definitions and terminology, which the reader can find in Hairer (2013), Hairer (2014), Hairer (2015), and P. K. Friz and Hairer (2014) and a variety of survey papers on the subject. (For the reader in search of one concise reference, we recommend P. K. Friz & Hairer, 2014, Section 13.) Participants of Global Derivatives 2017 (Barcelona) and Gatheral 60th Birthday conference (CIMS, NYU) are thanked for their valuable feedback. We are also very thankful to anonymous referees for their very constructive feedback.

ON ROUGH PATHS AND LESSONS FROM KPZ AND SINGULAR SPDE THEORY
We already pointed out in Section 1.2 that any analysis of correlated ( ≠ 0) rough volatility models will involve the (Itô-) integral where (̂) is a fBm with Hurst parameter < 1∕2, itself given as an integral of a singular kernel against the Brownian motion . Although the scalar Brownian motion can easily be lifted to a Brownian rough path (of Itô-, or Stratonovich type), the integral in (22) cannot be viewed as rough integral. Indeed, a generic integrand (̂), and even̂, is not at all a rough path controlled by (in the sense of Gubinelli). Hence, the stochastic integral cannot be defined by standard rough path theory as found in P. K. Friz and Hairer (2014, Section 4).
A third attempt, in view of the nongeometric nature of Itô integration, to use branched rough paths is also doomed, for it requires-like classical geometric rough path theory-all iterated integrals. In our case, there is already an obstacle at level-2, before the appearance of any branching, in that the full set of second iterated integrals is an ill-defined object ( * stands for well-defined Itô-integrals, ? for integrals of unclear meaning). Note that imposition of a first-order (respectively, Itô) product rule would manifestly clash with Itô-calculus, aŝhas infinite quadratic variation when < 1∕2.
On the other hand, formal expansion of (22) over some interval [ , ] gives so that the troubling terms-the "?" in (23)-do not appear. What is needed then, in the general case, is a higher order "partial" branched rough path theory (for we deal with nongeometric/Itô objects), in which only partial information on the iterated integrals is stored. But even then, one faces failure of canonical (Wong-Zakai type) approximations, that is, Such failures are atypical for rough path theory. Having made all these observations, Hairer's regularity structures (see below for more details) provides everything we desire: a tailor-made algebraic structure (which by construction only stores the required higher order information), together with a machinery that gives continuity properties of all operations of interest and a consistent way to renormalize approximate stochastic integrals, such as the one appearing in (24). The absence of a canonical approximation theory, as seen in (24), is a defining feature of the singular SPDEs recently considered by Hairer, Gubinelli, and now many others. In particular, approximation of the noise (say, -mollification for the sake of argument) typically does not give rise to convergent approximations. To be specific, it is instructive to review the very example that led Hairer to regularity structures: the universal model for fluctuations of interface growth given by the KPZ equation with space-time white noise = ( , ; ). As a matter of fact, and without going into further details, there is a well-defined Itô-solution = ( , ; ) (known as the "Cole-Hopf" solution), but if one considers the equation with -mollified noise, then = diverges with → 0. In this sense, there is a fundamental lack of approximation theory and no Stratonovich solution to KPZ exists. To see the problem, take 0 ≡ 0 for simplicity and write with the space-time convolution denoted by ⋆ and the heat kernel One can proceed with a Picard iteration but there is an immediate problem with ( ′ ⋆ ) 2 , (naively) defined as the -to-zero limit of ( ′ ⋆ ) 2 , which does not exist. However, there exists a diverging sequence of real numbers such that, in probability, The idea of Hairer, following the philosophy of rough paths, was then to accept ⋆ , ( ′ ⋆ ) ⋄2 (and a few more) as enhancement of the noise ("model") upon which solution depends in pathwise robust fashion. This unlocks the seemingly fixed (and here even nonsensical) relation Loosely speaking, one has Theorem 2.1 (Hairer). There exist diverging constants such that a Wong-Zakai 2 result holds of the form̃→ , in probability and uniformly on compacts, wherẽ

Similar results hold for a number of other singular semilinear SPDEs.
In a sense, this can be traced back to the Milstein scheme for SDEs and then rough path theory. Consider = ( ) , with 0 = 0 for simplicity, and consider the second-order (Milstein) approximation One has to unlock the seemingly fixed relation for there is a choice to be made. For instance, the last term can be understood as Itô-integral ∫ ⋅ 0 or as Stratonovich integral ∫ ⋅ 0 • (and, in fact, there are many other choices, see, for example, the discussion in P. K. Friz & Hairer, 2014.) It suffices to take this thought one step further to arrive at rough path theory: accept as new (analytic) object, which leads to the main (rough path) insight SDE theory = analysis based on ( , ).
As before, every symbol is given concrete meaning by "realizing" it as an honest function (or Schwartz distribution). Naturally, one can take More interestingly, can be mapped to any of This realization map is called "model" and captures exactly a typical, but otherwise fixed, realization of the noise (mollified or not) together with some enhancement thereof, renormalized or not. For instance, writing Π , for the realization map for renormalized enhanced noise, one has where ( * ) indicates suitable centering at ( , ). Mind that takes values in a (finite) linear space spanned by (sufficiently many) symbols, is an example of a modeled distribution, the precise definition is a mix of suitable analytic and algebraic conditions (similar to the notation of a controlled rough path).
The analysis requires keeping track of the degree (a.k.a. homogeneity) of each symbol. For instance, | | = 1∕2 − (related to the Hölder regularity of the realized object one has in mind), | 2 | = 2, and so on. All these degrees are collected in an index set. To compare jets at different points (think ( − 1) 3 = ⋯), a group of linear maps on  is used, called a structure group. Last not least, the reconstruction map uniquely maps modeled distributions to functions or Schwartz distributions. (This can be seen as generalization of the sewing lemma, the essence of rough integration, see, for example, P. K. Friz & Hairer, 2014, which turns a collection of sufficiently compatible local expansions into one function or Schwartz distribution.) In the KPZ context, the (Cole-Hopf or Itô) solution is then indeed obtained as reconstruction of the abstract (modeled distribution) solution .

THE ROUGH PRICING REGULARIT Y STRUCTURE
In this section, we develop the approximation theory for integrals of the type ∫ (̃) . In the first part, we present the regularity structure and the associated models we will use. In the second part, we apply the reconstruction theorem from regularity structures to conclude our main result, Theorem 3.25.

Basic pricing setup
We are given a Hurst parameter ∈ (0, 1∕2], associated with a fBm (in the Riemann-Liouville sense) , and fix an arbitrary ∈ (0, ) and an integer At this stage, we can introduce the "level-( + 1)" model space where ⟨…⟩ denotes the vector space generated by the (purely abstract) symbols in {…}. We will sometimes write Remark 3.1. It is useful here and in the sequel to consider as a sanity check the special case = 1∕2 in which case we recover the "level-2" rough path structure as introduced in (P. K. Friz & Hairer, 2014, Chapter 13). More specifically, if we take a Hölder exponent ∶= 1∕2 − < 1∕2, we may choose = 1. Then, condition (27) is precisely the familiar condition > 1∕3.
The interpretation for the symbols in is as follows: Ξ should be understood as an abstract representation of the white noise belonging to the Brownian motion , that is, =̇where the derivative is taken in the distributional sense. Note that because we set = 0 for ≤ 0, we havė( ) = 0 for ∈ ∞ ((−∞, 0)). The symbol (…) has the intuitive meaning of "integration against the Volterra kernel," so that (Ξ) represents the integration of the white noise against the Volterra kernel, that is, should be read as products between the objects above. These interpretations of the symbols generating  will be made rigorous by the model (Π, Γ) in the next subsection. Every symbol in is assigned a homogeneity, which we define by We collect the homogeneities of elements of in a set ∶= {| | | ∈ }, whose minimum is |Ξ| = −1∕2 − . Note that the homogeneities are multiplicative in the sense that | ′ | = | | + | ′ | for , ′ ∈ such that ′ = ⋅ ′ ∈ (with the product defined in the obvious way). At last, our regularity structure comes with a structure group , an (abstract) group of linear operators on the model space  , which should satisfy Γ − = ⨁ ′ ∈ ∶ | ′ |<| | ℝ ′ and Γ = for ∈ and Γ ∈ . We will choose = {Γ ℎ | ℎ ∈ (ℝ, +)} given by

The limiting model ( , )
Let be a Brownian motion on ℝ + ∶= [0, ∞) and extend it to all of ℝ by requiring = 0 for ≤ 0. We will frequently use the notations which denote the Itô integral and the Skorokhod integral (which boils down to an Itô integral whenever the integrand is adapted), respectively. For background on Skorokhod integration, we refer to Janson (1997, Section 7.3), and Nualart's ICM lecture (Nualart, 2006) is also highly recommended. Skorokhod integrals have the distinct advantage of avoiding the need of an adapted integrand but coincide with Itô integrals once the integrand is adapted. For a reader unfamiliar with the (beautiful) theory of Skorokhod integration, it should be sufficient to simply think of an Itô integral with possibly nonadapted integrands for the purposes of this article. Whenever we make usage of specific properties of Skorokhod integration, we will make this explicit.
Proof. This is a direct consequence of the binomial theorem. □ We extend the domain of to all of ℝ 2 by imposing Chen's relation for all , , ∈ ℝ, that is, we set for , ∈ ℝ, ≤ , We are now in the position to define a model (Π, Γ) that gives a rigorous meaning to the interpretation we gave above for Ξ, (Ξ), Ξ(Ξ), …. Recall that in the theory of regularity structures, a model is a collection of linear maps Π ∶  → 1 (ℝ) ′ , Γ ∈ for indices , ∈ ℝ that satisfy where the loosely stated bounds in (34) and (35) hide a multiplicative constant, which can be chosen uniformly for ∈ , any , in a compact set and for ∶= −1 ( −1 (⋅ − )) with ∈ (0, 1] and ∈ 1 with compact support in the ball (0, 1).
We will work with the following "Itô" model (Π, Γ), which makes our interpretations of the elements of more precise. (We will [occasionally] write (Π It̂, Γ It̂) to avoid confusion with a generic model, which we also denote by (Π, Γ).) We extend both maps from to  by imposing linearity. Proof. The only symbol in for which (33) is not straightforward is Ξ(Ξ) , where the statement follows by Chen's relation. The bounds (34) and (35) follow for trivially and for (Ξ) by the − ′ -Hölder regularity of̂, ′ ∈ (0, ). It is straightforward to check the condition (35) by using the rule Γ ′ = Γ ⋅ Γ ′ so that we are only left with the task to bound Π Ξ(Ξ) ( ). Along the lines of the proof of (P. K. Friz & Hairer, 2014, Theorem 3 □ As we will see below in Section 3.2, this model is the toolbox from which we can build pathwise Itô integrals of the type ∫ 0 ( ,̂( )) d ( ). For an approximation theory for such expressions, we are in need of a comparable setup that describes approximations, which will be achieved by introducing a model (Π , Γ ).

The approximating model ( , )
The whole definition of the model (Π, Γ) is based on the objecṫ. It is therefore natural to build an approximating model by replacinġby some modificatioṅthat converges (as a distribution) tȯ as → 0.
The definition oḟwill be based on an object , which should be thought of as an approximation to the Dirac delta distribution. Our goal is to build from wavelets, which can be as irregular as the Haar functions. We find it therefore convenient to allow to take values in the Besov space  1,∞ (ℝ), > 1∕2 + , which includes functions like [0,1] ∈  1 1,∞ (ℝ).
Locally,̇is contained in  |Ξ| ∞,∞ (ℝ) (recall: |Ξ| = −1∕2 − ), so that due to  |Ξ| ∞,∞ (ℝ) ⊆ ( 1,∞ (ℝ)) ′ , we can definė∶ which is a Gaussian process and pathwise measurable and locally bounded. For (maybe stochastic) integrands , we introduce the notations If takes values in some (nonhomogeneous) Wiener chaos induced bẏ, we also introduce where ⋄ denotes the Wick product. Note that these two objects do not coincide in general. A complete repetition of the definition of Wick multiplication would stray away too far from the focus of this article. We refer to Janson (1997, Section 3.1) for more details. In essence, a Wick product combines two random variables and (which lie in the a suitable space, namely, the Wiener chaos) in a symmetric, bilinear manner to a new random variable ⋄ by subtracting from the usual product ⋅ a sum of correlation terms. If and are independent, these correlations vanish, so that the Wick product just coincides with the pointwise product ⋅ . This includes the case of being constant so that, in particular, 1 ⋄ = . Another rather simple example arises if , are both (centered) Gaussian random variables. The Wick product is then simply given by In this article, and are themselves products of Gaussian random variables. In this case, an explicit formula for ⋄ = ⋅ − … is given in equation (3.6) of Janson (1997).
The motivation for using the same symbol "⋄" for Wick products and for Skorokhod integrals (cf. (29)) is that Skorokhod integrals can be seen as "infinitesimal Wick products" in the sense that they are the limit of sums of Wick products, compare Remark 3.7 below. We sketch in Remark 3.7 shortly that one might want to read the Skorokhod integral as where ⋄ should be read as Skorokhod integration on the left and "Wick multiplication" on the righthand side (which is ill-defined as d d only exists as a distribution). The ill-defined identity (37) can be read as a motivation for the (well-defined) definition (36). The close relation between Skorokhod integration and Wick multiplication plays a crucial role in the proof of Theorem 3.14 in the Appendix.
Remark 3.7. For the reader's convenience, we briefly comment on the close relation between the Skorokhod integral and the Wick product. Indeed, when = ∑ [ , ] , with summation over a finite partition of [0, ], and each a (nonadapted) random variable in a finite Wiener-Itô chaos, it follows from (Janson, 1997, Theorem 7.40 where ⋄ denotes Skorokhod integration on the left and Wick multiplication on the right-hand side and where , = ( ) − ( ). Passage to 2 -limits is then standard, so that a Skorokhod integral ∫ ⋄ d can be interpreted as the integrated Wick product " ⋄ d d ," which can be seen as a motivation for our definition (36). See also Nualart (2013) and the references therein.
We now define an approximate fBm by settinĝ which has the expected regularity as it is shown in the following lemma.
Proof. The proof is elementary but a bit bulky and therefore postponed to the Appendix. □ Finally, we can give the definition of the approximating model (Π , Γ ), the "canonical" model built from the approximate (and hence regular) noise .
Proof. The identity Π = Γ Π is straightforward to check. The bounds (34) and (35) on Γ and on Π (Ξ) follow from the regularity of̂as proved in Lemma 3.8. The blowup of Π Ξ(Ξ) ( ), however, is even better than we need, because by the choice of , we have |̇| ≤ on compact sets, for some random constant . □ The definition of this model is justified by the fact that application of the reconstruction operator (as in Lemma 3.23) yields integrals of the form As we pointed out already in Section 1, there is no hope that integrals of this type will converge as → 0 if < 1∕2. This can be cured by working with a renormalized model (Π , Γ ) instead.

The renormalized modelF
rom the perspective of regularity structures, the fundamental reason why integrals like (38) fail to converge to ∫ 0 ( ,̂) d lies in the fact that the corresponding models will not satisfy (Π , Γ ) → (Π, Γ) in a suitable norm. To see what is going on, we will first rewrite Π Ξ(Ξ) . Remark 3.11. At first glance, it might seem surprising (and maybe confusing) that we are in need of a Skorokhod integral to write down the integrals in Lemma 3.10, especially for a reader more familiar with rough path theory. Note that the expression ∫ ∞ 0 ( ) (̂−̂) ⋄ d is ill-defined as an Itô integral: As we allow to have support below , the domain of integration involves with < , in which case (̂( ) −̂( )) is not adapted with respect to the filtration of at time . The concept of a Skorokhod integral is in such cases a natural extension of Itô's notion of integration that boils down to the classical Itô integral once the integrand is adapted. In the theory of rough paths, one usually takes ( ) = [ , ′ ] ( ) for < ′ (see also P. K. Friz & Hairer, 2014, Section 13.3.2), which explains why issues of this kind never arise in the rough path framework.
There is, in fact, a way to write the integral ∫ ∞ 0 ( ) (̂−̂) ⋄ d as an Itô integral: Expand (̂−̂) via the binomial theorem and "pull out" all factors depending on (such a point of view is essentially behind Chen's relation in (32)). However, as this seems like a rather nebulous definition, we consider it more convenient to work with a more appropriate notion of integration in this article.
Proof of Lemma 3.10. We prove this by reexpressing , . For < , we already have so that it remains to see what happens for < . With relation (32), we then have where we use for the sake of concision formal notation, which is easy to translate to a rigorous formulation. Using the fact that for Gaussian random variables 1 , , 2 , we have (a consequence of Janson, 1997, Theorems 3.15 and 7.33), we obtain and [̇( ) ⋅ (̂( ) −̂( ))] = − ( − ) >0 for < < , we can reformulate this expression and obtain (An alternative derivation can be given following Nualart & Pardoux, 1988, Theorem 3.2.) As Π Ξ(Ξ) ( ) = ∫ ℝ ( ) d , , the claim follows. □ Let us also reexpress the approximating model in suitable form.
Lemma 3.13. For all , ∈ ℝ, we have Our hope is now that the new modelΠ converges to Π in a suitable sense. Similar to Hairer (2014, (2.17)), we define the distance between two models (Π, Γ) and (Π,Γ) on a compact time interval where | ⋅ | denotes the absolute value of the coefficient of the symbol ′ with | ′ | = and where the first supremum runs over ∈ 1 with ‖ ‖ 1 ≤ 1. We will also need ‖Π‖ = s u p supp ⊆ (0, 1), We are now ready to give the fundamental result of this subsection. Recall that the (minimal) homogeneity is |Ξ| = −1∕2 − , which corresponds to being Hölder with exponent 1∕2 − .
for any ∈ (0, 1) and ∈ [1, ∞). In particular, the distance between the renormalized model and the Itô model almost decays with rate for = ( , ) large enough.
Remark 3.15. In the special case of the level-2 Brownian rough path (i.e., = 1∕2, = 1 ), the above result is in precise agreement with known results-but note that we are dealing with the simple case of scalar Brownian motion. More specifically, we do not see the usual (strong) rate "almost" 1∕2, but have to subtract the Hölder exponent used in the rough path/model topology (here: 1∕2 − ), which almost leads to the rate . As = 1 entails the condition 1∕2 − > 1∕3, we see that < 1∕6, exactly as given, for example, in P. K. Friz and Hairer (2014, Ex. 10.14). A better rate can be achieved by working with higher level rough paths (here: > 1), and indeed, the special case of = 1∕2, but general , can be seen as a consequence of P. Friz and Riedel (2011): at the price of working with ∼ 1∕(1∕2 − ) levels, one can choose arbitrarily close to 1∕2 and so recover the usual "almost" 1∕2 rate. Of course, the case < 1∕2 is out of reach of rough path considerations.

Approximation and renormalization theory
We now address the central question of how the integral ∫ 0 (̂, ) d has to be modified to make it convergent to ∫ 0 ( , )d . The key idea is to combine the convergence result from Theorem 3.14 with Hairer's reconstruction theorem, which we state below.
We first recall the notion of a modeled distribution, compare Hairer (2014, Definition 3.1). We say that a map ∶ ℝ →  is in the space  (Γ), > 0, for some time horizon > 0 if where, as above, | ⋅ | denotes the absolute value of the coefficient of with | | = . Given two models (Π, Γ) and (Π, Γ) and two , ∶ ℝ  →  , it is also useful to have the notion of a distance The reconstruction theorem now states that for > 0, a map ∈  (Γ) can be uniquely identified with a distribution that behaves locally like Π ⋅ (⋅).
Proof. We prove this via wavelet methods in the Appendix. □ In the following, we introduce ( ) to denote both and . To study objects like ∫ 0 (̂( ) , ) d ( ) with the reconstruction theorem, we first "expand" the integrand (̂( ) , ) in the regularity structure  , obtaining On the level of the regularity structure, these objects can be multiplied with the "noise" Ξ, which gives a modeled distribution on  . We will analyze ( ) by writing it as the composition of a (random) modeled distribution with the smooth function . To this end, we need the following. Remark 3.20. Our notion of admissibility mimics (Hairer, 2014, Definition 5.9), which, however, is not directly applicable here (due to the failure of assumption 5.4 in Hairer, 2014).
Proof. By definition of the space of modeled distribution, we need to understand the action of Γ on all constituting symbols. As { , (Ξ)} span a sector, that is, a space invariant by the action of the structure group, it is clear that Application of the realization map Π , followed by evaluation at , immediately identifies (… .) with where we used admissibility and Π Ξ = Π Ξ in the last step. As a consequence, Γ Ξ( ) ≡ Ξ( ) so that trivially Ξ ∈  for any < ∞. □ For a given (sufficiently smooth) function , and a generic model (Π, Γ) on our regularity structure, define Remark that Ξ( ) is function-like, that is, takes values in the span of symbols with nonnegative degree. From Hairer (2014, Proposition 3.28), we then have (In particular, we see that ( ) ( ) coincides with Π when Π is taken as either the approximate or the renormalized approximate model.) We can also define Ξ Π simply obtained by multiplying Π with Ξ. The properties of Π and Ξ Π are summarized in the following lemma.
Proof. The map Π is simply the composition (in the sense of Hairer, 2014, Section 4.2) of the function with the modeled distributions Ξ and  → . The result then follows from Hairer (2014, Theorem 4.16) (polynomial dependence in ‖Π‖ is not stated there but is clear from the proof). □ Remark 3.22. In the case when ∈ 2 +3 , but with no global bounds, the result still holds as we only consider the values of on the range of the continuous function Ξ (which is bounded by some ≥ 0). The resulting bounds then depend linearly on ‖ ‖ 2 +3 ( ×[0, ]) .
In the case of the Itô model (Π, Γ) and the approximating renormalized models (Π , Γ ), we simply denote Π by and , respectively. We are then allowed to apply Hairer's reconstruction theorem (see Theorem 3.16). Note that we have two reconstruction operators  and  , because we start with two models.  ( ) Ξ ( ) can be written down explicitly.
Lemma 3.23. We have (a.s.) Proof. The proof is in the Appendix. □ If we take = [0, ) , we obtain  Ξ( [0, ) ) = ∫ 0 (̂( ), ) d ( ), so that it is natural to chooseĨ ( ) =  Ξ ( [0, ) ) as an approximation. However, note that the key property of the reconstruction operator  ( ) is that it is locally close to the corresponding model Π ( ) , so that we, in fact, have two natural approximations: Definition 3.24. For as in Lemma 3.21 and ≥ 0, we set We might drop the indices and , onĨ and̃if there is no risk of confusion. The following theorem, which can be seen as the fundamental theorem of our regularity structure approach to rough pricing, shows that these approximations do both converge.
Theorem 3.25. Fix > 0. For smooth, bounded with bounded derivatives, andĨ ,̃, as in Definition 3.24, we have (i) for any ∈ (0, 1) and any < ∞, there exists such that (ii) for every ∈ (0, 1), we can pick = ( , ) large enough, such that for any < ∞, there exists such that Remark 3.26. With regard to (i), althoughĨ ( ) does not depend on the choice of , and nor does its (Itô) limit, the choice of affects the entire regularity structure and so, implicitly also the reconstruction operator  used in the definition ofĨ , as well as the modeled distribution . The latter, in turn, requires ∈ for the construction to make sense. If is chosen arbitrarily close to 1, needs to have derivatives of arbitrary order, hence our smoothness assumption.
Remark 3.27. By an easy localization argument, one shows that for smooth (but without any further bounds), one still has The original rough volatility model due to Gatheral et al. (2018) makes a point that should be of exponential form. Now, the result with -estimates still holds because we only consider the values of on the range of the continuous function Ξ (which is bounded by some ≥ 0). As pointed out in Remark 3.22, the bounds then depend linearly on ‖ ‖ +2 ( ×[0, ]) . Behause (Π, Γ) is Gaussian model, Ξ is a Gaussian process (say,̂or̂), and hence we have Gaussian concentration of Fernique-type for sup ∈[0, ] |Ξ( )|. So, for instance, if and its derivatives have exponential growth, we do have the bounds of the above theorem, for all < ∞. This remark justifies in particular the choice ( ) = exp( ) and = 2 in the numerical discussion of Section 6.
Remark 3.28. In Neuenkirch and Shalaiko (2016), it is shown (in a slightly different setting) that the strong rate for the standard Euler scheme (or, more precisely, left-point rule) is no better than in general even when the fractional process is exactly simulated. In that sense, the scheme suggested in Theorem 3.25 is almost optimal.
We then obtain that the error is of order , ∈ (0, 1), using Theorem 3.14, Lemma 3.21 and (41) for the first term, and also Theorem 3.16 for the second term. Letting ↑ and ↑ ∞, our total rate can be chosen arbitrary close to .
To obtain the second estimate, we can boundĨ ( ) −̃, ( ) with the first inequality in Theorem 3.16. □

Nonconstant versus constant renormalization
If comes from a mollifier (cf. Example 3.6), then the renormalization = (⋅, ⋅) that was applied in Theorem 3.14 and thus in Definition 3.24 is a constant, which is the familiar concept one encounters in the study of singular SPDE (Chandra & Hairer, 2016;Hairer, 2013Hairer, , 2014. If comes from wavelets such as the Haar basis, (⋅, ⋅) is usually not constant but a periodic function with period . Thus, we see that our analysis gives rise to a "nonconstant renormalization." It is natural to ask if one can do with constant renormalization after all. Assume that is periodic with mean From Lemma 3.13, it follows that (and its mean) are bounded by −1∕2 , uniformly in . Putting all this together, it easily follows that |⟨ − , ⟩| ≲ + −1∕2 , uniformly over all bounded in , with convergence to zero when > 1∕2 − . As a consequence, taking ( ) = (̂), for smooth , we can clearly apply this estimate with any < . Hence, by equating the constraints on , we arrive at > 1∕4. The practical consequence regarding part (i) of Theorem 3.25 then is that we can indeed replace nonconstant renormalization by constant renormalization, however at the prize of restricting to > 1∕4 and with an according loss on the convergence rate. Interestingly, our numerical simulation suggests that no loss occurs and constant renormalization works for any > 0. While we have refrained from investigating this (technical) point further, 5 we can understand the mechanism at work by looking at the following toy example: Consider the Itô-integral ∫ 1 0 where is an fBm, but now with Hurst parameter > 1∕2, built, say, as Volterra process over . Using Young integration theory, one can give a pathwise argument that shows that Riemann-Stieltjes approximations converge a.s. (with vanishing rate as → 1∕2). However, we know from stochastic theory (Itô integration) that this convergence holds in 2 (and then in probability) for any > 0. We would thus expect that constant renormalization is still valid when ∈ (0, 1∕4], but now the difference only vanishes in mean-square sense. This conjecture was checked numerically in Section 6.

The case of the Haar basis
The following special case of the above approximations to ∫ 0 (̂, )d is of particular interest for our purposes. We next collect some more concrete formulas that arise in this case. Let = 2 − , ∶= [0,1) , and , = 2 ∕2 (2 ⋅ − ), ∈ ℤ. The corresponding approximation to the Dirac delta is then The mollified Volterra kernel (40) then takes the form A special role is played by the diagonal function as a renormalization, We additionally havê where = ⟨̇, , ⟩ are i.i.d. (0, 1)-distributed variables. As our approximation, we can finally take ℐ ( ) from Definition 3.24 with partition {[ , +1 )} = {[ 2 − , ( + 1)2 − ∧ )}, which gives us As explained at the end of the last section, ( ) in these formulas could be replaced by its local mean, the constant

Basic setup
We want to add an independent Brownian motion, so that we take an additional symbol Ξ. We again fix and define We fix |Ξ| = −1∕2 − and the homogeneity of the other symbols is defined multiplicatively as before.
Arguments similar to the proof of Lemma 3.9 show that this indeed defines a model on  .
Theorem 4.2. The models Π satisfy a LDP in the space of models with rate 2 and rate function given by As an immediate corollary, we have the following corollary. with speed 2 , and rate function given by Remark 4.4. This improves a similar result obtained in Forde and Zhang (2017). In fact, we now cover functions of exponential form, as required in rough volatility modeling (Bayer et al., , 2019Gatheral et al., 2018).
Proof. Note that where ≡ Π as defined in Lemma 3.21. By the contraction principle and the continuity estimate from Theorem 3.16, it holds that 1 satisfies an LDP, with rate function given by where we used ℎ ≡ Π ℎ . It then suffices to note that and optimizing over ℎ 2 for fixed ℎ 1 we obtain (53). □ We note that due to Brownian, respectively, fractional Brownian scaling, small-noise large deviations translate immediately to short-time large deviations, compare Forde and Zhang (2017).
Although the rate function here is not given in a very useful form, it is possible to expand it in small and so compute (explicitly in terms of the model parameters) higher order moderate deviations. In Bayer et al. (2019), this was related to implied volatility skew expansions. Rosenbaum and coworkers, (El Euch et al., 2018;El Euch & Rosenbaum, 2019, 2018 show that stylized facts of modern market microstructure naturally give rise to fractional dynamics and leverage effects. Specifically, they construct a sequence of Hawkes processes, suitably rescaled in time and space, which converges in law to a rough volatility model of rough Heston form

Motivation from market micro-structure
(As earlier, , are independent Brownian motions.) Similar to the case of the classical Heston model, the square root provides both pain (with regard to any methods that rely on sufficient smooth coefficients) and comfort (an affine structure, here infinite-dimensional, which allows for closed-form computations of moment-generating functions). Arguably, there is no real financial reason for the square-root dynamics 7 and ongoing work attempts to modify the above square-root dynamics, such as to obtain (something close to) log-normal volatility. We note that log-normal volatility was a key feature of the rough volatility model discussed in Gatheral et al. (2018). This motivates the study of more general dynamic rough volatility models of the form with sufficiently nice functions , , . (While ( ) = √ is a possible choice in what follows, we assume , ∈ 3 for a local solution theory and then, in fact, impose , ∈ 3 for global existence. One clearly expects nonexplosion under, for example, linear growth, but in order not to stray too far from our main line of investigation, we refrain from a discussion.) Remark that ( ) plays the role of spot volatility. Further note that the choice = 0, ≡ 0, ≡ 1 brings us back to the "simple" case with (rough stochastic) volatility ( ) = (̂) considered in earlier sections.

Regularity structure approach
We insist that (55) is not a classical Itô-SDE (solutions will not be semimartingales), nor a rough differential equation (in the sense of rough paths, driven by a Gaussian rough path as in P. K. Friz & Hairer, 2014, Chapter 10). If rough paths have established themselves as a powerful tool to analyze classical Itô-SDE, we here make the point that Hairer's theory is an equally powerful tool to analyze stochastic Volterra (respectively, mixed Itô-Volterra) equations in the singular regime of interest.
As preliminary step, we have to find the correct model space, spanned by symbols that arise by formal Picard iteration. To this end, rewrite (55) as an equation for modeled distributions, from which one can guess (or rigorously derive along Hairer, 2014, Section 8.1) the need for the symbols We have degrees | | = 0, |(Ξ)| = − . For subsequent symbols, the degree is computed as For a modeled distribution, ( ) takes values in the linear span of sufficiently many symbols, the (minimal) number of which is dictated by the Hurst parameter . Loosely speaking,  ∈  indicates an expansions with -error estimate, in practice easy to see from the degree of the lowest degree symbols that do not figure in the expansion. For example, in case of a "level-2 expansion," we can expect as |(Ξ) 2 | = |(Ξ(Ξ))| = 2 − 2 . It follows from general theory (Hairer, 2014, Theorem 4.16) that if  ∈  0 , then so is (), the composition with a smooth function, and by Hairer (2014, Theorem 4.7), the product with Ξ ∈  ∞ −1∕2− is a modeled distribution in  −1∕2− . For both reconstruction and convolution with singular kernels, one needs modeled distributions with positive degree − 1∕2 − > 0. Given ∈ (0, 1∕2], we can then determine which symbols (up to which degree) are required in the expansion. As earlier, fix an integer (so that ( + 1).( − ) − 1∕2 − > 0) and see that  ∈  ( +1).( − ) 0 will do. When > 1∕4, and by choosing > 0 small enough, we see that = 1 will do. That is, the symbols required to describe  are { , (Ξ)} and if one adds the symbols required to describe the right-hand side, one ends up with the level-2 model space spanned by which is exactly the model space for the "simple" rough pricing regularity structure, (28) in case = 1. When ≤ 1∕4, this precise correspondence is no longer true. To wit, in case ∈ (1∕3, 1∕4], taking = 2 accordingly, solving (56) on the level of modeled distributions will require a ("level-3") model space given by ⟨Ξ, Ξ(Ξ), Ξ(Ξ) 2 , Ξ(Ξ(Ξ)), , (Ξ), (Ξ) 2 , (Ξ(Ξ))⟩, which is strictly larger than the corresponding level-3 simple model space given in (28). In general, one needs to consider an extended model spacê= ⟨̂⟩, so as to have, for any ≥ 0, (with the understanding that only finitely many such symbols are needed, depending on as explained above). As a result, symbols such as Ξ(Ξ((Ξ)) ), ≥ 0, (Ξ((Ξ((Ξ)) ) ′ ), , ′ ≥ 0, … will appear. At this stage, a tree notation (omnipresent in the works of Hairer) would come in handy and we refer to Bruned, Chevyrev, Friz, amd Preiß (2019) (and the references therein) for a recent attempt to reconcile the tree formalism of branched rough path (Gubinelli, 2010;Hairer & Kelly, 2015) and the most recent algebraic formalism of regularity structures. (In a nutshell, the simple case (28) corresponds to trees where one node has branches; in the present nonsimple case, symbols branching can happen everywhere.) Carrying out the following construction in the general case of fixed > 0 is certainly possible; 9 however, the algebraic complexity is essentially the one from branched rough paths, and hence, the general case requires a Hopf algebraic (Connes-Kreimer, Grossman-Larson, etc.) construction of the structure group (a.k.a. positive renormalization). Although this, and negative renormalization, is well understood (Bruned, Hairer, & Zambotti, 2019;Hairer, 2014, also Bruned et al., 2019, for a rough path perspective), all complete exposition would lead us to far astray from the main topic of this paper. Hence, for simplicity only, we shall restrict from here on to the level-2 case > 1∕4 (with = 1 accordingly) but will mention general results whenever useful.

Solving for rough volatility
We rewrite (56) as an equation for modeled distributions in  , (Here , are the operators associated with composition with , ∈ +2 , respectively.) We also impose ∈ (1∕2 + , 1) which is clearly necessary such as to have the product () ⋅ Ξ in a modeled distribution space of positive parameter, so that reconstruction, convolution, and so on, make sense. Let > 1∕4, = 1 and pick ∈ (0, 4 −1 6 ) so that ( + 1).( − ) − 1∕2 − > 0. As explained in the previous section, this exactly allows us to work in the familiar structure of Section 3.1. That is, with = 1, with index set and structure group as given in that section. This structure is equipped with the Itô-model and its (renormalization) approximations. Equation (58) critically involves the convolution operator  acting on  . The general construction (Hairer, 2014, Section 5) is among the most technical in Hairer's work, and, in fact, not directly applicable (our kernel , although -regularizing with = 1∕2 + fails the assumption 5.4 in Hairer, 2014), so we shall be rather explicit.
Remark 5.2. Hairer (2014, Theorem 5.2) suggests the estimate  maps  →  + . The difference to our baby Schauder estimate stems from the fact, unlike assumption 5.3 in Hairer (2014, p. 64), we do not assume that our regularity structure contains the polynomial structure.
Proof. (Sketch) The special case ≡ Ξ ∈  ∞ was already treated in Lemma 3.19. We only show that, in the general case,  necessarily has the stated form but will not check the properties. It is enough to consider with values in ⟨Ξ, ΞΞ⟩ and make the ansatz Applying reconstruction, together with Hairer (2014, Proposition 3.28), we see that ( ) ≡ (…), which, in turn, must equal *  , provided that we postulate validity of (ii). This is the given definition of  . □ We return to our goal of solving noting perhaps that () makes sense for every function-like modeled distribution, say ( ) = (Similar remarks apply to , the composition operator associated with ∈ +2 ). Recall = 1. =  is clearly the (unique) reconstruction of the (unique) solution to the abstract problem. We also checked that is indeed a solution for the Itô-Volterra equation. However, if one desires to know that is the unique strong solution to the stochastic Itô-Volterra equation, it is clear that one has to resort to uniqueness results of the stochastic theory, see, for example, Coutin and Decreusefond (2001).
Proof. The well-posedness and continuous dependence on the parameters essentially follows from results of Hairer (2014), and details are spelled out in Appendix C.
The fact that the reconstruction of the solution solves the Itô equation can be obtained by considering approximations, as is done in Hairer and Pardoux (2015, Theorem 6.2) or P. K. Friz and Hairer (2014, Chapter 5). □ Using the large deviation results obtained in the previous subsection, we can directly obtain an LDP for the log-price For square-integrable ℎ, let ℎ be the unique solution to the integral equation Corollary 5.5. Let ∈ (1∕4, 1∕2] and smooth (without boundedness assumption). Then − 1 2 satisfies an LDP with speed 2 and rate function given by where Remark 5.6. Concerning the case ≤ 1∕4, the following proof extends to any > 0, provided that one builds the correct regularity structure as discussed at the end of Section 5.2. (In particular, the proof of Theorem 4.2 for obtaining Schilder-type large deviations for the appropriate Itô-model extends immediately.) Proof. We ignore the second part ∫ 0 (…) in , which is ( ) = ( 1 2 − ) as is bounded. Let = ∫ 0 ( )( + ). By scaling, we see that − 1 2̂i s equal in law tô1 , where = and , are defined in the same way as , with , replaced by , and replaced by = 1+ 1 2 ℎ. We then note that where Ψ is locally Lipschitz by Theorem 5.3. We can then directly combine the fact that Π satisfies an LDP (Theorem 4.2) with a contraction principle such as (Hairer and Weber, 2015, Lemma 3.3), to obtain that 1 satisfies an LDP with rate function It then suffices to note that ℎ is exactly  for  the solution to (59) corresponding to a model Π (ℎ,ℎ) and with ℎ ≡ 0, and to optimize separately over ℎ as in the proof of Corollary 4.3. □ We also have an approximation result. (Assume , to be smooth with three bounded derivatives.) Corollary 5.7. Let > 1∕4 (but see remark below). Then = lim , uniformly on compacts and in probability, where Remark 5.8. Replacing the renormalization function by its mean is possible, provided > 1∕4. However, unlike the discussion at the end of Section 3.2, this is no more a consequence of quantifying the distributional convergence. In the present context, this is achieved by checking directly modelconvergence, which, fortunately, is not much harder. We leave details to the interested reader.
Remark 5.9. In contrast to the previous statement, the above result is more involved for ∈ (0, 1∕4] and additional renormalization terms appear, the general description of which would benefit from pre-Lie products, as recently introduced .
Proof. Thanks to Theorem 3.14 and Theorem 5.3, it follows from continuity of reconstruction that so that the only thing to do is check that solves (62). Note that (59) implies that one has (omitting upper s at all normal and calligraphic …) and, with (60), But then, becauseΠ is a "smooth" model in the sense of remark 3.15 in Hairer (2014) As convolution commutes with reconstruction, compare Lemma 5.1, it follows that is indeed a solution to (62). □

NUMERICAL RESULTS
We will revisit the case of European option pricing under rough volatility. Building on the theoretical underpinnings of Section 3, we present a concise, but self-contained, description of the central algorithm of this paper-for simplicity restricted to the unit time interval-and complement the theoretical convergence rates obtained in previous sections with numerical counterparts. The code used to run the simulations has been made available on https://www.github.com/RoughStochVol.

Implementation
Without loss of generality, set time to maturity = 1. We are interested in pricing a European call option with spot 0 and strike under rough volatility-the risk-free interest rate is assumed to satisfy = 0. From Theorem 1.4, we have where the computational challenge obviously lies in the efficient simulation of  .
As explored in Subsection 3.4, we take a Wong-Zakai-style approach to simulating ℐ, that is, we approximate the white noise procesṡon the grid associated with the Haar basis as follows.
Let { } =1,…2 −1 ∼ i.i.d.  (0, 1) and choose a Haar grid level ∈ ℕ such that the step size of the grid satisfies = 2 − . Then, for all ∈ [0, 1], we seṫ which induces an approximation of the fBm As outlined above, the central issue is that the object ∫ 1 0 (̂( ), )̇( )d does not converge in an appropriate sense to the object of interest ℐ as → 0. This is overcome by renormalizing the object, two possible approaches of which are explored in Subsection 3.4. For the remainder, we will consider the "simpler" renormalized object given bỹ where the renormalization object ( ) can be one of Inserting the nonconstant version of (68) into (67), we obtaiñ

Observed convergence rates
In this subsection, we will discuss strong convergence of the approximating objectĨ to the actual object of interest ℐ as well as weak convergence of the option price itself as the Haar grid interval size → 0. Specifically, we will be looking at Monte Carlo estimates of our errors, that is, in order to approximate some quantity [ ] for some random variable , we will instead be looking at 1 ∑ =1 where the are i.i.d. samples drawn from the same distribution as . In other words, we need to generate realizations of the bivariate stochastic object (Ĩ , ), a task that can be vectorized as described below, thus avoiding expensive looping through realizations.

Strong convergence
We verify Theorem 3.25(i) numerically, albeit in the 2 (Ω)-sense and-for simplicity-with , and we expect an error "almost" of order . Remark 6.1. We choose ( , ) = exp( ) because this closely resembles the rough Bergomi model (see Bayer et al., 2016, and below). Also, for the simplest nontrivial choice ( , ) = , the discretization error is overshadowed by the Monte Carlo error, even for very coarse grids.
As ( ,̂) is a two-dimensional Gaussian process with known covariance structure, it is possible to use the Cholesky algorithm (cf. Bayer et al., , 2019 to simulate the joint paths on some grid, and then use standard Riemann sums to approximate the integral. The value obtained in this way could serve as a reference value for our scheme. However, for strong convergence, we need both objects to be based on the same stochastic sample. For this reason, we find it easier to construct a reference value by the wavelet-based scheme itself, that is, we simply pick some ′ ≪ and consider as → ′ . As can be seen in Figures 1 and 2, both renormalization approaches stated in (68) are consistent with a theoretical strong rate of almost even for < 1∕4 (cf. discussion at the end of Section 3.2). 4  |Ξ| (ℝ) denotes the space of distributions that are locally in the Besov space  |Ξ| ∞,∞ (ℝ) (cf. Hairer, 2014, Remark 3.8). 5 Some computations lead us to believe that this question can be settled with the aid of mixed (1, )-variation of the covariance function of the Volterra process, compare P. K. Friz, Gess, Gulisashvili, and Riedel (2016), which we expect to hold uniformly over approximation. 6 Upon setting Γ (Ξ) = Ξ, the given relation is precisely implied by multiplicativity of Γ. 7 This is also a frequent remark for the classical Heston model. 8 We are not aware of any literature on mixed Itô-Volterra systems (although expect no difficulties). Here, of course, it suffices to first solves for and then construct as stochastic exponential. 9 We note that, as ↓ 0 the number of symbols tends to infinity. In comparison, among all recently studied singular SPDEs, only the sine-Gordon equation (Hairer & Shen, 2016) exhibits the similar feature of requiring arbitrarily many symbols. 10  is extended linearly to all of  by taking  = 0 for symbols ≠ Ξ. ∫ (0, ) d ∫ (0, )