Semi-Static and Sparse Variance-Optimal Hedging

We consider hedging of a contingent claim by a 'semi-static' strategy composed of a dynamic position in one asset and static (buy-and-hold) positions in other assets. We give general representations of the optimal strategy and the hedging error under the criterion of variance-optimality and provide tractable formulas using Fourier-integration in case of the Heston model. We also consider the problem of optimally selecting a sparse semi-static hedging strategy, i.e. a strategy which only uses a small subset of available hedging assets. The developed methods are illustrated in an extended numerical example where we compute a sparse semi-static hedge for a variance swap using European options as static hedging assets.


INTRODUCTION
Semi-static hedging strategies are strategies that are composed of a dynamic (i.e. continuously rebalanced) position in one asset and of static (i.e. buy-and-hold) positions in other assets. Such hedging strategies have appeared in mathematical finance in several different contexts: The hedging of Barrier options (cf. [Car11]), model-free hedging approaches based on martingale optimal transport (cf. [BHLP13]), and -most relevant in our context -the semi-static replication of variance swaps by Neuberger's formula (cf. [Neu94]). Compared with fully dynamic strategies, semi-static strategies have the advantage that no rebalancing costs or liquidity risks are associated with the static part of the strategy and hence even assets with limited liquidity can be used as static hedging assets.
Remarkably, for certain hedging problems, semi-static strategies allow for perfect replication even in incomplete markets -at least theoretically. Again, the most prominent example is the replication formula for a variance swap, given by [Neu94,CM01]: In any continuous martingale model, a variance swap can be replicated by dynamic hedging in the underlying and a static portfolio of European put-and call-options. This very replication formula is at the heart of the computation of the volatility index VIX, whose value is determined precisely from a discretization of Neuberger's replicating option portfolio (cf. the CBOE's technical document [Exc14]).
However, Neuberger's result relies on certain idealizations: Most importantly, the static part of the strategy consists of infinitesimally small positions in an infinite number of puts and calls with strikes ranging from zero to infinity. Any practical implementation of this strategy therefore has to decide on a certain quantization of the theoretical strategy, i.e. how to assign non-infinitesimal weights to the actually tradable put-and call-options. Rather than doing this in an ad-hoc manner, our goal is to determine how to optimally implement a semi-static hedging strategy when a finite number n of hedging assets is available. Our optimality criterion is the well-known variance-optimality criterion introduced by [Sch84,FS86], i.e. we minimize the variance of the residual hedging error under the risk-neutral measure. As we show in Section 2, this criterion is pleasantly compatible with semi-static hedging: The semi-static hedging problem separates into an inner problem, which is equivalent to the variance-optimal hedging problem with a single asset (as considered in [Sch84,FS86]) and an outer problem, which is an n-dimensional quadratic optimization problem, cf. Theorem 2.3.
After having analyzed the general structure of the variance-optimal semi-static hedging problem, we turn to another question in Section 3: How many assets d < n are enough to obtain a 'reasonably small' hedging error? In case of Neuberger's formula for the variance swap -where infinitely many European options reduce the hedging error to zero -how good is using 12, 6 or even just 3 options? Beyond that, which 3 options should one select from, say, 30 that are available in the market? It turns out that this problem of finding a sparse semi-static hedging strategy is closely related to the wellknown problem of variable selection in high-dimensional regression, cf. [HTF13,Sec. 3.3], and more generally to sparse modelling approaches in statistics and machine learning. 1 Indeed, to solve the problem of optimal selection we will draw from methods developed in statistics, such as the LASSO, greedy forward selection and the method of Leaps-and-Bounds.
Finally, with the goal of a numerical implementation of sparse semi-static hedging in mind, we have to find tractable methods to compute variances and covariances of hedging errors, expressed mathematically as residuals in the Galtchouk-Kunita-Watanabe (GKW) martingale decomposition. Here, we build on results from [KP10] which allow to calculate the GKW-decomposition 'semianalytically', i.e. in terms of Fourier integrals, in several models of interest, such as the Heston model. The results of [KP10], which focus on calculation of the strategy and the hedging error in a classic variance-optimal hedging framwork, are however not sufficient for the semi-static hedging problem and we draw from some extensions that are developed in the technical companion paper [DTHKR17].
We conclude in Section 5 with a detailed numerical example, implementing the sparse semi-static hedging problem for hedging of a variance swap with put and call options in the Heston model. In particular, we compare the performance of the different solution methods for the subset selection problem, analyze the dependency of optimal hedging portfolio and hedging error on the number d of static hedging assets, and study the influence of the leverage parameter ρ on the optimal solution.

VARIANCE-OPTIMAL SEMI-STATIC HEDGING
To set up our model for the financial market, we fix a complete probability space (Ω, F, P) equipped with a filtration F satisfying the usual conditions. We fix a time horizon T > 0, assume that F 0 is the trivial σ -algebra and set F = F T . We also assume that a state-price density dQ dP is given, which uniquely specifies a risk-neutral pricing measure Q. All expectations E [.] denote expectations under this risk-neutral measure Q. We denote by S = (S t ) t≥0 the price process of a traded asset, set interest rates to zero to simplify the exposition of results, and assume that S is a continuous squareintegrable martingale under Q. More generally, we denote by H 2 = H 2 (F) the set of real-valued F-adapted square integrable Q-martingales, which becomes a Hilbert space when equipped with the norm X 2 H 2 := E X 2 T . We also set H 2 0 := X ∈ H 2 : X 0 = 0 .
2.1. Variance-optimal hedging. Before discussing semi-static hedging, we quickly review varianceoptimal hedging of a claim H 0 in L 2 (Ω, F, Q), up to the time horizon T > 0, as discussed e.g. in [FS86]. We identify the claim H 0 with the martingale which is an element of H 2 . The set of all admissible dynamic strategies is denoted by where S, S denotes, as usual, the predictable quadratic variation of S. The variance-optimal hedge ϑ with initial capital c of the claim H 0 is the solution of (2.1) The resulting quantity ε is the minimal hedging error. The minimization problem (2.1) can be interpreted as orthogonal projection (in H 2 ) of the claim H 0 onto the closed subspace spanned by deterministic constants (corresponding to the initial capital c) and by L 2 (S) := { T 0 ϑ t dS t , ϑ ∈ L 2 (S)} ⊂ H 2 0 , the set of claims attainable with strategies from L 2 (S). The resulting orthogonal decomposition of H 0 is known as the Galtchouk-Kunita-Watanabe (GKW-)decomposition of H 0 with respect to S, cf. [KW67,AS93]. From the financial mathematics perspective, (2.2) decomposes the claim H 0 into initial capital, hedgable risk, and unhedgable residual risk. The orthogonality of L to L 2 (S) in the Hilbert space sense implies orthogonality of L to S in the martingale sense, i.e. it holds that L, S = 0. Hence, the variance-optimal strategy ϑ can be computed from (2.2) as and ϑ can be expressed as the Radon-Nikodym derivative ϑ = d H, S /d S, S .

2.2.
The variance-optimal semi-static hedging problem. We are now prepared to discuss the varianceoptimal semi-static hedging problem and its solution. In addition to the contingent claim H 0 which is to be hedged, denote by H = (H 1 , . . . , H n ) the vector of supplementary contingent claims, all assumed to be square-integrable random variables in L 2 (Ω, F, Q). Again, we associate to each H i the martingale The static part of the strategy can be represented by an element v of R n , where v i represents the quantity of claim H i bought at time t = 0 and held until time t = T . The dynamic part ϑ of the strategy is again represented by an element of L 2 (S).
Definition 2.1 (Variance-Optimal Semi-Static Hedging Problem). The variance-optimal semi-static hedge (ϑ , v) ∈ L 2 (S) × R n and the optimal initial capital c ∈ R are the solution of the minimization problem Note that v E [H T ] is the cost of setting up the static part of the hedge and its terminal value is v H T . The dynamic part is self-financing and results in the terminal value T 0 ϑ t dS t . Adding the initial capital c and subtracting the target claim H 0 T yields the above expression for the hedging problem. To solve the variance-optimal semi-static hedging problem, we decompose it into an inner and an outer minimization problem and rewrite (2.5) as The inner problem is of the same form as (2.1), while the outer problem turns out to be a finite dimensional quadratic optimization problem. To formulate the solution, we write the GWK-decompositions of the claims (H 0 , . . . , H n ) with respect to S as (2.7) Similarly to (2.3) we get and introduce the vector notation ϑ := (ϑ 1 , . . . , ϑ n ) for the strategies and L := (L 1 , . . . , L n ) for the residuals in the GKW-decomposition. Finally we formulate the following condition: Intuitively, existence of a non-zero x with x L T = 0, means that the number of supplementary assets can be reduced without changing the hedging error ε 2 in (2.5). We are now prepared to state our main result on the solution of the variance-optimal semi-static hedging problem: Under the non-redundancy condition, C is invertible and the unique solution of the semi-static hedging problem is given by The minimal squared hedging error is given by Moreover, the elements of A, B and C can be expressed as Corollary 2.4. If the non-redundancy condition does not hold true, then any solution v ∈ R n of the linear system Cv = B, together with c = E H 0 T and ϑ v = ϑ 0 − v ϑ is a solution of the semi-static hedging problem. The solution set is never empty, and the solution which minimizes the Euclidian norm of v can be obtained by setting v = C † B, where C † denotes the Moore-Penrose pseudo-inverse of C.
Notice that the minimal squared hedging error ε 2 in Theorem 2.3 is the Schur complement of the block C in the 'extended covariance matrix' In particular, if (L 0 T , L T ) has normal distribution, then ε 2 can be expressed as ε 2 = Var L 0 T L T . Proof of Theorem 2.3 and Corollary 2.4. First, we consider the inner minimization problem in (2.6). This problem is equivalent to the variance-optimal hedging problem for the claim The solution ϑ ν is given by the GKW-decomposition using the bilinearity of the predictable quadratic covariation. Uniqueness of the GKW-decomposition yields L v t = L 0 t − v L t and the squared hedging error is given by Thus, the outer optimization problem in (2.6) becomes Since C is positive semi-definite, the first order-condition Cv = B is necessary and sufficient for optimality of v. Under the non-redundancy condition, Var(x L T ) > 0 for any x ∈ R n \ {0}, hence C is positive definite and in particular invertible. The unique solution of the outer problem is therefore given by v = C −1 B, completing the proof of Theorem 2.3.
For the corollary, it remains to show that Cv = B has a solution, even when the non-redundancy condition does not hold. A solution exists, if B is in the range of C, or equivalently, if B is in (ker C) ⊥ . By assumption ker C is non-empty, and we can choose come x ∈ ker C, i.e. with x C = 0. Since C is the covariance matrix of L T is follows that x L T = 0, a.s. This implies that also x B = Cov(x L T , L 0 T ) = 0, for all x ∈ ker C and hence that B ∈ (ker C) ⊥ .
Finally, we compute the hedge contribution of a single supplementary asset H n+1 . By hedge contribution, we mean the reduction in squared hedging error that is achieved by adding the asset H n+1 to a given pool of supplementary assets (H 1 , . . . , H n ). We denote by ε 2 n and ε 2 n+1 the minimal hedging error achieved with supplementary assets (H 1 , . . . , H n ) and (H 1 , . . . , H n+1 ) respectively.
Proposition 2.5 (Relative Hedge Contribution). Suppose that the non-redundancy condition holds true for all supplementary assets H 1 , . . . , H n+1 . Then the relative hedge contribution RHC n+1 of H n+1 is given by . . , n. Remark 2.6. The expression for the relative hedge contribution has an intuitive interpretation under the assumption that the residuals (L 0 T , . . . , L n+1 T ) have multivariate normal distribution. In this case the hedge contribution of H n+1 is equal to the partial correlation Cor(L 0 Thus, roughly speaking, a supplementary asset has a high hedge contribution, if it is strongly correlated with H 0 , even after conditioning on all claims that are attainable with semistatic strategies in S and (H 1 , . . . , H n ).
T ] − K C −1 K) for the Schur complement of C in C new . Using the Schur complement, the inverse of the block matrix C new can be written as cf. [HJ12], and applying Theorem 2.
2.3. The variance swap and long/short constraints. We review the semi-static hedging of a variance swap with an infinite pool of European put-and call-options, as discussed in [Neu94,CM01]. We will apply the methods developed in this paper to this hedging problem in section 5. It also serves as a motivation to add long/short constraints to the semi-static hedging problem.
Recall that a variance swap is a contingent claim on an underlying traded asset S, which at maturity T pays an amount H Usually, k is chosen such that the value of the contract is zero at inception, and the corresponding value k * = E [[log S, log S] T ] is called the swap rate. Recall that our only assumption on the discounted price process S is that it is a square-integrable strictly positive continuous martingale. Applying Itô's formula to log S T we get (2.14) log that is, to hedge the variance swap, it is enough to dynamically trade in the stock S and enter a static position in the 'log-contract' with payoff log S T S 0 , cf. [Neu94]. Furthermore, from [CM01], we have which inserting into the above equation yields This equality can be interpreted as a semi-static replication strategy for the variance swap, which uses a dynamic position in S and a static portfolio of infinitesimally small positions in an infinite number of out-of-the-money puts and out-of-the-money calls. We make several observations: (a) For any practical implementation the 'infinitesimal portfolio' has to be discretized and portfolio weights have to be assigned to each put and call. (b) Since they are calls and puts on the same underlying asset, the static hedging assets are highly correlated. (c) The static positions in puts and calls are long positions only.
To address point (a) different ad-hoc discretizations of the integrals in (2.16) are possible (e.g. left or right Riemann sums, trapezoidal sums, etc.). However, it is not obvious which discretization is optimal in the sense of minimizing the hedging error. The choice of an optimal discretization in the variance-minimizing sense is precisely given by Theorem 2.3. Point (b) suggests that given a moderate number (say 30) of puts and calls as static hedging assets, many of them will be redundant in the sense that their hedge contribution (given the other supplementary assets) is small. This observation motivates the sparse approach of the next section and will be confirmed numerically in the application Section 5.
Point (c) finally motivates the addition of short/long constraints, or more generally, linear constraints of the type where p ∈ R n is fixed, to the outer problem in (2.6). With these constraints, the outer problem is a linearly constrained quadratic optimization problem, which can still be efficiently solved by standard numerical software.

SPARSE SEMI-STATIC HEDGING
We now focus on the problem of optimal selection of static hedging assets, as outlined in the introduction and motivated in the previous section. Note that the subset selection only affects the static part of the strategy and hence only the outer problem in (2.6). Recall the 1 -norm v 1 = ∑ n i=1 |v i | on R n and the (non-convex) 0 -quasinorm v 0 which counts the number of non-zero elements of v, cf. [FR13].
Definition 3.1 (Sparse Variance-Optimal Semi-Static Hedging Problem). The sparse variance-optimal semi-static hedge (ϑ , v) ∈ L 2 (S) × R n with effective portfolio size d < n and its optimal initial capital c ∈ R are the solution of the minimization problem (2.6), with the outer problem replaced by The 1 -relaxation of this problem is given by where λ > 0 is a tuning parameter that replaces d. In both problems, we allow for long/short contains of the form (2.17).
Of course, the minimization problem (3.1) is equivalent to the extensively studied subset selection problem in linear regression and (3.2) to its convex relaxation in Lagrangian form, usually called LASSO. We refer to [HTF13] for a general overview and to [Tib96] for the LASSO. We emphasize that • The 0 -constrained subset selection problem (3.1) is non-convex and hard to solve exactly if the dimension n is high. • The 1 -penalized minimization problem (3.2) is convex and efficient numerical solvers exist even for large n. Its solution is usually a good approximation to the exact subset selection problem, but no guarantee of being close to the solution of (3.1) can be given in general. 2 To illustrate the effect of the 1 -penalty, denote by v * the solution of the unpenalized hedging problem (2.5) and assume for a moment that all GKW-residuals (L 0 T , . . . , L d T ) are uncorrelated. This assumption is highly unrealistic in the hedging context, but leads to a simple form of the solution of the penalized problem, cf. [Tib96]: It is given by v = sign(v)(|v| − λ ) + , i.e. all static positions are shrunk towards zero by λ and truncated when zero is reached. This nicely illustrates the sparsifying effect of the penalty and the role of λ .
While (3.2) is frequently used as a surrogate for (3.1), the following alternatives exist for solving (3.1) directly, or for approximating its solution. Again, we refer to [Tib96] for further details on the described methods: Brute-Force: Solve the quadratic optimization problem for each possible subset of cardinality d. Since there are d n of these subsets, this approach is usually not efficient and becomes completely infeasible for large n. Leaps-and-Bounds: 'Leaps-and-Bounds' is a branch-and-bound algorithm introduced by [FW74] for subset selection in linear regression, which gives an exact solution to (3.1) without testing all possible subsets. Greedy Forward Selection: A simple greedy approximation to (3.1) is to assume that the optimal subsets of different cardinality are nested. In the forward approach the problem (3.1) is first solved for d = 1, which is easy. Then, iteratively, the supplementary claim with the largest relative hedge contribution (see (2.5)) is added to the set of active static positions in each step. The same procedure could be used backwards ('greedy backward selection') i.e. starting with d = n and then removing iteratively the supplementary claim with the smallest hedge contribution. In general, no guarantee of being close the exact solution of (3.1) can be given for these methods. We will compare the practical performance of the different solution methods in Section 5.

STOCHASTIC VOLATILITY MODELS WITH FOURIER REPRESENTATION
The final ingredient that is still missing for a numerical solution of the (sparse) semi-static hedging problem is an efficient method to compute the quantities A, B and C from Theorem 2.3. One possible approach would be to compute transition densities of S and H 0 , . . . , H n by Monte-Carlo-Simulation and to compute the GKW-decomposition by sequential backward regression, cf. [FS88]. Due to the fact that the joint distribution of S and all price processes of supplementary claims is needed, we expect a heavy computational load in order to obtain reasonably large accuracy with this method. An interesting alternative method has been suggested by [KP10] (see also [HKK06,Pau07]) for the classic variance-optimal hedging problem (2.1) of European claims. This alternative is based on the well-known Fourier method for pricing of European claims, cf. [CM01, Rai00, KP10].

4.1.
Fourier representation of strategies and hedging errors. We stay close to the framework of [KP10] and assume that the payoff of some option H is given by H = f (X T ), where X is the log-price process of the underlying stock, i.e. we also assume S = exp(X). The payoff of a call for example can be written as f (x) = (e x − K) + , but it is not necessary to restrict to this specific case. Furthermore, we assume that the (rescaled) two-sided Laplace transform H t (u)f (u)du, along S(R), where we denote the conditional moment generating function (analytically extended to the complex plane) of X T by H t (u) := E e uX T F t . Note that H t (u) is well-defined on S(R) due to the integrability condition imposed on X T . In the important cases of European puts and calls, the two-sided Laplace transformf is given bỹ with R > 1 for calls and R < 0 for puts, cf. [HKK06, Sec. 4].
The key insight, pioneered by [HKK06] for variance-optimal hedging in models with independent increments and by [KP10,Pau07] for affine stochastic volatility models, is that the Fourier representation (4.2) of European claims can be extended to their GKW-decomposition (2.2). More precisely, both the strategy ϑ and the hedging error ε 2 = E L 2 T of the variance-optimal hedging problem (2.1) can be expressed in terms of Fourier-type integrals, similar to (4.2). For our problem of interest, the semi-static hedging problem (2.5), the results of [HKK06, KP10,Pau07] are not sufficient: To obtain the quantities A, B and C of Theorem 2.3, we also need to compute the covariances E[L i T L j T ] between the GKW-residuals of different claims. In the companion paper [DTHKR17] we extend the results of [HKK06,KP10,Pau07] to the semi-static hedging problem. Moreover, we show that the method can be used in any stochastic volatility models where the Fourier-transform of the log-price X is known (e.g. the Heston, the 3/2 or the Stein-Stein model, cf. [Lew00]). Here, we only need a special case of the more general results in [DTHKR17], which is condensed into Theorem 4.1(i) below.
In order to formulate the representation result, we assume that a claim H 0 (e.g. a variance swap), and supplementary assets H 1 , . . . , H n with Fourier representations (4.1) are given, and define for u, u 1 , u 2 ∈ C complex-valued predictable processes of finite variation A, B(u), C(u 1 , u 2 ) by Theorem 4.1. Let a stochastic volatility model with forward price process S = e X and variance process V be given, and let T > 0 be a fixed time horizon. Let H 0 be a variance swap with payoff [X, X] T = T 0 V t dt and let the supplementary assets (H 1 , . . . , H n ) be European puts or calls with Fourier representations given by (4.1). Assume that (S,V ) are continuous square-integrable semimartingales and that there exist functions h(u,t,V t ), γ(t,V t ), continuously differentiable in the last component, such that Then the following holds true: (i) The quantities A, B and C in Theorem 2.3 can be represented as The processes (4.3) can be written as Proof. Part (i) of the theorem is technically demanding and follows from Theorems 4.5, 4.6 and 4.8 in the companion paper [DTHKR17]. In order to show part (ii), let Y be a R n -valued continuous semi-martingale and let α, β be functions in C 2 (R n , C Inserting the definition of the variance-optimal strategy ϑ (u) = d H(u),S S,S into (4.3) and recognizing that for continuous martingales predictable variation ., . and quadratic variation [., .] coincide, we obtain for C and similar expressions for B and A. Using assumption (4.4) and applying (4.8) several times we obtain (4.6).
The joint moment generating function of the Heston model is known explicitly and of the form well-defined for real arguments in the set (4.11) and with analytic extension to the associated 'complex strip' Then the explicit expression of ψ t is given, for (u, w) ∈ D t by (cf. [Alf15, Prop. 4.2.1]), whenever the denominator of g is equal to zero. Moreover, φ t (u, w) is given by The following theorem specializes Theorem 4.1 to the Heston model and gives (up to integration) explicit expressions for the quantities A, B and C from Theorem 2.3. The proof of the theorem is given in Appendix A.
Theorem 4.2. Let (X,V ) be given by the Heston model (4.9) and let the claim H 0 be a variance swap, i.e. with payoff H 0 T = [X, X] T at maturity T . Let the supplementary claims (H 1 , . . . , H n ) be European puts and calls with payoffs f i and two-sided Laplace transformsf i , integrable along strips S(R i ), as in (4.2). If E e 2R i X T < ∞ for all i = 1, . . . , n then the quantities A, B and C, defined in Theorem 2.3 are given by Remark 4.3. Note that the common leading factor σ 2 (1 − ρ 2 ) of A, B and C also becomes the leading factor of the minimal squared hedging error ε 2 , cf. (2.12). This makes perfect sense, since it makes the hedging error roughly proportional to vol-of-vol σ and shows that the hedging error vanishes in the complete-market boundary cases ρ = ±1 of the Heston model. However, ρ and σ also appear inside φ , ψ and therefore their influence on ε 2 is not limited to the leading factor σ 2 (1 − ρ 2 ) alone.

NUMERICAL RESULTS
The following numerical implementation should be considered in terms of a 'stylized financial market' setting, i.e., while we do not calibrate the model to current market data, we use parameters that are realistic in a market setting. More specifically, we use the Heston model parameters from [Gat06]: In Subsection 5.4 we vary the leverage parameter ρ, but keep all other parameters fixed. The current stock price is normalized to S 0 = 100 and we use a time-to-maturity of T = 1 (years) for the variance swap and the call options. The price of a variance swap (i.e. the swap rate k * = E [[log S, log S] T ]) can be readily calculated as The supplementary assets are OTM-puts and OTM-calls with strikes ranging from K min = 50 to K max = 150 in steps of ∆K = 5.
We focus on three aspects of the semi-static hedging problem: • Comparing the different methods that were proposed in Section 3 to solve the sparse semistatic hedging problem; • Analyzing the dependency of hedging error and optimal portfolio composition on effective portfolio size d; • Analyzing the dependency of hedging error and optimal portfolio composition on the leverage parameter ρ.
5.1. Comparison of methods. As a first step, we computed A, B and the matrix C from Theorem 2.3 using the Fourier-representation in Theorem 4.2 by adaptive integration in MATLAB. Next, we implemented the methods, described in Section 3, i.e.
(1) Greedy forward selection (with and without short-sale constraints) (2) Leaps-and-Bounds (with and without short-sale constraints) (3) LASSO in R, [R C16]; using the function lars in the package lars [HE13] with option type="lasso" for the computation of the LASSO solution. While computationally most demanding, the Leaps-and-Bounds solution can serve as a benchmark solution, since it is (up to numerical error) the exact solution of the sparse semi-static hedging problem (3.1). The other methods, in contrast, only return a 'reasonably close' solution to (3.1). In all cases, we report the relative hedging error ε/k * , i.e. the hedging error normalized by the price of the variance swap. A challenge that is faced by all methods is the bad condition of the matrix C. With parameters chosen as above (5.1) the reciprocal condition number of C is 1.11 × 10 −6 . While small, this number is still several orders of magnitude larger than the machine precision of 2.22 × 10 −16 (double precision arithmetic) on the computer that was used. The bad condition of C is not surprising, since put and call options with neighboring strikes are highly correlated. This effect is likely amplified by the fact that C contains the correlations of the GKW-residuals and not the correlations of the option prices themselves. While we have considered pre-conditioning of C, along the lines of [Neu98], we have found that greedy forward selection and Leaps-and-Bounds perform well even without additional conditioning. Also the addition of short-sale constraints seems to have a regularizing effect on the methods. Figure 1 shows the relative hedging error (as percentage of the variance swap price) attained with the optimal portfolio returned by methods 1-3 for different effective portfolio sizes d = 0 . . . 21. Notice that the implementation of LASSO adds and removes supplementary assets from the active set, such that the graph can show multiple solutions for the same effective portfolio size (e.g. for d = 15). Focusing on the comparison of methods, we find that • The Leaps-and-Bounds method returns the solution with the smallest hedging error, consistent with the fact that it solves (3.1) exactly. It is remarkably fast, but further numerical experiments indicate that its runtime is sensitive to the choice of model parameters. • The greedy method is the fastest method and the residual hedging error of its solution is only slightly higher than the hedging error of the Leaps-and-Bounds solution. Moreover, the performance of the greedy method is stable with respect to parameter choice. • The LASSO methods seems to be severely affected by the bad condition of C. This is not surprising, since it has been remarked e.g. in [BVDG11, Sec. 2.6] that the LASSO method has problems with highly correlated data.
Summing up, we can recommend the greedy method as fast, reliable and easy to implement. The Leaps and bounds methods is useful as an efficient way to compute an exact benchmark solution.
We cannot recommend LASSO, as it cannot deal well with the bad condition of C. 3 Interestingly, this observations are contrary to the usual wisdom in variable selection for regression problems, where greedy forward selection often has unstable performance and LASSO yields superior results, cf. [BVDG11, Ch. 2]. We attribute these findings to the highly correlated nature of the matrix C, which is untypical in regression scenarios, but a natural feature of our hedging problem.

5.2.
Analysis of the hedging error. We return to Figure 1 to analyze the hedging error resulting from the sparse variance-optimal semi-static hedging problem (3.1) for different effective portfolio sizes d. We consider the benchmark solution returned by the Leaps-and-Bounds method with shortsale constraints. First, we note that dynamic hedging in the underlying S, without using any static positions in puts and calls (d = 0) results in a relative hedging error of 59.7 %. This error is already reduced to 5.7 % by just adding three supplementary assets (d = 3) and can be further reduced to 3.4 % by selecting six supplementary assets (d = 6). Finally, the error levels off to 1.6 % when the full range (d = 21) of puts and calls between K min = 50 and K max = 150 is used. Further substantial reductions of the hedging error can only be achieved by extending the range of available strikes; adding more options within the current range has only negligible effects.
The sharp decrease of the hedging error between d = 0 and d = 3 affirms the basic premise of sparse semi-static hedging: That selecting only a small number of supplementary assets already leads to a significant reduction of the hedging error. On the other hand, the poor performance of the LASSO solution shows that a sub-optimal choice of supplementary assets does not result in a satisfactory reduction of the hedging error. In other words, it is important that the sparse sub-portfolios are chosen optimally, and not arbitrarily.

5.3.
Composition of the hedging portfolios. We now turn to the composition of the static hedging portfolio, i.e. the vector v ∈ R n with the constraint v 0 ≤ d, that is returned by the solution methods for the sparse semi-static hedging problem (3.1). Recall that the element v i is the nominal size of the position in the supplementary asset H i , with negative sign indicating a short position. In our setting, the elements of v can simply be indexed by the strike K of the corresponding put/call. The optimal portfolios returned by the different solution methods, along with their dependency on effective portfolio size d are shown in Figure 2. We make the following observations: • With the exception of the put K = 55 only long positions are observed; • Positions in OTM puts (K < 100) are larger than in OTM calls (K > 100), in line with Neuberger's replicating portfolio (2.16); • The general pattern (going from effective portfolio size d = 1 to 21) for all methods can be described as follows: Start with an (approximately) ATM option. Proceed by selecting both OTM puts and calls, going outwards as d increases and putting more weight on OTM puts, until the limit K min = 50 is reached. Continue by adding OTM calls and by filling up the gaps from earlier stages.
We suspect that the rare short positions are numerical artifacts, rather than belonging to the true optimal solution of (3.1). Indeed, their effect on the hedging error is minuscule, and we hence recommend to use a-priori short-sale constraints, in the case of hedging a variance swap. Figure 2 gives a good overview of the portfolio composition, but it is difficult to assess the precise size of the individual positions v i . For this reason, we provide in Figure 3 an additional plot of the portfolio weights v indexed by strike K for the optimal portfolios of effective sizes d = 3, 6, 12 in doubly logarithmic coordinates. Note that Neuberger's replicating portfolio (2.16) puts an infinitesimal weight of v(K)dK = 1 K 2 dK on an option with strike K. In doubly logarithmic coordinates, this becomes log v(K) = −2 log K, i.e. in the portfolio weights should form a line of downward slope −2. Figure 3 shows reasonable agreement with this asymptotic result, even for effective portfolio size as small as d = 3. For d = 12 numerical errors from the bad condition of the matrix C seem to accumulate and could explain the unruly shape of the graph.  5.4. The role of correlation. Finally, we turn to the role of the correlation parameter ρ, which is interesting for several reasons: First, the value of ρ does not affect the theoretical price of the variance swap. Second, ρ also does not affect the infinitesimally optimal strategy (2.16). Finally, ρ allows to tune the degree of market incompleteness, since the Heston model becomes a complete market model in the boundary cases ρ = ±1. Despite of the first two points, it turns out that ρ has a significant effect on the attainable hedging error and the composition of the optimal portfolio in the sparse semi-static hedging problem. This influence can already be suspected from the leading factor 1 − ρ 2 appearing in Theorem 4.2, which propagates to the (squared) hedging error itself, see also Remark 4.3. Indeed, as Figure 4 shows, the dependency of the relative heading error on ρ is very close to a 'semi-circle law' f (ρ) = c d 1 − ρ 2 , with different constants c d for different effective portfolio sizes d. dM dQ F t = exp(aX t + bV t )/K, i.e. by exponential tilting of Q. Clearly, the characteristic function of (X t ,V t ) under M is given by E M e iyX t +izV t = E Q e uX t +wV t = exp (φ t (u, w) +V 0 ψ t (t, u, w) + uX 0 ) . Due to the analyticity properties of φ t (u, w) and ψ t (u, w), cf. Lemma A.1(c), all partial derivatives of the left hand side with respect to (y, z) exist. Standard results on differentiability of characteristic functions (cf. [Luk60, Sec. 2.3]) yield that Transforming the left hand side back to Q yields the desired result.
If the expectations in (4.14) are finite, then an application of Theorem 4.1 yields the desired representations of A, B,C. Thus, it remains to show integrability and to determine the explicit expressions in (4.14). First, (4.14a) is easily obtained from the Heston SDE (4.9). To show (4.14b) we make use of Lemma A.1 and A.2. Let u = x + iz be element of some strip S(R j ) and note that the integrability condition on X T implies that (x, 0) ∈ D T . From Lemma A.1(b) we conclude that (x, ψ T −t (x, 0)) ∈ D t . Now Re ψ T −t (u, 0) ≤ ψ T −t (0, x), together with Lemma A.1 (d) shows that also (Re u, Re ψ T −t (0, u)) ∈ D t , which is equivalent to (u, ψ T −t (u, 0)) ∈ S(D t ). Applying Lemma A.2 with w = ψ T −t (u, 0) yields 4.14b. For (4.14c) we can use a similar argument: Write u 1 = x 1 + iz 1 and u 2 = x 2 + iz 2 . The integrability condition on X T implies that (2x 1 , 0) and (2x 2 , 0) are in D T . From Lemma A.1(b) we conclude that (2x 1 , ψ T −t (2x 1 , 0)) ∈ D t , and similarly for x 2 . Convexity of D t , see Lemma A.1(a), shows that (x 1 + x 2 , 1