Realized kernels in practice: trades and quotes

Authors


Abstract

Summary  Realized kernels use high-frequency data to estimate daily volatility of individual stock prices. They can be applied to either trade or quote data. Here we provide the details of how we suggest implementing them in practice. We compare the estimates based on trade and quote data for the same stock and find a remarkable level of agreement.

We identify some features of the high-frequency data, which are challenging for realized kernels. They are when there are local trends in the data, over periods of around 10 minutes, where the prices and quotes are driven up or down. These can be associated with high volumes. One explanation for this is that they are due to non-trivial liquidity effects.

1. INTRODUCTION

The class of realized kernel estimators, introduced by Barndorff-Nielsen et al. (2008a), can be used to estimate the quadratic variation of an underlying efficient price process from high-frequency noisy data. This method, together with alternative techniques such as subsampling and pre-averaging, extends the influential realized variance literature which has recently been shown to significantly improve our understanding of time-varying volatility and our ability to predict future volatility—see Andersen et al. (2001), Barndorff-Nielsen and Shephard (2002) and the reviews of that literature by, for example, Andersen et al. (2008) and Barndorff-Nielsen and Shephard (2007).1 In this paper, we detail the implementation of our recommended realized kernel estimator in practice, focusing on end effects, bandwidth selection and data cleaning across different types of financial databases.

We place emphasis on methods that deliver similar estimates of volatility when applied to either quote data or trade data. This is difficult as they have very different microstructure properties. We show realized kernels perform well on this test. We identify a feature of some data sets, which causes these methods difficulties—gradual jumps. These are rare in financial markets; they are when prices exhibit strong linear trends for periods of quite a few minutes. We discuss this issue at some length.

In order to focus on the core issue, we represent the period over which we wish to measure the variation of asset prices as the single interval [0, T]. We consider the case where Y is a Brownian semimartingale plus jump process inline image given from

image((1.1))

where inline image is a finite activity jump process (meaning it has a finite number of jumps in any bounded interval of time). So, Nt counts the number of jumps that have occurred in the interval [0, t] and Nt < ∞ for any t. We assume that a is a predictable locally bounded drift, σ is a càdlàg volatility process and W is a Brownian motion, all adapted to some filtration inline image. For reviews of the econometrics of processes of the type Y see, for example, Shephard (2005).

Our object of interest is the quadratic variation of Y,

image

where inline image is the integrated variance. We estimate it from the observations

image

where inline image is a noisy observation of inline image

image

We initially think of U as noise and assume inline image. It can be due to, for example, liquidity effects, bid/ask bounce and misrecording. Specific models for U have been suggested in this context by, for example, Zhou (1996), Hansen and Lunde (2006), Li and Mykland (2007), and Diebold and Strasser (2007). We will write inline image to denote the case where inline image are mutually independent and jointly independent of Y.

There has been substantial recent interest in learning about the integrated variance and the quadratic variation in the presence of noise. Leading references include Zhou (1996), Andersen et al. (2000), Bandi and Russell (2008), Hansen and Lunde (2006), Zhang et al. (2005), Zhang (2006), Kalnina and Linton (2008), Jacod et al. (2007), Fan and Wang (2007), and Barndorff-Nielsen et al. (2008a).

Our recommended way of carrying out estimation based on realized kernels is spelt out in Barndorff-Nielsen et al. (2008b). Their non-negative estimator takes on the following form:

image((1.2))

where k(x) is a kernel weight function. We focus on the Parzen kernel, because it satisfies the smoothness conditions, k′(0) = k′(1) = 0, and is guaranteed to produce a non-negative estimate.2 The Parzen kernel function is given by

image

Here xj is the jth high frequency return calculated over the interval τj−1–τj in a way that is detailed in Section 2.2. The method by which these returns are calculated is not trivial, for the accuracy and depth of data cleaning is important, as are the influence of end conditions.

This realized kernel has broadly the same form as a standard heteroskedasticity and autocorrelated (HAC) covariance matrix estimator familiar in econometrics (e.g. Andrews, 1991), but unlike them, the statistics are not normalized by the sample size. This makes their analysis more subtle and the influence of end effects theoretically important.

Barndorff-Nielsen et al. (2008b) show that as inline image if inline image and inline image then

image

The dependence between U and Y is asymptotically irrelevant. They need H to increase with n in order to eliminate the noise in such a way that inline image. With H ∝ nη, we will need η > 1/3 to eliminate the variance and η > 1/2 to eliminate the bias of K(U), when inline image.3 For inline image, we simply need η < 1.Barndorff-Nielsen et al. (2008b) show that H ∝ n3/5 is the best trade-off between asymptotic bias and variance.4

Their preferred choice of bandwidth is

image((1.3))

where c*= ((12)2/0.269)1/5= 3.5134 for the Parzen kernel. The bandwidth H* depends on the unknown quantities ω2 and inline image, where the latter is called the integrated quarticity. In the next section, we define an estimator of ξ, which leads to a bandwidth, inline image that can be implemented in practice.

Although the assumption that inline image is a strong one, it is not needed for consistency. Previously inline image has been shown under quite wide conditions, allowing, for example, the U to be a weakly dependent covariance stationary process. The realized kernel estimator in (1.2) is robust to serial dependence in U and can therefore be applied to the entire database of high-frequency prices. In comparison, Barndorff-Nielsen et al. (2008a) applied the flat-top realized kernel to prices sampled approximately once per minute, in order not to be in obvious violation of inline image—an assumption that the flat-top realized kernel estimator is based upon.

The structure of the paper is as follows. In Section 2, we discuss the selection of the bandwidth H and the important role of end effects for these statistics. This is followed by Section 3, which is on the data we used in our analysis and the data cleaning we employed. We then look at our data analysis in Section 4, suggesting there are some days where our methods are really challenged, while on most days, we have a pretty successful analysis. Overall, we produce the empirically important result that realized kernels applied to quote and trade data produce very similar results. Hence for applied workers, they can use these methods on either type of data source with some comfort. This analysis is followed by a conclusion in Section 5.

2. PRACTICAL IMPLEMENTATION

2.1. Bandwidth selection in practice

Initially Barndorff-Nielsen et al. (2008a) studied flat-top, unbiased realized kernels, but their flat-top estimator is not guaranteed to be non-negative. This work has been extended to the non-negative realized kernels (1.2) by Barndorff-Nielsen et al. (2008b), and it is their results we use here. Their optimal bandwidth depends on the unknown parameters ω2 and inline image through ξ as spelt out in (1.3). We estimate ξ very simply by

image

where inline image is an estimator of ω2 and inline image is a preliminary estimate of inline image The latter is motivated by the fact that it is not essential to use a consistent estimator of ξ, and inline image when σ2u does not vary too much over the interval [0, T], and it is far easier to obtain a precise estimate of IV than of inline image.5

In our implementation we use

image

which is a subsampled realized variance based on 20 minute returns. More precisely, we compute a total of 1200 realized variances by shifting the time of the first observation in 1-second increments. RVsparse is simply the average of these estimators.6 This is a reasonable starting point, because market microstructure effects have negligible effects on the realized variance at this frequency.7 To estimate ω2 we compute the realized variance using every qth trade or quote. By varying the starting point, we obtain q distinct realized variances, RV(1)dense, … , RV(q)dense, say. Next we compute

image

where n(i) is the number of non-zero returns that were used to compute RV(i)dense. Finally, our estimate of ω2 is the average of these q estimates,

image

For the case q = 1, this estimator was first proposed by Bandi and Russell (2008) and Zhang et al. (2005). The reason that we choose q > 1 is robustness. For inline image to be a sensible estimator of E(U2τ) it is important that inline image There is overwhelming evidence against this assumption when q = 1, particularly for quote data. See Hansen and Lunde (2006) and the figures presented later in this paper. So, we choose q such that every q-th observation is, on average, 2 minutes apart. On a typical day in our empirical analysis in Section 4, we have q ≈ 25 for transaction data and q ≈ 70 for mid-quote data. These values for q are deemed sufficient for inline image to be a sensible assumption.

Another issue in using RV(i)dense/(2n(i)) as an estimator of ω2, is an implicit assumption that ω2 is large relative to [Y]/(2n(i)). This problem was first emphasized by Hansen and Lunde (2006), who showed that the variance of the noise is very small after the decimalisation, in particular for actively traded assets where they found inline image The main reason being that the decimalisation has reduced some of the main sources for the noise, U, such as the magnitude of ‘rounding errors’ in the observed prices, and the bid-ask bounces in transaction prices. So our estimator, inline image is likely to be upwards biased, which results in a conservative choice of bandwidth parameter. But there are a couple of advantages in using a conservative value of H. One is that a too small value for H will, in theory, cause more harm than a too large value for H; another is that a larger value of H increases the robustness of the realized kernel to serial dependence in Uτ.

So, in our empirical analysis we use the expression inline image to choose the bandwidth parameter for the realized kernel estimator that is based on the Parzen kernel function.

It should be emphasized that our bandwidth choice is optimal in an asymptotic MSE sense. Alternative selection methods that seek to optimize the finite sample properties of estimators (under the assumption that inline image and Y ⊥⊥ U) have been proposed in Bandi and Russell (2006b). They focus on flat-top realized kernels (and related estimators), but their approach can be adapted to the class of non flat-top realized kernels that are defined by (1.2).

2.2. End effects

In this section, we discuss end effects. From a theoretical angle, we will explain why they show up in this estimation problem, why they are important, and how these effects are eliminated in the computation of the realized kernel. From an empirical perspective, we will then argue that they can largely be ignored in practice.

The realized autocovariances, γh, h = 0, 1, … , H are not divided by the sample size. This means that the realized kernel is influenced by the noise components of the first and last observations in the sample, U0 and UT, respectively. The problem is that inline image as n →∞. The important theoretical implication is that K(X) would be inconsistent if applied to raw price observations. Fortunately, this end-effect problem is easily resolved by replacing the first and last observation by local averages. The implication is that inline image where inline image and inline image both are averages of m, say, observations. If Ut is ergodic with E(Ut) = 0, then it follows that inline image as m →∞. So, the local averaging at the two end-points eliminates the end-effects.

While the contribution from end effects are dampened by the local averaging (jittering), a drawback from increasing m is that fewer observations are available for computing the realized kernel. This follows from the fact that 2m observations are used up for the two local averages. This trade-off defines a mean-squared optimal choice for m. In practice, the optimal choice for m is often m = 1, as shown in Barndorff-Nielsen et al. (2008b). This is the reason that end effects can safely be ignored in practice, despite their important theoretical implications for the asymptotic properties of the realized kernel estimator. To quantify this empirically, we computed the realized kernels for m = 1, … , 4 for Alcoa Inc. and found that it led to almost identical estimates. Across our sample period, the (absolute) difference was on average less than 0.5% on average.

Loosely speaking, end-effects can safely be ignored whenever the quadratic variation, [Y], is thought to dominate the size of U20+ U2T. This is the case for actively traded equities. However, for less liquid assets, this could be a problem, e.g. on days where the squared spread is, say, 5% of the daily variance of returns. In any case, we now discuss how this local averaging is carried out in practice, for the case m = 2, which is the value we use in our empirical analysis.

Write the times at which the log-price process, X, is being recorded as 0 =τ0≤⋯≤τN= T. When the recording is being carried out regularly in time, we have τj−τj−1= T/N, for j = 1, … , N, but in practice, we typically have irregularly spaced observations. Define the discrete time observations X0, X1, … , Xn, where

image

Thus, the end points, X0 and Xn, are local averages of two available prices over a small interval of time. These prices allow us to define the high frequency returns as xj= Xj− Xj−1 for j = 1, 2, … , n that are used in (1.2).

3. PROCEDURE FOR CLEANING THE HIGH-FREQUENCY DATA

Careful data cleaning is one of the most important aspects of volatility estimation from high-frequency. The cleaning of high-frequency data have been given special attention in e.g. Dacorogna et al. (2001, chapter 4), Falkenberry (2001), Hansen and Lunde (2006), and Brownless and Gallo (2006). Specifically, Hansen and Lunde (2006) show that tossing out a large number of observations can in fact improve volatility estimators. This result may seem counter-intuitive at first, but the reasoning is fairly simple. An estimator that makes optimal use of all data will typically put high weight on accurate data and be less influenced by the least accurate observations. The generalized least-squares (GLS) estimator in the classical regression model is a good analogy. On the other hand, the precision of the standard least squares estimator can deteriorate when relatively noisy observations are included in the estimation. So, the inclusion of poor quality observations can cause more harm than good to the least-squares estimator, and this is the relevant comparison to the present situation. The realized kernel and related estimators ‘treat all observations equally’ and a few outliers can severely influence these estimators.

3.1. Step-by-step cleaning procedure

In our empirical analysis, we use trade and quote data from the NYSE Trade and Quote (TAQ) database, with the objective of estimating the quadratic variation for the period between 9:30 am and 4:00 pm. The cleaning of the TAQ high frequency data was carried out in the following steps. P1–P3 was applied to both trade and quote data, T1–T4 are only applicable to trade data, while Q1–Q4 is only applicable to quotation data.

All data

  • P1. Delete entries with a time stamp outside the 9:30 am–4 pm window when the exchange is open.

  • P2. Delete entries with a bid, ask or transaction price equal to zero.

  • P3. Retain entries originating from a single exchange (NYSE in our application). Delete other entries.

Quote data only

  • Q1. When multiple quotes have the same time stamp, we replace all these with a single entry with the median bid and median ask price.

  • Q2. Delete entries for which the spread is negative.

  • Q3. Delete entries for which the spread is more that 50 times the median spread on that day.

  • Q4. Delete entries for which the mid-quote deviated by more than 10 mean absolute deviations from a rolling centred median (excluding the observation under consideration) of 50 observations (25 observations before and 25 after).

Trade data only

  • T1. Delete entries with corrected trades. (Trades with a Correction Indicator, CORR ≠ 0).

  • T2. Delete entries with abnormal Sale Condition. (Trades where COND has a letter code, except for ‘E’ and ‘F’). See the TAQ 3 User's Guide for additional details about sale conditions.

  • T3. If multiple transactions have the same time stamp, use the median price.

  • T4. Delete entries with prices that are above the ‘ask’ plus the bid–ask spread. Similar for entries with prices below the ‘bid’ minus the bid–ask spread.

3.2. Discussion of filter rules

The first step P1 identifies the entries that are relevant for our analysis, which focuses on volatility in the 9:30 am–4 pm interval.

Steps P2 and T1 removes very serious errors in the database, such as misrecording of prices (e.g. zero prices or misplaced decimal point), and time stamps that may be way off. T2 rules out data points that the TAQ database is flagging up as a problem. Table 1 gives a summary of the counts of data deleted or aggregated using these filter rules for the database used in Section 4, which analyses the Alcoa share price.

Table 1.  Summary statistics for the cleaning and aggregation procedures when applied to Alcoa Inc. (AA) data from different exchanges.
 Trade dateQuote data
 P2T1T2T3T4 P2Q1Q2Q3Q4
  1. Notes: The first column gives the number of observations observed between 9:30 am and 4:00 pm (P1). Subsequent columns state the reductions in the number of observations due to each of the cleaning/aggregation rules. A blank entry means that the filter was not applied in the particular case. NYSE(N): New York Stock Exchange, PACIF(P): Pacific Exchange, NASD(D): National Association of Security Dealers, NASDAQ(T): National Association of Security Dealers Automated Quotient, in each case the letter in parenthesis is the TAQ identifier.

January 24, 2007
NYSE72760002299542,121028,2050068
PACIF68470004678115,909077680012
NASD981300146365130,2311520,62508757
NASDAQ0     0     
Other142003  323    
January 26, 2007
NYSE87870003454451,115036,843006
PACIF46060002824421,509012,024000
NASD10,74300267281140,1302628,922019749
NASDAQ0     0     
OtherOther479003  363    
May 4, 2007
NYSE84870003234848,812034,1810035
PACIF47950003117428,676019,250000
NASD140200163720239401491060
NASDAQ10,1310007155049,720039,751006
OtherOther485001  34,92688    
May 8, 2007
NYSE24,34700114,47553109,240090,766008
PACIF24,84000019,0961376,900062,386000
NASD6,64304152384117,003012,90801081
NASDAQ42,16200034,48323138,1400122,610004
Other1,897003  102,8107    

By far, the most important rules here are P3, T3 and Q1. In our empirical work, we will see the impact of suspending P3. It is used to reduce the impact of time-delays in the reporting of trades and quote updates. Some form of T3 and Q1 rule seems inevitable here, and it is these rules which lead to the largest deletion of data.

We use Q4 to get the outliers that are missed by Q3. By basing the window on observation counts, we will have it expanding and contracting in clock time depending on the trading intensity. The choice of 50 observations for the window is ad hoc, but validated through extensive experimentation.

T4 is an attractive rule, as it disciplines the trade data using quotes. However, it has the disadvantage that it cannot be applied when quote data is not available.8 We see from Table 1 that it is rarely activated in practice, while later results we will discuss in Table 2 on realized kernels, demonstrate the RK estimator (unlike the RV statistic) is not very sensitive to the use of T4.

Table 2.  Sensitivity of RV and RK to our filtering rules P2, T3 and T4 for trade data from Alcoa Inc. (AA) on three specific days and averaged across the full sample.
 No. of ObservationsRealized varianceRealized kernel
P2T3.•T4.EP2T3.ET4.EP2T3.AT3.BT3.CT3.DT3.ET4.E
  1. Notes: Analysis based on data from the common exchanges (NYSE, PACIF, NASD and NASDAQ) and all exchanges (denoted ALL). T3A-E vary how multiple data on single seconds are aggregated. Our preferred method is T3.E, which takes the median prices. The first three columns report the observation count at each stage. T3.• signify that T3A-E all result in the same number of observations.

January 24, 2007
NYSE7276497749723.252.202.140.910.810.830.830.830.820.82
PACIF6847216921681.341.261.070.970.830.830.840.830.830.76
NASD9813343434332.651.711.550.950.840.840.830.830.840.84
All24,0787815 7.192.88 1.020.960.950.920.920.92 
January 26, 2007 (excluding 12:13 to 12:21 pm)
NYSE8169509450906.955.615.675.105.305.315.315.315.315.31
PACIF4160166316604.854.844.865.275.145.145.135.145.145.13
NASD9828381538056.205.275.124.795.085.085.085.085.095.09
All22,6307757 11.006.31 4.865.165.175.175.175.16 
May 8, 2007
NYSE24,3479871981814.277.327.726.256.826.736.706.716.726.69
PACIF24,840574457317.945.525.517.087.107.097.097.097.107.08
NASD66434240423923.6912.509.247.576.997.027.027.017.017.04
NASDAQ42,162767976567.575.385.396.516.896.876.846.876.906.89
All99,88913585 62.627.34 6.176.906.886.886.876.88 
Averages over full sample
NYSE9719547654604.913.273.242.462.422.412.412.412.412.41
NASD41092196219412.264.083.812.432.372.372.372.372.372.38
PACIF7602235623512.812.482.472.532.442.442.442.442.442.44
NASDAQ12,846352634478.362.412.502.692.572.572.562.562.572.60
All31,7358344 83.8317.61 2.702.542.532.532.532.54 

It is interesting to compare some of our filtering rules to those advocated by Falkenberry (2001) and Brownless and Gallo (2006). In such a comparison, it is mainly the rules designed to purge outliers/misrecordings that could be controversial.

Among our rules Q4 and T4 are the relevant ones. Q4 is very closely related to the procedure (Brownless and Gallo 2006, pp. 2237) advocate for removing outliers. They remove observation i if the condition, inline image, is true. Here inline image and si(k) denote, respectively, the δ-trimmed sample mean and sample standard deviation of a neighbourhood of k observations around i and γ is a granularity parameter. We use the median in place of the trimmed sample mean, inline image, and the mean absolute deviation from the median in place of si(k). By not using the sample standard deviation, we become less sensitive to runs of outliers.

Falkenberry (2001) also use a threshold approach to determine if a certain observation is an outlier. But instead of using a ‘Search and Purge’ approach he applies a ‘Search and Modify’ methodology. Prices that deviate with a certain amount from a moving filter of all prices are modified to the filter value. For transactions, this has the advantage of maintaining the volume of a trade even if the associated price is bad.

Finally, we note that our approach to discipline the trade data using quotes, T4, has formerly be applied in only Hansen and Lunde (2006), Barndorff-Nielsen et al. (2006), and Barndorff-Nielsen et al. (2008a).

4. DATA ANALYSIS

We analyse high-frequency stock prices for Alcoa Inc. which has the ticker symbol AA. It is the leading producer of aluminium, and its stock is currently part of the Dow Jones Industrial Average (DJIA). We have estimated daily volatility for each of the 123 days in the six-month period from January 3 to June 29, 2007. Much of our discussion will focus on four days that highlight some challenging empirical issues. The data are transaction prices and quotations from NYSE and all data are from the TAQ database extracted from the Wharton Research Data Services (WRDS). We present empirical results for both transaction and mid-quote prices that are observed between 9:30 am and 4:00 pm.

We first present results for a regular day, by which we mean a day where the high frequency returns are such that it is straightforward to compute the realized kernel. Then we present empirical results on the use of realized kernels using the entire sample of 123 separate days, indicating the realized kernels behave very well and better than any available realized variance statistic. Then we turn our attention to days where the high-frequency data have some unusual and puzzling features that potentially could be harmful for the realized kernel.

4.1. Sensitivity to data cleaning methods

In Table 2, we give a summary of the various effects of aggregating and excluding observations in different manners. We have carried out the analysis along two dimensions. First, we have separated data from different exchanges. Specifically, we consider trades on NYSE, PACIF, NASD and NASDAQ in isolation. We also investigate the performance of the estimator when all exchanges are considered simultaneously, which is the same as dropping P3 entirely. This defines the first dimension that is displayed in the rows of Table 2, for three of the four days we give special attention and averaged over the full sample for AA.

Our second dimension is the amount of cleaning, aggregation and filtering that we apply to the data. With reference to the cleaning and filtering step in Section 3.1, the columns of Table 2 have the following information.

P2: This is the data with a time stamp inside the 9:30 am–4 pm window, when most the exchanges are open. We have deleted entries with a bid, ask or transaction price equal to zero. So, this is basically the raw data, with the only purged observations being clearly nonsense ones.

T3.A–E: This is what is left after step T.3. The different letters represent five different ways of aggregating transactions that have the same time stamp:

  • A. First single out unique prices and aggregate volume. Then use the price that has the largest volume.
  • B. First single out unique prices and aggregate volume. Then use the price by volume weighted average price.
  • C. First single out unique prices and aggregate volume. Then use the price by log(volume) weighted average price.
  • D. First single out unique prices and aggregate volume. Then use the price by number of trades weighted average price.
  • E. Use the median price. This is the method that we used in the paper.

T4.E This is what is left after rounding step T.4 on the data left after T3.E.

In Table 2, we present observation counts, realized variances and realized kernels. Two things are particularly conspicuous. On January 24 at PACIF, only one observation was filtered out by T4.E, still both the realized variance and the realized kernels are quite sensitive to whether this observation is excluded—it is the only day and exchange where this is the case. In the left-hand panel of Figure 1, we display the data around this observation, and it is clear that it is out of line with the rest. Also May 8 at NASD, only one observations was filtered out by T4.E, here only the realized variance is quite sensitive to whether this observation is excluded. In the right-hand panel of Figure 1, we display the data around this observation, and again, it is clear that it is out of line with the rest. Hence we conclude that T4 is useful when it can be applied in practice, but it does not usually make very much difference in practice when RK estimators are used.

Figure 1.

Transaction prices for Alcoa Inc. over a period of 5 minutes surrounding one observation deleted by T4.E. The left-hand panel displays January 24 on PACIF, and the right-hand panel show the scenario at May 8 on NASD.

An noteworthy feature of Table 2 is how badly RV does when we aggregate data across exchanges and only apply P2—basically only implementing trivial cleaning. The upward bias we see for RV when based on trade-by-trade data is dramatically magnified. Some of this is even picked up by the RK statistic, which significantly benefits from the application of T3. It is clear from this table that if one wanted to use information across exchanges, then it is better to carry out RK on each exchange separately and then average the answers across the exchanges rather than treat all the data as if they were from a single source.

4.2. A regular day: May 4, 2007

Figure 2 shows the prices that were observed in our database after being cleaned. They are based on the irregularly spaced times series of transaction (left-hand panels) and mid-quote (right-hand panels) prices on May 4, 2007. The two upper plots show the actual tick-by-tick series, comprising 5246 transactions and 14,631 quotations recorded on distinct seconds. Hence for transactions data, we have a new observation on average every 5 seconds, while for mid-quotes it is more often than every couple of seconds. In the middle panel the corresponding price changes are displayed, changes above 5 cents and below minus 5 cents are marked by a large star (red) and are truncated (in the picture) at ±5 cents. May 4 was a quite tranquil day with only a couple of changes outside the range of the plot. The lower panel gives the autocorrelation function of the log-returns. The acf(1) is omitted from the plot, but its value is given in the subtext. For the transaction series, the acf(1) is about −0.24, which is fundamentally different from the one found for the mid-quote series that equals 0.088. This difference is typically for NYSE data as first noted in Hansen and Lunde (2006). It is caused by the more smooth character of most mid-quote series, that induces a negative correlation between the innovations in Y and the innovations in U. The negative correlation results in a smaller, possibly negative, bias for the RV, and this feature of mid-quote data will be evident from Figure 5, which we discuss in the next subsection. The negative bias of the RV is less common when mid-quotes are constructed from multiple exchanges, see, e.g. Bandi and Russell (2006a). A possible explanation for this phenomenon was given in Hansen and Lunde (2006, pp. 212–214), who showed that pooling mid-quotes from multiple exchanges can induce additional noise that overshadows the endogenous noise found in single exchange mid-quotes.

Figure 2.

High-frequency prices and returns for Alcoa Inc. (AA) on May 4, 2007, and the first 100 autocorrelations for tick-by-tick returns. Left-hand panels are for transaction prices and right-hand panels are for mid-quote prices. Returns larger than 5 cents in absolute value are marked by red dots in the middle panels. The largest and smallest (most negative) returns are reported below the middle panels. Lower panels display the autocorrelations for tick-by-tick returns, starting with the second-order autocorrelation. The numerical value of the first-order autocorrelation is given below these plots. A log-scale is used for the x-axis so that the values for lower-order autocorrelations are easier to read.

Figure 5.

Histograms for various characteristics of the 102 days in our sample. Left-hand panels are for transactions prices, right-hand panels are for mid-quote prices. The two upper panels are histograms for the difference between the realized kernel based on 1-tick returns and that based on five-minute returns. The panels in the second row are the corresponding plots for the realized variance. Histograms of the first-order autocorrelation are displayed in the panels in the third row. The fourth row of panels are histograms for the sum of the 2nd to the 10th autocorrelation. The 4 days for which detailed results are provided are identified in each of the histograms.

May 4, 2007 is an exemplary day. The upper panels of Figure 3 present volatility signature plots for irregularly spaced times series of transaction prices (left-hand panels) and mid-quote prices (right-hand panels).9 The dark line is the Parzen kernel with H = c4/5n3/5, and the light line is the simple realized variance.

Figure 3.

Signature plots for the realized kernel and realized variance on May 4, 2007 for Alcoa Inc. Those based on transaction prices are plotted in left-hand panels and those based on mid-quote prices are plotted in right-hand panels. The horizontal line in these plots is the subsampled realized variances based 20-minute returns. The thicker dark line in the upper panels represents the realized kernels using the bandwidth inline image, and the thin line is the usual realized variance. The lower panels plot the point estimates of the realized kernel as a function of the bandwidth, H, where the sampling frequency is the same (tick-by-tick returns) for all realized kernels. The estimate of the optimal bandwidth is highlighted in the lower panels.

The lower panel of Figure 3 present a kernel signature plot where the realized kernel computed on tick-by-tick data is plotted against increasing values of H. In these plots, we have indicated the optimal choices of H. In both plots, the horizontal line is an average of simple realized variances based on 20 minute returns sampled with different offsets. The shaded areas denote the 95% confidence interval based on 20 minute returns using the (Barndorff-Nielsen and Shephard, 2002) feasible realized variance inference method. We characterize May 4, 2007 as an exemplary day because the signature plots are almost horizontal. This shows that the realized kernel is insensitive to the choice of sampling frequency. An erratic signature plot indicates potential data issues, although pure chance is also a possible explanation.

4.3. General features of fesults across many days

Transaction prices and mid-quote prices are both noisy measures of the latent ‘efficient prices’, polluted by market microstructure effects. Thus, a good estimator is one that produces almost the same estimate with transaction data and mid-quote data. This is challenging, as we have seen the noise has very different characteristics in these two series.

Figure 4 presents scatter plots where estimates based on transaction data are plotted against the corresponding estimates based on mid–quote data. The upper two panels are scatter plots for the realized kernel using tick-by-tick data (left-hand side) and the upper right-hand plot is the realized kernel based on 1-minute returns, and both scatter plots are very close to the 45°, suggesting that the realized kernel produce accurate estimates at this sampling frequencies, with little difference between the two graphs. The lower four panels are scatter plots for the realized variance using different sampling frequencies: tick-by-tick returns (middle left-hand panel), 1-minute returns (middle right-hand panel), 5-minute returns (lower left-hand panel) and 20-minute returns (lower right-hand panel). These plots strongly suggest that the realized variance is substantially less precise than the realized kernel. The realized variance based on tick-by-tick returns is strongly influenced by market microstructure noise. But the characteristics of market microstructure noise in transaction prices are very different from those of mid-quote prices. Thus, as already indicated, the trade data causes the realized variances to be upward biased, while for quote data, it is typically downward bias. This explains that the scatter plot for tick-by-tick data (middle left-hand panel) is shifted away from the 45° degree line.

Figure 4.

Scatter plots of estimates based on transaction prices plotted against the estimates based on mid-quote prices for Alcoa Inc. Regression lines and regression statistics are included with the 45° line.

Table 3 reports a measure for the disagreement between the estimates based on transaction prices and mid-quote prices. The statistics computed in the first row are the average Euclidian distance from the pair of estimators to the 45° degree line. To be precise, let VT,t and VQ,t be estimators based on transaction data and quotation data, respectively, on day t, and let inline image be the average of the two. The distance from (VT, VQ) to the 45° degree line is given by

image

and the first row of Table 3 reports the average of this distance computed over the 123 days in our sample.

Table 3.  This Table present statistics that measure the disagreement between the daily estimates based on transaction prices and mid-quote prices.
 Realized kernelSimple realized variance
tick1 mintick1 min5 min20 min
Alcoa Inc (AA)
Distance0.0890.1051.1190.1700.3120.406
Relative Distance1.0001.18212.621.9223.5234.575
American International Group, Inc (AIG)
Distance0.0200.0380.4580.0610.0880.132
Relative Distance1.0001.89222.753.0354.3826.558
American Express (AXP)
Distance0.0790.0600.5780.1330.1660.248
Relative Distance1.0000.7557.2771.6692.0953.117
Boeing Company (BA)
Distance0.0470.0510.5640.1060.1210.242
Relative Distance1.0001.08311.962.2462.5675.132
Bank of America Corporation (BAC)
Distance0.0280.0700.6200.0500.0840.345
Relative Distance1.0002.50922.211.7753.00412.35
Citigroup (C)
Distance0.0330.0520.7220.0800.1390.250
Relative Distance1.0001.60422.122.4674.2707.664

The distance is substantially smaller for the realized kernels than any of the realized variances, while our preferred estimator, the realized kernel based on tick-by-tick returns, has the least disagreement between estimates based on transaction data and those based on quote data. The relative distances are reported in the second row of Table 3, and we note that the disagreement between any of the realized variance estimators is more than twice that of the realized kernel.

Table 4 contains summary statistics for realized kernel and realized variance estimators for the Alcoa Inc. data over our 123 distinct days. The estimators are computed with transaction prices and mid-quote prices using different sampling frequencies. The sample average and standard deviation is given for each of the estimators, and the fourth column has the empirical correlations between each of the estimators and the realized kernel based on tick-by-tick transaction prices. The table confirms the high level of agreement between the realized kernels estimator based on transaction data and mid-quote data. They have the same sample mean, and the sample correlation is nearly one. The time-series standard deviation of the daily mid-quote based realized kernel is marginally lower than that for the transaction based realized kernel. The table also shows the familiar upward bias of the tick-by-tick trade based RV and downward bias of the mid-quote version. Low frequency RV statistics have more variation than the tick-by-tick RK, while the RK statistic behaves quite like the 1-minute mid-quote RV.

Table 4.  Summary statistics for realized kernel and realized variance estimators, applied to transaction prices or mid-quote prices at different sampling frequencies for Alcoa Inc. (AA).
 Mean (HAC)Std.inline imageacf(1)acf(2)acf(5)acf(10)
  1. Notes: The empirical correlations between the realized kernel based on tick-by-tick transaction prices and each of the estimators are given in column 4 and some empirical autocorrelations are given in columns 5–8.

Realized kernels based on transaction prices
1 tick2.401 (0.268)1.7501.0000.500.29−0.080.10
1 minute2.329 (0.290)1.9310.9520.440.23−0.080.10
RV based on transaction prices
1 tick3.210 (0.232)1.6700.9160.440.25−0.120.10
1 minute2.489 (0.225)1.5550.9690.460.28−0.120.10
5 minute2.458 (0.293)2.0010.9530.400.26−0.080.06
20 minute2.315 (0.262)1.7450.8780.300.22−0.040.10
Realized kernels based on mid-quotes
1 tick2.402 (0.258)1.7200.9970.490.29−0.090.09
1 minute2.299 (0.281)1.8770.9440.420.22−0.080.12
RV based on mid-quotes
1 tick1.897 (0.173)1.2090.9100.410.26−0.090.11
1 minute2.398 (0.234)1.5290.9730.500.31−0.090.10
5 minute2.464 (0.317)2.1380.9660.450.23−0.080.08
20 minute2.286 (0.298)2.0610.8840.340.19−0.030.06

Figure 5 contains histograms that illustrate the dispersion (across the 123 days in our sample) of various summary statistics. In a moment we will provide a detailed analysis of three other days, and we have marked the position of these days in each of the histograms. As is the case in most figures in this paper, the left-hand panels correspond to transaction data and right-hand panels to mid-quote data. The first row of panels present the log-difference between the realized kernel computed with tick-by-tick returns and the realized kernel based on five-minute returns. The day we analysed in greater details in the previous subsection, May 4, is fairly close to the median in all of these dimensions. The three other days—May 8, January 24 and January 26—are our examples of ‘challenging days’. January 24 and January 26 are placed in the two tails of the histogram related to the variation in the realized kernel. The three other dimensions we provide histograms for are—(2nd row) the log-difference between the realized variance computed with tick-by-tick returns and that computed with five minute returns; (3rd row) the distribution of the estimated first-order autocorrelation and the 4th row contains histograms for the sum of the next nine autocorrelations (acf(2) to acf(10)).

Note the bias features of the realized variance that is shown in the second row of histograms. For transaction data the tick-by-tick realized variance tends to be larger than the realized variance sampled at lower frequencies, whereas the opposite is true for mid-quote data.

Next we turn to three potentially harder days that have features that are challenging for the realized kernel. These days were selected to reflect important empirical issues we have encountered when computing realized kernels across a variety of datasets.

4.4. A heteroskedastic day: May 8, 2007

We now look in detail at a rather different day, May 8, 2007. Figure 6 suggests that this day has a lot of heteroskedasticity, with a spike in volatility at the end of the day. This day is also characterized by several large changes in the price. The transaction price changed by as much as 25 cents from one trade to the next and the mid-quote price by as much as 19 cents over a single quote update. Informally, this is suggestive of jumps in the process. Although jumps can alter the optimal choice of H, they do not cause inconsistency in the realized kernel estimator.

Figure 6.

High-frequency prices and returns for Alcoa Inc. on May 8, 2007, and the first 100 autocorrelations for tick-by-tick returns. For details see Figure 2.

The middle panels of Figure 6 visualise the different behaviour of the price throughout the day. The jump in volatility around 2:30 pm is quite clear from these plots.

In spite of the jump in volatility, and possibly jumps in the price process, Figure 7 offers little to be concerned about, in terms of the realized kernel estimator. Again the volatility signature plot is reasonably stable for both transaction prices and mid-quote prices, and so, one has quite some confidence in the estimate.

Figure 7.

Signature plots for the realized kernel and realized variance for Alcoa Inc. on May 8, 2007. For details see Figure 3.

4.5. A ‘gradual jump’: January 26, 2007

The high-frequency prices for January 26 is plotted in Figure 8. On this day, the price increases by nearly 1.5% between 12:13 and 12:20. The interesting aspect of this price change is the gradual and almost linear manner by which the price increases in a large number of smaller increments. Such a pattern is highly unlikely to be produced by a semi-martingale adapted to the natural filtration. The gradual jump produces rather disturbing volatility signature plots in Figure 9, which shows that the realized kernel is highly sensitive to the bandwidth parameter. This is certainly a challenging day.

Figure 8.

High-frequency prices and returns for Alcoa Inc. on January 26, 2007, and the first 100 autocorrelations for tick-by-tick returns. For details see Figure 2.

Figure 9.

Signature plots for the realized kernel and realized variance for Alcoa Inc. on January 26, 2007. For details see Figure 3.

We zoom in on the gradual jump in Figure 10. The upper left-hand panel has 96 upticks and 43 downticks. The lower plot shows that the volume of the transactions in the period that the price changes are not negligible; in fact, the largest volume trades on January 26 are in this period.

Figure 10.

The ‘gradual’ jump on January 26, 2007. Prices and returns in the period from 2:12 pm to 12:22 pm are shown in the two upper panels. The lower panel shows the prices and volume (vertical bars) between 11:45 am and 1:00 pm.

One possible explanation of this is that there is one or a number of large funds wishing to increase their holding of Alcoa (perhaps based on private information), and as they buy the shares, they consume the immediately available liquidity—they could not buy more at that price, the instantaneous liquidity may not exist, it can only be met by waiting for it to refill. If the liquidity had existed, then the price may have shot up in a single move.

An explanation of such a scenario can be based on market microstructure theory (see e.g. the surveys by O'Hara, 1995 or Hasbrouck, 2007). Dating back to Kyle (1985) and Admati and Pfleiderer (1988a,b, 1989), the idea is to model the trading environment as comprising three kinds of traders: risk neutral insiders, random noise trades and risk neutral market makers. The noise trades are also known as liquidity traders because they trade for reasons that are not directly related to the expected value of the asset. As such they provide liquidity, and it is their presence that explain what we encounter in Figure 10. An implication of the theory is that without these noise traders, there would be no one willing to sell the asset on the way up to the new price level at 12:25. So, without the noise traders, we would have seen a genuine jump in the price. Naturally, this line of thinking is speculative, and abstract from the fact that some market makers, including those at the NYSE, are obliged to provide some liquidity. This ‘compulsory’ liquidity will also tend to erase genuine jumps in the observed prices.

Mathematically, we can think of a gradual jump in the following way. The efficient price jumps at time τj by inline image but inline image, which means that

image

Hence the noise process is now far from zero. As trade or quote time evolves the noise trends back to zero, revealing the impact of the jump on X, but this takes a considerable amount of new observations if the jump is quite big. This framework suggests a simple model

image

where inline image is covariance stationary and inline image is one for gradual jumps. Obviously, this could induce very significant correlation between the noise and the price process. Of course not all jumps will have this characteristic. When public announcements are made, where the timing of the announcement is known a priori, then jumps tend to be absorbed immediately in the price process. In those cases inline image. These tend to be the economically most important jumps, as they are difficult to diversify.

This line of thinking encouraged us to remove this gradual jump to replace it by a single jump. This is shown in Figure 11, while the corresponding results for the realized kernels are given in Figure 12 which should be compared with Figure 9. This seems to deliver very satisfactory results. The autocorrelations are now very different after having removed observations between 12:13 and 12:21. Compare with Figure 8. Hence ‘gradual jumps’ seem important in practice and challenging for this method. We do not currently have a method for automatically detecting gradual jumps and removing them from the database.

Figure 11.

High-frequency prices and returns for Alcoa Inc. on January 26, 2007, and the first 100 autocorrelations for tick-by-tick returns, after prices between 12:13 and 12:21 are removed from the sample. For details see Figure 2.

Figure 12.

Signature plots for the realized kernel and realized variance for Alcoa Inc. on January 26, 2007, after deleting the prices between 12:13 pm and 12:21 pm. For details see Figure 3.

4.6. A puzzling day: January 24, 2007

The feature we want to emphasize with this day is related to the spiky price changes. The upper panel of Figure 13 shows this jittery variation in the price, in particular towards the end of the day, where the price moves a lot within a narrow band. We believe this variation is true volatility rather than noise because the bid ask spread continues to be narrow in this period, about 2 cents most of the time.

Figure 13.

High-frequency prices and returns for Alcoa Inc. on January 24, 2007, and the first 100 autocorrelations for tick-by-tick returns. For details see Figure 2.

January 24, 2007 is a day where the realized kernel is sensitive to the sampling frequency and choice of bandwidth parameters, H, as is evident from Figure 14. This may partly be attributed to pure chance, but we do not think that chance is the whole story here. Chance plays a role because the standard error of the realized kernel estimator depends on both the sampling frequency and bandwidth parameter. Rather the problem is that too large a H, or too low sampling frequency will overlook some of the volatility on this day—a problem that will be even more pronounce for the low-frequent realized variance. We will return to this issue in Figure 15.

Figure 14.

Signature plots for the realized kernel and realized variance for Alcoa Inc. on January 24, 2007. For details see Figure 3.

Figure 15.

Transaction prices for Alcoa Inc. on January 24, 2007 at different sampling frequencies. The lower panel presents the tick-by-tick return on transaction data (dots), and the spread as it varied throughout the day (vertical lines).

Figure 14 also reveals a rather unusual volatility signature plot for the realized variance based on mid-quote prices. Usually the RV based on tick-by-tick returns is smaller than that based on moderate sampling frequencies, such as 20-minutes, but this is not the case here.

Figure 15 shows the prices that will be extracted at different sampling frequencies. The interesting aspect of these plots is that the realized variance, sampled at moderate and low frequencies, largely overlooks the intense volatility seen towards the end of the day.

Returns based on 20 minutes, say, will tend to be large in absolute value, during periods where the volatility is high. However, there is a chance that the price will stay within a relatively narrow band over a 20 minute period, despite the volatility being high during this period. This appears to be the case toward the end of the trading day on January 24, 2007. The reason that we believe the rapid changes in the price is volatility rather than noise, is because the bid–ask spread is narrow in this period; so, both bid and ask prices jointly move rapidly up and down during this period. Naturally, when prices are measured over 20 minutes intervals returns are small, yet volatility is high, the realized variance (based on 5-minute returns) will underestimate the volatility, for the simple reason that the intraday returns do not reflect the actual volatility. This seems to be the case on this day as illustrated in the two lower panels in Figure 15. The two sparsely sampled RV cannot capture this variation in full, because the intense volatility cannot fully be unearthed by 20-minute intraday returns.

Because the realized kernel can be applied to tick-by-tick returns, it does not suffer from this problem to the same extent. Utilizing tick-by-tick data gives the realized kernel a microscopic ability to detect and measure volatility that would otherwise be hidden at lower frequencies (due to chance). The ‘strength’ of this ‘microscope’ is controlled by the bandwidth parameter, and the realized kernel gradually looses its ability to detect volatility at the local level as H is increased. However, H must be chosen sufficiently large to alleviate the problems caused by noise.

On January 24, 2007, we believe that K(X) ≃ 0.90 is a better estimate of volatility than the subsampled realized variance based on 20 minute returns, whose point estimate is nearly half that of our preferred estimator.

5. CONCLUSIONS

In this paper, we have tried to be precise about how to implement our preferred realized kernel on a wide range of data. Based on a non-negative form of the realized kernel, which uses a Parzen weight function, we implement it using an averaging of the data at the end conditions. The realized kernel is sensitive to its bandwidth choice. We detail how to choose this in practice.

A key feature of estimating volatility in the presence of noise is data cleaning. There is very little discussion of this in the literature, and so, we provide quite a sustained discussion of the interaction between cleaning and the properties of realized kernels. This is important in practice, for in some application areas, it is hard to extensively clean the data (e.g. quote data may not be available), while in other areas (such as when one has available trades and quotes from the TAQ database) extensive and rather accurate cleaning is possible.

We provide an analysis of the properties of the realized kernel applied simultaneously to trade and quote data. We would expect the estimation of [Y] to deliver similar answers and they do, indicating the strength of these methods.

Finally, we identify an unsolved problem for realized kernels when they applied over relatively short periods. We call these ‘challenging days’. They are characterized by lengthy strong trends being present in the data, which are not compatible with standard models of market microstructure noise.

Footnotes

  • 1

    Leading references on this include Zhang et al. (2005)Zhang (2006) and Jacod et al. (2007).

  • 2

    The more famous Bartlett kernel has k(x) = 1 − |x|, for |x| ≤ 1. This kernel is used in the Newey and West (1987) estimator. The Bartlett kernel will not produce a consistent estimator in the present context. The reason is that we need both k(0) − k(1/H) = o(1) and H/n = o(1), which is not possible with the Bartlett kernel.

  • 3

    This assumes a smooth kernel, such as the Parzen kernel. If we use a ‘kinked’ kernel, such as the Bartlett kernel, then we need η > 1/2 to eliminate the variance and the impractical requirement that inline image in order to eliminate the bias. Flat-top realized kernels are unbiased and converge at a faster rate, but are not guaranteed to be non-negative. The latter point is crucial in the multivariate case. In the univariate case, having a non-negative estimator is attractive but the flat-top kernel is only rarely negative with modern data. However, if [Y] is very small and the ω2 very large, which we saw on slow days on the NYSE when the tick size was $1/8, then it can happen quite often when the flat-top realized kernel is used. Of course our non-negative realized kernels do not have this problem. We are grateful to Kevin Sheppard for pointing out these ‘negative, days.

  • 4

    This means that inline image at rate n1/5, which is not the optimal rate obtained by Barndorff-Nielsen et al. (2008a) and Zhang (2006), but has the virtue of K(X) being non-negative with probability one, which is generally not the case for the other estimators available in the literature.

  • 5

    Consider, for instance, the simple case without noise and T = 1, where inline image is consistent for IV and inline image is consistent for inline image. With constant volatility the asymptotic variances of these two estimators are 4 and inline image, respectively. Further, the latter estimator is more sensitive to noise.

  • 6

    The initial two scale estimator of Zhang et al. (2005) takes this type of average RV statistic and subtracts a positive multiple of a non-negative estimator of ω2—to try to bias adjust for the presence of noise (assuming Y ⊥⊥ U). Hence this two-scale estimator must be below the average RV statistic. This makes it unsuitable, by construction, for mid-quote data where RV is typically below integrated variance due to its particular form of noise. Their bias corrected two scale estimator is re-normalized and so maybe useful in this context.

  • 7

    RVsparse was suggested by Zhang et al. (2005) and has a smaller sampling variance than a single RV statistic and is more objective, for it does not depend upon the arbitrary choice of where to start computing the returns.

  • 8

    When quote data is not available, Q4 can be applied in place of T4, replacing the word mid-quote with price.

  • 9

    These pictures extend the important volatility signature plots for realized volatility introduced by Andersen et al. (2000). To construct the plots we use activity fixed tick time, where the sampling frequency is chosen such that we get approximately the same number of observations each day. To explain it, assume that the first trade at the ith day occurred at time ti0 and the last trade on the ith day occurred at time inline image. So approximate ‘60 second’ sampling is constructed as follows. We get the tick time sampling frequency on day i as inline image. In this way, there will be approximately 60 seconds between observations when one takes the intraday average over the sampled intratrade durations. The actual sampled durations will in general be more or less widely dispersed.

ACKNOWLEDGMENTS

This paper was presented at the Econometrics Journal invited session on Financial Econometrics at the Royal Economic Society's Annual Meeting. We thank Richard Smith for his invitation to give it and the co-editor, Jianqing Fan and two anonymous referees for valuable comments that improved this manuscript. We also thank Roel Oomen, Marius Ooms and Kevin Sheppard for helpful comments. The second and fourth author are also affiliated with CREATES, a research centre funded by the Danish National Research Foundation.

Ancillary