Comparison of Three Methodologies for Removal of Random‐Noise‐Induced Biases From Second‐Order Statistical Parameters of Lidar and Radar Measurements

Abstract Random‐noise‐induced biases are inherent issues to the accurate derivation of second‐order statistical parameters (e.g., variances, fluxes, energy densities, and power spectra) from lidar and radar measurements. We demonstrate here for the first time an altitude‐interleaved method for eliminating such biases, following the original proposals by Gardner and Chu (2020, https://doi.org/10.1364/ao.400375) who demonstrated a time‐interleaved method. Interleaving in altitude bins provides two statistically independent samples over the same time period and nearly the same altitude range, thus enabling the replacement of variances that include the noise‐induced biases with covariances that are intrinsically free of such biases. Comparing the interleaved method with previous variance subtraction (VS) and spectral proportion (SP) methods using gravity wave potential energy density calculated from Antarctic lidar data and from a forward model, this study finds the accuracy and precision of each method differing in various conditions, each with its own strengths and weakness. VS performs well in high‐SNR, yet its accuracy fails at lower‐SNR as it often yields negative values. SP is accurate and precise under high‐SNR, remaining accurate in worse conditions than VS would, yet develops a positive bias under low‐SNR. The interleaved method is accurate in all SNRs but requires a large number of samples to drive random‐noise terms in covariances toward zero and to compensate for the reduced precision due to the splitting of return signals. Therefore, selecting the proper bias removal/elimination method for actual signal and sample conditions is crucial in utilizing lidar/radar data, as neglecting this can conceal trends or overstate atmospheric variability.

2 of 16 led to decades of impressive remote sensing campaigns (e.g., Li et al., 2020;She et al., 2019;Stober et al., 2021), yet to make full use of the data, advances must be made in data handling. Advances in science inevitably require the use of second-order statistics such as variances and fluxes, which are inherently biased by random-noise in the data generated during the detection processes (e.g., Chu et al., 2018;Gardner & Liu, 2014;Whiteway & Carswell, 1995). These biases grow increasingly problematic when analyzing the higher or wider reaches of lidar and radar data. To deal with these biases, various correction methods have been developed over the years (e.g., Chu et al., 2018;Whiteway & Carswell, 1995), each with its own advantages and disadvantages. These methods have not yet been compared side-by-side to assess their effectiveness under various conditions, which are therefore the subject of this work.
This study focuses on lidar-measured variance and covariance, but the same principles apply when calculating other second-order statistics using radar and lidar data. Variance is a statistic dependent on the perturbation of a value from its mean, so it is important to understand the anatomy of the perturbation, which can be represented as where the stand-in variable represents an arbitrary atmospheric parameter (such as density, temperature, zonal, meridional, or vertical wind, etc.) and and represent altitude and time, respectively. Here, ′ Total is the total measured perturbation that consists of two components: ′ which is the perturbation caused by atmospheric waves, and Δ which is the noise-induced perturbation. The ′ Total is found by subtracting the mean 0( ) = ( ) from , where the overbar denotes the sample time average over the chosen observational period. It is worth noting that for some real observations it may be suitable to subtract the sample median, instead of the sample mean, as the median can be more robust than the mean and less biased by outliers.

If calculating the variance of wave-induced perturbation Var[ ′ ( )] ≃ [ ′ ( )] 2 using the measured perturbation
On the righthand side of Equation 2, the second term is the noise-variance Var[Δ ( )] ≃ [Δ ( )] 2 and the third term is the cross-term between the wave-induced and noise-induced perturbations. Because ′ and Δ are independent and Δ is random and possesses a zero-mean, the expectation of the cross term [2 ′ Δ ] = [2 ′ ] ⋅ [Δ ] = 0 , that is, this cross-term will vanish when averaging over a large number of samples. However, the noise-variance term Var(Δ ) will remain and contribute a positive bias. Such a bias must be corrected to accurately estimate the wave-induced variance. It is important to note that all the overbars above represent the sample average over time, not ensemble average (e.g., Gubner, 2006). Statistics performed in this kind of geophysical studies have assumed that the sampled perturbations induced by atmospheric waves arise from a stationary, ergodic random process. Therefore, the sample average can be used to approximate the ensemble average , and in this case, the variance computed as the sample average like above can approximate the ensemble expectation of the variance, where "variance" is treated as a signal representing the wave strength, in the limit of very large number of samples. However, the limited number of samples not only differentiates computed variance from the theoretical expectation of signals but also introduces another consequence-the cross-term may not approach zero enough, adding additional uncertainties to the estimate of wave-induced variance.
The most direct way to isolate ( ′ ) 2 is to estimate (Δr) 2 at each altitude and then subtract it from the total variance. As the cross-term approaches zero due to non-correlation after averaging over sufficient samples, only the wave-induced variance will remain. This approach is named the variance subtraction method (e.g., Duck et al., 2001;Whiteway & Carswell, 1995). Estimating this term becomes problematic in low-SNR conditions, as large uncertainties in the estimation often result in the estimated variance-bias being greater than the geophysical variance itself, yielding a physically-impossible "negative variance" when the noise term is subtracted.
To overcome this issue, Chu et al. (2018) developed a solution called the spectral proportion method, where Monte Carlo simulations based on parameter uncertainties are used to estimate the wave-occupied proportion ( ) in the total Var( ′ Total ) at each altitude. Then the total variance is scaled down to estimate Var( ′ ) = ( )Var( ′ Total ) . By this method, there will be no negative variance induced by the waves. This technique may overestimate Var( ′ ) in high-noise scenarios as the uncertainty in determining ( ) increases substantially.

of 16
Gardner and Chu (2020) developed a new approach named the interleaved method. In this method, the return photon counts are split into two separate but interleaved samples so that two statistically independent samples probe the same air parcel over the same time period. Consequently, the variance in Equation 2 is substituted with a covariance between these two samples (see Section 2.3 for details) which no longer contains the noise-induced bias once a statistically-sufficient sample size is used. This method improves the accuracy of the variance estimate by eliminating the noise-induced bias yet decreases the precision both through increased uncertainty caused by photon count splitting and by any remaining terms containing the noise-induced perturbations which have not approached zero under a limited sample size.
Since each approach has strengths and weaknesses, this paper compares these three methods in terms of their accuracy and precision using Antarctic lidar data as well as a forward model. The lidar data used here are the Rayleigh scattering signals collected with an Fe Boltzmann lidar from 2011 to 2020 at the Arrival Heights Observatory near McMurdo Station (Chu, Huang, et al., 2011;Chu et al., 2020;. Although these techniques are demonstrated on lidar measurements, they can be applied to radar data as both are similarly affected by noise-induced biases in higher-order parameters. Additionally, this paper demonstrates an alternative way to apply the interleaved method by interleaving in altitude as opposed to time-interleaving as initially demonstrated in Gardner and Chu (2020).

Three Methodologies
This paper calculates gravity wave potential energy mass density ( pm ) to demonstrate these methods, as it is directly proportion to the atmospheric wave-induced variance discussed in the introduction and it has been studied extensively using lidar.
pm is calculated as where ( ) is gravitational acceleration and Bkg( ) is the background value of the parameter. For Equation 3 specifically, must be either atmospheric temperature or density by the definition of pm.
( ) in Equation 3 is the buoyancy frequency and its square is defined as below 2 can be calculated either from temperature gradient or from atmospheric density gradient and speed of sound Here, Bkg( ) is the background temperature, ( ) is the background density, and and are the specific heat at constant pressure and constant volume, respectively. We use the density-based 2 calculation for estimating pm from density, and the temperature-based 2 to calculate pm from temperature, though the two yielded nearly identical results.
In practice, we cannot calculate the ′ term directly but must estimate it from ′ Total , meaning that the resulting total pm is proportional to Var( ′ Total ) , that is, pm,Total ∝ Var( ′ Total ) which includes the Δ terms: In Equation 5, the last two terms are not wave-induced but introduced by noise, with the second introducing a positive bias and the third term introducing additional noise. Although pm is used as the demonstration here, atmospheric kinetic energy has similar issues with noise-induced bias, as ∝ 1 2 [( ′ ) 2 + ( ′ ) 2 ] where ′ and ′ are the wave-induced wind perturbations, and therefore can be treated similarly to pm .

Noise-Variance Subtraction
As seen in Equation 5, when the pm is calculated using ′ Total , there are noise terms present alongside the wave-induced pm. Even after averaging sufficient samples to drive the third term of Equation 5 to zero, there is still a bias caused by the noise-induced variance. In response, the noise-variance subtraction method was introduced, 4 of 16 which estimates and subtracts the Var(Δ ) term from the total variance. Whiteway and Carswell (1995) derived the following expression for the noise-variance: where is the number of short-average profiles used within one observational sample, is the uncertainty in the parameter . The variance subtraction-corrected pm is calculated as The limitations of this method become obvious when attempting to use it on noisy data. Here, growing uncertainty in parameter error can cause Var(Δ ) to become so large that it exceeds Var( ′ Total ) , producing a negative value for pm,VS , a non-physical result (see Sections 3 and 4).

Spectral Proportion Method
These negative values led to the development of the spectral proportion method in Chu et al. (2018) which eliminates the possibility of negative pm . This method also uses the calculated total variance, and then scales it down to estimate the wave-induced energy. As demonstrated in Chu et al. (2018), we first perform a Monte Carlo simulation by constructing 1,000 copies of ( ) , adding normally-distributed noise onto them with a standard deviation equal to the measurement uncertainty at that altitude and time, and calculating the perturbations for each of these 1000-fields individually. Then, for each altitude, the 1D-FFT of the perturbations is calculated for each field and averaged over all 1,000 iterations. The noise floor level is then estimated from the averaged-spectral plots shown in Figures 1a-1d by taking the average of all minima locations above frequency ( ) = 0.1 hr −1 , ignoring the highest minima. We only use minima above = 0.1 hr −1 to avoid the influence of any spectral filters applied during the background subtraction, and we ignore the highest minima to ensure that we do not include any troughs which are actually above the noise floor.
After the noise floor is determined, the proportion of wave energy occupying the total energy ( ) is calculated as Examples of these averaged spectra and noise floors can be seen in Figure 1 for a variety of altitudes while the ( ) is illustrated in Figure 1e. Note that the floor-level increases with increasing altitude as the corresponding SNR decreases. As a result, ( ) decreases with altitude in general. This ( ) profile is then used to scale down pm,Total so that the wave-induced pm is obtained as This method can, however, introduce positive-bias under high noise, which is discussed further in Sections 3 and 4.

Interleaved Method
The common idea of the previous two methods was based on the total variance calculation that includes both wave and noise-induced variances, and then each method employed some algorithm to either remove the estimated noise-variance or scale down the total variance to estimate the wave-induced variance. The interleaved method instead eliminates the noise-induced bias altogether by calculating the covariance of simultaneous, collocated samples taken in a way such that the noise-terms are driven toward zero. Gardner and Chu (2020) point out that this bias elimination would optimally be done using two adjacent lidars, but such a setup would be complex, expensive, and uncommon. The interleaved method they propose instead introduces a practical way to achieve the same bias-elimination using a single lidar (see diagrams in Figure 2). Gardner and Chu (2020) have demonstrated interleaving time bins (Figures 2a and 2c) for the covariance calculation but suggested that in many lidar systems it may make more sense to apply it to adjacent altitude bins. Here we describe the altitude-interleaving method.
Implementation of this altitude-interleaving process (Figure 2e) is best described by comparing it to a standard lidar data processing approach. In the standard data processing, the photon counts from n-adjacent fine bins, in height, are summed into a single, coarse bin with a height of in order to improve the SNR of data. After deriving from these coarse bins across the entire observation, the data is processed to yield perturbations and variance. In the interleaved method the bins are summed into two separate groups: one a sum of the odd numbered fine bins, and one a sum of the even numbered fine bins. These groupings are then individually processed to generate two distinct sets of atmospheric parameters and from which perturbations ′ Total and ′ Total are derived. 6 of 16 Each of these perturbations has the structure of Equation 1, comprised of both wave-induced and noise-induced perturbations. We then compute the covariance between these two sets of perturbations as where and are altitudes representing samples A and B and are separated by vertical distance , regardless how large the number is. It is worth noting that ′ and ′ , the wave-driven atmospheric perturbations, are highly correlated as they are measured at the same time over approximately the same altitude, shifted by a small value . Therefore, their covariance (first term in the righthand side of Equation 10) should be very similar to the variance ( ′ ) 2 . In contrast, Δr A and ΔrB are statistically independent random perturbations with zero means so their covariance and terms containing them, that is, the last 3 terms in Equation 10, will approach zero when averaging over a sufficient number of samples. If the number of samples is sufficiently large, all three terms will drop out, leaving a wave term without noise-induced bias.
The difference between the wave-induced variance and covariance depends on the level of correlation between ′ and ′ . If ′ and ′ are taken simultaneously over the exactly same altitudes, then their covariance is exactly equal to the wave-induced variance. With the small shift in altitude (similar to the small shift in time when using the time-interleaved method), the covariance is slightly smaller than the variance as theoretically derived in Gardner and Chu (2020). However, if the shift gets larger, the covariance will be considerably lower than the variance. Early works which suggested the viability of this covariance-substitution (Gardner & Liu, 2014) took covariance between samples of alternating coarse-bins, which is termed "time-lagged method" here as shown in Figures 2b and 2d, instead of generating two subsamples created at the fine-bin level as is done in the interleaved method (Figures 2a and 2c). The downside of this time-lagged approach is that the two coarse-bins are separated by a large time shift weakening the correlation in the first term of Equation 10 and yielding a falsely-lower covariance. In the time-interleaved method, the data is split so that samples A and B are measuring the same parcel of atmosphere over nearly the same time. By minimizing as demonstrated in Figures 2a and 2c, the difference between Var( ′ Total ) and Cov( ′ Total ′ Total ) is minimized. That difference can be corrected by a correction factor provided in Gardner and Chu (2020) which is often minimal and can be ignored as long as the interleaved method is used properly. This same theory applies when interleaving in altitude (see Figure 2e). By minimizing the correlation between ′ and ′ is maximized, meaning that the covariance can now be substituted into Equation 3 to calculate pm free of photon-noise-induced bias: This flexibility in interleaving-direction allows for the interleaved method to be used on data taken by many different lidar systems. For example, the Na Doppler lidar used as an example in Gardner and Chu (2020) saves its raw data in time intervals of 4.5 s, which means the data can be very finely time-interleaved, while the Fe Boltzmann lidar used for this study saves its data in time intervals of 1 min, likely too coarse to use a time-interleaved approach. However, the Fe Boltzmann lidar saves its counts at a high-vertical resolution of 48 m, allowing the interleaved method to be utilized altitudinally.
Moreover, as [Δ ] = 0, interleaving in derived atmospheric parameters (e.g., wind velocities , , ) is equivalent to interleaving in raw data (e.g., radar power or lidar photon counts). Therefore, when applying the interleaved method to radar measurements, one may choose to interleave high-resolution , , in time or altitude domain if interleaving the raw return signals is challenging. By averaging together odd and even (in either time or altitude) wind values, respectively, to create two independent samples, for example, and , the covariance Cov( , ) will be bias-free and can be used in place of variance. Researchers using radar techniques like coherent detection may be able to cleverly implement this interleaving idea in the raw data (more analogous to the photon count interleaving for lidars).

Error Analyses
Results obtained from these three methods are compared in Figure 3, where four cases are shown-winter and summer observations representing high and low SNRs, respectively, with both large and small samples sizes. By comparing these cases, one can get a sense for how the accuracy and precision of each method respond to increased sample size, as well as how they behave in varying noise levels. Before discussing these results in Section 4, it is necessary to introduce the analyses of uncertainty in precision and bias in accuracy.

Precision (Uncertainty) Analysis
The measurement uncertainty (Δ ) of parameter , caused by photon noise and other random noise in lidar detection or by similar random noise in radar detection, introduces uncertainties in the second-order statistics (such as Var( ′ ) and pm ) through the error propagation procedure. If the small errors in 2 and 2 Bkg are neglected, the pm uncertainty, Δ pm , is proportional to ΔVar( ′ ) -the uncertainty of Var( ′ ), according to Equation 3: We tabulate the uncertainty equations in Table 1, among which Equations 18-20 consider the photon-noise-induced uncertainty only. That is, only the error propagation from Δ is considered while any statistical error (see below) is omitted. The uncertainty in the variance subtraction method is calculated using the noise-variance given , and the spectral proportion method's uncertainty is found using ( ) as in Chu et al. (2018). In the interleaved method, the uncertainty is increased because of the reduction in SNR caused by splitting the photon counts into two groups. That splitting manifests itself as a √ 2 on thus there is a factor of two in Equation 20. This factor decreases the precision of the measurement, necessitating a larger sample size to reduce the overall uncertainty.
Equations 18-20 are suitable for estimating the random-noise-induced uncertainty for pm of individual measurements. If regarding each individual measurement of pm as a sample of the overall stationary ergodic signal, then the statistical error of sampling needs to be taken into account when taking the sample time average. Gardner and Chu (2020) have presented extensive analyses on such overall uncertainties of the estimated sample variances and covariances. Here, we apply their results of ΔVar( ′ ) and tabulate the total Δ pm equations as Equation 21 through Equation 23. The statistical error component, the first term of Equations 21-23, relates the individually measured atmospheric variances to the presumed stationary signal and expresses uncertainty based on the observational length and the correlation time of the parameter ′ , that is, the number of total independent samples taken during the observations for the sample average. Driving this statistical error toward zero requires a large number of samples, so this statistical error introduces a persistent value to the calculated statistical uncertainty.

Accuracy Analysis With Forward Modeling
While the precision of each method can be assessed from Figure 3 and Table 1, the analysis in Section 3.1 does not give much insight into the accuracies of these methods. To address this issue, a forward model was developed to test their performance. As the modeled wave energy is known, it is possible to assess any potential systematic bias introduced by each of the three methods, in addition to the assessment of their uncertainties in precision.
First, the atmospheric number density and wave-induced perturbations in this density field are modeled as a background with wave-induced perturbations: The background density field 0 ( ) is taken from the NRLMSISE-00 model (Picone et al., 2002). The perturbations ′ ( ) are modeled as a superposition of two plane-waves with downward phase progression (as shown in  (22) Interleaved method Note. , the correlation time of temperature or density perturbations (∼1 hr); obs , the total observation time length; Δ , the time resolution of data.   (Zhao et al., 2017). To replicate the growth of gravity wave strength with altitude due to exponentially decreasing background density, the wave amplitudes scale as where and 0 denote altitude and a reference altitude, respectively.
Next, we simulate the photon return from a lidar shooting vertically into this modeled density field. The photon counts are generated by where the relative perturbations are defined as Equation 15 is essentially a modification of the general lidar equation for Rayleigh scattering as written in Chu and Papen (2005) with the addition of the density perturbations. ( 0) is representative of typical photon counts observed by the Fe Boltzmann lidar at a reference altitude (typically around 0 = 30 km ). After including background photons resulting from the sunlight or star light, we get Bkg represents the photon counts from these unwanted sources and is estimated by averaging the Fe Boltzmann lidar's photon counts from 150-180 km over many observations during the appropriate season. Poisson-distributed noise is then added to Total( ) by using the photon count at each altitude-time grid point as the rate parameter for a Poisson-distributed sampling, resulting in received( ) , which is a matrix of noisy counts resembling those received by the photon counting system.

received(
) is then treated as if it were a real lidar observation and processed as usual using all three bias-removal/elimination methods. Wave( ) is processed identically to received to generate a reference for the true, modeled pm in the field with no bias from photon noise. This simulation process was repeated with the same wave spectra many times, each iteration with distinct, random noise, and the results from each method were averaged to create Figure 5. We used the same spectra in generating the wave-field in each of these plots to satisfy the stationary signal requirement (discussed in Section 1) for averaging large amounts of data, which also serves to minimize any additional effects which could obscure the accuracy of the methods.

Comparison of Three Methods
We now analyze how the three methods perform under different conditions of SNR and number of samples, in terms of their accuracy and precision. We define accuracy as how close the results are to the true atmospheric pm and precision as how well the results are determined (with no reference to the true values) as well as how repeatable the results would be (Bevington & Robinson, 2003). In this dataset, the summer measurements have lower SNRs than winter due to the full sunlight under which the data was collected. Additionally, as this data is derived from Rayleigh scattering, higher altitudes correspond to lower SNRs due to the exponential decrease of atmospheric number density with increasing altitude. Both temperature and density can be derived from these Rayleigh scattering signals. For this study, we use density, but these same pm values can also be derived from temperature. This relation is explored further in Appendix A.
We first compare the precision of these methods under various conditions. The precision of each method is largely determined by SNR and is increased with the use of a greater sample size (as the uncertainty is decreased). This is evidenced by Equations 18-23, as the Δ term (which reflects SNR) scales directly with the uncertainty, and the Δ obs term scales down this uncertainty as the number of observations increases. In Figures 3a and 3e, the variance subtraction and the interleaved method cannot be scientifically interpreted as shown due to their negative pm values. The spectral proportion method appears to show a realistic trend, even when using the small sample size. In Figures 3c and 3g, as more samples are incorporated, the variance removal and spectral proportion method continue to show similar, yet more precise trends. The interleaved method now yields nearly-entirely positive results which trend at a lower pm than the spectral proportion derived results.
We then assess the accuracy of the methods using the forward modeled results in Figure 5. The performance of the variance subtraction method in all conditions clearly shows that this method has low accuracy under Figure 5. Application of the three bias-removal/elimination methods to the forward-modeled lidar density measurements. Density derivation, pm calculation, and plotting procedures are identical to those used for Figure 3 except the data used here is from the forward model introduced in Section 3.2. The profile labeled "Forward Model" in each subpanel shows the pm as calculated directly from the modeled wave field, which is treated as the reference, that is, a bias-free profile.
11 of 16 high-noise due to its negative values in Figure 3 and its strong departure from the modeled pm seen in Figure 5. While the spectral proportion method yields a precise profile in all cases, Figures 5b and 5d show that it overestimates pm when the SNR becomes low, as evidenced by the departure from the modeled pm . While the interleaved method showed an especially noisy result under a small sample-size, Figure 5 shows that this noisy profile still generally centers around the modeled pm , with the same remaining true when using a larger-sample size.
The negative-bias of variance subtraction method is due to the uncertainty in the estimation of the noise-variance. As the noise-variance is computed using the temperature or density error as in Equation 6, large uncertainties in these error values inevitably occur near the top of the measurement (where the SNR has significantly declined) which cause the noise-variance to increase dramatically. When subtracted from total variance via Equation 7 these often yield negative variances and thus, physically-impossible negative pm values. Alongside the positive bias caused by the second term of Equation 2, noise can be introduced by the third term if the sample-size was insufficient to drive it close to zero. While these tests show poor performance from the variance subtraction method at higher-altitudes, the plots still show that it is a valid approach under high-SNRs. It has been successfully utilized many times before for lower-altitude studies (e.g., Chu et al., 2009;Duck et al., 2001;Yamashita et al., 2009).
The positive bias of spectral proportion method seen in Figures 3g and 5d is caused by high-noise in the initial sample contaminating the spectra at a given altitude. Looking at Figures 1a-1d, there is a regularly occurring peak near ≈ 0.18 hr −1 , yet Figure 1d shows additional peaks which are not present at the other altitudes. These could be due to a localized wave present at this altitude throughout the signal, but extensive testing and modeling has revealed that these peaks can be caused by strong noise present in the observation. As this noise-induced peak disguises itself as a wave-induced peak, any noise-floor determination method which captures the energy in the wave-peaks will also capture the energy in the noise-induced peak, causing an overestimation in the wave-energy calculated in the ( ). This systematic underestimation of the noise floor reflects how strong noise is able to cause a positive bias in the form of an overestimated ( ) when using the spectral proportion method, leading to an overestimated pm on noisy data. The interleaved method does not suffer from either positive or negative bias and generally remains centered around the modeled pm . Its core drawback is the increased uncertainty due to the splitting of the photon counts into two subsamples and additional noise contributed by the final three terms in Equation 10 (if they have not been driven to zero). The bin splitting reduces the signal level in each coarse bin by half, increasing overall uncertainty by √ 2 . This increase in uncertainty is significant, and, coupled with the last three terms in Equation 10, can often lead to the derived pm displaying negative values. Without enough samples to beat down the uncertainty to a certain level and zero-out the noise terms from Equation 10, single-observation results from the interleaved method may not be scientifically meaningful. Additionally, remaining noise-terms which did not approach zero due to a small sample size will affect the resultant pm with the capability to strongly offset the value calculated by the covariance. The stronger the noise-terms, the more samples are needed to drive the noise terms toward zero and remove their influence. Figure 6 further illustrates these trends by showing the behavior of an entire month of summer observations which is overlayed with its mean. For many of these individual runs, negative values may occur at many altitudes, but given a large enough samples size, the mean value becomes positive. However, even with a large number of samples as shown in Figure 3g, the uppermost bin is still negative, and obviously deviates from the trend established by the bins below. This result reinforces that lower-SNR increasingly necessitates the use of a larger sample size to compensate for the bin-splitting-induced uncertainty increase as well as to facilitate the driving of the noise terms toward zero. Appendix B further elaborates on the development of this precision increase with the increasing sample size.

Comparison to Previous Results
We now compare the performance of the interleaved method to that of the spectral proportion in Figure 7 by replicating results from a previously published study. In Chu et al. (2018), lidar data from 2011 to 2015 taken by the McMurdo Fe Boltzmann lidar was processed with the spectral proportion method to yield pm . One of the major findings from that study was the strong asymmetry between summer and winter pm . That study found significantly stronger pm in winter as opposed to summer, with this pattern repeating annually. The comparison in Figure 7 has demonstrated the repeatability of the seasonal asymmetry observations. The interleaved method generally agreed with the trends attained from the spectral proportion method. While some of these interleaved values are negative, it is now known from Section 4 that we must average over many samples (or apply a fit, as we do here) to reveal the true trend. Not only are the observational results replicated by the interleaved method, but since the interleaved method is more accurate in the high-noise summer and does not overestimate pm like the spectral proportion method, the summertime pm derived using the interleaved method yields lower summer minimums, and in the winters, a slightly higher maximum. This result means that the major conclusions of Chu et al. (2018) are strengthened by the use of the interleaved method.

Conclusions and Recommendations
Random-noise-induced biases are inherent issues to the accurate derivation of second-order statistical parameters (such as temperature, wind and species variances, momentum, heat and constituent fluxes, potential and kinetic energy densities of atmospheric waves, and power spectrum estimates) from lidar and radar measurements. As the boundaries of existing research expand, powerful techniques for removal of such biases must be developed to take full advantage of data collection campaigns. The variance subtraction, spectral proportion, and interleaved methods are all viable means to correct for the biases, yet the performance of each method varies depending on the conditions under which they are applied.
Based on the comparisons using the lidar observational datasets from Antarctica as well as the forward-modeled cases, we draw the following conclusions. The variance subtraction method is best used with high-SNR observations, as it is easily biased-negatively by noise in the data. It provides a precise, yet not always accurate, measurement of atmospheric variance even with a relatively-small sample size. The spectral proportion method is more robust, yielding precise and accurate measurements of variance in significantly noisier data than the variance subtraction method, and also does not rely on a large sample size (Chu et al., 2018). However, it begins to display a positive-bias under high-noise conditions. The interleaved method is the only method which will intrinsically not have a bias because it eliminates the random-noise-induced biases utilizing two statistically independent datasets that cover the same altitude range and time period . However, such improved accuracy is attained at the price of reduced precision, necessitating a much-larger sample size than the others even for high-SNR measurements.
This work is the first demonstration of altitude/range-interleaved method for deriving second-order statistics, following the original proposal by Gardner and Chu (2020). Interleaving in altitude (or range) bins provides two statistically independent samples over the same time period and altitude range even if the original raw data were not saved in high temporal resolutions but sufficiently high spatial (range) resolutions. Therefore, the 13 of 16 altitude/range-interleaved method provides a suitable solution to many current and historic lidar and radar datasets for accurately deriving variances, fluxes, wave energy densities, and power spectrum estimates, etc.
Given the overall considerations we recommend applying the interleaved (either in time or in altitude/range bins) method and the spectral proportion method in real applications because they are superior to the noise subtraction method as demonstrated in this work. When the application goals are to derive statistically mean profiles with high accuracy and/or there are a large number of samples, the interleaved method would be the best choice because it inherently eliminates the noise-induced biases to give the highest accuracy while the large sample size reduces uncertainties to ensure sufficient precision as well. However, if the application goals are to derive second-order statistics within a small number of samples and then study the time evolution of such statistics over month, season, and/or year (i.e., non-stationary signals in longer time periods), the spectral proportion may be a better choice for its higher precision and the ability to handle small sample sizes with a caveat of potentially positive-biases in high-noise conditions. Applying the proper bias-removal method can unlock the full potential of a dataset, allowing retrieval of second or higher-order parameters into lower-SNR regions of the data. Additionally, it can reveal trends in the data that may otherwise be concealed by the bias, such as the seasonal asymmetry demonstrated prior, or altitudinal trends which have not yet been discovered.

Appendix B: Development of Precision With Sample Size in the Interleaved Method
A primary downside of the interleaved method is the reduction of precision caused by splitting the samples into two groups. This is best countered by increasing the number of samples used for the measurements, which reduces the uncertainties roughly by a factor of 1 √ , where is the number of independent samples used (Bevington & Robinson, 2003). Figure B1a demonstrates this by showing the development of interleaved profiles with the inclusion of additional samples, where it can clearly be seen that additional samples reduce the noise-induced uncertainties, particularly those present at high altitudes where the SNRs are low. To quantify the uncertainty reduction, we define the root-mean-square (RMS) errors as the difference between the pm derived from all samples and the pm derived from a smaller sample size. Such RMS errors are similar to the total uncertainties given in Table 1, which include both the statistical errors and the random-noise-induced errors. Figures B1b and B1c show that the RMS errors in the altitude ranges of 30-69 km and 30-50 km, respectively, decrease with the increasing sample length in hours at a trend proportional to 1 √ , with only minor deviations from this trend due to non-uniform SNRs in individual samples. For our data in July at Antarctica, after 200 hr of data were averaged (i.e., 100, 2-hr samples), the uncertainty was already decreased by ∼80%. For any observations, the exact number of samples needed to improve the measurement precision depends on the general quality of the actual data, especially its SNRs, as well as the correlation time of the measured atmospheric parameters (see Table 1). Figure A1. Demonstration of the interleaved and spectral proportion methods applied to pm derivations using relative temperature and density perturbations, respectively. Results from the variance subtraction method are not plotted as its values are invalid for much of the range. (a) pm profiles from 186 hr of data from July 2019 at McMurdo. (b) pm profiles from 180 hr of data from January 2019 and 2020 at McMurdo. The densitypm data here is calculated identically to the other pm figures, and the temperaturepm is derived at the same resolution (Δ = 1 km, Δ = 2 hr ) as the density product.