Determination of the most pertinent EUV proxy for use in thermosphere modeling



[1] Two major issues in the specification of the thermospheric density are the definition of proper solar inputs and the empirical modeling of thermosphere response to solar and to geomagnetic forcings. This specification is crucial for the tracking of low Earth orbiting satellites. Here we address both issues by using 14 years of daily density measurements made by the Stella satellite at 813 km altitude and by carrying out a multiscale statistical analysis of various solar inputs. First, we find that the spectrally integrated solar emission between 26–34 nm offers the best overall performance in the density reconstruction. Second, we introduce linear parametric transfer function models to describe the dynamic response of the density to the solar and geomagnetic forcings. These transfer function models lead to a major error reduction and in addition open new perspectives in the physical interpretation of the thermospheric dynamics.

1. Introduction

[2] The density and composition of Earth's thermosphere is mostly sensitive to variations of the solar irradiance in the Extreme UltraViolet (EUV, 10–121 nm) spectral range. EUV radiation heats the upper atmosphere, and an intensifying flux causes the density at a given altitude to increase. Such changes in the atmospheric density mainly affect objects in low Earth orbit, where the drag force becomes the second-largest (but secular) perturbation.

[3] Errors in upper atmosphere density models are one of the main reasons for the uncertainty in the knowledge of spacecraft and debris location, particularly so when tracking data are not available. A major source of error is the definition of the solar EUV forcing which, by lack of continuous observations until 2002, is customarily replaced by EUV proxies [Tobiska et al., 2008; Bowman et al., 2008a; Lean et al., 2009], such as the F10.7 index (the solar radio flux at 10.7 cm) and nowadays more frequently the MgII index (the core-to-wing ratio of the Mg II K-line at 280 nm). A second source of error is the static nature of the models, which precludes preconditioning due to lack of memory.

[4] Here we show how a new approach allows both sources of error for thermospheric density nowcast to be reduced. First, we determine which single solar input is most appropriate by multiscale statistical analysis. Second, we introduce an empirical convolutive model that incorporates memory effects and allows both solar and geomagnetic forcings to be described simultaneously. This methodology can easily be extended to more than one solar input.

[5] The densities are inferred from precise orbit determination of the French geodetic Stella satellite, which is in a 96 °inclination and near-circular orbit at approximately 813 km altitude. Stella is a suitable spacecraft for this kind of analysis because of its spherical shape (no attitude-related errors), the perfect knowledge of the satellite characteristics (mass, surface, reflectivity), and the very accurate laser tracking by the International Laser Ranging Service [Pearlman et al., 2002].

2. The Data

[6] In this study, we use 14 years of mean densities (from January 7, 1997 through July 31, 2010) derived from orbit perturbation analysis [Jacchia and Slowey, 1963] from Stella. The density is averaged over intervals of 24 hours. This data set has the advantage of being homogenous, with no averaging over various satellites. The first Drag Temperature Model (DTM) [Barlier et al., 1978] is based upon such measurements, which essentially tie the observed decay of the semi-major axis to a mean density, which is estimated here with a relative uncertainty of 5%. Seasonal variations, which are important at the altitude of Stella, are removed by windowed Fourier analysis.

[7] The substitutes of the solar EUV flux we consider here are: 1) the F10.7 index from Penticton Observatory, Canada; 2) the MgII index from the LASP composite; 3) the integrated flux between 26–34 nm from the SEM radiometer onboard SoHO [Judge et al., 1998]; 4) the s10.7 index, which has been built for orbitography purposes, using SEM data [Tobiska et al., 2008]; 5) Lya, the intensity of the bright Lyman-α line (LASP composite); and 5) XUV, the baseline of the daily soft X-ray flux in the 0.1–0.8 nm band (from GOES). Data gaps in SEM are interpolated using a multivariate technique [Dudok de Wit, 2011]. Geomagnetic activity is represented by the planetary geomagnetic index Ap. Here, however, the focus is on testing solar inputs and not (yet) on optimizing the geomagnetic forcing.

3. Determination of the Best Solar Inputs

[8] Different time scales of the density are also associated with different physical mechanisms: fast variations are caused by geomagnetic activity and by solar rotation modulation of the EUV flux whereas longer time scales are associated with the lifetime of active regions and solar cycle. For that reason, we first decompose all quantities into a slowly-varying (DC) and a fluctuating (AC) component: x(t) = xDC(t) + xAC(t).

[9] The DC component is traditionally computed by running the data through a smoothing filter. An 81-day cutoff time is used in thermosphere models such as JB2008 [Bowman et al., 2008b], DTM2000 [Bruinsma et al., 2003] and MSIS[Picone et al., 2002]. This smoothing, however, incorporates part of the fast variations in the DC component. This can be detrimental during geomagnetic storms, when sudden density bursts may cause the DC value to increase, see Figure 1. We recommend instead the baseline or lower envelope, which is known to provide a better description of the slowly varying component in radio observations [Schmahl and Kundu, 1998]. We extract the baseline by taking the minimum value in a sliding 21-day window and subsequently smoothing that time series with a Gaussian filter that has a 21-day full width at half maximum. The major asset of the baseline is its resilience to peaks associated with geomagnetic storms, whose signature does not have to be removed manually.

Figure 1.

Excerpt of the thermospheric density, showing the difference between the baseline and a Gaussian smoothing over 81 days.

[10] The DC component of the density is found to be both in phase with and proportional to the solar forcing. Figure 2 shows a scatter plot of the DC component of the density ρ versus that of solar inputs. To quantify the correlation between the two, we shall henceforth use two complementary descriptors: 1) Spearman's rank correlation r is a direct measure of correlation [Wilks, 2011]; we prefer it to the customary Pearson coefficient since it is invariant to nonlinear rescalings; 2) RMS is the classical Root Mean Square error, normalized to the standard deviation σρ of the density ρ, i.e., RMS = equation image 〈(ρequation image)21/2, where equation image is the modeled density; here the density is linear function of each solar proxy x, i.e., equation image = α + βx. Cross-validation is done by first estimating the model coefficients and then the RMS from independent subsets of the data. A RMS of 100% means that none of the observed variability can be described by the model. Note also that a low RMS does not necessarily imply a high correlation, and vice-versa. Both are therefore needed to assess a solar input.

Figure 2.

Scatter plot of the DC component of the thermospheric density versus the DC component of various solar inputs. The RMS and Spearman's correlation coefficient r, together with their standard deviation, are indicated for each proxy. The standard deviation has been obtained by bootstrapping. In all plots, a linear model equation imageDC = α + βxDC is used for computing the RMS. Densities are in units of [10−14 kg m−3].)

[11] Based on the combined score of the correlation coefficient and RMS, we find from Figure 2 that inputs such as the XUV flux and the sunspot number (not shown) can be readily excluded. We have selected F10.7, MgII and SEM for further analysis because they have the best scores but also because these quantities are guaranteed to remain available in the next decade and are best adapted for operational use. The scatter plots suggest that the DC component of the density ρ can be relatively well modeled using a weakly non-linear function. For that reason, we define three new proxies labelled as F (from F10.7), M (from MgII) and S (from SEM). Each one is obtained by fitting the density with a cubic polynomial equation image = α + βx + γx3. Adding a quadratic or higher order terms does not reduce the RMS significantly with respect to its uncertainty. According to the RMS criterion, the best candidate for the DC component of the density is S, followed by M and F. We find that the S proxy also properly reproduces the density drop observed between the 1995–1996 and 2009–2010 solar minima, which supports the low EUV flux as being the primary cause of the low densities observed at the end of solar cycle 23 [Solomon et al., 2010, 2011].

[12] To compare the performance of individual solar inputs for short-term variations, we consider the multidimensional scaling technique used by Dudok de Wit et al. [2009]: all quantities are displayed on a 2D (so-called correspondence map) in such a way that their distance reflects their dissimilarity, which is expressed here by their pairwise RMS. The point of interest is the relative distance between quantities, not the axes. Such maps are widely used in statistics for their ability to provide a single global picture of the similarity between all the quantities.

[13] Here, we compute the correspondence maps after using the à trous wavelet transform to first decompose the AC component of each quantity into different time scales. By doing so, we investigate the similarities at different scales. Let us concentrate on three characteristic time scales that respectively correspond to half a solar rotation (i.e., center-to-limb effects), solar rotation and long-term effects, see Figure 3. The latter is simply based on the DC component. At the shortest time scales of 1–2 days (not shown) the Ap geomagnetic index is the quantity that is located closest to the density. At longer time scales, however, the shortest distances are systematically obtained with S, followed by F or by M. From this, we conclude that the EUV flux in the 26–34 nm band, after a nonlinear rescaling, is the best overall solar proxy for the thermospheric density. The F proxy, which is based on the widely used F10.7 index, is a fallback option for time scales larger than a month, whereas the M proxy is more suitable for short-term variations. This distinction highlights the importance of distinguishing different time scales. Other inputs, such as the intensity of the H Lyman-α line and the Magnetic Plage Strength Index systematically perform more poorly.

Figure 3.

Correspondence maps for three characteristic scales. The distance between each pair of points approximates their RMS error; axes have no immediate meaning. Characters correspond to the legend used in the text.

[14] Interestingly, when correspondence maps show three aligned and closely-spaced quantities, then the quantity in the middle can be approximated by a linear combination of the two others. Figure 3 reveals that this is not the case with the density ρ, except for the largest scales. So, even though some improvement is possible by using more than one input (as in the JB2008 model), adding more inputs is unlikely to reduce the RMS further. Our representation thereby provides a strategy to determine the smallest combinations of inputs.

4. Model for the Time-Evolution

[15] Direct observations of the thermospheric density show that it does not respond instantaneously to external forcings but rather reacts with some delay. Two reasons for this are the inertia of the atmosphere and wavelength-dependent center-to-limb effects in the solar spectral irradiance. To incorporate this property in our reconstruction, we model the AC component using transfer functions by doing system identification (SI) [Ljung, 1997]. We consider a particular class of discrete linear time-invariant models called Output Error (OE), in which the modeled density equation image is a function of two inputs: u1 (solar forcing) and u2 (geomagnetic forcing), using the current date index t, and preceding days:

equation image

where Bk(z−1) = bk,1 + bk,2z−1 + ⋯ + bk,nbkz−1nbk+1, Fk(z−1) = 1 + fk,2z−1 + ⋯ + bk,nfkz−1nfk+1, and z−1 is the delay operator of the z-transform, namely z−1u[t] = u[t − 1]. OE models are widely used to model linear systems with additive noise. We use information theoretic criteria to determine the optimum order of the model, and find typically nf1 = 2, nf2 = 3, nb1 = nb2 = 3, which means that 2 to 3 past values only are needed to describe the internal dynamics of the density (described by the F(z−1) polynomials) and the response to the forcings (described by the B(z−1) polynomials).

[16] OE models bring a major improvement over classical attempts to model the density. First, they provide a rigorous framework that contrasts with the (often subjective) selection criteria used to determine past and present combinations of solar inputs. Second, both solar and geomagnetic forcings can now be described simultaneously with a single model. This is a major improvement because, so far, all attempts to isolate the thermospheric response either to solar or to geomagnetic activity were severely constrained by the necessity to consider the very few intervals during which one of the forcings could be ignored [see, e.g., Sutton et al., 2006]. So, by assuming that the response to the two inputs is linear and additive, we can now model the response of the density to any combination and any evolution of solar and geomagnetic activity levels. Incidentally, the OE model can give much deeper insight into the underlying physics, for example by providing access to the impulse or the step response of the density.

[17] Figure 4 illustrates the good fit achieved by the OE model, and compares it with the DTM2000 and JB2008 models. To quantify the performance, we summarize in Table 1 the RMS of the reconstructed density for various cases. We list three alternative models, but the OE model should be really compared to the static (i.e., memoryless) model only, because this is the only one that uses use exactly the same solar and geomagnetic inputs. The DTM2000 and JB2008 are only listed as examples of performance; the former uses the F10.7 and Ap indices, and the latter four solar and two geomagnetic inputs.

Figure 4.

Excerpt of the thermospheric density, also showing the fits from the DTM2000, the JB2008 and OE models.

Table 1. RMS of the Reconstructed Densitya
ModelSolar InputDC OnlyAC OnlyDC & AC
  • a

    RMS of the reconstructed density, for different cases: the DC part only (third column), the AC part only (fourth column) and the full density (fifth column). The static case refers to the S proxy with an instantaneous response of the density (no convolutive model). All other models use different (and generally more) inputs. Values strictly inferior to the median of each column are shown in boldface.

JB20084 inputs0.160.360.25
OEF, M, S0.160.400.26

[18] For the DC component of the density, the best performance is achieved with the S proxy from SEM. Adding a second or a third solar proxy does not bring a major improvement, which supports our hypothesis that the RMS for long-term changes cannot be reduced further by using more solar forcing terms. Differences in the RMS, however, are also likely to be caused by the intrinsic long-term variability of the density, by possible instrumental drifts and by the modeling of the seasonal variation.

[19] The main improvement occurs in the AC component, which is also the one interest here. Not surprisingly, models that use several inputs (JB2008 or OE with 3 inputs) perform best. However, the OE model with one single input only (S) does almost as good and compared to the static case, the RMS is reduced by 20%. This is the most important result of the table, as it highlights the good performance of our simple empirical model. This is again reflected in the global performance (DC & AC), from which we conclude that S is the best all-purpose solar input, way ahead of the MgII and F10.7 indices.

5. Conclusions

[20] This study shows that major improvements can still be made in the methodology used for modeling of the thermospheric density response to external forcings. Here, we focused on the solar forcing, using 14 years of daily-mean density measurements made at 813 km altitude.

[21] We find that the EUV flux in the 26–34 nm band (as measured by SoHO/SEM) does systematically better than either the F10.7 or the MgII indices. The RMS on the reconstructed density is typically 20% lower and the superiority of this proxy is observed at all time scales, including solar rotation and solar cycle. This EUV flux is presently measured by SDO/EVE and soon will be by GOES/EUVS, making it a good candidate for operational space weather applications. Our method also provides a visual strategy for selecting the best combinations of solar inputs.

[22] For the first time transfer function models have been used to describe the dynamic response of the thermosphere to the solar and geomagnetic forcings, thereby casting this problem in the rigorous framework of SI. Using a linear output error model, we find that the RMS can be reduced by 20% compared to the equivalent static model. The SI framework brings numerous additional advantages. Primarily, it allows to describe the response to arbitrary temporal evolutions of the inputs without the need to isolate periods during which one the forcings only is active; that constraint has so far been a major impediment to the analysis of the thermospheric dynamics. Second, linear transfer function models allow to estimate the impulse response of the thermosphere, which provides deeper insight into its physical characteristics. This response differs from the one estimated using flares [e.g., Sutton et al., 2006], because in the EUV flare spectra differ from daily averaged spectra. These aspects will be detailed in a forthcoming publication.


[23] This study received funding from the European Community's Seventh Framework Programme (FP7-SPACE-2010-1) under the grant agreement 261948 (ATMOP project, SB is equally supported by GRGS. The following institutes are acknowledged for providing the data: Laboratory for Atmospheric and Space Physics (Boulder), National Geophysical Data Center (NOAA), the Space Sciences Center at the University of Southern California, and Space Environment Technologies. We also thank S. C. Solomon and B. R. Bowman for their review.

[24] The Editor thanks Bruce Bowman and Stanley Solomon for their assistance in evaluating this paper.