Qualitative and Quantitative Assessment of the SET HASDM Database

The High Accuracy Satellite Drag Model (HASDM) is the operational thermospheric density model used by the US Space Force Combined Space Operations Center. By using real‐time data assimilation, HASDM can provide density estimates with increased accuracy over other empirical models. With historical HASDM density data being released publicly for the first time, we can analyze the data to compare dominant modes of variability in the upper atmosphere as modeled by HASDM and the Jacchia‐Bowman 2008 Empirical Thermospheric Density Model (JB2008), a Jacchia family model upon which density corrections are made as a part of the HASDM framework. This model comparison is conducted through the use of principal component analysis (PCA) which shows the increased variability of the HASDM dataset. We highlight HASDM's ability to capture the movement of lighter species during solar minimum conditions, unlike many empirical models. We then compare density from both models to the CHAllenging Minisatellite Payload (CHAMP) and Gravity Recovery and Climate Experiment (GRACE) accelerometer‐derived density estimates. This comparison shows that HASDM more closely matches the accelerometer‐derived densities with mean absolute differences of 30.93% compared to CHAMP and GRACE‐A, respectively. The comparison also reveals improved representation of cooling mechanisms due to NO and CO2 by the HASDM database.

The thermosphere is a dynamic, highly driven system impacted by external forces (e.g., space weather events) and internal dynamics (e.g., species transport). Solar irradiance is a major source of variation, providing the baseline average density (Qian & Solomon, 2012). This process is well-represented by solar indices, particularly at low latitudes (Vickers et al., 2014). However, these indices are not adequate in characterizing the thermosphere during solar minimum , when composition changes and other processes become more relevant (Mehta et al., 2019). During events like solar flares or coronal mass ejections, mass and energy from the Sun interact with the magnetosphere causing Joule heating and auroral particle precipitation into the thermosphere (Deng et al., 2013;Fedrizzi et al., 2012). This causes sudden and often large changes in mass density. Due to a lack of pre-storm conditioning, empirical models frequently under-perform during these events (Bruinsma et al., 2021). The thermosphere consists of multiple neutral species that are influenced by these processes. Global circulation causes interhemispheric transport of lighter species resulting in latitudinal variations in species and neutral density (Qian & Solomon, 2012). In addition to horizontal species movement, upwelling and downwelling can transport species vertically impacting neutral density as a function of altitude. Horizontal and vertical transport mechanisms have been recently investigated by Sutton (2016). Certain species (e.g., nitric oxide (NO) and carbon dioxide (CO 2 )) provide cooling mechanisms, particularly in response to geomagnetic activity (Houghton, 1970;Kockarts, 1980;Mlynczak et al., 2003).
Over the past six decades, the scientific community has developed and advanced thermospheric density models. A significant subset of these models are empirical. Empirical models use long-term trends from measurements over an array of instruments to fit parametric equations that describe the system. Even within this subset, there are multiple families/series of models that use different types of measurements and have evolved over decades. Three of these series, discussed by Emmert (2015), are the mass spectrometer incoherent scatter radar (MSIS; Picone et al., 2002), DTM (Bruinsma, 2015), and Jacchia series . MSIS models typically use mass spectrometer and incoherent scatter radar measurements but have evolved and now incorporate additional data (e.g., accelerometer-derived density estimates). The Drag Temperature Model (DTM) series used orbit-derived density data but more recently incorporated accelerometer-derived density and mass spectrometer data. The Jacchia series of models (e.g., Jacchia-70 and the Jacchia-Bowman 2008 Empirical Thermospheric Density Model (JB2008)) strictly use both orbit-and accelerometer-derived density estimates. The availability of accelerometer-derived density estimates has been advantageous for model development and assessment. Over the lifetime of satellites with on board accelerometers (e.g., CHAllenging Minisatellite Payload (CHAMP) and Gravity Recovery and Climate Experiment (GRACE)), we accumulate measurements over many altitudes and space weather conditions (Bettadpur, 2012;Luhr et al., 2002). Researchers have used these measurements to derive density estimates by removing accelerations from other sources (e.g., gravity and solar radiation pressure, SRP; Calabia & Jin, 2016b;Doornbos, 2012;Sutton, 2008). HASDM was developed by Storz et al. (2005) and is an assimilative extension of the Jacchia (1970) upper atmosphere density model. HASDM employs dynamic calibration of atmosphere (DCA) which uses calibration satellite observations to make corrections to its background empirical density model. This assimilation technique was introduced as an application for HASDM by Casali and Barker (2002), but was expanded later to estimate 13 global density correction parameters (Storz et al., 2005). HASDM is not available for public use, but the global density outputs from the model were recently released to the public for the first time by Space Environment Technologies (SET; Tobiska et al., 2021). It is called the SET HASDM density database. This database contains three-dimensional (3D) density grids from the start of 2000 to the end of 2019 at a 3-h cadence. The HASDM dataset has a spatial resolution of  15 longitude,  10 latitude, and 25 km in altitude from 175 to 825 km.
In this work, we will leverage principal component analysis (PCA), also referred to in literature as empirical orthogonal function analysis or proper orthogonal decomposition, in order to study and compare the most dominant sources of variance within the HASDM dataset and spatiotemporally matched JB2008 model outputs. The resulting modes of variation and temporal coefficients give insight into the processes that drive the variance within the models. A similar methodology has been used to analyze thermospheric density datasets previously and is often used in the development of reduced-order models (Gondelach & Linares, 2020;Mehta et al., 2018;Mehta & Linares, 2017). PCA has also been used to study satellite accelerometer datasets (Calabia & Jin, 2016b;Lei, Matsuo, et al., 2012;Matsuo & Forbes, 2010). For this article, the use of PCA is restricted to a quantitative (determining correlation) and qualitative (examining modes and coefficients) investigation.
This article is organized as follows, we start by detailing the HASDM and JB2008 models and their drivers. Then, we discuss the use of PCA as an investigatory tool followed by the results of the analysis. After, we compare the HASDM and JB2008 densities to CHAMP and GRACE density estimate over the entire availability of their measurements, with a focus on storm-time and quiet conditions.

JB2008
The most recent in the Jacchia series is the JB2008 density model. JB2008 was an improvement to its predecessors and incorporated new solar and geomagnetic indices to drive the model. It uses the 10 F , 10 S , 10 M , and 10 Y indices and proxies to model variations caused by solar heating. In addition to p a , JB2008 utilizes Dst to improve model density during geomagnetic storms. These indices are used in temperature corrections, semiannual functions, and new Dst temperature equations. The model reduced non-storm density errors by  5% and reduced storm-time density errors from Jacchia-70 by  60%, from NRLMSIS by  35%, and from JB2008 (with only p a ) by 16% .

HASDM
Using JB2008 as the background density model, HASDM is able to further reduce these errors. By building on the density correction work of Marcos et al. (1998) and Nazarenko et al. (1998), HASDM can provide dynamic global density corrections via 13 global temperature correction coefficients through its DCA algorithm. HASDM also exploits a prediction filter for its DCA corrections. Through this filter, the model adjusts an extrapolated time series of 27 days (one solar rotation) for the correction coefficients using wavelet and Fourier analysis (Storz et al., 2005). For satellite trajectory estimation, HASDM uses a technique, called segmented solution for ballistic coefficient (SSB), that enables the estimated ballistic coefficient to deviate over the fitting period. At the time of the original HASDM publication, it used approximately 75 calibration satellites for fitting. As of 2001, the satellites spanned from 190 to 900 km in altitude with a majority being between 300 and 600 km (Bowman & Storz, 2003). The temperature correction coefficients produced by HASDM are not released to the public. However, SET recreates the HASDM output weekly for validation, and this recreation makes up the SET HASDM density database. For more details on HASDM, the reader is referred to Storz et al. (2005), and for more details on SET's validation and on the database, the reader is referred to Tobiska et al. (2021). While HASDM is an assimilative correction scheme to a background model, we refer to the SET HASDM dataset as a model for convenience.

Model Drivers
The most common solar proxy used in density modeling is 10.7 F , referred to in this article as 10 F . Originally identified and measured by Covington (1948), 10 F serves as a proxy for solar extreme ultraviolet (EUV) emissions which deposit energy into the thermosphere. The 10.7 in the subscript refers to the 10.7 cm wavelength of the solar radio flux being measured. While this does not directly interact with the Earth's atmosphere, it has been shown to be a reliable proxy for thermospheric heating  10 Wm Hz ) indicated as sfu.
The 10 S index characterizes the integrated 26-32 nm solar EUV emission, which penetrates into the middle thermosphere and is absorbed by atomic oxygen . While the emissions that 10 S represents have no relationship to the 10.7 cm wavelength, they are normalized and converted to sfu through linear regression. Similar fits are done for 10 M and 10 Y to convert to uniform units. 10 M is a proxy representative of far ultraviolet photospheric 160 nm Schumann-Runge Continuum emissions and is consistent with molecular oxygen dissociation in the lower thermosphere . The final solar driver for JB2008 is 10 Y , which represents X-ray emissions in the 0.1-0.8 nm range and H Lyman- 121 nm emissions. During solar maximum, the X-ray emissions are more heavily weighted, and LICATA ET AL.

10.1029/2021SW002798
3 of 20 the opposite is true for solar minimum. For each of these four solar drivers, 81-days centered averages are generated and used for prediction in JB2008.
The first of the two geomagnetic drivers for JB2008 is the geomagnetic planetary amplitude, p a . It has a 3-h cadence and is often used in density models. However, using Dst during geomagnetic storms results in increased accuracy over p a for density modeling . The Dst index is largely driven by the strength of the ring current in the inner magnetosphere. This makes it an ideal indicator of ring current strength and therefore geomagnetic storms (Ganushkina et al., 2017).
For operational use of HASDM, forecasts of these drivers are required. SET provides the driver forecasts using multiple algorithms/sources. The solar drivers are forecasted using the SOLAR2000 algorithm (Tobiska et al., 2000). p a forecasts come from the National Oceanic and Atmospheric Administration Space Weather Prediction Center forecasters, and Dst forecasts are a produced by the Anemomilos algorithm (Tobiska et al., 2013). Error statistics of historical forecasts for all six drivers were presented as a community benchmark by Licata et al. (2020).

CHAMP and GRACE Density Estimates
The CHAMP and GRACE datasets used in this study are from , which originate from Sutton (2008) but are scaled to account for higher fidelity satellite geometry and improved gas-surface interaction simulations (Mehta et al., 2013Walker et al., 2014). However, there is no correction to the SRP accelerations which used the simplified 13-panel geometry. Both satellites have near polar orbits, covering nearly all latitudes, and over their respective lifetimes, CHAMP and GRACE datasets cover altitudes ranging from 300 to 535 km. This, in conjunction with the date range covered by the satellites, makes their density estimates invaluable for model comparison. Figure 1 shows altitudes each dataset covers along with orbit-averaged densities over their mission spans. This study only included CHAMP and GRACE-A data due to similarities between the orbits of GRACE-A and GRACE-B.
The orbit-averaged densities were computed using a centered window with a span of 90 min, approximately one orbit. Discontinuities in Figure 1 represent data gaps. Both the CHAMP and GRACE-A datasets contain files for every day containing information such as GPS time, local solar time, latitude, altitude, and density. CHAMP has measurements every 10 s, while GRACE-A provides measurements every 5 s. LICATA ET AL.

Principal Component Analysis
The spatial resolution of these models are  15 longitude,  10 latitude, and 25 km altitude spanning from 175 to 825 km. This results in 12,312 grid points for every 3 h between the start of 2000 to the end of 2019. The high dimensionality of the data makes it a prime candidate for PCA. PCA is an eigen decomposition technique that determines uncorrelated linear combinations of the data that maximize variance (Hotelling, 1933;Karl Pearson, 1901).
PCA has been used in the last two decades on atmospheric density models and accelerometer-derived density sets alike. Matsuo and Forbes (2010) applied linear regression using spherical harmonics to extract the mean of CHAMP densities and performed PCA on the residuals, and Lei, Matsuo, et al. (2012) expanded this methodology to both CHAMP and GRACE data. Calabia and Jin (2016a) interpolated GRACE accelerations to a grid prior to performing PCA. This methodology was expanded to a 3D global grid by Mehta and Linares (2017) and is used in this work.
For both models, the spatial dimensions are reshaped into a single dimension to make the spatiotemporal dataset two-dimensional. Then a common logarithm of the density values is taken in order to reduce the variance of the dataset from five orders of magnitude to less than one. To examine the inherent differences between HASDM and JB2008, the difference (HASDM − JB2008) is taken between the log-density values across the spatial and temporal dimensions for a separate analysis. Next, we subtract the temporal mean for each cell to center the data. Finally, we perform PCA using the svds function in MATLAB to obtain the U , , and V matrices. PCA decomposes the data and separates spatial and temporal variations such that: where  n x  is the model output state (density on its full 3D grid), x is the mean,  x is the variance, r is the choice of order truncation,  i are temporal coefficients, and i U are orthogonal modes or basis functions. The modes are the first r columns of the left singular vector derived by performing PCA on an ensemble of model output solutions such that: In Equation 2, m represents the ensemble size (two solar cycles with HASDM and JB2008). The normalized and centered data is denoted by X. U is the left unitary matrix, and it is made of orthogonal vectors that represent the modes of variation.  is a diagonal matrix consisting of the squares of the eigenvalues that correspond to the vectors in U . We can extract temporal coefficients by performing matrix multiplication between  and T V . Therefore, the signs of the modes and coefficients are important in the analysis phase. The reader is referred to Bjornsson and Venegas (1997) for more details on the distinction between PCA and singular value decomposition for use on climatic data.
Upon obtaining the temporal coefficients, we compute the Pearson correlation coefficients between each driver and PCA coefficient (Pearson, 1920). In addition to the space weather drivers, the models use temporal inputs (e.g., universal time and day of year). To correlate seasonal and annual trends, we generated sinusoidal inputs based on the day of year (Weimer et al., 2020). The first two are sine and cosine functions with periods of 6 months. This is used to test correlations with semiannual trends. The last two are sine and cosine functions with periods of 1 yr to correlate with annual trends.

Model-Data Comparison
In order to compare the satellite density estimates to the two models, we implement a trilinear interpolation algorithm using the global density grids from the models. Since the temporal resolution of the model densities is only every 3 h, we maintain the same density grids over each 3-h period and interpolate the LICATA ET AL.
10.1029/2021SW002798 5 of 20 model densities to the satellite positions at their respective cadence (5 or 10 s). The authors appended the existing CHAMP and GRACE density data of  with the HASDM and JB2008 densities and have made them publicly available to the community (see Data Availability Statement section).
We compute mean absolute percent difference shown in Equation 3 between the models and satellites using a 1-yr window to investigate the agreement between the four density sources (HASDM, JB2008, CHAMP, and GRACE-A). This is done for the individual density values and the orbit-averaged density values. Differences in the individual density values give an idea of localized deviations in the datasets while differences in the orbit-averaged density values represent the general deviations in the datasets. We also perform two case studies to compare densities during the 2003 Halloween Storm and a quiet period in 2009. There is no scaling performed on the densities, because the goal is to compare relative trends in the density data. Therefore, drag coefficient uncertainty can play a role in biases between models.

PCA Results
We begin by performing PCA on the entire dataset  to get insight into the general density formulations. Then, we look into specific conditions, such as solar maximum and solar minimum, where different processes drive the global density variations. In the solar maximum and solar minimum analyses, PCA is not conducted again on those individual years, we calculate the correlations for only the selected period. Figure 2 shows the variance captured by the first 10 modes of variation for the log 10 HASDM and JB2008 densities. We also spatiotemporally decompose the difference between the HASDM and JB2008 model output to investigate the additional modes of variation within the HASDM database. Figure 2 also shows the variance captured by the first 10 modes of the differenced dataset. Note that the log 10 density differences are taken prior to conducting PCA; therefore, it serves as a stand-alone analysis and will be discussed in Section 3.2.

2000-2020, Solar Maximum, Solar Minimum Analysis
It is clear that the contribution of the first PC for both models is significant, capturing over 60% of the system's total variance. More importantly, the first PC for JB2008 captures over 10% more variance than it does for HASDM. There is also more variance captured by the second PC for JB2008, but beyond that, the individual variance captured is marginally greater for HASDM. JB2008 densities are more generalized relative to HASDM. Therefore, PCA can capture more variance in the first few modes while HASDM contains more variability, spreading out the variance captured over an increased number of modes. The mean log 10 density is shown in the first row of Figures 3 and 4 even though they are removed prior to performing PCA. The first mode for both models largely represents the solar flux cycle and is driven by temperature. There is a peak around 14 h local time and a minimum at 2 h. From Figure 8, the first coefficient is highly correlated to 10 F and the other solar indices; there is a 90% or greater correlation to all four solar indices/proxies and their centered averages. There is lower correlation between  1 and the solar drivers when considering only solar maximum or solar minimum. In these periods, there are higher correlations to the geomagnetic and temporal drivers. Between 400 and 600 km, mode 1 for HASDM flips peak locations; the transition occurs at 525 km. This is hypothesized to be the oxygen to helium transition LICATA ET AL.
10.1029/2021SW002798 7 of 20   as lighter species have an inverse correlation to temperature (Mehta et al., 2019). This is also represented in JB2008, but it occurs at 600 km which can be seen in Figure 4. The second mode seemingly represents a hemispherical/latitudinal variation due to summer-winter variation. Based on the day of year, this mode can change in intensity and orientation. This is caused by the sinusoidal trend of  2 with a period of  365 days. There is a stronger signal in  2 for JB2008 highlighting the increased variability in the HASDM dataset.
It is important to note that PCA does not guarantee that each mode represents physical processes or that a given mode corresponds to a single process. For the first two modes, there is an evident dominant process LICATA ET AL.

10.1029/2021SW002798
8 of 20 representing it but does not seem to be the case for mode 3 based on the correlation values in Figure 8. In Figure 6,  3 is nearly a mirror image of the first coefficient for both HASDM and JB2008, visually. However, Figure 7 shows that there are no longer strong similarities between  1 and  3 . The general appearance of  3 in 2019 is even different between the two models. These observations are reinforced by the correlation values in Figure 8 for  3 during the three periods. There are negligible correlation values for the entire period, but the coefficient correlates strongly with different drivers during solar maximum and solar minimum.
In Figures 3 and 4, the movement of the peak in mode 3 provides insight to HASDM. At lower altitudes (200-400 km), the HASDM peak has a  4 hour shift relative to JB2008. The JB2008 peak is located between 12 and 14 h local time while HASDM's are around 9 h. Beyond 400 km, the peak in both models shifts to 2 h local time and toward the equator. They exhibit similar trends up to 825 km which hints at the reliance on HASDM's background model when the signal decreases at higher altitudes. We interpret this as HASDM's ability to accurately capture the winter helium budge prominent during solar minimum conditions (Keating & Prior, 1968;Reber & Hays, 1973) as the local time peak coincides with the high concentration of helium observed by Cageao and Kerr (1984) and Thayer et al. (2012) around 9 h local time.
The last two modes shown in Figures 3 and 4 for the two models are flipped, meaning the fourth for HASDM has the same source as the fifth for JB2008 and vice versa. There is only a 2.25% difference between modes 4 and 5 for HASDM which signifies that their respective contribution to the system's overall variance is LICATA ET AL.
10.1029/2021SW002798 9 of 20 similar. Therefore, in all mode and coefficient figures, we switch JB2008 modes 4 and 5 to have them match that of HASDM.
Mode 4 seems to be further effects of solar heating, while mode five is similar to mode three in its structure.
 4 has some correlation to geomagnetic activity, while  4 for JB2008 has moderately strong correlation to the semiannual cosine wave. In Figures 3 and 4, the difference between modes 3 and 5 is the location of the peak present in either the northern or southern poles.
During all three periods,  4 for JB2008 has a high correlation with the semiannual cosine wave (around −0.70), while  4 for HASDM has its highest correlation with p a and Dst. In the coefficient figures,  4 for HASDM has spikes that coincide with the geomagnetic storms seen in the Dst plot.  5 for both models have high correlations with the annual cosine wave and likely represent residuals from the summer-winter variation. Figure 9 shows the mean log 10 density difference and the first three modes from the density difference PCA at 200, 400 and 600 km. Figure 11 contains the correlation coefficients between the drivers and temporal coefficients for the difference analysis. The temporal coefficients for this analysis can be found in Figure 10.

Density Difference PCA
LICATA ET AL.

10.1029/2021SW002798
10 of 20 The first row in Figure 9 contains the mean density differences at three altitudes. These show that the diurnal structure of the models' densities deviate from one another, particularly with respect to altitude. The color bar limits are centered around zero showing that HASDM predicts higher night-side densities and lower dayside densities. The specific structure of this difference evolves with altitude.
The first mode is representative of the additional variation captured in  1 for HASDM relative to JB2008. While Figure 11 shows no meaningful correlation between  1 and the solar drivers, Mode 1 has peaks at 14 h and a general diurnal structure. Considering how  1 oscillates about zero in the first column of Figure 10, the first mode is likely representative of structural differences in density related to the expansion of the thermosphere. In the last column of Figure 10,  1 has a semiannual period, supported by its correlation with the semiannual cosine wave (−0.37). Figure 11 shows that  2 correlates to all the solar drivers, and has some correlation with the geomagnetic and temporal drivers. In the first column of Figure 10,  2 is mostly negative during solar maximum and does not have any significant long-term trends during those periods. However, during solar minimum, it oscillates above zero with a defined semiannual period, as seen in the last column. This suggests that it may represent a process that is prevalent during solar minimum.
LICATA ET AL.  There is an interesting structural evolution for mode two in Figure 9 with increasing altitude. Based on the peak and trough locations at 600 km, we hypothesize that it represents the movement of lighter species during solar minimum. Lighter species rise through vertical advection and move laterally through horizontal divergence and advection (Sutton, 2016). Looking at the 600 km profile for mode 2, there is a trough centered above the equator at 14 h, where density is highest due to heating.  2 is positive during solar minimum, holding the peak-trough locations, and negative during solar maximum, flipping the peaktrough locations. Mode 2 during solar maximum would revert to a simple heating profile.
In Figure 9, mode 3 shows a latitudinal dependence with asymmetry between the poles.  3 in Figure 10 changes with the seasons, which follows the interhemispheric transport of light species (Mehta et al., 2019). The modes in Figure 9 and the temporal coefficients in Figure 10 come from the density difference analysis, meaning it is representative of a process in only one of the models. Most empirical models, like JB2008, use diffusive equilibrium to predict temperature as a function of altitude, which does not take into account the profiles of the individual species (Picone et al., 2016). The technique breaks down during solar minimum when the movement of lighter species becomes critical. We hypothesize that HASDM is able to capture this phenomenon. Figure 12 shows histograms of the 10 log orbit-averaged densities for both satellites and models, and Table 1 shows mean absolute differences for both the orbit and orbit-averaged densities with respect to the satellite densities, broken down by year. Both HASDM and JB2008 overpredict density relative to CHAMP and GRACE-A. However, the HASDM distributions have a marginally smaller bias for both satellites. The shape of the HASDM distributions more closely matches the high fidelity CHAMP and GRACE-A estimates with the smaller peaks being present on the right side. The JB2008 has similar distributions to both satellites but LICATA ET AL.
HASDM densities more closely match both CHAMP and GRACE-A estimates overall. However, there are years for both satellites where JB2008 predicts densities closer to the satellite estimates, particularly as the solar cycle begins to transition out of solar maximum (2003)(2004) and as it transitions out of solar minimum (2010). In general, there is more agreement between the model and satellite densities during solar maximum. The models have lower percent differences when looking at orbit-averaged values, because they are tracking general density trends much better than the short period disturbances. The decrease in the density differences range from 3 -13% and 4 -15% for HASDM and JB2008, respectively, when comparing the orbit to obit-averaged differences for CHAMP. For GRACE-A, the differences are more pronounced, being 5 -20% for HASDM and 5 -19% for JB2008. Considering the similarity in orbit inclination, this disagreement between the orbit and orbit-averaged differences may be attributed to the altitude.

CHAMP and GRACE Case Study
To more closely examine the densities, we look at both active and quiet 6-day periods (Figures 13 and 14, respectively). Figure 13 shows densities along CHAMP and GRACE-A orbits during the 2003 Halloween storm.
The 2003 Halloween storm was one of the strongest geomagnetic storms in the 21st century with p a reaching its maximum possible value on two occasions in a 2-day period. Figure 13 shows Dst at a three-hourly cadence (to line up with the HASDM dataset), but a higher resolution progression of Dst during this storm shows minimum values of −400 nT in the second phase of the storm (Pulkkinen et al., 2005). Note LICATA ET AL.

10.1029/2021SW002798
13 of 20 that Dst is multiplied by −1 in this figure to coincide with the axes for p a . During the two density peaks, the models and satellites all have similar timing, but HASDM overpredicts relative to the satellites and JB2008.
The density recovery post-storm is modeled significantly closer to the satellites by HASDM than JB2008. We interpret this as HASDM's ability to model the impact of post-storm cooling mechanisms, such as NO. Knipp et al. (2017) showed that in shock-led geomagnetic storms, there is an overproduction of NO. Lei, Burns, et al. (2012) studied this particular storm and found that temperature and density post-storm were lower than pre-storm levels. This is observed in Figure 13 in the HASDM and satellite densities. The percent difference for JB2008 grows considerably in the recovery phase. This difference is considerably larger compared to GRACE-A than it is for CHAMP, which reinforces observations made by Oliveira and Zesta (2019). HASDM is able to capture processes like this through its calibration satellite assimilation. The mean differences for orbit-averaged densities with respect to CHAMP are 16.47% and 34.98% for HASDM and JB2008, respectively. Relative to GRACE-A, the mean differences are 23.24% and 47.51% for HASDM and JB2008, LICATA ET AL.

10.1029/2021SW002798
14 of 20 respectively. The constant bias in HASDM may be attributed to the aforementioned D c and SRP uncertainty or errors in HASDM.
In Figure 14, the same information is presented for a quiet period in 2009. For this period, 10 F remains at solar minimum levels, and stays between 69 and 70 sfu. Concurrently, p a varies continuously but never exceeds 12 ( p K = 3-).
In the orbit-averaged plots, HASDM is discernibly closer to the satellite densities. JB2008 predicts density closer to HASDM for CHAMP (lower altitude) than for GRACE-A. Referring back to Figure   in average altitude for the satellites in 2009 is approximately 150 km. This explains the order of magnitude difference in densities and shows that HASDM more closely matches satellite estimates at higher altitudes. For both satellites' orbits, HASDM is predicting lower density than JB2008 for a majority of the period. HASDM may be better capturing the effects of radiative cooling by CO 2 , not modeled by JB2008. Emmert et al. (2010) found that in the 2008 solar minimum, there were record-low densities that could not be entirely attributed to 10 F . They identified that enhanced radiative cooling by CO 2 was a major contributor to the decreased density. HASDM's data assimilation scheme is able to make corrections based observations of lower density. The mean differences for orbit-averaged densities with respect to CHAMP are 26.47% and 36.98% for HASDM and JB2008, respectively. Regarding to GRACE-A, the mean differences are 16.01% and 36.00% for HASDM and JB2008, respectively.
There are considerable peaks in the JB2008 densities, particularly for the GRACE-A orbit. These seem to be a response to the geomagnetic activity (seen in the bottom-right panel). All JB2008 density peaks lag p a spikes by about 12 h and deviate from HASDM and satellite densities. p a is a good indicator of the source, because JB2008 uses p a when no storms are detected. JB2008 is likely overestimating the impact of the p a fluctuations relative to HASDM and considering the satellite densities.

Summary
In this work, we investigate HASDM and JB2008 densities by leveraging PCA. To analyze the model data, PCA was applied after normalization and centering for both models, along with the difference of the log 10 densities. The comparison of the modes for the two models showed that HASDM likely models the presence of the winter helium bulge, unlike JB2008. The density difference PCA demonstrated HASDM's ability to model the short and long-term movement of lighter species during solar minimum, which was observed through its modes and the comparison of its temporal coefficients across the three periods of interest. Further investigation is required to confirm these observations. By comparing the two models to CHAMP and GRACE-A accelerometer-derived density estimates, we found that HASDM had a more similar density distribution to the satellite estimates than JB2008. Overall, HASDM's predictions were closer to CHAMP's estimates (orbit = 29.35% and averaged = 23.84%) than JB2008's (orbit = 33.96% and averaged = 27.74%). This observation was also true with respect to GRACE-A: LICATA ET AL.  HASDM (orbit = 41.80% and averaged = 30.93%) JB2008 (orbit = 50.52% and averaged = 39.93%). By examining the model and satellite densities during the 2003 Halloween storm, it was observed that HASDM achieved significant improvements post-storm, capturing the over-cooled state of the thermosphere due to an overproduction of NO. The difference in density for JB2008 relative to GRACE-A reached over 150% in the 48 h period following the storm while HASDM maintained a 20%-25% difference. Looking at the densities for a quiet period, we saw that HASDM predicted lower densities than JB2008, potentially representing its ability to capture long-term cooling effects from CO 2 during solar minimum through its data assimilation scheme. Across this 6-day window, HASDM differences relative to the satellite estimates were 10%-16% lower than JB2008.
In the future, we plan to develop machine-learned (ML) models using various drivers. Not only will it generate a computationally efficient predictive model, it will allow us to perform nonlinear analysis into the contribution of these and additional drivers. These models can leverage ML techniques to also model uncertainty in the system (Licata et al., 2021). More studies of HASDM's predictions during major storms are required and detailed statistics of its prediction capability during storm phases would provide great insight LICATA ET AL. for future model development. While it is outside the scope of this work, it is necessary to further investigate HASDM's ability to capture light species movement during solar minimum.

Data Availability Statement
The JB2008 model is available for download at https://spacewx.com/jb2008/. Requests can be submitted for access to the SET HASDM density database at https://spacewx.com/hasdm/ and all reasonable requests for scientific research will be accepted as explained in the rules of road document on the website. The historical space weather indices used in this study can also be found at the JB2008 link. Original CHAMP and GRACE density estimates from  can be found at http://tinyurl.com/densitysets. As a product of this work, we appended the HASDM and JB2008 density estimates to those files. These updated files can be found at https://zenodo.org/record/4602380#.YEwEw-1KhuU.
LICATA ET AL.