Assessing Thermospheric Neutral Density Models Using GEODYN's Precision Orbit Determination

This study focuses on utilizing the increasing availability of satellite trajectory data from global navigation satellite system‐enabled low‐Earth orbiting satellites and their precision orbit determination (POD) solutions to expand and refine thermospheric model validation capabilities. The research introduces an updated interface for the GEODYN‐II POD software, leveraging high‐precision space geodetic POD to investigate satellite drag and assess density models. This work presents a case study to examine five models (NRLMSIS2.0, DTM2020, JB2008, TIEGCM, and CTIPe) using precise science orbit (PSO) solutions of the Ice, Cloud, and Land Elevation Satellite‐2 (ICESat‐2). The PSO is used as tracking measurements to construct orbit fits, enabling an evaluation according to each model's ability to redetermine the orbit. Relative in‐track deviations, quantified by in‐track residuals and root‐mean‐square errors (RMSe), are treated as proxies for model densities that differ from an unknown true density. The study investigates assumptions related to the treatment of the drag coefficient and leverages them to eliminate bias and effectively scale model density. Assessment results and interpretations are dictated by the timescale at which the scaling occurs. DTM2020 requires the least scaling (∼−7%) to achieve orbit fits closely matching the PSO within an in‐track RMSe of 7 m when scaled over 2 weeks and 2 m when scaled daily. The remaining models require substantial scaling of the mean density offset (∼30 − 75%) to construct orbit fits that meet the aforementioned RMSe criteria. All models exhibit slight over or under‐sensitivity to geomagnetic activity according to trends in their 24‐hr scaling factors.

Precision orbit determination (POD) programs are employed in both operational and research capacities to provide high-fidelity orbit trajectories of LEO satellites.The quality of such trajectories is directly dependent on the ability of a POD's force model to realistically capture the conservative and non-conservative forces impacting a satellite's orbit.Due to advancements in conservative force modeling, the largest source of error preventing more accurate orbit trajectories is now associated with non-conservative forces (Reigber et al., 2006;Tapley et al., 2005;Velicogna & Wahr, 2005).Of these, atmospheric drag is the most variable and uncertain as a consequence of its reliance on modeling the thermospheric neutral mass density (ρ) variations and the satellite drag-coefficient (C D ) (Hejduk & Snow, 2018).The largest source of uncertainty is ρ, but for satellites with complex shapes, C D can contribute to this uncertainty.Mehta et al. (2022) describes this issue as the interconnectedness of uncertain parameters, an extremely challenging problem to solve for the satellite drag community and one that has a significant impact on the assumptions made in this work.The burden for achieving more precise and reliable LEO nowcasting and forecasting largely relies on the ability of thermospheric density models to accurately capture the behavior of neutral density and reliably predict it into the future.Adding to the problem, assessing the performance of density models presents a massive challenge due to the scarcity of satellite measurement data and the lack of absolute truth due to the complexity of interconnected uncertainties.This necessitates the community to seek alternative methods to add to the validation method repertoire.The growing prevalence of global navigation satellite system (GNSS)-enabled low-Earth orbiting satellites and their POD solutions represent one such potential data source, and providing methods to take advantage of these data sets will help the community expand and refine model validation capabilities.
POD programs such as the NASA Goddard Space Flight Center's (GSFC) GEODYN II software (henceforth referred to as GEODYN) have been developed within the geodesy scientific community with the above challenges in mind-implementing techniques such as reduced-dynamics paired with extremely high-quality tracking measurements from GNSS to mitigate the need for highly accurate non-conservative force models when performing non-predictive orbit determination.Through these means, centimeter-level radial accuracy has been demonstrated to produce precise science orbit (PSO) solutions for missions such as the Ice, Cloud, and Land Elevation Satellite-2 (ICESat-2), which orbits at approximately 500 km (Thomas et al., 2021).These techniques-combined with GEODYN's legacy of precise conservative force and measurement modeling, meticulous time systems, and accurate coordinate reference frames-have made the program a top-tier POD tool that is well-positioned to study thermospheric neutral density models and their distinct impacts on the estimation of satellite drag (Lemoine et al., 2016;Loomis et al., 2019;Luthcke et al., 2003;Zelensky et al., 2010).This work aims to provide a method to improve the specification of satellite drag physics and the assessment of neutral density model performance to help the Ionosphere-Thermosphere (IT) community advance model predictions, and consequently improve the accuracy of POD solutions.This paper presents the development of a modernized Python interface for the GEODYN software, leveraging the high-precision nature of space geodetic POD, but refashioned to study satellite drag and to enable density model assessment.We make use of the well-specified, low-error ICESat-2 PSO to perform a case study assessment of five thermospheric density models, three of which are empirical while the other two are physics-based.The ICESat-2 PSO serves as tracking measurements to POD-based orbit fits in which the drag effects from density models are assessed according to each model's ability to redetermine the orbit.Implications regarding the treatment of the drag coefficient are investigated and discussed.This work reports an initial result using a fixed drag coefficient of C D = 2.5, followed by two methods for debiasing the assessment results using a drag acceleration scaling factor over both a 2-week and a daily time interval.Each model's orbit fit contains relative in-track deviations, quantified by in-track residuals and root-mean-square errors from the ICESat-2 PSO, which are treated as proxies for model densities that differ from a true, unspecified density.By developing these methods, we aim to provide the community with the means to take advantage of emerging GNSS-tracked satellite data sets and POD solutions to objectively quantify density model performance.In addition, we hope to address deficiencies in non-conservative force modeling that may currently impede higher quality predictions of LEO trajectories.The presented model assessment results will be parsed into the Community Coordinated Modeling Center's Comprehensive Assessment of Models and Events using Library Tools (CAMEL) framework, for community use.

10.1029/2023SW003603
3 of 27 Section 2 gives the necessary science background needed to understand our methodology.Section 3 details the GEODYN software, provides information regarding the ICESat-2 POD solutions, and offers an overview description of the upper atmospheric density models that are assessed in this work.Section 4 details the methodology, the setup procedure for conducting the model assessment, and the methods for debiasing the assessment results using drag acceleration scaling factors.Section 5 provides the results and discussion of the assessment using ICESat-2 PSO as a case study.

Background
The precision of a POD solution relies on the fidelity of the tracking measurement models, the quality of the tracking data, and the ability of the POD force model to capture realistic accelerations acting on the satellite.In general, the force model describes the overall motion of a spacecraft by calculating the sum of all impacting forces.These forces are defined as either conservative which are potential in nature, or non-conservative which act to dissipate the satellite's orbital energy.Conservative forces captured by the GEODYN force model include the Earth's static gravity field (geopotential), solid Earth and ocean tides, the effects of dynamic polar motion, the acceleration from time variable gravity, Third-body perturbations (primarily from the Sun and Moon), and contributions from general relativity.Recent improvements in conservative force modeling as well as advances in the internal measurement models have shifted the primary source of error in POD solutions to the non-conservative forces (Loomis et al., 2019;Luthcke et al., 2006;Reigber et al., 2006).The non-conservative forces modeled in GEODYN are atmospheric drag, solar radiation pressure (SRP), and Earth radiation pressure (ERP).As altitude decreases in the LEO regime, atmospheric drag increases exponentially becoming the largest non-gravitational force acting on satellites in the lower register of LEO.In addition, the drag force's dependence on the upper atmospheric neutral mass density makes it the most error-bound perturbing force (Hejduk & Snow, 2018).While force model errors can be largely accommodated via reduced-dynamics and high-quality tracking measurements, this technique is limited in its application for the eventual goal of orbit prediction, which requires an improved, more-realistic force model (Luthcke et al., 2019;Tapley et al., 2004).
The drag force acting on a satellite of mass m sat is proportional to the atmospheric neutral mass density ρ, the drag coefficient C D , the projected area perpendicular to the flow direction A sat , and the velocity of the satellite relative to the atmosphere  ⃗  .The drag acceleration    due to the drag force per unit mass acting on a satellite is given in Equation 1as Physically, the total drag force acting on a satellite surface is given by the force due to incident atmospheric particles impacting the surface combined with the force from scattered particles departing from the surface.These effects are represented by the drag coefficient C D , which depends on the geometry, orientation, material, and surface temperature of a spacecraft, the local atmospheric composition, gas-surface interactions, and other effects (Bernstein & Pilinski, 2022).In the context of spacecraft dynamics, the C D is generally characterized as either fixed, fitted, or physical.Fixed C D uses a predetermined value that does not change.Fitted C D is derived using some form of a fitting or filtering process and is typically updated over time (every few hours or orbits).Physical C D is computed by modeling the momentum and energy exchange between the flow-field particles and the satellite (see Mehta et al. (2022) for more details).If not physically calculated, C D 's presence in Equation 1 may be thought of as a scaling factor that effectively serves to average out errors in the atmospheric density model and gas-surface interactions.In its base state, GEODYN can use either a fitted or fixed C D .In the fitted case C D is an adjustable parameter that accounts for mismodeled physics and for uncertainties in ρ associated with the upper atmospheric density model.
Earth's upper atmosphere is driven by a broad range of external energy inputs, leading to complex thermal, electromagnetic, and chemical processes that result in a thermospheric neutral mass density ρ that is highly dynamic and whose variability is difficult to specify (Emmert, 2015).Upper-atmospheric density models are employed within POD force models to represent the complex behavior of ρ when calculating the force of satellite drag acting on a spacecraft, directly or indirectly through C D .The three types of density models most commonly used by upper atmospheric communities are semi-empirical, physics-based, and data assimilative models.The simple yet effective semi-empirical models are most commonly employed in POD force models since they offer excellent 10.1029/2023SW003603 4 of 27 climatological pictures of upper atmospheric variability and are computationally inexpensive.Physics-based models are more complex, taking the form of general circulation models which solve the first-principle equations that govern the coupled thermosphere-ionosphere system.They are not typically used in POD geodetic settings due to the computational expense.A data assimilative technique can be used to calibrate modeled density and has given rise to data assimilative (also referred to as dynamically calibrated) models.These combine analyses from a multitude of space objects to produce corrections to empirical (and occasionally physics-based) thermospheric models.The most prominent example of assimilative thermospheric density models is the United States Space Force, High Accuracy Satellite Drag Model (HASDM) (Storz et al., 2005).It is a common practice in the IT modeling community to compare model performances against HASDM outputs since it performs real-time calibration using ∼75 space objects.
Different models, and even model types, have varying degrees of performance under specified conditions.Individual model performances are known to depend greatly on the solar flux and geomagnetic conditions that drive them, and their respective strengths make some models better qualified for some scenarios than others.
Semi-empirical models are often computationally fast and accurate for climatological uses, but their ability to accurately project into the future is closely tied to the fidelity of their drivers.Physics models offer great potential for forecasting, but lack the accuracy of semi-empirical models in near real-time scenarios (Shim et al., 2014;Sutton, 2018).The vast range in model performances makes the evaluation of models a critical goal for upper atmospheric science and satellite drag communities.The scarcity, and coupled uncertainty (via C D uncertainty) of thermospheric density measurements makes this a significant challenge.The most common method for objectively quantifying a density model's performance is to compare the sampled model outputs against satellite measurements, for example, see Walterscheid et al. (2023)-usually in the form of accelerometer-derived densities from the Challenging Minisatellite Payload (CHAMP), Gravity Recovery and Climate Experiment (GRACE) or Gravity Field and Steady-State Ocean Circulation (GOCE) missions (Bruinsma et al., 2004;Doornbos et al., 2010;Mehta et al., 2017;Sutton et al., 2005).
In a series of papers motivated to provide community organization for conducting model comparison and evaluation, Bruinsma et al. (2017) and Bruinsma et al. (2018Bruinsma et al. ( , 2021) ) provide commonalities for inter-model scoring.They report on chosen observed density data sets, time periods of interest, and provide a scoring metric in the form of the mean, standard deviation and root mean square error (RMSe) of the observation-to-model density ratios.He et al. (2018) similarly presents an assessment of several semi-empirical thermosphere models, focusing on their ability to reproduce spatial variations and capture complex features in thermosphere mass density.Shim et al. (2014) provides a systematic evaluation of thermospheric and ionospheric models, quantifying model performance using four skill scores calculated as functions of geomagnetic activity and geographic latitude: RMS error, prediction efficiency, ratio of maximum-to-minimum, and ratio of maximum amplitude.Thayer et al. (2023) investigates the use of the day-to-night density ratio as a metric for representing the atmosphere's response to large scale perturbations (i.e., the transition from solar maximum to solar minimum), providing inter-model and model-to-observation comparisons, and unearthing discrepancies that are not observed between models and observations when viewed using more common metrics.Each of these reports makes use of the accelerometer-derived density data sets to objectively quantify model performance.
While advances have also been made via earlier work to assess satellite drag accelerations and extract densities from POD processes (see Doornbos et al. (2005), Hackel et al. (2017), van den IJssel et al. (2020), andDoornbos et al. (2002)), this work aims to contribute an additional method to the community in which accurately developed and well-honed POD tools can be leveraged for assessing density model performance.For the purposes of this paper, we make a distinction when referring to the different stages of model assessment.We use the term "assessment" to refer more generally to methods and results that offer insight into model performance."Verification" refers to using other well-specified methods and data sets to confirm the fidelity of our methods and results."Validation" refers to the act of objectively quantifying modeled densities against observed/derived values.This paper offers a verification of our method and results by comparing against the HASDM densities, and provides an example performance assessment using 2-weeks of the ICESat-2 PSO as a case study.A more formal validation scheme is the eventual goal of this work, however, this requires additional considerations and is a source of continuing effort.

GEODYN and the Pygeodyn Wrapper
The GEODYN-II program is a precision orbit determination and parameter estimation tool that has been used on every NASA geodetic Earth and planetary altimeter mission since 1985.The program is used extensively for orbit determination, geodetic parameter estimation, tracking instrument calibration, and satellite orbit prediction, as well as for many other applied research studies in satellite geodesy (Loomis et al., 2019;Luthcke et al., 2019;Pavlis et al., 2019).GEODYN is capable of ingesting essentially all types of tracking measurements, the most common of which include observations from global navigation satellite systems (GNSS) and satellite laser ranging (SLR), as well as post-processed orbits in the form of orbit trajectories or precisely converted elements (PCE) (Pavlis et al., 2019).GEODYN performs orbit propagation using Cowell's method of numerical integration, and performs data-reduction utilizing a Bayesian least-squares batch estimation process to optimally estimate parameters by minimizing the residuals between tracking data and orbit propagations (see Vallado (2013) for more information).GEODYN's long history in geodetic applications has ensured the development of very precise conservative force and measurement models, as well as accurate time systems and coordinate reference frames, making the program a top-tier POD tool (see Luthcke et al. (2003), Lemoine et al. (2016), and the references therein).With this understanding, the errors found in the observed residuals between tracking data and determined orbit are more related to uncertainties in the satellite specific non-conservative force models, rather than being related to the quality of measurement modeling or orbit determination methods and tools.In the lower register of LEO, where atmospheric drag dominates, the observation residuals can provide valuable information on the drag model errors.
Pygeodyn is an internally developed Python-based wrapper meant to offer improved user access to the FORTRAN-based GEODYN software.Pygeodyn offers users a streamlined and simplified tool to navigate the complex steps for modifying, controlling, running, and reading the various data sets and files that compose the GEODYN program.The main portion of GEODYN II is composed of two sequenced programs: GEODYN-IIS, a scheduling program and GEODYN-IIE, an execution program stage.The scheduling program reads and organizes input data, ancillary data files, and the user's setup options.The execution program then integrates the satellite trajectory and applies the selected models, performs orbit determination to provide computed observables, and uses the least squares scheme, along with any measured observables, to provide solutions for updated orbits as well as any requested geophysical parameters.The two stages communicate via a series of binary files which are output from the scheduling program and fed into the execution program.Historically, adding atmospheric density models to GEODYN required modification to IIS as well as subsequent data tracking and modification to IIE, a series of complications that have been circumvented with our Pygeodyn tool.Pygeodyn gives the ability to switch between different atmospheric density models that have been connected to GEODYN-IIE without the need to modify GEODYN-IIS, simplifying the user experience for adding and selecting the models.
Programming in Python has also afforded Pygeodyn the ability to interface with the NASA Goddard Community Coordinated Modeling Center's (CCMC) Kamodo API (Ringuette et al., 2023), granting access to their sophisticated model readers and allowing Pygeodyn to connect physics-based density model outputs to the POD scheme.

ICESat-2 PSO Solutions as Tracking Data
ICESat-2 flies in a near-circular, near-polar, low-Earth orbit at ∼496 km altitude and an orbital period of 94.22 min.Details of the orbital parameters are reported in Luthcke et al. (2019).The ICESat-2 PSO (i.e., the science quality POD solutions), and their corresponding setup files, are provided by the Geodesy and Geophysics Laboratory within NASA/GSFC, which maintains the GEODYN program and provides science quality POD for many NASA missions.ICESat-2 is an excellent platform for orbital drag-based model assessment because of its science requirements to have such high-quality orbit solutions, as well as stable attitude specifications.The ICESat-2 PSO solutions are generated through the reduction of GNSS double-difference carrier phase observable residuals and are reported by Thomas et al. (2021) as having a radial orbit accuracy below 1.5 cm over a 24-hr orbit solution according to independent assessment using SLR measurement residual analysis.Note that Thomas et al. (2021) gave the 1.5 cm accuracy in terms of the radial component only because of the mission requirements.
The precisions of the orbit solutions were also verified in all three components using orbit overlap analysis and were found to have a 3D RMSe less than 1.5 cm.Technical details regarding the construction and analysis plan for the ICESat-2 PSO can be found in Luthcke et al. (2019).
The centimeter-level orbit of the PSO data is achieved using the previously mentioned reduced-dynamics technique in which GEODYN solves for empirical acceleration parameters that describe the difference between the actual positions, that is, those derived from the GPS tracking measurements, and the positions that are calculated by the program's physical force models and satellite propagator.The PSO data includes estimations of along-track and cross-track empirical accelerations every quarter of an orbit, applying a nearest neighbor covariance constraint.With the use of reduced-dynamic empirical accelerations, it is possible to compensate for errors associated with using the MSIS86 model to calculate the effects of atmospheric drag.Luthcke et al. (2019) notes that even though a reduced-dynamic approach is commonly employed by the geodesy community to overcome any inadequacies in a force model, the technique relies on an orbit solution that has already attained sufficient radial accuracy through the use of a high-quality physical force model.Dense tracking measurements and the reduced-dynamic technique do not obviate the use of accurate orbit modeling, and improvements in the orbit fit will be realized when the force models are improved.We also note that while MSIS86 was used to estimate the drag accelerations in the ICESat-2 PSO, due to them being combined with additional empirical accelerations in the along-track and cross-track directions no related bias is found that favors the MSIS series of models in the assessments reported in this work.
This work uses the ICESat-2 PSO as tracking measurement input to a data-reduction run of GEODYN-the goal being to assess the ability of each selected density model to re-determine the orbit of ICESat-2.A data-reduction run in GEODYN is one in which orbit parameters (i.e., initial conditions) and optionally geophysical parameters (such as gravitational coefficients or the drag coefficient) are adjusted to minimize residuals and provide an improved solution.This data reduction is computed over an orbital arc, a set time period for which continuous tracking data is available.The term "orbit fit" refers to the outputs of GEODYN runs in which the ICESat-2 PSO is the tracking data type and respective density models are used to iteratively re-determine the orbit.
The following capabilities for density model assessment are enabled by using GEODYN to construct orbit fits from ICESat-2 orbit solutions: 1. Leverage GEODYN's high-fidelity physical force models which have been honed by the program's long legacy in space geodesy.2. Perform data-reduction runs in which we compare the relative ability of each atmospheric density model to re-determine the orbit of ICESat-2 given the isolated satellite drag effects.3. Control the POD and force model parameters such that for each respective run, the only relative variable impacting the overall fit of the orbit solution for a given arc is the atmospheric density model used to estimate the drag term.4. Control for relative errors between runs associated with an unknown drag coefficient by using a realistic fixed value of C D = 2.5.This value is determined by physically calculating C D using the Diffuse Reflection with Incomplete Accommodation (DRIA) method along the orbit of ICESat-2, as is described in Section 4.

Model Descriptions
This section provides a brief overview of the atmospheric density models that are used for verification and assessment.Table 1 lists the models, providing the Model ID used for referencing in this paper, the full name and version number, the run conditions based on the drivers, and the models' spatial and temporal resolutions.The authors acknowledge that while there are a number of ways to improve a density model's output at runtime (see Sutton (2018), Shim et al. (2014)), the output used in this work is intended to reflect typical community use, with each model being run according to the developer's operational instructions.The TIEGCM model was initialized using the standard default seasonal initial states, while the CTIPe model is run according to conditions set by the CCMC's Runs on Request feature.Additional information regarding each model can be found in the references provided in the second column of Table 1.
We provide a verification using SET-HASDM, a data-assimilative model, and assessment results for MSIS2, JB2008, DTM2020, TIEGCM, and CTIPe.The semi-empirical models (MSIS2, JB2008, DTM2020) are interfaced directly into GEODYN's FORTRAN-based source code.The physics-based models (TIEGCM and CTIPe) are interfaced to GEODYN via the CCMC's Kamodo program which reads and interpolates the model output files.These interpolated outputs are connected to GEODYN through the Pygeodyn wrapper using an orbit cloud interpolation technique which is detailed in Appendix C. In addition, physics-based models whose maximum altitude is below the orbit altitude of ICESat-2 include a diffusive equilibrium extrapolation of the neutral densities (see Chapter 10 of Schunk and Nagy (2009)).The use of Kamodo makes the analysis techniques in this paper easily extensible to additional models.Any thermospheric model that is supported by Kamodo, with the appropriate diffusive equilibrium extrapolation, can be added to this and similar analyses in the future.

Setup for ICESat-2 Case Study
This method uses a satellite's PSO as tracking measurements to construct dynamic POD-based orbit fits from different density models.The dynamic POD technique uses a batch least-squares approach to iteratively reduce errors between the propagating orbit fit and the ingested PSO-GEODYN refers to this as data-reduction mode.
The initial conditions, and any other adjustable parameters, are iteratively estimated and updated to refine the orbit fit until it consistently reaches a convergence threshold.The remaining errors that persist between the PSO and a given model's orbit fit are understood to be primarily due to atmospheric drag effects from the respective density model.This understanding is leveraged to investigate density model performance through the assessments that are presented in this paper.Figure 1 provides a visualization showing connections between the high-level data sets and processes.While the true density along the ICESat-2 orbit remains unknown, each model's orbit fit contains in-track deviations from the ICESat-2 PSO, which are treated as proxies for model density deviations from the true density.Note.For the semi-empirical models the spatial resolution is given as the maximum degree and order used in the spherical harmonic expansion or relevant details.The GEODYN run setup that is used to construct the POD-based orbit fits is kept as similar as possible to the setup used by the team at NASA-GGL to produce the ICESat-2 PSO-meaning we modify only what is necessary to use PSO as the tracking measurement type, and to control the procedure such that drag is the only independent variable in each model's run.An extended overview of GEODYN's setup and force model parameters for the model assessment runs is provided in the appendix in Table B1, with only the most impactful considerations being discussed here.In addition to each orbit fit using the same background force models, the ICESat-2 attitude information (i.e., the time series of quaternions describing the orientation of the satellite in space) is also utilized to properly orient the spacecraft body and the solar array.The satellite geometry is modeled using the same configuration that was used to construct the PSO.14 panels make up a box wing model that is used to calculate the time-varying area.The orbit fits are split into 24-hr, consecutive daily arcs.The arc length can theoretically be much shorter; however, orbit errors related to force model perturbations (i.e., drag) require propagation time to accumulate.An arc length on the order of 1-2 orbital periods may not depict substantial trajectory deviations in the residuals, making 24-hr arcs a balanced choice to demonstrate this assessment method.In theory, reducing Other non-conservative forces that must be considered in addition to atmospheric drag are SRP and ERP which are both calculated by GEODYN according to the descriptions shown in Table B1, and the references therein.
Using GEODYN and its high-fidelity force model ensures that the estimated SRP and ERP accelerations are more precise than what would be modeled by a standard satellite flythrough scheme.For each density model's orbit fit the acceleration due to drag will vary according to the error in the respective model, while the contributions from SRP and ERP will remain consistent for each arc across each model run.The magnitude of the SRP acceleration is often on par with that of the drag acceleration at the ICESat-2's altitude; however, ICESat-2 is a very stable vehicle with high fidelity time series of quaternions describing the orientation of the satellite in space, and errors related to modeling SRP and ERP are assumed to be low relative to those of drag.Errors related to mismodeling non-drag forces can still potentially be transferred into the residuals, and as a result, this method is presented as a relative assessment between controlled model runs rather than an absolute validation of performance.

Assessment Procedures
The interconnected, uncertain nature of C D and ρ makes the absolute determination and assessment of either value a very complex problem that is still an active field of research within this community.In this case study, the assessment procedure is split into three subsequent orbit fit methods, each based on assumptions made to characterize C D when calculating the drag acceleration for each model during the orbit fit procedure.In a preliminary orbit fit method, explicit biases are identified via orbit fits constructed using a fixed C D that is held constant across all models.The constant fixed C D is chosen to be a physically realistic value of C D = 2.5 based on the average result calculated from the Diffuse Reflection with Incomplete Accommodation (DRIA) model along the ICESat-2 orbit.This demonstrates that the method is sufficiently sensitive to recognize differences in the drag effects between the models and provides an understanding of each model's approximate mean density offset relative to the ICESat-2 PSO.In an alternative orbit fitting method, GEODYN's parameter estimation procedure is used to adjust the C D for every 24-hr arc for the 2-week period from the a priori estimate of C D = 2.5.The mean-adjusted C D over the 2 weeks is then used as a fixed, model-specific value that is constant for the time period.This provides a fixed, unique C D for each density model that effectively scales the density over the 2 weeks to account for each model's mean density offset and examine the model response to solar and geophysical dynamics.In a final orbit fitting method, the daily C D adjustments are used without a 2-week averaging.The residuals using these daily, model-specific C D adjustments provide an assessment of model performance on time periods less than a day.In these cases, the scaling factors extracted from the orbit fits are directly coupled to the density discrepancies produced by the different models.
The preliminary orbit fits use a fixed C D of 2.5 that is constant with respect to each model.This enables direct model comparison but subjects an assessment of the density models to explicit biases depending on each model's density offset relative to this C D value.Each model's sampled densities along the ICESat-2 orbit have an overall mean-density offset relative to each other.Fixing the C D to a specific value will cause a particular offset amount to be favored.For instance, C D = 3.5 will produce favorable orbit fits for models that trend toward a lower density, whereas C D = 1 would favor models that trend toward higher densities.Due to these circumstances, the DRIA model is independently used to calculate a physically realistic value of the ICESat-2's C D along its orbit.DRIA is a relatively simple, computationally fast model for capturing the gas-surface interactions between the upper atmosphere and a spacecraft.In the DRIA model, particles are always reflected with a diffuse angular distribution, but their energy exchange with the surface varies depending on the value of the energy accommodation coefficient α.This work uses Sentman's closed-form solutions for the DRIA model as depicted in Equation 12of Walker et al. (2014).The energy accommodation is assumed to be fixed at α = 0.89-a tenuous assumption based on the limited empirical data for α near 500 km during solar minimum (Pilinski et al. (2010); Pilinski (2008)).This α value is likely higher than is realistic for this altitude and solar flux, therefore providing a lower limit for what a physically realistic drag coefficient might be; however, given the complex changes in atmospheric structure that occur in this altitudinal regime, this empirical value is still the most representative until further observations can be made.There are other physical C D models that could 10.1029/2023SW003603 10 of 27 be used instead of DRIA, but choosing and assessing the C D models quickly expands beyond the scope of this work.For the sake of being able to conduct a model assessment as a proof-of-concept in this case study, this assumption is made with the intention to improve the treatment of C D in future efforts.Constructing orbit fits with a fixed, common C D of 2.5 for each model represents the type of method that is possible without being able to model the physical drag coefficient or without GEODYN's capability to adjust the parameter.This adjustment procedure was performed for different a priori C D values and found that the final adjusted C D for each model was consistently the same.
The 24-hr debiasing method uses GEODYN's parameter estimation capabilities to determine a daily fitted value of the C D that accounts for accumulated errors from the force model over the 24-hr arc-the most prominent of which being due to density uncertainty.In the field of space geodetic POD, C D is often adjusted in conjunction with reduced-dynamic empirical accelerations to account for disagreement between the observed accelerations from tracking measurements, and calculated accelerations from uncertainties in the drag force model.This technique is used to get very low error and precise orbit solutions but limits the ability to distinguish errors that are specific to drag or the density models.By allowing only the C D to adjust and match the orbit fit's modeled accelerations with the PSO observation, density errors over the 24-hr period are incorporated into the adjustment.A density model that is found to be over-/under-estimating the density, will have a C D that is adjusted to be smaller/ larger in a non-physical way-effectively using C D as a scaling term between the PSO observation and uncertainty in the density model orbit fits.In practice, the C D also absorbs any errors from mismodeled forces, but these are held constant in the model-to-model comparison.Each model is given an a priori estimate of C D = 2.5 at the start of the 24-hr arc, which is allowed to adjust within a standard deviation of 10.The drag coefficient fitting occurs concurrently with the iterative orbit fit routine.Due to this non-physical use of C D to effectively debias the density, the term "drag acceleration scaling factor" is adopted.The 24-hr scaling factor for each model (m) can be calculated for each arc (i) as, 24,, = ,,,∕2.5 (2) The 2-week debiasing method acts as a combination of the previous two methods.The C D adjustments for each model are averaged over the 2-week period to provide a mean adjusted C D .Each model's orbit fit is then re-determined using the mean adjusted C D for each respective model as the fixed value for the 2-week period.This assessment permits a scaling of the density models over an extended period of time, highlighting errors in the orbit fits that are due to variations that take place on a longer time scale than 24-hr.This method is also motivated by the need to provide a scoring metric for each density model that can be parsed into the CCMC's CAMEL model validation infrastructure.While the C D = 2.5 case is dominated by the model biases and the 24-hr debiased case demonstrates a method to debias daily densities, the 2-week debiased case quantifies the ability of the models to capture dynamics caused by geomagnetic and solar activity over a more prolonged time period.This is a method that could be used in the future to assess model performance during individual stormtime periods.

Assessment Metrics
Using a PSO as tracking data makes use of GEODYN's data-reduction mode combined with a dynamic technique for estimating the orbit of a satellite.This technique uses the trajectory input to estimate updates to the initial conditions which define the motion of the satellite, thus refining the orbit.The orbit residuals obtained in this setup are the absolute differences between the PSO and each density model's orbit fit.Since other force model parameters are held constant between each density model's run, the inter-comparison of the residuals contains information primarily corresponding to relative errors in each density model's ability to replicate the drag effects seen in the ICESat-2 PSO.
To best observe satellite drag effects, all output orbits are transformed from the J2000, geocentric inertial reference system to the NTW, orbit-aligned satellite coordinate system (Vallado, 2013).This system is composed of an in-track component  T that is parallel to the inertial velocity vector, a normal component  N that is perpendicular to the velocity and nominally in the radial direction, and a cross-track component  Ŵ that is normal to the orbit plane and completes the right-hand coordinate frame.The in-track component  T is parallel to the velocity vector direction and contains any indication that the spacecraft's trajectory has changed since orbital energy dissipations from drag will impact in the velocity direction.Information regarding this transformation as well as supporting coordinate frame details can be found in Appendix A.
For any given arc, y o is defined to be a component of the orbit from the PSO data set in the NTW frame, and y m to be the orbit fit for each density model m.The residuals for each component of the orbit and orbit fit are then calculated (in terms of the in-track component) as, The root-mean-square error (RMSe) of the residuals represents the square root of the variance of the absolute difference in the two orbits, indicating how well the density model's orbit fit matches the PSO for that arc.For the in-track component, this is computed for every ith time step of an arc with n time steps as, During the POD process, GEODYN iteratively minimizes the discrepancies between the observed orbit (i.e., PSO) and computed orbit (i.e., orbit fits from a density model) across the entire 24-hr arc by adjusting the initial conditions to converge toward a computed trajectory.Since the minimization occurs across the entire arc, the resulting residuals take on the non-linear shapes shown in Figure 2. On a given arc and when comparing the resulting orbit fits from each density model, the only variable that has been permitted to impact each orbit fit's performance relative to the PSO is the drag effects from the selected density model.Therefore, we reason that the relative differences in the residuals for each orbit fit is indicative of density model performance.Other potential errors from mismodeled physics may persist in the residuals, but they are held constant between each model run and will impact the orbit fits consistently.If we run the same arc using the same force model and conditions, but only change the density model used to calculate satellite drag accelerations, the residuals will contain the errors related to the program attempting to reconcile errors in the density model.The RMSe for each orbit fit represents a single value for how well the program can reconcile each model's errors in density over the entire 24-hr arc.
As described in Section 3.2, the ICESat-2 PSO was generated through the reduction of GNSS double-difference carrier phase observable residuals and has been shown to have a radial orbit accuracy of below 1.5 cm through independent assessment using SLR measurement residual analysis (Thomas et al., 2021).The precision of the orbit solutions was also verified to be less than 1.5 cm in all three components using orbit overlap analysis.Given this, relative deviations from the low-error PSO are treated as a rough proxy for the density model errors relative to some unknown true density.The true density value is obscured by the various interconnected unknowns of C D , SRP, and ERP and therefore remains unspecified.Over the course of a single arc, drag forces from each density model dissipate the satellite's orbital energy at distinct rates, resulting in drag accelerations that are either greater or less than what is represented by the in-track position of the PSO.A strongly negative in-track residual indicates a modeled density that is larger than truth, while a strongly positive in-track residual indicates a modeled density that is smaller than truth.Additional details regarding the shape of the in-track residuals and the relationship between in-track position of the PSO and orbit fits and the density can be found in Figure A2 of Appendix A.
The RMSe is the standard deviation of the residuals and serves as a measure of the difference between a respective orbit fit and the PSO over a single whole arc.Theoretically, an in-track RMSe of zero would mean no difference between an orbit fit and the PSO, indicating near-perfect agreement on average between the modeled density and the POD-based true density across the 24-hr arc.In this setup, perfect agreement for any model is unlikely since the residuals may additionally contain errors related to mismodeled forces, as well as bias/offsets related to fixing the C D to a common value for all models.A further limitation of the metric is that the RMSe lacks information regarding timescales less than the arc length, and is unsigned, meaning it does not indicate if the modeled density is above or below the truth for a given arc.For these reasons, the in-track residuals and their respective RMSe values are assessed in conjunction with each other.

Results and Discussion
This section is organized as follows: (a) the preliminary method for orbit fit construction using a fixed C D = 2.5 is presented, and the orbit fit method is verified using the SET-HASDM density database to determine baseline understanding; (b) an assessment of the semi-empirical and physics-based models is presented via orbit fit results that are debiased using a mean drag acceleration scaling factor over the full 2 week period; (c) an assessment of the semi-empirical and physics-based models is presented via orbit fit results that are debiased using a 24-hr drag acceleration scaling factor.
The specific conditions for producing density values for each model are detailed in Section 3.3.The authors acknowledge that while there are a number of ways to improve a density model's outputs at runtime (see Sutton (2018), Shim et al. (2014)), the outputs used in this work are intended to reflect typical community use, with each model being run according to the developer's operational instructions.Results are presented by focusing on a 2-week time period from 9 November 2018 to 23 November 2018, providing 14 adjacent daily arcs with no maneuver-based data gaps.The assessment conditions are for the altitude regime near ∼490 km, in an atmosphere with very low solar flux, and low-to-minor geomagnetic activity.Note that this is a notoriously difficult altitude regime and activity condition for empirical models due to the minimal access to satellite density data at this altitude during times of prolonged solar minimum-increasing the potential value of this style of assessment for these models especially.Figure 3 shows low solar activity for the time period, both in terms of the magnitude and variation of solar EUV and FUV, as approximated by measurements of the 10.7 cm solar radio flux (F 10.7 , top panel).The Kp geomagnetic index (bottom panel) depicts low-to-minor geomagnetic activity during the time of interest, with the 2-week period being bookended by minor geomagnetic disturbances that reach no higher than 10.1029/2023SW003603 13 of 27 Kp = 4.3.Only one minor-to-moderate disturbance occurs on 5 November 2018, four days before the period of interest, reaching a peak of Kp = 5.7.

Preliminary Orbit Fits Using a Fixed C D of 2.5
The most straightforward way to construct the orbit fits is by calculating the acceleration due to drag from different density models using a fixed drag coefficient value for all arcs and models.This permits bias depending on the relationship between each model's mean density and the chosen C D , but importantly demonstrates that the method is sufficiently sensitive for recognizing differences in the drag effects between the models.The in-track residual errors in this method should not be interpreted as an indication of performance, but rather as indicators of each model's mean density offset relative to the true-unknown density.Figure 4, shows the fixed C D assessment results for the semi-empirical and physics-based models.The top panel shows each model's orbit averaged density along the orbit of ICESat-2, the middle panel shows the in-track residuals for each model and arc, and the bottom panel shows the in-track RMSe values.
The negative parabolic shape of the in-track residuals of MSIS2, TIEGCM, and JB2008 indicate that these modeled densities are too high-i.e., these orbit fits experience more drag acceleration and their fits tend to lag behind the PSO.The positive parabolic shape of the in-track residuals of CTIPe indicates modeled densities that are too low-i.e., the drag acceleration is lower and the orbit fits tend to be in front of the PSO.The DTM2020 model estimates densities that are too large at the start of the time period and grow steadily closer to the true density, as indicated by the relatively flat parabolic shape of the residuals in the second half of the time period.
In reality, the PSO-to-orbit fit relationship is slightly more complex over an arc, with the above being a generalization of the overall trend.A more detailed understanding of the orbit fit movement relative to the PSO can be found in Appendix A.
The orbit fits from SET-HASDM are separated for use as verification since it uses similar assumptions of a fixed drag coefficient and a satellite drag data assimilation technique in its internal workings.The SET-HASDM density database affords the opportunity to access historical records of HASDM densities that have been corrected through the real-time data-assimilative calibration to ∼80 low earth orbiters.The HASDM model is the operational standard used by the 18th Space Defense Squadron which is tasked with executing command and control over United States' space assets and all resident space objects for sake of space situational awareness.Verification with SET-HASDM provides a baseline understanding of the fidelity of the orbit fit results.Since the WALDRON ET AL. 10.1029/2023SW003603 14 of 27 SET-HASDM density values have already been effectively debiased in its data-assimilation scheme, we do not go through the steps of debiasing using the methods presented in Sections 5.2 and 5.3.Referring to Figure 5, the SET-HASDM model consistently depicts in-track RMSe values that are on the order of 8.18 m over the 2-week period.The in-track residuals have a negative shape, indicating that the densities from SET-HASDM are slightly larger than what would be expected from the PSO.results in Figure 5 are intended to serve as an approximate consistency check that our overall methodology, and more specifically our debiasing method, provide orbit fits with in-track errors in a reasonable range.

Debias Using Two-Week Scaling Factor
The second orbit fit method debiases the density models using a mean-adjusted C D over the 2-week period (average values of the 24-hr adjusted C D s shown later in Figure 8).This provides a fixed C D that is unique for each density model, adjusted to account for biases due to each model's mean density offset over the time period (see Figure 4).The in-track residuals and RMSe values for this assessment, shown in Figure 6, quantify the error due to density variation over the 2-week time period.The average adjusted C D used to construct the 2-week debiased orbit fits for each model is reported in Table 2.
As mentioned in Section 4.2, the DRIA calculations indicate that C D = 2.5 is a realistic value if one assumes that a fixed energy accommodation of α = 0.89 is reasonable-an assumption limited by lack of empirical observation.According to the DRIA model, 2.5 is a realistic lower limit for the drag coefficient.Looking at Table 2, we can see that the mean adjusted C D for MSIS2, TIEGCM, and JB2008 are all well below this lower limit, offering further evidence that these models are, on average, over-estimating the density.The upper bound is slightly more difficult to estimate in this setup, but CTIPe's adjusted C D of 4.38 is likely too high.This will need to be investigated further in the future.
After removing the bias from the models, the relative effects of the minor geomagnetic activity become more stark in the in-track residuals.Here the changing shape of the residual curves indicates whether the model is over or underestimating the effects of geomagnetic activity on the modeled density.For example, several of the models display the positive parabolic curves during geomagnetic activity, indicating densities that are too low as compared to their quiet time densities.DTM2020's residuals appear to show an anti-correlation to the geomagnetic activity, beginning with densities that are too high and ending the 2 weeks with densities that are too low, which may be reflective of an overly sensitive response to geomagnetic activity and the overall downward trend in Kp.The effects of a model capturing density variations during geomagnetically active times are now better quantified by the in-track RMSe after 2-week debiasing is applied.

Debias Using 24-hr Scaling Factor
The 24-hr debiasing procedure described in Section 4.2, is used to scale the orbit fits and their residuals to a daily cadence.Figure 7 presents the resulting in-track residuals (top panel) and in-track RMSe values (bottom panel) for each 24-hr arc for each model.The 24-hr drag acceleration scaling factor is derived by adjusting the C D from the a priori of 2.5 over each daily arc, absorbing the average density offset for that day.The debiasing effect is seen in the overall reduction in residual error from Figures 4-7.The calculated 24-hr scaling factors are presented in the top panel of Figure 8 as a percent change from the fixed value of 2.5.The bottom and right panels show the Kp index and Pearson's correlation coefficient between each model's scaling factors and the Kp, respectively.
The 24-hr scaling accounts for both the overall model bias and uncertainties in the density on timescales that are on the order of, or greater than, the chosen arc length of 24 hr (i.e., combination of mean density offset and daily geomagnetic variation).The remaining error depicted by the in-track residuals of Figure 7 are likely due to higher frequency variations in density that are not captured by the 24-hr debiasing (e.g., day-night variations in the neutral density).The correlation between the Kp index and the scaling factors demonstrates how this metric can be used to determine how well a model accounts for geomagnetic activity.As shown by the scaling factors in

Discussion
While the in-track residuals of the ICESat-2 orbit fits offer an effective means of assessing the density models, most methods that use drag acceleration to study density are going to be limited by the complex interconnected uncertainties in the C D and the density.For these reasons, we split our overall assessment into the three methods presented, each of which help to further illuminate the functioning of the models.For the sake of brevity, only JB2008 and TIEGCM are given in-depth discussions that synthesize an understanding of model behavior from their results.The reader will then be able to apply these discussions to the remaining models, which here are discussed in more general terms.
Considering the cumulative results for JB2008 (plotted in orange for all relevant figures), Figure 4 shows that with a fixed C D of 2.5, the in-track residuals exhibit a negative parabola shape, indicating that the JB2008 density provides larger drag accelerations than what the PSO experiences.Figure 6 provides the assessment method in which the JB2008 densities are effectively scaled by ∼− 34% for the full 2-week period.In this case, the in-track residuals better highlight the arcs in which JB2008 performs differently relative to the other models, specifically on November 11th and 12th when the geomagnetic activity fluctuates around Kp = 3 after having been moderately elevated for several days.In the 24-hr debiased case, JB2008's daily scaling factors (shown in orange in Figure 8) effectively compensate for density variations on the order of or greater than 24-hr.The mean density offset adjustment is ∼− 34% and is seen to be slightly inversely correlated (R = −0.18)with the geomagnetic activity-indicating that JB2008 tends to slightly overestimate densities during active times.This effect can be clearly seen in JB2008's orbit averaged densities, represented by the orange line in the top panel of Figure 4, where the model provides much sharper density peaks than the other models during the times of slightly elevated geomagnetic activity.This effect is also clearly represented in the 24-hr debiased in-track RMSe values (bottom panel of Figure 7), which shows significant variance during active times between the PSO and the JB2008 orbit fit, even after the 24-hr scaling.The higher RMSe values shown in Figure 7 on November 9th-13th indicate that JB2008's overestimation of density during active times is not sufficiently compensated by the 24-hr scaling factor, meaning that the variation likely occurs at a higher frequency.JB2008's estimation of quiet time density is much better than its active time density, needing ∼−34% adjustment from the fixed C D case to provide orbit fits that match the PSO to within 1 m (during quiet times).
Considering the cumulative results for TIEGCM (plotted in cyan for all relevant figures), Figure 4 shows that with a fixed C D of 2.5, the in-track residuals exhibit a negative parabola shape, indicating that the TIEGCM density provides larger drag accelerations than what the PSO experiences (i.e., the TIEGCM densities are too high).TIEGCM offers interesting results from Figure 6 of the 2-week scaled case.The TIEGCM densities are effectively scaled down by ∼−45% over the 2-week period to compensate for the mean density offset.Since this value is the 2-week average from the scaling factors shown in Figure 8, it is skewed to only partially compensate for lack of geomagnetic sensitivity (i.e., densities are scaled to be too low in active times, increase in RMSe on the bookends of the period) and partially compensate for mean density offset in quiet times (densities are scaled to be too high in quiet times, increase in RMSe from Nov. 13th to 17th).The 2-week scaled case is able to clearly show in the in-track residuals that TIEGCM struggles more than the other models to properly capture variation during the period of this study.TIEGCM's daily scaling factors (cyan line in Figure 8) effectively compensate for errors in density variations on the order of or greater than 24-hr.The daily variation of the scaling factors is found to slightly correlate with the Kp (R = 0.29)-indicating that TIEGCM tends to underestimate densities during active times.This is seen in TIEGCM's orbit averaged densities (cyan line in the top panel of Figure 4) where the model provides significantly less sensitivity to the times of slightly elevated geomagnetic activity, and less variation overall.The 24-hr debiased in-track RMSe values (bottom panel of Figure 7), interestingly show very low variance between the PSO and the TIEGCM orbit fit after the 24-hr scaling.This is most likely explained by the overestimation of density during active times being sufficiently compensated for by the 24-hr scaling factors despite its orbit average densities seeming to lack much variation at all.This adds further suspicion to the higher frequency variations seen in models such as JB2008.It is possible that by moving to shorter arc lengths, such as 3 hr, the time series of scaling factors could better capture these variations, and this is a future goal of this work.
CTIPe is the only model that underestimates the density for all arcs, each requiring a ∼76% increase to bring the in-track residuals to within two m.DTM2020 requires the least overall scaling, requiring only ∼−7% shift to bring the inxtrack residuals to within 2 m across all arcs.All models capture the geomagnetic activity relatively well as demonstrated by their scaling factors not being very highly correlated to Kp. DTM2020 (R = −0.13)and JB2008's (R = −0.18)scaling factors are slightly inversely correlated to Kp, indicating a subtle over-sensitivity to geomagnetic activity during this time period, while TIEGCM (R = 0.29), MSIS2 (R = 0.35), CTIPe (R = 0.19) all indicate a subtle under-sensitivity.The scaling undergone for each model produces RMSe values that are comparable to that of the SET-HASDM orbit fit, which was separated out to serve as an approximate consistency check of our debiasing method due to its data-assimilative technique.

Conclusions and Future Work
This work presents the development of a modernized interface for the GEODYN-II POD software.The approach leverages the high-precision nature of space geodetic POD and an upgraded utility of the neutral density models to focus POD methods toward studying satellite drag and conducting density model assessment.The assessment method uses high-fidelity PSO as observed tracking measurements that are input into POD-based orbit fits.The drag effects from each density model are assessed according to each model's ability to redetermine the satellite's orbit.Each density model's orbit fit contains relative in-track deviations from the PSO which are treated as a proxy for model densities that differ from a true, unknown, density.These deviations are quantified with the in-track residuals and their RMS errors.We demonstrate the capabilities of this tool via a case study assessment of five thermospheric density models (MSIS2, DTM2020, JB2008, TIEGCM, and CTIPe, and a verification using SET-HASDM) using the ICESat-2 mission PSO as the observed measurements.Preliminary orbit fits are constructed after determining a mean C D from a physics-based solution.A fixed C D of 2.5 is applied for all models before being debiased by adjusting the C D to account for density errors in the drag acceleration.The debiasing is performed at two different cadences, 24-hr and 2-weeks, with each method highlighting different temporal aspects of the model density errors.The scaling factors extracted from the 24-hr and 2-week debiasing methods are well-equipped for use in improving forecasting and modeling methods.The 24-hr scaling factors provide a more accurate representation of the true density variations for each model, while the 2-week scaling factors are computationally simpler and indicate more baseline density effects.In addition, the 2-week extended time period scaling factors are compatible for parsing into the CCMC's CAMEL database to move in the direction of community-oriented model validation.
We continue our efforts on this project as we move in the direction of offering a more robust thermospheric model validation scheme.Possible improvements include improving the non-conservative force modeling in GEODYN for ICESat-2 using a more realistic 3-D model of the satellite shape that would account for self-shadowing and variations in cross-sectional area with incidence angle for example, as in March et al. (2019).The orbit determination for the primary science orbits and the subsequent analyses described in this paper would have to use these improved geometry models.A further improvement would be to incorporate SLR measurements of ICESat-2 into the evaluations of the density models.One could include the SLR data along with the PSO trajectory data in the evaluation.See Thomas et al. (2021) for a description of these data for ICESat-2.Planned future work involves addressing the key constraints highlighted in the methodology, the foremost of which is the need to evaluate the drag coefficient more frequently along the ICESat-2 orbit.Future work will also involve expanding the study to encompass the entire ICESat-2 mission time period.Additional expansion includes incorporating additional satellites and constellations that may illuminate model differences within atmospheric regimes that lack observations of neutral density.We aim to make our expanded results available through the CCMC's CAMEL framework as well as through future publications.
The assumptions made in this paper are limited by the current status of unknowns between gas surface interaction research and thermospheric variability research.At this time, the true drag coefficient is not known for any satellite, and modeling the C D will always introduce some inherent bias into the results.We aim to address this issue by implementing several of the satellite gas-surface interaction models currently used in the satellite 10.1029/2023SW003603 20 of 27 drag community to calculate the time and compositionally dependent drag coefficient.Isolating the effects of the C D will aid to better identify the various non-density related errors that may be present in the in-track residuals.Being able to distinguish these errors and accurately quantify the amount of deviation introduced by a given density model will provide significant insight regarding model performance to the earth-space environment modeling community.As the ability to model C D improves, the results provided by this method will similarly become more valid.The Geospace Dynamics Constellation (GDC) is an upcoming NASA mission that is intended to help fill the gaps in understanding gas-surface interactions by providing a stable platform with full measurements of neutral composition, density, and temperature along with a high-fidelity POD in which cross-validation of density model assessment is possible.As a result, in addition to providing its own neutral density observations that can be used for research, operations, and model validation, these advances expected from the GDC mission will improve the accuracy and usability of density proxies derived from POD solutions like those used here.These advances will effectively multiply our density observations to be able to use any satellite with sufficiently accurate GNSS positioning and knowledge of spacecraft parameters as a density observing platform.
This work provides a step in the direction of being able to use high-fidelity GNSS-enabled LEO satellite POD solutions to objectively quantify and validate thermospheric model performance.The strength of assessment using this method is its ability to identify the relative accuracy of the models in a way that is directly tied to operational use for orbit propagation.There are a multitude of uses for the tools and methods presented in this work, such as for density retrievals along the orbit of a satellite, which is a planned future effort; however, this report focuses specifically on model assessment.As work continues to refine these methods and address the caveats presented in this paper, the results of model assessments using this technique will continue to become better suited to aid satellite operators when choosing a model that will perform best under specified conditions.Having a multitude of methods for assessing upper atmospheric models under various conditions helps model developers refine the models themselves, making them better suited for orbit prediction.

Appendix A: Coordinate System to Study Drag
GEODYN's input and output trajectories make use of the J2000 inertial reference system.The  X ,  Ŷ , and  Ẑ components of the inertial coordinate system offer limited information on how a satellite's orbit is impacted by atmospheric drag, leading us to convert to the more suitable Satellite Coordinate System.Two coordinate frames suited for this assessment are the NTW and RSW frames, with differences between the two being highlighted in Figure A1.We make use of the NTW system, which aligns with the orbit plane and is composed of an in-track component  T that is parallel to the velocity vector    , a normal component  N that is perpendicular to the velocity and nominally in the radial direction, and a cross-track component  Ŵ that is normal to the orbit plane and completes the right-hand coordinate frame.Being parallel to the velocity vector means that the in-track component  T will contain any indication that the spacecraft's trajectory has changed since orbital energy dissipations from drag will impact in the velocity direction.
Figure A2 contains additional visualization related to the shape of the in-track residuals and how it relates to the movement of the PSO relative to the orbit-fit satellite.The overall shape of the in-track residuals is a result of the batch-least squares fitting routine as it attempts to minimize the distance between the PSO and the orbit fit across the whole arc.
Variations in the in-track component  T are not the same as variations in the along-track component  Ŝ of the RSW system.In-track variations act in the direction of the velocity vector, whereas along-track variations are merely along, but not necessarily parallel, to the direction of the velocity vector.We make the distinction to use the NTW system rather the RSW system whose radial component is often used to assess orbit accuracy in geodetic POD studies.The NTW coordinate system is described in Vallado (2013) to have the following unit vectors and transformation:

Appendix C: Orbit Uncertainty Interpolation Technique and the Kamodo Interface
Kamodo is a CCMC tool for access, interpolation, and visualization of space weather models and data in Python (Ringuette et al., 2023).Kamodo allows model developers to represent simulation results as mathematical functions which may be manipulated directly by end users.Kamodo handles unit conversion transparently and supports interactive science discovery through Jupyter notebooks with minimal coding in Python.Kamodo is chosen for this project due to its ability to offer model agnostic methods for reading data output from different model sources.Kamodo is called using its Satellite Flythrough capabilities, in which a user is able to sample the models with satellite ephemeris and return requested values from the chosen model.The orbit is pre-initialized in GEODYN using MSIS2 to get an a priori estimate for the orbit coordinates.Then using the a priori orbit, extend out the uncertainty of the coordinates to create a cube of possible values centered on the orbit.This approach accounts for possible model output differences as the orbit iteratively converges toward a solution.Finally, we plug the orbit and its uncertainty cubes into Kamodo to interpolate the model densities at all requested points.By doing this, the orbit density values from the physics model can be quickly ingested into the POD program.Figure C1 visualizes this procedure.

Figure 1 .
Figure 1.A flowchart visualizing the assessment process and how the data sets and POD methods fit together for model assessment.

4 )Figure 2
Figure 2 provides an example of the observation residuals for sample orbits over four, 24-hr arcs.The normal  N (top panel) and cross-track  Ŵ (bottom panel) residuals are included in this figure only to demonstrate that the in-track component contains the majority of the variance associated with residuals in this reference frame.The RMSe values for each component and arc are included as an overlay.

Figure 2 .
Figure 2. Depicted here are example observational residuals for each component of the NTW system across four 24-hr arcs using the MSIS2 model.The RMSe for each arc/component is given under each curve, showing that the majority of the residual variance (i.e., the orbit error due to drag) is contained within the in-track direction.Note that the vertical scale is different for each plot.

Figure 3 .
Figure 3. Top: observed solar F 10.7 radio flux.The dashed curve is the daily measured value from the Ottawa observatory normalized to 1 AU sun-earth distance; the solid curve is an 81-day (∼3 solar rotation) centered average.Bottom: the 3-hourly planetary magnetic index, Kp.Both panels depict the period of interest from 9 November 2018 to 23 November 2018.A few days before and after the period of interest are depicted in the shaded portions.

Figure 5 .
Figure 5. Verification results using SET-HASDM across 14 adjacent, 24-hr arcs from 9 November 2018-23 November 2018.Top: The solid blue curve depicts the neutral densities along the ICESat-2 orbit as an orbit average.Middle: In-track orbit residuals for each of the 14 adjacent, 24-hr arcs.Bottom: In-track RMSe values for each arc's in-track residuals.The range of the y-axes is chosen to facilitate comparison with Figure 4.

Figure 6 .
Figure 6.Assessment results for orbit fits that are debiased using 2-week drag acceleration scaling factors.Top: Debiased in-track orbit residuals for each arc.Bottom: Debiased in-track RMSe for each arc's in-track residuals.

Figure 7 .
Figure 7. Assessment results for orbit fits using 24-hr drag acceleration scaling factors.Scaling factors are extracted in the least squares orbit fitting procedure by allowing the C D to adjust once-per-arc to absorb observed errors between the PSO and the converging orbit fit.Top: In-track orbit residuals for each arc.Bottom: In-track RMSe for each arc's in-track residuals.

Figure 8 .
Figure 8. Top: Drag acceleration scaling factors extracted from the orbit fits shown in Figure 7, presented as a percent change from the fixed C D = 2.5.Bottom left: The Kp index for the time period.Bottom right: Pearson's correlation coefficient between the scaling factors and Kp.

Figure 8 ,
Figure 8, MSIS2 (R = 0.35), TIEGCM (R = 0.29), and CTIPe (R = 0.19) all exhibit a subtle positive correlation, indicating a slight underestimation of density enhancements from geomagnetic activity and resulting in the scaling factors being used to compensate for these errors.Contrarily, the subtle inverse relationships shown by DTM2020 (R = −0.13)and JB2008 (R = −0.18)indicate a slight overestimation of these models' densities during days of increased activity.

Figure A1 .
Figure A1.The above is a schematic showing the NTW and RSW satellite coordinate systems and details regarding their components.The NTW system's in-track component is parallel to the velocity vector, making it an effective tool for assessing relative effects due to atmospheric drag.

Figure A2 .
Figure A2.Above is a diagram which explores the circumstances that result in the parabolic shape of the in-track residuals.Representative in-track residuals for a single arc is given as a negative quadratic curve in the top panel.The bottom panel provides corresponding frames of schematics for each marked point to depict how the orbit fit and PSO are positioned relative to each other.Note that the shape of the in-track residuals is not limited to be only positive or negative parabolas, but this scenario represents a simple case for understanding.

Figure C1 .
Figure C1.A representative schematic showing the constructed "cube of uncertainty" that surrounds a given coordinate along the orbit of a satellite.Each point that makes up this cube will contain modeled neutral density values between which we can interpolate in GEODYN as the orbit drifts from the a priori orbit.This figure also demonstrates how perturbations due to different density models, represented here by different colored orbits, may necessitate a range of uncertainty for the satellite's indexed location.

Table 1
Thermospheric Density Models Used in This Assessment

Table 2
Summary of the Assessment Procedure Assuming a Fixed C D That is Equal to the Average Adjusted Value for Each Model, Assuming an a Priori of C D = 2.5