Performance Characterization of ESA's Tropospheric Delay Calibration System for Advanced Radio Science Experiments

Precise radiometric tracking is of key importance during operations of interplanetary missions and for advanced radio science applications. Radio science research performed on deep space missions like Cassini, Juno, BepiColombo, and the upcoming JUICE mission, rely on a combination of X and Ka band radio links to mitigate the dispersive effects of propagation through interplanetary plasma, solar corona, and Earth ionosphere, leaving tropospheric delay as one of the main error contributors to Doppler and ranging measurements.

Later, the European Space Agency (ESA) performed a preliminary study named AWARDS (Graziani, et al., 2014;Tortora, et al., 2013) for the definition of the requirements and preliminary system design of a tropospheric delay calibration system (TDCS). In addition, media calibration performance requirements for accurate spacecraft tracking were studied in detail in another ESA study called ASTRA (Iess et al., 2012(Iess et al., , 2014. The TDCS described here represents the prototype of a new instrument for the calibration of tropospheric delay and delay rate based on a high stability and high accuracy K a /V band MWR, which was developed in the framework of an ESA contract by a consortium formed by Radiometer Physics GmbH (RPG), University of Bologna, and the Université Catholique de Louvain.
Specifically, this work focuses on the end-to-end performance characterization of the TDCS, which was carried out through a detailed orbit determination using Doppler measurements acquired from ESA's deep space ground station (DSA3) in Malargüe, during several tracking passes of the Gaia spacecraft.
It is important to clarify that the purpose of this test is not to reproduce the full orbit determination solution used for the navigation of Gaia but to validate the TDCS products by making a punctual evaluation of the relative end-to-end noise reduction when TDCS-based calibrations are used in place of standard calibrations based on global navigation satellite system (GNSS) measurements.

Tropospheric Delay Calibration System
The TDCS is a combination of instruments, software tools, and operational procedures that allows an accurate estimation of the tropospheric delay along the slant direction while minimizing the effect of the instrument instability. The main TDCS subsystem is an ultra-stable MWR for deep-space applications, which represents a modified version of the standard HATPRO-G5 radiometer developed by RPG . A description of the concept and design of the first HATPRO generation is given by Rose et al. (2005). In 2015, the radiometric performance was significantly improved with the fifth generation of the series (G5).
The TDCS, shown in Figure 1, measures sky noise emissions at frequencies near the water vapor absorption peak at 22.2 GHz, the oxygen absorption band around 60 GHz, and in the 30 GHz window, which is sensitive to liquid water content (clouds and rain). With respect to the standard HATPRO-G5 radiometer, this instrument was tailored for spacecraft tracking applications by adding (a) a 2-axes tracking system to gain full sky scanning capabilities, (b) a modified antenna structure including an external heated parabolic reflector, which narrows the MWR half-power beamwidth down to 1.2°, allowing to better replicate the air volume sampled by the deep space antenna and to reduce the effect of solar radiation contamination during periods of superior conjunction, and (c) a high precision meteo station providing values of air pressure, temperature, relative humidity, rain rate and wind speed at ground level. The instrument includes internal and external temperature control and antenna blower systems to avoid water condensation and the icing on the exposed surfaces and to maintain accurate stability of the receivers' temperature.
Further updates include new control procedures for the tracking system, specific calibration procedures for the instrument parameters, and the development of an external software tool that acts as a high-level access point for monitoring and controlling the TDCS functions by the ground station front-end controller.
The TDCS was installed at ESA's deep space ground station in Malargüe in February 2019 and is located at a distance of 31.8 m from the deep space antenna along the North-North-East direction. This physical separation represents a relevant error source for the delay and delay-rate calibrations, due to the different air volumes sampled by the radiometer and the deep space antenna. This observation prompted JPL to propose a new design for its next generation of a tropospheric calibration system, which foresees a direct integration of the water vapor radiometer within the antenna receiver (Tanner et al., 2021). While a direct integration was not possible for the current setup, the selected mounting location for the TDCS is well within the 50 m distance limit proposed by Tortora et al. (2013) and represents a tradeoff between calibration performances and operational constraints (i.e., availability of supporting infrastructures and visibility during tracking).

Testbed Summary and Data Availability
Two successive testbed campaigns were carried out between February and July 2019, targeting a total of 44 tracking passes of ESA's Gaia spacecraft. Both hardware and software updates were performed in response to issues that were encountered during the test campaign. As a result, the availability of key data products (Doppler data, GNSS-based calibrations, and TDCS-based calibrations) has varied among the different tracking passes. Since the simultaneous availability of all these products is required for the performance evaluation, only 32 of the total passes were eventually included within the analysis, as shown in Table 1, which provides an overview of this subset along with a summary of the atmospheric conditions that were encountered during each pass.
The range of TDCS retrieved zenith wet delay (ZWD) values indicates the potential improvement that can be obtained when tropospheric calibrations are introduced in the orbit determination process. Being Gaia only visible at night, most of the tracking passes are characterized by dry conditions, which can limit the effectiveness of the TDCS calibrations. Values of TDCS retrieved liquid water path (LWP) above ∼10 g/m 2 indicate the presence of condensed water (clouds or fog) along the instrument line-of-sight. These values scale with the length of the propagation path through the cloud, so high values (i.e.,∼ 500-1,000 g/m 2 ) indicate the presence of thick cloud formations and may suggest the presence of precipitated water, in particular when coupled with the triggering of the rain flag (RF) by the rain sensor included within the TDCS. Characteristic integrated values for wind speed (WS) and turbulence strength (C N 2 ), derived from the European Center for Medium Weather Forecast (ECMWF) database (Molteni et al., 1996), can be considered as proxy parameters for the presence of turbulent eddies in the lower portions of the atmosphere, which affect the accuracy of the TDCS calibrations (Lasagni Manghi et al., 2019). Both values of wind speed and turbulence strength were provided by UCL (Quibus et al., 2019) and were derived by averaging over time the local vertical profiles from the ECMWF data set that fell within the tracking pass interval. The obtained vertical profiles were then spatially averaged from ground level to a height of 1 km above the surface.
Finally, the wind speed at ground level, measured by the TDCS meteo station, provides, on one hand, an indication of the strength of the mechanical noise introduced in the Doppler measurements by the tropospheric calibrations, and on the other hand, represents another proxy for the effect of wind shears on tropospheric turbulence.

TDCS Data Processing
The retrieval of atmospheric variables from MWR observations is an ill-posed problem, since a given set of brightness temperature measurements may be related to several different atmospheric states (Keihm & Marsh, 1998). To resolve this ambiguity, the TDCS uses a neural network retrieval algorithm, which was trained using a large number of atmospheric vertical profiles extracted specifically for the Malargüe site from a numerical weather prediction model (ECMWF reanalysis). From each of these profiles of temperature, humidity, pressure, and liquid water concentration, the slant wet delay was computed. At the same time, each profile was used as input for the simulation of the corresponding brightness temperature measurements via a state-of-the-art radiative transfer model. The resulting data set was split into a training part and a test part. For the former, simulated brightness temperatures and slant wet delay values were used to derive a set of retrieval coefficients, which minimized the output error over the complete training data set. The latter served for validation of the retrieval performance through a statistical comparison of the slant wet delay values calculated from the test profiles and the ones retrieved via the neural network coefficients.
During operations, the retrieval coefficients related the measured brightness temperature vector (14 K a /V band channels) to the best slant wet delay value. Since brightness temperature and delay scale differently with the length of the propagation path through the atmosphere, the estimation process of the retrieval parameters was repeated at 19 discrete elevation angles (spaced at constant airmass steps) covering the range between 10° and 90°, leading to the generation of an elevation-dependent grid of retrieval coefficients. The discrete angle-grid may cause artificial effects at low elevation angles (<30°). This was handled by a smart interpolation scheme based on the following steps: at first, the mean radiative temperature was derived from surface sensors observations and a dedicated neural network retrieval and used to derive the atmospheric opacity from brightness temperatures. Then, the atmospheric opacity was linearly interpolated to the nearest nodes on the retrieval grid and used to derive the corresponding brightness temperature (h) 99th percentile of the wind speed measured by the TDCS meteo station; (i) characteristic integrated wind speed derived from the ECMWF data set; (j) characteristic integrated turbulence strength derived from the ECMWF data set. a The start time for these passes corresponds to the previous day with respect to the date reported in the second column.

Table 1 Summary of Data Availability and Main Meteorological Parameters for the Analyzed Tracking Passes
values using the mean radiative temperature and neglecting its small variation with airmass. Finally, the slant wet delay output was calculated as an airmass weighted average of the values retrieved for the two considered grid nodes.
For each of the analyzed tracking passes, slant wet delay values were retrieved from brightness temperature measurements using the trained neural network retrieval and converted to zenith wet delay using a sin(el) mapping function, where el represents the instantaneous elevation as indicated by the tracking system. This simple mapping was preferred to higher fidelity mapping functions, such as the one from Niell (1996), to be consistent with the procedures used for the retrieval training.
Then, outliers were identified by computing the distance of each data point from a smoothed data set, which was generated using a median filter with a 10 min time window, and removed when lying outside a fixed number of standard deviations from the median.
Finally, calibration cards were generated from the zenith wet delay time series using a linear piecewise fit between consecutive data points and according to the definitions provided by JPL (Media Calibration Interface, 2008). Several calibration cards were generated with increasing values of the zenith wet delay integration time to improve the signal-to-noise ratio. Specifically, TDCS calibrations with 1, 20, and 60 s integration time were respectively used and compared within the orbit determination analysis.
The zenith hydrostatic delay (ZHD) was computed according to the model of Saastamoinen (1972) The corrected delay values were then smoothed using a Gaussian filter with a 10 min time window and down-sampled to 20 s to remove some discontinuities within the 1 Hz sampling rate time series. These discontinuities, which were caused by the resolution of the TDCS pressure sensor of 0.1 mbar (corresponding to a zenith delay resolution of about 0.23 mm), were comparable in magnitude to the short scale variations in the zenith wet delay and might have resulted in an increased Doppler noise if not properly corrected. Finally, calibration cards were generated from the ZHD time series using a linear piecewise fit between consecutive data points.

Introduction
Gaia is an ESA cornerstone scientific mission, whose aim is to measure the three-dimensional position and velocity distributions of stars within the Milky Way using accurate astrometric measurements (Prusti et al., 2016). The selection of this particular mission for the TDCS testbed campaign was mainly driven by geometrical and operational considerations. Since Gaia operates from a Lissajous-type orbit around the second Earth-Sun Lagrange point, it is constantly near solar opposition. This means that the impact of solar plasma and Earth ionosphere on the propagation delay is particularly limited, thus simplifying the processing and calibration procedures for the Doppler data. Furthermore, several Gaia tracking passes were already  The overall concept for this analysis was to perform a standard orbit determination process for the Gaia spacecraft, using Doppler measurements collected at the DSA3 complex and a priori information on the dynamical model provided by the Flight Dynamics team at ESA's European Space Operation Center in Darmstadt, Germany (ESOC FD). This process was repeated by keeping all parameters fixed while varying the applied tropospheric calibrations (either generated from dual-frequency GNSS measurements or generated by the TDCS measurements) to allow for a direct comparison of the respective accuracies.

Data Selection and Processing
Raw Doppler measurements at X/X band acquired during the Gaia tracking passes between February 16 and July 18, 2019 were provided by ESA as collected by the telemetry, tracking and command processor, according to the format definitions provided by Ricart (2018). As a first step, the set of Doppler observables was reduced by removing all measurements in the proximity of the chemically propelled maneuvers to avoid discontinuities.
Then, all observables collected below 15° of elevation at the ground station were removed to mitigate the progressive degradation of the radiometric retrieval accuracy. This value, which represents a conservative limit, was selected to account for the retrieval errors due to the granularity of the retrieval coefficients, possible fast variations of the observed atmospheric scene, and contaminations due to the ground and clutter emission.
Doppler measurement weights for each tracking pass were computed as the root mean square value of the residuals for that pass.

Media Calibrations
For the most recent deep space missions, the dispersive effect from the charged particles in the solar corona is calibrated using a multi-frequency link with coherent uplink and downlink (Bertotti et al., 1993;Mariotti & Tortora, 2013). This was not possible for the current analysis, since Gaia uses a single frequency link at X-band. However, the effect of Solar plasma is assumed to be small, considering that Gaia operates near solar opposition, with elongation values always larger than 170° (Asmar et al., 2005;Iess et al., 2014).
Similar considerations apply for the propagation delay induced by charged particles in the Earth's ionosphere. Being Gaia only visible at night, the ionospheric induced Doppler error at X-band was expected to be small when compared to the variations of tropospheric delay (Thornton & Border, 2003), with typical values in the order of 10 μm/s, and was therefore neglected.
The GNSS-based tropospheric calibrations, which were used as a baseline for validation of the TDCS products, were provided by ESOC FD in the form of time-normalized polynomials of 6 h intervals, according to the format described by JPL (Media Calibration Interface, 2008). The tropospheric zenith delay was estimated from GNSS measurements with a 5 min temporal resolution, using the precise point positioning Kalman filter approach (Zumberge et al., 1997). The ZHD was then derived from surface pressure measurements, following the same procedure described in Section 4, and subtracted from the total delay to obtain the wet component (Feltens et al., 2018).

Dynamical Model
The dynamical model was kept reasonably simple to reduce the likelihood of possible biases in the results caused by mismodelling errors.
The gravitational accelerations that were considered for this analysis include point-mass gravity from the Sun, the planets and their satellites, the Moon, and Pluto. Higher order gravitational harmonics were neglected. State vectors and gravitational parameters for the Solar System bodies were taken from JPL's DE430 planetary ephemerides (Folkner et al., 2014).
Non-gravitational accelerations were introduced in the form of interpolating polynomials using tabulated coefficients generated by ESOC FD, allowing for easier replicability. Specifically, the main non-gravitational accelerations acting on the spacecraft were the ones from solar and thermal radiation pressure, which were provided in form of normalized acceleration components.
Gaia performed three main orbit trim maneuvers (OTM) throughout the testbed campaign: two station-keeping maneuvers, in February and April 2019, respectively, and an inclination change maneuver, split into nine burns, in July 2019. All maneuvers occurring during the tracking passes were modeled as impulsive burns and estimated within the filter using a priori ΔV values provided by ESOC FD.
Attitude control during chemically propelled maneuvers was performed using the reaction control system (RCS), which caused parasitic ΔVs to be imparted on the spacecraft. Similar to the OTM, reaction control system firings were modeled as impulsive burns and estimated within the filter.
Conversely, attitude control during standard operations was performed using a cold-gas micro-propulsion system, which caused parasitic accelerations to act permanently on the spacecraft. Instantaneous accelerations from cold-gas thrusters were provided in the tabulated form by ESOC FD.

Filter Setup
The analysis was carried out using JPL's MONTE orbit determination software (Evans et al., 2018), which adopts a weighted least-squares batch filter to generate iterative corrections to the a priori dynamical model in order to minimize the difference between the real and the simulated measurements (Bierman, 2006). Table 3 summarizes the solve-for parameters within the square root information batch filter and their associated a priori uncertainties.
A priori values for the spacecraft state were taken from the operational trajectory reconstructed by ESOC FD. Another key parameter that was estimated within the filter is the phase center of the onboard antenna. Since Gaia uses two separate antennas for uplink and downlink, which are respectively a low-gain antenna and a phased array antenna, the estimated coordinates are actually referred to a virtual antenna located at the midpoint between the two. Estimated values for the coordinates of the virtual antenna were consistent with the a priori uncertainty and mostly absorbed the short-term variations in the location of the spacecraft center of mass. It should be noted that the virtual antenna coordinate along the spin axis, which corresponds to the x-axis of the spacecraft body frame, is not observable using Doppler measurements, so only the y and z components were estimated locally for each pass.

Results
The overall quality of the TDCS calibrations is driven by several intrinsic and scene-dependent factors. The former comprises all error sources which are related to the MWR components, such as the noise characteristics of the MWR radiometric receivers, antenna losses, and spill-over losses of the K a band channels over a variable background. The latter comprises all error sources whose magnitude depends on the local atmospheric conditions encountered during the measurements. These include the retrieval error contribution,  When the tropospheric calibrations are included within the orbit determination process, the radiometric measurements will thus be affected by a variable amount of uncalibrated (or residual) tropospheric delay and by additional error sources that are introduced as part of the calibration generation process.
The slant wet delay retrieval accuracy of G5 radiometers was estimated through a series of retrieval self-tests using portions of the ECMWF training database for Malargüe. Its value ranged between 0.8 mm at Zenith and 6.3 mm at 10° of elevation, well below the 20 mm accuracy requirement from the AWARDS study . These accuracies should be applicable to the TDCS as well, under the assumption that possible antenna losses are corrected effectively. However, an end-to-end assessment of the delay accuracy using range observations was harder to obtain, due to the simplified dynamical model employed for this analysis.
The key factor when assessing the impact of tropospheric calibrations on Doppler measurements is represented by the calibration stability, which is usually characterized in terms of Allan standard deviation at various time intervals. The stabilities of the radiometric receivers (instrumental noise) and of the retrieval algorithm were the subject of previous investigations by the authors (Lasagni Manghi, et al., 2019;Maschwitz et al., 2019), and were numerically quantified with simulations and testing in controlled environments. The former showed typical values of Allan deviation of 1 3 10 15 .   at 1,000 s, while the latter ranged between   16 10 E and   10 15 at 1,000 s for the case of "broken cloud advection" (alternating sequences of clear-sky and cloudy conditions), which was deemed as the worst-case retrieval scenario. Analogously, the beam mismatch contribution was quantified by numerical simulations under specific atmospheric conditions (Graziani, et al., 2014).
Instead of focusing on individual error contributions, the current analysis provides an estimation of the end-to-end frequency stability of the Doppler residuals obtained when using either GNSS-based or TDCSbased tropospheric calibrations.
As a first step, a visual inspection of the Doppler residuals at 60 s count time was performed to highlight the presence of major signatures within the data and to identify possible causes for these features. The specific value of 60 s for the count time was selected since it represents a standard case for radio science applications (Casajus et al., 2021;Durante et al., 2019;Tortora et al., 2016;Zannoni et al., 2020). In fact, this value is sufficiently smaller than the characteristic time scales of the typical investigated processes and sufficiently large to avoid numerical noise issues (Zannoni & Tortora, 2013). Root mean square (RMS) values of the residuals were then produced for each pass, along with an estimation of the relative noise reduction between the two analyzed cases.
Finally, the overall stability of the Doppler residuals was quantified by computing the Allan standard deviation (ASD) according to Equation 3, where y represents the normalized frequency residuals, ΔT is the stability time interval, the brackets 〈•〉 indicate an ensemble average over the measured time series, and   ,Δ E y t T indicates a time average over the interval between t and t + ΔT, according to the expression in Equation 4. Specifically, stability intervals of 20, 60, and 1,000 s were considered, which represent typical values used for radio science applications.

ASD T y t T T y t T s s y
In the following, a single representative pass is analyzed in detail according to the procedure described above. Then, a summary of all passes is produced with a quantitative comparison of the relative performances between the analyzed test cases.

Example Tracking Pass (April 19, 2019)
This pass was selected as representative of standard conditions that were encountered at the deep space ground station in Malargüe. From Table 1, we observe that this pass was characterized by moderate liquid water content and that the rain flag was not triggered, suggesting the presence of clouds along the line of sight, with no precipitation. Moderate to high values of wind speed were also observed at ground level, in particular during the first hours of the pass. Vertical profiles from the ECMWF data set also suggest the presence of moderate to high turbulence levels. Figure 2 compares the Doppler residuals at 60 s count time, using GNSS-based calibrations (left) and TDCSbased calibrations, with 20 s integration time (right). With the introduction of TDCS calibrations, we observe a consistent improvement in the residuals, with an overall 51% reduction of the root mean square values and no apparent signature being introduced. The observed improvement is particularly pronounced during the first half of the tracking pass, where higher wind speed and liquid water path values are observed. This may provide an indication of the ability of the selected neural network retrieval to correctly separate the information content of liquid water from the one provided by water vapor.
The left plot in Figure 3 shows a comparison of the ASD curves obtained by applying Equation 3 to the Doppler residuals for four analyzed test cases, corresponding to GNSS calibrations, and TDCS calibrations at 1, 20, and 60 s integration time, respectively. Till stability intervals of 10 s, all curves, with the exception of the TDCS calibrations at 1 s integration time, are collapsed and approximately follow a power law with a slope equal to −1. This behavior may suggest that the dominant error source at those characteristic timescales is the Doppler thermal noise (Iess et al., 2014). However, for short integration times of the tropospheric products, the thermal noise of the MWR receiver components, which is introduced through the calibrations, becomes comparable in magnitude and induces the observed offset in the 1 s curve.
At longer stability intervals, the uncalibrated tropospheric delay becomes progressively more relevant, as indicated by the departure of all curves from the initial linear trend. From Figure 3, it is clear that the TDCS-based calibrations are able to capture the atmospheric variability along the slant path much better than their GNSS-based counterpart, with minimum ASD values that are obtained for a 20 s integration time of the zenith wet delay, which is therefore used in the following sections for the overall performance characterization. Similar results are observed through a comparison of the power spectral density of the Doppler residuals, which were generated using an adaptive multi-taper spectral estimation method (Percival & Walden, 1993). From the right plot of Figure 3 we can observe that above 10 −1 Hz the spectrum is dominated by the Doppler thermal noise. Most of the atmospheric instability that is calibrated using TDCS data occurs at characteristic frequencies between 10 −4 and 10 −2 Hz.

Overall Statistics
The procedure described above was repeated for all the passes included within the analysis. Figure 4 compares the Doppler residuals at 60 s count time, using GNSS-based calibrations and TDCS-based calibration, respectively, for the whole testbed campaign. Even though the root mean square value is dominated by the presence of a few noisy passes, likely caused by adverse meteorological conditions, an overall improvement of the residuals is clearly detectable. Figure 5 shows the root mean square values for the Doppler residuals as a function of the pass ID, along with the ratio between corresponding values for the two test cases. The average noise reduction between the different tracking passes is approximately 34% when using TDCS-based calibrations instead of the GNSSbased ones, with a maximum reduction of 61% for pass 32 (July 18, 2019). Although with different magnitudes, all passes show a noise reduction, with the exception of pass 31 (July 16, 2019) for which the noise increased by approximately 11%. However, this pass incidentally coincided with a series of reaction control system firings and OTM, which increased the number of estimated parameters and limited the availability of Doppler observables. Moreover, by looking at Table 1 we can observe that this pass was characterized by extremely dry conditions, which corresponded to a reduced signal-to-noise ratio of the estimated calibrations.
Considering the limited number of observed tracking passes, it is difficult to pinpoint an exact cause for the variability in the performance of the TDCS products. The amount of uncalibrated atmospheric variability affecting the Doppler residuals depends both on the actual value of the integrated zenith wet delay and on the accuracy of the calibrations, which strongly depends on the atmospheric conditions. Using TDCS calibrations may also introduce additional error sources such as the mechanical noise from wind-induced vibrations of the tracking system mounting structure or radiometric retrieval errors induced by fast variations in the observed atmospheric scene (in particular at low elevations), which may dominate over the tropospheric noise for particular tracking passes.
Finally, Figure 6 offers a comparison of the ASD curves computed at characteristic stability intervals of 20 s, 60 s, and 1,000 s, respectively, which represent typical values used for radio science applications. It can be observed that both the 20 and 60 s curves show a consistent reduction of the ASD values when using TDCS-based calibrations, with a magnitude that is more pronounced for the latter case. A similar reduction is observed for the 1,000 s stability interval curves, with the exception of a couple of tracking passes, corresponding to pass IDs 11 and 14. A detailed inspection of the Doppler residuals for these tracking passes highlighted the presence of small wave-like signatures at elevation angles below 30°, which was introduced by the TDCS calibrations. The cause of these signatures, which is currently under investigation, is expected  to be related to the granularity of the elevation-dependent retrieval coefficients. The retrieval-induced error, which is small for most of the tracking passes, may become relevant for specific atmospheric conditions and particularly at low elevation angles, for which the observed atmospheric scene may be subject to fast variations.
Additional investigations may be required for a fine-tuning of the retrieval algorithm, which could improve the accuracy of the tropospheric calibrations at low elevations.

Conclusions
This work presented the first statistical characterization of the end-to-end performances of the TDCS prototype that was installed at ESA's deep space ground station in Malargüe, Argentina.
An extensive testbed campaign was carried out between February and September 2019, using the TDCS alongside the 35 m deep space antenna to track the Gaia spacecraft during a series of scheduled passes. The described analysis, which does not replicate the full orbit determination solution for the navigation of Gaia, was mostly intended as a side-by-side comparison of the orbit determination performance when TDCSbased tropospheric calibrations are used in place of the standard GNSS-based calibrations.
The instrument performance was characterized in terms of root mean square values of the Doppler residuals and Allan standard deviation computed at characteristic stability intervals. The results indicate that an average reduction of about 34% in the residual Doppler noise is observed when TDCS-based calibrations are used. The actual magnitude of this improvement strongly varies between the different tracking passes, with maximum reductions around 61% and a few cases with no appreciable improvement. The overall quality of TDCS calibrations depends on several factors, including the magnitude of the actual tropospheric variability (which depends on the integrated water vapor content along the slant direction), the accuracy of the neural network retrieval, and the magnitude of the additional error sources introduced by the calibration process.
A complete statistical characterization of the TDCS performances would require the analysis of a larger sample of tracking passes under diverse observing conditions (e.g., tracking during daytime or at low elongation values). Future work may therefore include additional observations for Gaia, along with the analysis of large data sets for BepiColombo, Mars Express, or the ExoMars orbiter, which are routinely being tracked from the DSA3 station complex.
More specifically, an analysis of BepiColombo tracking passes is currently underway as part of the cruise tests and solar conjunction radio science experiments. This analysis is expected to improve the TDCS performance characterization, thanks to the more accurate K a /K a band tracking link that allows for an almost complete cancellation of the solar and ionospheric plasma noises.
The inclusion of ranging data within the orbit determination setup, combined with a high-fidelity dynamical model and longer integration arcs, will also enable an end-to-end assessment of the delay accuracy, which is relevant for possible applications of TDCS calibrations to VLBI measurements.

Data Availability Statement
Data used for this research will be made publicly available through the Guest Storage Facility (GSF) within ESA's Planetary Science Archive (https://www.cosmos.esa.int/web/psa/psa_gsf). This data set will include all raw measurements and ancillary information required for replicating the Gaia orbit determination analysis.