Toward High Precision XCO2 Retrievals From TanSat Observations: Retrieval Improvement and Validation Against TCCON Measurements

Abstract TanSat is the 1st Chinese carbon dioxide (CO2) measurement satellite, launched in 2016. In this study, the University of Leicester Full Physics (UoL‐FP) algorithm is implemented for TanSat nadir mode XCO2 retrievals. We develop a spectrum correction method to reduce the retrieval errors by the online fitting of an 8th order Fourier series. The spectrum‐correction model and its a priori parameters are developed by analyzing the solar calibration measurement. This correction provides a significant improvement to the O2 A band retrieval. Accordingly, we extend the previous TanSat single CO2 weak band retrieval to a combined O2 A and CO2 weak band retrieval. A Genetic Algorithm (GA) has been applied to determine the threshold values of post‐screening filters. In total, 18.3% of the retrieved data is identified as high quality compared to the original measurements. The same quality control parameters have been used in a footprint independent multiple linear regression bias correction due to the strong correlation with the XCO2 retrieval error. Twenty sites of the Total Column Carbon Observing Network (TCCON) have been selected to validate our new approach for the TanSat XCO2 retrieval. We show that our new approach produces a significant improvement on the XCO2 retrieval accuracy and precision when compared to TCCON with an average bias and RMSE of −0.08 ppm and 1.47 ppm, respectively. The methods used in this study can help to improve the XCO2 retrieval from TanSat and subsequently the Level‐2 data production, and hence will be applied in the TanSat operational XCO2 processing.


Introduction
Carbon Dioxide (CO 2 ) has been recognized as the most important greenhouse gases causing climate change due to the rise in anthropogenic emissions since the industrial revolution. Accurate measurement of atmospheric CO 2 in order to reduce the uncertainties of CO 2 fluxes is a key requirement for meeting the "measurable, reportable and verifiable" mitigation commitments of the United Nations Framework Convention on Climate Change (UNFCCC) (https://unfccc.int/resource/docs/2007/cop13/eng/06a01.pdf) that is aimed at avoiding disastrous consequences caused by climate change. The existing in-situ surface CO 2 measurement networks provide a wealth of accurate data related to the global carbon cycle. Unfortunately, the sparse coverage of such networks is still a major limitation for global carbon cycle research and large uncertainties in our quantitative understanding of regional carbon fluxes remain. plane is spanned by the vectors from the sun to the surface footprint and from the surface point to the observer). ND provides the routine measurements over land with the satellite flying in consistent rotation angle routine as in GL mode. We only use ND observations in this study. The swath width of TanSat measurements is~18 km across the satellite track and contains 9 footprints each with a size of 2 km × 2 km in nadir (Lin et al., 2017;Zhang et al., 2019). Nadir and glint mode alternate orbit-by-orbit, and the TanSat nadir model ground track is typically between two OCO-2 tracks, which provides potential future opportunities for combined usage of both data products for increased spatial coverage.
The first global XCO 2 dataset from TanSat observations for the first half year of operations has been produced with the Institute of Atmospheric Physics Carbon dioxide retrieval Algorithm for Satellite measurement (IAPCAS) , using the CO 2 weak band only. The nadir XCO 2 dataset has been evaluated using eight TCCON sites to show an average precision of 2.11 ppm for the TanSat retrievals .
In this study, we apply the University of Leicester Full Physics (UoL-FP) algorithm to the TanSat XCO 2 retrieval using a joint CO 2 weak band and O 2 A band retrieval, after adopting a new approach to radiometrically correct TanSat spectra. This new radiometric correction is introduced in Section 2 together with the UoL-FP TanSat retrieval approach. The applied quality filtering and bias correction methods are discussed in Section 3. Section 4 gives the results of the comparisons of the TanSat XCO 2 retrievals to ground-based observations from the TCCON network and Section 5 provides the summary and outlook.
Several publications have already introduced the UoL-FP algorithm and its applications in detail (Boesch et al., 2013;Cogan et al., 2012;Parker et al., 2011) and here we only provide a brief overview. Basically, the retrieval aims to resolve an optimization problem to find the best estimate of a state vector b x by minimizing the difference between a measured and a simulated spectrum taking into consideration additional constraints on the measurement errors and state vector a priori uncertainties. This state vector gives all retrieved parameters and includes atmospheric, surface and instrument variables. Full physics refers to the fact that the algorithm generates the simulated spectrum via an accurate multiple-scattering Radiative Transfer (RT) model. As many processes involved in the transfer of light through the atmosphere respond in a non-linear way to changes in state vector elements, an iterative Levenberg-Marquardt (L-M) inversion scheme is used in the retrieval, where x a is the a priori of the state vector. S a and S ε indicate the covariances of the state vector and the measurement respectively. The weighting function (known as Jacobians) K gives the linear change of the spectrum per change in state vector ∂y/∂x. The update step of the state vector's i th iteration from x i to x i+1 can be adjusted by the L-M factor λ.
The forward model, which includes a vector RT model, is one of the most essential parts of the retrieval algorithm. In UoL-FP, the radiative transfer model LIDORT is used, which is a linearized discrete ordinate radiative transfer model that generates radiances and Jacobians simultaneously (Spurr et al., 2001;Spurr & Christi, 2014). According to the instrument design, ACGS/TanSat only measures one direction of polarized light instead of the total intensity; hence we need to compute the Stokes vector {I, Q, U, V} (Liou, 2002;Mishchenko et al., 2004;Stokes, 1852) in the forward simulation. Since multiple scattering is depolarizing, it is reasonable to expect that the polarization could be accounted for by a low-order scattering approximation. For a relatively clear atmosphere (e.g. aerosol optical depth <0.3), retaining only the second order of scattering components for Q and U is generally sufficient (Natraj & Spurr, 2007). A 2-orders of scattering (2OS) model (Natraj & Spurr, 2007) is applied in our RT model to extend the scalar LIDORT to vector simulation capability (Somkuti, 2018;Somkuti et al., 2017). The Low Stream Interpolation (LSI) method is used to speed-up the RT calculations (O'Dell, 2010).
The atmosphere is discretized into 20 coarse layers from the surface up to 0.1 hPa allowing 10 sublayers within each coarse layer (200 fine layers in total) to reduce interpolation errors from non-linear changes of the gas absorption cross sections with pressure and temperature. Absorption cross sections for O 2 , CO 2 , H 2 O and CH 4 are considered in the RT computation. We use the NASA ACOS/OCO-2 v5.0 absorption coefficients (ABSCO) that have been extensively used in ACOS GOSAT and OCO-2 retrievals Devi et al., 2016;Drouin et al., 2017).
UoL-FP employs two aerosol retrieval types representing large and small aerosol particles. The optical depth profile and optical properties for each type are inferred from the aerosol data from the Copernicus Atmosphere Monitoring Service (CAMS) (https://atmosphere.copernicus.eu/). There are five basic aerosol types, including sea salt, dust, organic matter, black carbon and sulphate, provided by CAMS. Beyond the type, we also consider hydrophobic and hydrophilic effects in computing optical properties, as organic matter and black carbon are separated into hydrophobic and hydrophilic particles, whereas sea salt and sulphate are treated as hydrophilic, and dust as hydrophobic only. For aerosol concentration and vertical distribution, we use the CAMS model data as a climatology which is created from the CAMS data for the years 2014-2016. In addition, a high-altitude cirrus retrieval type is included using the ice particle model of Baum et al. (2014). The surface reflectance is assumed to be Lambertian and is described by a mean albedo and its wavelength dependent slope for each band.

Polarization
As shown by the sensitivity study of Natraja et al. (2007) and Bai et al. (2018), significant errors could be introduced in the O 2 A band and CO 2 weak band RT computation of radiances when using intensity instead of a combination of Stokes components, which can then cause significant errors in the CO 2 retrieval due to the wrong surface pressure and aerosol contributions (Butz et al., 2009;Geddes & Bösch, 2015).
In physical terms, the light measured by the instrument can be represented by a linear combination of Stokes components {I,Q,U,V}, As the circular polarization component V is very weak in a realistic atmosphere, we ignore this parameter by setting a 4 to 0. The Stokes parameter coefficients a 1 , a 2 and a 3 are determined by the satellite position, measurement geometry and the pointing direction of the polarizer, which are provided in the L1B data product. More detail on the definition of the polarization angle is given in Appendix A.

Radiometric Correction Approach of TanSat Spectra 2.3.1. TanSat Solar Calibration Measurement
TanSat has multiple on-orbit calibration strategies, including solar, dark field and white light lamp calibration, which are helpful to monitor the instrument status and performance. TanSat switches to solar calibration measurements when it flies over the ascending end of each orbit until almost in darkness and regularly takes~7 minutes (~1,260 frames) of solar measurements. The solar calibration procedure provides direct measurements of the solar spectrum which has no contamination from the Earth's atmosphere and surface and thus without uncertainties from the radiative transfer of light. During the on-orbit test phase, solar calibration is performed once every two orbits, and then changes to daily during operational observations.
Ideally, we can imagine the solar calibration as a model for a measurement essentially without atmospheric extinction and scattering (there is a well-calibrated solar diffuser used in the solar calibration, which involves a reflection). Therefore, one can use solar calibration measurements to verify the radiometric and spectral calibration applied to a measured spectrum if an accurate solar model is used and the reflection from the diffuser is well known .
Here, we use the new solar line model (2016 version) created by G. C. Toon (2014) (https://mark4sun.jpl. nasa.gov/toon/solar/solar_spectrum.html) combined with the UoL-FP solar continuum model that is obtained from a polynomial fitting of SOLar SPECtrometer (SOLSPEC) measurements (Meftah et al., 2018). The solar line model has been extensively used and verified with GOSAT and OCO-2 retrievals (Uchino et al., 2012). This solar model combines the solar lines and solar continuum, and hence can be directly used for solar model fitting without any further calculation.
TanSat observes the Sun though a reflective diffuser before the relay optical system, hence the wavelength-dependent diffuser reflectance needs to be compensated for, otherwise it could cause an extra pattern in the measured solar spectrum. In this study, we use a wavelength-dependent Bidirectional Reflectance Distribution Function (BRDF) that has been characterized pre-flight in the laboratory (Wang et al., 2014) without any corrections for a time dependent degradation as the instrument performance is stable. The solar irradiance is non-polarized and we use a factor of 0.5 to adjust for the linear polarizer.
To avoid contamination of terrestrial absorption from the upper atmosphere when the satellite observes the limb measurement region, only measurements with boresight solar zenith angle (defined angle between line of sight and solar) between 5°and 6°are used in fitting. Note that TanSat rotates by a 5°pitch angle (away from the Earth) to avoid damage of CAPI from strong incident light Chen, Wang, et al., 2017).

Solar Calibration Analysis and Radiometric Corrections
Monitoring the solar calibration spectrum shows a stable instrument performance during the first year of TanSat operations ( Figure 1, the CO 2 weak band is also stable but this is not shown here). The mean 10.1029/2020JD032794

Journal of Geophysical Research: Atmospheres
normalized solar measurement in the 750 nm region shows little change (mean normalized spectrum changes < 0.1%) for spectra acquired every 3 months, except for a very small difference for solar lines, which is probably caused by small instrument performance changes, e.g. instrument line shape (ILS). The figure shows that spectra acquired for the different cross-track footprints show some differences to each other in the continuum, which means each footprint has to be analyzed separately.
In order to further investigate the solar calibration spectrum, we developed a fitting tool for the solar spectrum based on the solar model corrected for the variable Earth-Sun distance and the Doppler shift effect (in wavelength) due to Earth's rotation and revolution. The fitting tool retrieves the wavelength dependence continuum correction in different forms (e.g. polynomial or Fourier series) as well as a wavelength shift and stretch simultaneously. The fitting tool uses the Gaussian-Newton method without the constraint of measurement noise.
The fitting algorithm is fast and hence we fit all individual solar spectra between March 2017 and May 2018 without averaging or manual spectrum selection. In total, 181,232 retrievals have been carried out and we find consistent fitting residuals for each cross-track footprint, which is expected considering the stable solar calibration measurement (Figure 1). Averaged relative fitting residuals for the NIR band are given in Figure 2. A considerable and consistent pattern with a mean RMSE of 0.48% remained in the fitting residual when we only adjust a wavelength-independent scaling factor to the continuum, which needs to be corrected to avoid errors in the XCO 2 retrieval. Thus, a method is needed that can compensate for these structures and improve the fitting residual. However, this pattern is not a simple linear or quadratic function with wavelength and a more complex model is needed. We found that the relative residual did not change much with changes in intensity of the incident light introduced by instrument degradation and Earth-Sun distance changes and hence we adopt a correction model based on multiplicative continuum scaling rather than a model using additive offsets. A common approach is to use a polynomial as a function of wavelength to scale the continuum (L inc ). We have found that a 5 th order polynomial is the best choice, where a i is the coefficient of the i th order of the polynomial component for a wavelength relative (Δλ = λ − λ 0 , λ 0 is the reference wavelength, in this study, we use first wavelength grid point of each band) to a reference of 750 nm. D(λ) represents an additive offset assumed to be a linear function given by a slope and a constant. This additive offset could compensate the impact from SIF (only O 2 A band) and stray light. As can be seen in Figure 2, this polynomial approach leads only to minor improvements in the fitting residual. Increasing the order of the polynomial did not lead to significant improvements compared to the 5 th order.
An alternative approach that should have a better capability to capture the oscillating nature of the fitting residuals is to use a Fourier series: where a i and b i are coefficients of i th order of the Fourier series cosine and sine components with c being the zero-order coefficient, and ω the scaling coefficient for frequency. Δλ and D(λ) are the same as for the polynomial model above. We have tested a 5 th to 10 th order Fourier series in the solar calibration fitting and finally found that a 8 th order is the best choice, which is because less than 8 th order is not sufficiently compensating the fitting residual and more than 8 leads to limited further improvement compared to 8 th orders. As a result of using Fourier series model, we find a significant improvement in the fitting residual with a mean RMSE of 0.21% ( Figure 2) compared to 0.48% for the polynomial approach. Note that, (1) the fitting residual near solar lines is still larger than the measurement noise, and (2) a gas absorption structure is visible in the residuals, which is also the case in fit residuals for the CO 2 weak band. This could be correlated with stray light from Earth or preflight calibration (e.g. ILS and radiometric calibration), but the exact reason is unclear and needs to be further investigated in the future. In principal, the continuum correction corrects the dominant effects in the fitting residual but some features still remain visible in the residuals. For example, larger residual structures appear for solar lines. Considering that this affects only a small number of pixels throughout a spectral band, it can be expected that impact on the retrieval is limited and we do not attempt to further correct these features. This continuum feature is probably caused by several reasons, e. g. radiometric calibration and stray light. For nadir observation, the component of incident light is more complicated than for solar calibration due to the scattering and absorption in atmosphere. However, we found similar residual features in both non-corrected solar calibaration and nadir observation fitting. Therefore, the applied Fourier series is applicable to nadir observation. But the parameters change between solar calibration and nadir observation fitting and also among the soundings, and hence we retrieve all Fourier series parameters in the cloud screening and XCO 2 retrieval (See details in Text S3, Table S2 and Figure S6, S7of SI).
The applied continuum correction in our study is basically a correction of the radiometric gain and no other corrections are applied. This is because (1) errors of the continuum are more obvious and they are stable in the case of TanSat, (2) solar lines are not sufficiently deep to constrain potential non-linearity corrections, (3) the ILS cannot be easily re-analyzed from space-based data, especially since TanSat did not provide solar calibration measurement for the entire dayside which would scan the ILS due to the Doppler shift (Sun et al., 2017).

O 2 A Band Surface Pressure Retrieval for Cloud Screening
The O 2 A band is important in the XCO 2 retrieval because, (1) it allows cloud screening based on apparent surface pressure to remove measurements that are highly contaminated by thick cloud, and (2) to provide information on aerosol and thin clouds in a joint O 2 A and CO 2 band retrieval to reduce errors that are otherwise introduced by light path modification from scattering.
An Oxygen A-Band (ABO2) cloud-screening algorithm is used as the cloud screening algorithm based on the analysis of a small number of micro-windows in the O 2 A band. The ABO2 algorithm has been applied to GOSAT and OCO-2 cloud screening and verified against MODIS and CALIOP measurement (Taylor et al., 2012. Unfortunately, the continuum correction described above cannot easily be applied to a retrieval that uses narrow micro-windows, and benefit from the fast RT model. Hence, we adopt an apparent surface pressure retrieval (assuming aerosol-free conditions) based on a fast RT model for the whole range of the O 2 A band (0.757-0.772 μm) that covers a multitude of O 2 lines and continuum for both sides of the band. This retrieval includes surface pressure, temperature profile offset, Lambertian surface albedo and its wavelength dependence slope, wavelength stretch and the coefficients of the Fourier series continuum model in the NIR as retrieved parameters. The outputs are used later as a priori values for a subsequent XCO 2 retrieval (Table 1). A priori surface pressure is taken from the European Centre for Medium-Range Weather Forecasts (ECMWF) ERA-Interim 0.75°× 0.75°reanalysis data product (Dee et al., 2011), and is interpolated to the location of the sounding and corrected for elevation differences using the U.S. Geological Survey's (USGS) EROS Data Center Shuttle Radar Topography Mission Global 30 Arc-Second Elevation (SRTM30) digital elevation model (DEM) (ftp://edcsgs9.cr.usgs.gov/pub/data/srtm/SRTM30).
We found that the frequency coefficient (ω in equation (8)) of the Fourier series cannot be well-fitted in the O 2 surface pressure retrieval if the first guess is not very close to the true value due to non-linear effects. Since we observe that the structure in the fitting residuals of the solar spectra changes little with time we can obtain a good first guess from the fitting of solar calibration measurement value, not only for ω but also for other coefficients. A set of 20,000 high quality solar calibration soundings (RMSE < 0.21%, which is the mean RMSE of all retrievals) including all 9 footprints have been selected for this calculation.
Another highly non-linear parameter is the stretch of the wavelength grid. The update of TanSat L1B data from version 1 to version 2 significantly improved the wavelength calibration, but additional corrections are still necessary. The solar calibration fitting also simultaneously provides wavelength stretch coefficients which we then use in the O 2 A band surface pressure retrieval.
The impact of the Fourier continuum correction on the surface pressure retrieval is significant ( Figure 3). We found that (1) the retrieval without the correction has a large bias and scatter, and (2) there are large Journal of Geophysical Research: Atmospheres differences between the cross-track footprints in the retrieval even after continuum correction (with a constant gain factor). Therefore, the application of the apparent surface pressure retrieval for cloud screening without continuum correction is problematic due to the different scatter and bias for different footprints which could mean that a large number of clear-sky measurement will be screened out for some footprints. As is shown for the case given in Figure 3, the surface pressure values for the retrieval with continuum correction mostly fall in the range between −10 to 0 hPa which satisfies the commonly used criterion for cloud screening of −20 to +20 hPa. In contrast, the retrieval without correction spreads between −10 to −30 hPa.
A ± 20 hPa threshold for cloud screening is reasonable and relatively loose, which means more data is allowed to pass the cloud screening. The benefit is obvious: we do not include aerosol and cloud scattering corrections in the apparent surface pressure retrieval, so that there will always be a small bias in the retrieved surface pressure which can become more significant for very dark surfaces ( Figure 3).

TanSat XCO 2 Two-Band Retrieval
The information on CO 2 volume mixing ratio (VMR) comes to a large extent from the 1.6 μm CO 2 weak band, which means that a retrieval based on only the weak band can provide a relatively accurate result if the measurement scene is not impacted by aerosols or if perfect knowledge on aerosols is available. Unfortunately, both are not possible for real scenarios. The preliminary TanSat XCO 2 retrieval (version 1.0) has been created using the CO 2 weak band only Yang et al., 2018). An extremely stringent filter has been applied in post screening, and hence screens out a large number of retrievals. The data produced with this approach provides good information on the global CO 2 distribution and trend but does not yield enough quantity or accuracy for reliable surface flux inversions.
The O 2 A band surface pressure retrieval with our newly developed continuum correction shows reliable results; hence we can extend the CO 2 retrieval with confidence to use the O 2 A and CO 2 weak band together to improve the retrieval accuracy. We have applied a two-band retrieval to produce XCO 2 data from soundings that are recognized as clear-sky scenes by the cloud screening. The retrieval uses a state vector that includes a CO 2 profile, scale factors for temperature and water vapor, surface pressure, surface albedo and spectral slope. In addition, we fit an additive zero offset and its wavelength dependent slope in both the O 2 A and CO 2 weak bands. For the CO 2 weak band, the same correction model as for the O 2 A band is applied. The full state vector is given in Table 1.
Significant improvements to the O 2 A band residual have been found when using the Fourier series correction. The Standard Deviation (SD) of the normalized residual indicates that the fitting is stable (Figure 4). Notice that the residuals of the 9 footprints show a slight difference even with the correction, e.g. at the long-wavelength end that contain few atmospheric absorption features, which means this correction cannot perfectly compensate all of the spectral patterns. The residual still contains structures related to O 2 lines, which is because (1) the spectroscopy is not perfect (Connor et al., 2016), and (2) the observed residual features for the solar calibration fitting which we discussed in section 2.3. The improvement for the CO 2 weak band is also large ( Figure 5) and the RMSE is reduced by almost half for the retrieval with the continuum correction. Similar to the O 2 A band, minor residual patterns remain and need to be investigated in the future.

Dataset
To remove outliers and to correct for small biases, a quality control filter and bias correction is applied using a reference dataset that is reliable enough for indicating retrieval errors. The Total Carbon Column Observing Network (TCCON) provides accurate measurement of XCO2 Wunch et al., 2015), and has been used for GOSAT and OCO-2 filtering and bias correction (Kim et al., 2016;Oshchepkov et al., 2013;Wunch, Wennberg, et al., 2011;Wunch et al., 2017;Wu et al., 2018;Yoshida et al., 2013). The OCO-2 team has also developed additional methods for filtering and bias correction for their version 8 product, including small area approximation, multi-model median and southern hemisphere approximation (O'Dell et al., 2018). In this study we only focus on the retrieval around TCCON sites, and hence only the data of 20 TCCON sites (Table 2) has been selected as a reference dataset for bias/filtering and validation in this study.  Figure 3. The light gray line (right y-axis) shows the measurement spectrum as reference. The red and blue shading indicates the continuum normalized standard deviation (SD) for the retrieval with and without Fourier series continuum correction respectively.

Journal of Geophysical Research: Atmospheres
algorithm (Rodgers & Connor, 2003) has been used in GOSAT (Cogan et al., 2012) and OCO-2 validation (O'Dell et al., 2018;Wunch et al., 2017). In practice, the application of this correction has led to small changes of <0.3 ppm (O'Dell et al., 2018;Nguyen et al., 2014). In this study, we directly compare the TanSat retrieval with TCCON measurement without attempting to remove differences in smoothing errors.
A colocation criterion of ±3°in latitude/longitude for matching up the TanSat soundings and TCCON measurements is applied. Other spatial co-location methods exist which allow to increase the quantity of matched up data, which is important for GOSAT (Guerlet et al., 2013;Wunch, Wennberg, et al., 2011), but since TanSat has a much higher data density compared to GOSAT, the spatial criterion above does provide a sufficient number of soundings. All TCCON measurements within ±1 hour of the satellite overpass are averaged to provide a reference dataset for each overpass, and only overpasses with more than 20 TCCON measurements within the 2 hour period are used. In this study, we found 396,068 co-located TanSat retrievals in 174 overpasses across the 20 TCCON sites.
Using the TCCON average as reference, we define the individual retrieval error as the XCO2 difference between each TanSat retrieval ( b C TanSat ) and the TCCON average ( C TCCON ) for a satellite overpass: δ xco2 ¼ b C TanSat − C TCCON . Hence, there will be hundreds of individual TanSat retrievals for each average TCCON value for an overpass, except when heavily contaminated by clouds.

Semi-Autonomous Sounding Selection 3.2.1. Method
The retrieval algorithm adjusts a number of parameters related to the atmosphere, surface and instrument and due to the complexity and non-linearity of the retrieval problem and limitations of the forward model to perfectly model the real behavior of the instrument, some retrievals will converge to a false solution or a local minimum. Therefore, as a first step, a quality control filter is applied to the retrievals to screen out such outliers before using the XCO 2 retrievals for bias correction and validation.
The main goal of a filter design is to efficiently screen out the poor retrievals (defined by large individual errors δ xco2 ) while keeping a maximum number of high-quality retrievals. Assuming that large errors are introduced by an imperfect forward model and/or measurement, we expect errors to correlate with other parameters used in the retrieval that are adjusted together with CO 2 .
The quality control normally comes with two basic questions, namely how to select the filter parameters and how to decide the best threshold values.
To answer the first question, we select 37 candidate filters and calculate the correlations with δ xco2 (only the top 8 have been listed in Table 3). The significance of each candidate filter is sorted with respect to the correlation coefficient, and the candidates with the lowest impact on δ xco2 (correlation coefficient < 0.3) are abandoned at the beginning. We select the candidates according to their rank, which means the first candidates have the highest priority to be chosen. Actually, it is not possible to decide on the best choice of filters (we call the number of filters complexity from hereon in) before the analysis on the performance for all possible complexity options has been carried out.
The solution to the latter question is often approached in an empirical manner. Physical basis methods have been developed and applied in the NASA Atmospheric CO 2 Observations from Space (ACOS) OCO-2

10.1029/2020JD032794
Journal of Geophysical Research: Atmospheres retrieval for versions V7 , and in subsequent versions V8 (O'Dell et al., 2018) and V9 (Kiel et al., 2019). In this study, we hope to use a method that is less driven by empirical decisions. A machine learning Genetic Algorithm (GA) method has been developed and applied to OCO-2 to generate warn level data (Mandrake et al., 2013). The algorithm optimizes complexity (how many filters are selected), threshold value and transparency (how many data points pass the filter, represented by percentage) simultaneously. Mandrake et al. (2013) use one additional species of gene to control the filter selection and optimize this selection simultaneously, which treats each candidate equally, and then causes the filter combination for each complexity to be different. In this study, the filter has been selected according to the rank of the δ xco2 correlation, which means that the selection of the filter combination is carried out after the complexity is fixed. For each GA, runs with different complexity from 2 to 8 are carried out and shown in Figure 6. We optimize the threshold values of all filters for each GA run with chosen complexity with different transparency simultaneously (see more details of the GA that is used in this paper in Appendix B).
Subjective selection of transparency and complexity is required at the end of the applied GA. The transparency for different complexities against RMSE is shown in Figure 6. It needs to be noted that for each complexity the filter is fixed, which means the complexity +1 represents the addition of an extra filter (Table 3). Few improvements on filter selection have been found when the complexity is larger than 4 for the determination of OCO-2 warn levels (Mandrake et al., 2013). We also found similar results for a complexity of 5 when compared to the complexity runs with values of 2-8 ( Figure 6). The advantages of multi-feature combinations for more than 4 features appears only when the fraction of filtered-out data is close to 50%, and there are few advantages if the fraction of filtered-out data becomes less than 30% (transparency > 70%). For a target of 2 ppm RMSE, we select a transparency between 64-65% (64.3%) with five filters.

Application of the Filter
Retaining as much data as possible for a given requirement of RSME is an advantage of the GA method. However, GA can give an optimal solution in a mathematical sense, which may not be physically reasonable; namely the filter thresholds are unrealistic. For contrasting the results obtained with GA against manual selection, we also applied an empirical selection of filter thresholds for the five first filters applied in GA (Table 3). The results are compared in Figure 7 using the bin-error plot method which is similar to that used by O'Dell et al. (2018) for OCO-2 post screening. In general, we find that the thresholds from GA and the empirical method (manual selection) have similar effects on the filtering. However, to achieve a similar RSME with the empirical selection of thresholds, we reduce the amount filtered data by an additional 13.5% compared to GA. Some filters, e.g. Grad CO 2 , Delta Psurf, and AlbedoB2, are parameters which have a specific physical meaning. Although these parameters are constrained during the retrieval, unreasonable results still occasionally occur. The GA that has been applied in this study has no capability of judging if a threshold has a reasonable value or not. Therefore, in practice, we strongly recommend applying additional physical filters to remove unreliable retrievals, if needed. However, in this study we only use GA screening.
In summary, our retrieval convergence rate is 94.8% of the cloud-free measurement (~30% of the original measurements for a 20 hPa threshold of the apparent surface pressure from the O 2 A band) and 64.3% have been recognized as good retrievals by GA. In total, we keep~18.3% of all nadir measurements, and subsequently apply a bias correction to them.

Bias Correction
The bias correction is applied after the quality control. Biases in retrieved XCO 2 can be introduced by shortcomings in the retrieval algorithm (e.g. parameterized models and radiative transfer), the Figure 6. The optimal target run genetic algorithm (GA) profile (Paretooptimal trade-off curves) for the selection of filter complexity and transparency. The XCO 2 mean individual RMSE is a total RMSE calculated from all retrievals which pass the optimized threshold value (good retrieval). The color indicates the number of parameters (complexity) that are used in the GA run. For each complexity, the filter is determined and fixed (see section 3.2.1 for details and Table 3 for filter definitions). Transparency has a 1% interval through 0-100%. No significant additional reduction in the RMSE was seen when using more than 5 filters. In this study, we cut-off at 2 ppm with 5 filters, which is an ad hoc choice, and the transparency is 64.3%. The 2 ppm RMSE is a compromise between coverage and accuracy. The RMSE is calculated from individual TCCON overpass couples, which means the footprint can be spatially away from the TCCON location, and hence there are would be geolocation differences that casue naturally CO 2 differences. In addition, we also have to consider the measurement error and bias that include in the RMSE. measurements (e.g. stray light or calibration) and the databases which are used (e.g. gas absorption and solar model). The former two will typically lead to variable biases that will depend on other parameters while the latter one more likely induces a global bias. Commonly, the parameter-dependent bias dominates and can be corrected  using a linear combination of identified bias parameters, Δ XCO2 ¼ ∑ n i¼1 a i · p i þ b, with a i being the linear coefficient for the i th parameter p i , and b an offset. For the bias correction, we use the filters that have already been applied in the quality control as these five parameters are most strongly correlated with the error in XCO 2 . A multiple linear regression is applied to find the coefficients a i for each across-track footprint. A further small improvement in RMSE is found with increasing the number of parameters up to 16 (SI section 1). The most significant improvement appears for the first three parameters and further parameters have a smaller effect. Improvement becomes less significant when using more than 12 parameters. Here, we use five parameters to avoid over-optimizing the bias correction. The number and selection of filter parameters is somewhat subjective and has been made to agree with the quality control. Using rank of XCO 2 individual error correlation to select bias correction parameters sometimes could miss bias correlated parameters, and hence more bias relative studies are recommend in the next stage research. The bias mostly comes from imperfect forward model and measurement (e.g. stray light and calibaration issue), and they mixed but independent for each footprint. The parameter bias correction does not only involve physical parameters but also parameters used for the spectrum correction. Therefore, we perform the bias correction separately to each footprint.
The effect of the bias correction is shown in Figure 8. The largest improvements in RMSE are found for footprint 1, 2, 7, and 8. Besides a parameter-related bias, there can also be a footprint dependent bias as has been Figure 7. The performance of the GA with 2-5 filters (rows 1-4) and a manual filter threshold selection with 5 filters (row 5). The histograms (left y-axis) indicate the counts for each filter bin, and red and black solid points show the bias and RMSE for each bin (right y-axis). The gray line is the upper and lower threshold used for each filter. The filters are applied from left to right (columns 1-5) sequentially.

Journal of Geophysical Research: Atmospheres
shown for OCO-2. Typically, a stable and constant bias mainly relates to instrument effects (O'Dell et al., 2018;Wu et al., 2018). TanSat has 9 footprints in a swath across~18 km on the ground on average. In our retrieval, we also investigate the cross-footprint bias by subtracting the mean value of a swath when all 9 across-track footprints are available. After the independent parameter bias correction, no obvious cross-track footprint bias is found and the average bias is less than 0.1 ppm but with a large (>0.3 ppm) standard deviation (SD). This result is also true, when we select the swaths for low variation of surface albedo across the swath (SD < 0.01) to guarantee that the scene is comparable through all footprints (only 532 swaths satisfy this criteria). This result indicates that parameter bias correction, if carried out for each footprint independently, can reduce the across-track bias.

The Discussions on Two-Bands Retrieval
The Fourier series approach, introduced in section 2.3, has been instrumental in allowing a two-bands retrieval that uses the O 2 A band together with the weak CO 2 band. As shown in section 2.3, the continuum correction for the O 2 A band in the XCO 2 retrieval leads to a significantly improved fitting residual for all 9 footprints. We have also found that the apparent surface pressure retrieval (sect.2.4) from the O 2 A band yields reliable results when adopting the continuum correction. In this study, we retrieve XCO 2 from TanSat nadir measurements from March 2017 to May 2018 around 20 TCCON sites (Table 2, and see detail in sect.3.1) by using the setup that has been introduced in section 2.5. The continuum correction effect (as discussed in sect.2.3) on XCO 2 is shown in Figure 9 for TanSat retrievals around the TCCON/Lamont (USA) site using all TanSat retrievals that pass the quality control but without bias correction. The RMSE decreases from 4.08 ppm to 1.59 ppm, while the bias changes from 0.57 ppm to −0.56 ppm. We also found that more retrievals fail to converge if no continuum correction is applied, meaning that the continuum

10.1029/2020JD032794
Journal of Geophysical Research: Atmospheres correction also improved the retrieval robustness. In summary, the continuum correction and usage of the O 2 A band shows a significant improvement on the retrieval precision. We also compared our new approach with the TanSat XCO 2 retrieval from a single CO 2 band only ( Figure 10). The single CO 2 weak band retrieval has been carried out with UoL-FP/TanSat retrieving the CO 2 profile, surface albedo and wavelength stretch, assuming no aerosol and cloud scattering in the atmosphere. This is the same strategy used by the IAPCAS (the Institute of Atmospheric Physics Carbon dioxide retrieval Algorithm for Satellite remote sensing) retrieval to generate preliminary TanSat XCO 2 data Yang et al., 2018). The improvement is significant both on the bias and RMSE. The bias is reduced by~1 ppm, while the RMSE decreases from 3.43 to 1.59 ppm. Figure 9. Histogram of individual XCO 2 retrieval errors (difference between TanSat and TCCON XCO 2 ) for TCCON/ Lamont for clear-sky measurements. The orange (right y-axis) and blue (left y-axis) give the results for the two-band retrieval with (orange) and without (blue) Fourier series continuum correction. All data statistics in this figure passed quality control, but there is no bias correction applied. The RMSE remove bias shown in text box of this figure gives the RMSE calculated from each individual error after subtracting the overall bias. Figure 10. Same as Figure 9, but for a comparison between a single CO 2 weak band retrieval (no continuum correction) and the two-band retrieval with Fourier series continuum correction.

Validation Against TCCON Measurement
To compare TanSat retrievals to TCCON measurements, we average the quality-controlled, bias-corrected TanSat retrievals per overpass including all across-track footprints for a swath, as no obvious across-track footprint bias has been observed. Only overpasses with more than 50 soundings are used. Figure 11 shows the comparisons of the TanSat XCO 2 average per overpass compared to the TCCON retrievals averaged over ±1 hour of the overpass time. From the 174 data pairs found, we can infer a daily mean bias of 0.08 ppm and a RSME of 1.47 ppm, which are parameters often used to characterize retrieval performance (Cogan et al., 2012;O'Dell et al., 2018;Wu et al., 2018). The linear regression has a slope of 0.83 and R 2 of 0.77. However, these statistics can be misleading, as sites with large numbers of overpasses will have a larger weight than sites with fewer overpasses, and therefore this average is dominated by the few sites with a large number of overpasses. Consequently, the figure also displays the overall mean of the mean RMSE per site with the individual values for mean bias and RMSE given in (Table 2).
Overall, we find that mean biases are small but show a noticeable variation between sites which is partly due to the low number of data points at some sites. Considering only the seven sites with more than 10 overpasses, we find that JPL (USA), Pasadena (USA) and Saga (Japan) show a large (~1 ppm) negative bias. JPL (USA) and Pasadena (USA), for example, have a strong impact from the nearby city of Los Angles (USA), and it is suggested not to include these sites for bias correction (O'Dell et al., 2018). The linear regression with a slope of 0.83 and R 2 of 0.77 can be improved to 0.84 and 0.92 respectively by removing measurements at Pasadena (USA), JPL (USA), Tsukuba (Japan) and Saga (Japan) (see SI section 2).
For the other four sites, Lamont, Park Falls, Darwin and East Trout Lake, we observe small biases of a few tenths of a ppm only. For Lauder (New Zealand) and Sodankyla (Finland) we also observe large biases, but

10.1029/2020JD032794
Journal of Geophysical Research: Atmospheres the number of data points are small. In the case of Sodankyla, we find a very large outlier for one day (24 July 2017) with a large difference to TCCON (~+ 8 ppm) and to the a priori value (~+ 5 ppm). Checking cloud information from the RGB Image and MODIS cloud mask did not show cloud contamination within the satellite field of view (FOV). However, if we remove this day, the bias for this site reduces to 0.35 ppm and the RMSE to 1.25 ppm which indicates that the statistics given in Table 3 can be impacted by single outliers. It should be noted that the linear regression will be impacted by the quantity of overpasses used. In this study, we only use 15 months of TanSat nadir measurements, leading to few overpasses for many sites. Including longer time series in the future will help to increase the robustness of the results.

Temporal Trend and Variations
XCO 2 has typically a detrended, seasonal amplitude of~5-8 ppm from peak to trough (Lindqvist et al., 2015) in the Northern Hemisphere and roughly an annual growth of~2 ppm globally, which is also seen from TCCON measurement from 2017 to 2018 ( Figure 12). This behavior is well captured in the TanSat retrievals for both the Northern and Southern Hemisphere. For example, for Edwards (USA) the satellite measurements show a very clear seasonal variation with a peak to peak detrended variation of~6 ppm. A + 1.36 ppm bias and RMSE of 1.39 ppm is found for this site but this is derived from 3 co-located data-pairs only. The negative biases found for JPL (USA) and Pasadena (USA) are clearly visible in the time series.
Higher values from TanSat compared to TCCON are observed toward the end of the time series for Southern Hemisphere sites Darwin (Australia), Lauder (New Zealand) and Wollongong (Australia). A

10.1029/2020JD032794
Journal of Geophysical Research: Atmospheres potential seasonal dependence of biases will not necessarily be corrected by the applied bias correction as it does not include parameters like solar zenith angle (or airmass) or time. To further investigate potential seasonal biases, Figure 13 shows a time series of the observed biases for sites with more than nine co-located overpasses. Although, we find some variations of biases; for example, Saga (Japan) indicates a negative bias in the spring of 2017 and a positive bias in the spring of 2018, we do not observe a general correlation of biases with season across all sites.

Summary and Outlook
In this study, the UoL-FP algorithm has been implemented to retrieve XCO 2 from TanSat nadir mode observations. A major obstacle toward high-quality retrieval of XCO 2 from TanSat has been spectral artifacts in the fitting residual of the O 2 A band. By analyzing the solar calibration measurement, we found that this pattern is stable with time and that it can be effectively eliminated by applying an 8-orders Fourier series model. This correction significantly improves the fitting residual, and accordingly reduces the XCO 2 retrieval RMSE against measurements from the TCCON site at Lamont (USA) from 4.08 to 1.59 ppm.
A data-driven quality control and bias correction strategy has been applied to improve the data quality of the XCO 2 retrievals. Based on the correlation of TanSat-TCCON XCO 2 individual error against different retrieval and forward model parameters, including Grad CO 2 , Delta Psurf, Continuum B1C3, Zeroff B2S and AlbedoB2 (see description in Table 3), a Genetic Algorithm (GA) has been used to select quality filters for 64.3% of transparency and~2 ppm of RMSE. This leads to an overall retrieval throughput of~18.3% (good retrievals). Compared to the manual selection of filters this achieves 13.5% more soundings for comparable RMSE. We apply a multiple linear regression for the same parameters as selected by the GA to derive a 10.1029/2020JD032794

Journal of Geophysical Research: Atmospheres
footprint dependent bias correction. After applying the bias correction, no obvious difference in remaining bias between cross-track footprints is found.
The validation involves 20 TCCON sites with co-location criteria within ±3°latitude/longitude and ±1 hour in time. The mean RMSE is found to be 1.47 ppm and the mean of the RMSE per site is 1.32 ppm. Typically, we find biases of a few tenths of a ppm for individual TCCON sites but larger biases (~1 ppm) are observed for some sites, especially in the proximity of major cities such as for JPL (USA), Pasadena (USA) and Tsukuba (Japan).
Previous TanSat retrievals have adopted an approach using a single (weak) CO 2 band retrieval only. To improve this approach, the development of a 2-band retrieval in this paper combining the O 2 A band with the weak CO 2 band represents a step forward in reliable TanSat retrieval as demonstrated by the improvement of the RMSE against TCCON from 3.43 to 1.59 ppm, which was found in a retrieval study of the Lamont TCCON site.
The methods used in this study, such as continuum correction, can help to improve the XCO 2 retrieval from TanSat and subsequently the L2 data product, and hence will be applied in the IAPCAS XCO 2 retrievals which are used for the operational processing of TanSat data. There are differences in models and retrieval strategy between the UoL-FP/TanSat and IAPCAS/TanSat retrieval. In future studies we will investigate the impact of the differences of the two algorithms and their advantages and disadvantages. In this study we introduced an improved TanSat retrieval over land based on a two-band retrieval. The inclusion of the strong CO 2 band still needs to be investigated in the future and can be expected to further improve the retrieval performance by providing more information on water vapor and temperature, as well as the wavelength dependency of aerosols and clouds in the long-wave end of the SWIR. Further studies on target mode observations would be helpful to improve the retrievals. Furthermore, the retrieval of glint mode observations over both ocean and land surfaces will need to be studied, allowing us to extend the coverage of the TanSat XCO 2 dataset.

Appendix A: The Polarization Angle of TanSat Measurement
To consider polarization effects in RT and the retrieval, the polarization angle needs to be known as accurately as possible. Errors in calculated radiances that could arise from a miscalculation of the polarization angle can be as large as 20% in deep absorption lines in the O 2 A band ( Figure A1), leading to large errors Figure A1. A simulation on the impact of the polarization angle on a simulated O 2 A band spectrum. The left sub-figure shows the changes of the spectrum when the polarization angle is 20°, 40°, 60°and 80°larger than the true angle. The color index is marked in the text box of the right sub- figure. The right sub-figure shows the relative change to the normalized spectrum against radiance level. Notice that the change becomes larger when the radiance decreases. This effect introduces significant errors in deep absorption lines where important information for the retrieval comes from. The simulated scene has a surface albedo of 0.2 and solar zenith angle of 30°in nadir measurement mode. The calculated polarization angle is 86.3°as calculated by the method described in section 2.1.2. in the XCO 2 retrieval. The results shown in Figure A1 have been calculated for an aerosol-free scene with a solar zenith angle of 30 o and a Lambertian albedo of 0.2 assuming the 1976 U. S standard atmosphere. However, the impact will become even worse for larger solar zenith angles; for example, Beijing, China (with latitude 40°) the SZA increases to a value larger than 60°in winter for the TanSat overpass time. The polarization impact in the CO 2 weak and strong bands is lower in aerosol-free scenes as the impact of Rayleigh scattering is greatly reduced, but an accurate calculation is still required in the presence of polarizing surfaces or large polarizing particles (cirrus).
Commonly, RT simulations are carried out in the local meridian plane (a plane defined by the unit vector of the local normal of a footprint and the vector from the footprint to the satellite). Unfortunately, the polarizer direction is the reference to the principal plane due to TanSat in-flight tracking strategy (tracking the principal plane). The change of the Stokes vectors due to the rotation of the reference plane can be described by an angle defined between the principal plane and the local meridian plane (Liou, 2002;Mishchenko et al., 2004), namely the polarization angle (σ), cos σ ¼ cos θ sun − cos θ obs · cos Θ sin θ obs · sin Θ ; (A-1) in which θ sun and θ obs are the footprint solar zenith and observation zenith angles, while Θ is the scattering angle in the scattering plane, defined by the satellite line of sight and the solar direction: cos θ obs cos θ sun þ sin θ obs sin θ sun cos Δφ ð Þ; (A-2) with the relative azimuth angle Δφ defined by the footprint satellite azimuth and the solar azimuth angle.
In practice, due to the instrument optical design and integration strategy, the polarizer direction is parallel to (or in) the principal plane. This could lead to the light that passed the polarizer to be approximately zero (in ideal conditions) meaning that the measurement will become almost negligible when the measurement geometry meets the Brewster angle condition (solar zenith angle approaching to Brewster angle). For the purpose of avoiding this effect and increasing the received signal, the TanSat spacecraft is rotated by a + 45°yaw angle during flight. Therefore, the actual polarization angle is σ+45°. Then the Stokes parameter coefficients are given by, In summary, the Stokes vectors {I,Q,U} at the top of atmosphere are firstly computed by the forward model and then combined to a simulation of the signal received by the instrument according to Eq. 2 in main text.
Appendix B.

B.1 The Genetic algorithm
Genetic algorithms (Goldberg, 1989;Holland, 1975) belong to the class of derivative-free (Jacobi-free) optimization methods (Rios & Sahinidis, 2013), which also includes other popular methods such as simulated annealing (Kirkpatrick et al., 1983) and particle swarm optimization (Kennedy & Eberhart, 1995). As opposed to traditional gradient-based methods, where a cost function is minimized by updating an initial guess along a descent direction identified by computing the derivatives of the cost function with respect to its parameters (Nocedal & Wright, 1999), in derivative-free methods the optimization is carried out through successive evaluations of the cost function itself, until a definite convergence criterion is met. An advantage of derivative-free optimization methods compared to gradient-based methods is that they are more capable of escaping local minima of the cost function to be optimized. Their disadvantage is that they may require a higher number of iterations before reaching the minimum. For this reason, they are particularly advantageous when the cost function is relatively inexpensive to evaluate.
Optimization in genetic algorithms is based on concepts drawn from the process of evolution taking place in biology. If by f(x) we denote a cost function to be optimized with respect to a vector x, a genetic algorithm first generates a dataset (population) {x i } of candidate solutions, and then randomly updates the dataset by applying two operations to its elements: 1. Mutation: one or more component of an element x are randomly perturbed 2. Cross-over: a new element of the population is generated by combining two existing elements All the elements in the population are then ranked with respect to a fitness function which measures the goodness of a certain element as a solution of the optimization problem, and a selection is performed by retaining a user-defined percentage of elements which obtain the best ranking. Typically, the fitness function is the cost function itself. As a result of this selection process, a new population is created, and the random operations of mutation and cross-overfollowed by a selectionare applied to the new population. The process described above is repeated until a convergence criterion is met.
In practice, each filter there are two threshold values that need to be decided: upper boundary and lower boundary (gene). Therefore, we define two segments (upper boundaries and lower boundaries) for each individual respectively, and the combined segments as a whole, containing filters with two boundaries, and we run GA for each complexity with determined filters. The evolution is completed by multi reproductions that typically include cross-over and mutation. Cross-over exchanges the gene segment among individuals that win the fitness selection. The purpose of cross-over is to combine the advantaged genes and to improve the fitness of a new generation of individuals. However, we always hope to change advantaged genes by at least a small distance toward the optimized solution. Therefore, only mutations are applied in reproduction, which has been found to be sufficient (Mandrake et al., 2013). Threshold values for each gene are initiated randomly with a quantity of 100,000 individuals ranging between the maximum and minimum values in the training dataset. For each reproduction, only 15 individuals pass the fitness selection and replace the population of the last reproduction. Therefore, the mutation is only coming from these 15 individuals that contain advantaged genes. We let the GA mutate the gene 100 times for each individual by selecting gene and according number randomly, and then 150,000 individuals have reproduced after each mutation. A convergence criterion has been chosen to stop the GA run when the RSME of the retrieval error reduces to less than 0.2% over the previous ten iterations. This criterion is looser than the one that has been introduced by Mandrake et al. (2013), but in practice, we found this works sufficiently well.