Radio Science

Inversion of Odin limb sounding submillimeter observations by a neural network technique



[1] The limb sounder radiometer on board the satellite Odin is the first instrument measuring emission from space in the submillimeter region to map atmospheric species. Nonlinear inversions of Odin spectra by iterative approaches are computationally very intensive, so a faster neural network technique has been developed. The technique is tested here by inverting simulated observations in the 544.2-545.0 GHz band, retrieving O3 by the neural networks and an optimal estimation approach based on the Marquardt-Levenberg algorithm. Special consideration is given here to the implementation of a spectral reduction technique and the treatment of the main random uncertainties. The reduction technique is based on deriving spectral eigenvectors from the weighting functions of the observations and successfully reduced the dimensionality of the spectral space by two orders of magnitude. The random uncertainties are treated by incorporating their possible realizations into the training sets, and inversion of simulated spectra with thermal noise, temperature, and pointing uncertainties gave similar retrieval errors for the neural networks and optimal estimation, with the neural networks being much faster. However, a problem remains because optimal estimation can easily incorporate last-minute a priori information that reduces the random uncertainty and subsequently the retrieval error, but it is not so easy for the neural networks to incorporate the same information.

1. Introduction

[2] The Odin satellite is the result of a Swedish initiative for a small satellite built in collaboration with Canada, France and Finland, and was launched in February 2001. Its microwave radiometer (Odin-SMR) is the first space-borne instrument to measure thermal emission from the middle atmosphere (10-100 km) in the submillimeter range (480-580 GHz) in a limb sounding mode. Measured species include O3, H2O, HNO3, H2O2, ClO, N2O and NO, aiming at providing both altitude determination and geographical mapping of key species involved in different atmospheric processes, with special emphasis on the ozone depletion phenomena.

[3] Optimal estimation (OEM) [Rodgers, 1976] is the chosen method to perform the operational inversions [Baron et al., 2002]. For some observational bands the assumption of a linear mapping is not of general validity, and iterative inverting schemes are needed. For the Odin-SMR inversions, an iterative scheme combining OEM and the Marquardt-Levenberg algorithm [e.g., Marks and Rodgers, 1993] will be applied, but at the cost of a large computational burden.

[4] A neural network (NN) technique to invert Odin-SMR spectra has already been presented by Jiménez and Eriksson [2001]. The technique consisted in setting up and training a set of NNs to perform a nonlinear regression between spectra and species profiles. For the simulated Odin-SMR inversions tested, the NN technique and OEM did inversions that were very similar in terms of vertical resolution and retrieval error, with the advantage for the NN technique of doing very fast inversions. Here, the NN technique will be discussed regarding its extension into a more operational technique. The main aspects treated here will be a data reduction technique specific for limb sounding atmospheric observations, to reduce the dimensionality of the spectra space, and the treatment of the random observation uncertainties by the NN technique.

[5] The paper is organized as follows. First, the NN technique is introduced by presenting the net topology and the training method. A description of a data reduction scheme to set NNs of practical dimensions follows. Then a concrete NN algorithm is set up to retrieve O3 from simulated radiometric data in the 544.6 GHz band. This is followed by a discussion of the possibilities of the NN technique for data reduction and for the treatment of random uncertainties. Finally retrievals using the NN method are compared with simulations of the operational OEM inversions.

2. Inversion Theory

[6] Following the formalism of Rodgers [1990], the forward model F for the observations, linearized around an a priori state (xapriori, bapriori), can be written as

equation image

where y is the observed spectra, x are the variables to be retrieved, in this case, species profiles, b are other variables of the model, ϵ are the measurement errors, for instance, thermal noise, and Kx and Kb are the state and model parameter weighting functions. If the uncertainty of variables and parameters is expressed in the form of covariance matrices, denoted as Sϵ for ϵ, Sb for b, and Sx for x, the retrieved state is called , the contribution function matrix is called Dy, and the averaging kernels matrix is denoted as A, the expected retrieval error δ = x can be characterized as

equation image

with I as the identity matrix and the forward model and measurement uncertainties grouped together as So = Sϵ + KbSbequation image

[7] Applying OEM to solve an inversion problem means to select the maximum likelihood state x given a measurement y. If the forward model can be linearized around the a priori state (xapriori, bapriori), the solution is given by:

equation image

If the mapping between spectra and profiles is not linear, there is no analytical expression giving the maximum likelihood state and the solution has to be found numerically. For the Odin-SMR inversions the Marquardt-Levenberg algorithm will be used. An iteration of the Marquardt-Levenberg algorithm is given by [Marks and Rodgers, 1993]

equation image

where xi is the solution after iteration i, equation image are the weighting functions evaluated at (xi, bapriori), D is a diagonal matrix having the diagonal elements of Sx, and c is the parameter controlling the trade-off between steepest descent and Newtonian iteration.

3. Neural Network Technique

3.1. Topology

[8] A NN is an interconnected assembly of processing units called neurons. The relation between inputs and output of the neuron is given in Figure 1. By organizing the neurons in different layers and allowing the input to the NN to propagate in different ways, different types of NN can be set up. When the input signal is allowed to propagate only in the forward direction, there is more than one layer, and at least one of the layers has differentiable activation functions, the NN is called a feed-forward multilayer perceptron (MLP). Following the work of Jiménez and Eriksson [2001], a MLP of one hidden layer and one output node, one for each species and altitude of the retrieval grid, will be used here. If the input vector of the MLP is i and the output of the MLP is u, the way the input signal propagates through the MLP is given by

equation image

where fj is the activation function, Wj the weighting matrix, bj the bias, and ij the input at layer j, in this case o for output layer and h for hidden layer. Hyperbolic tangent and linear activation functions were used for the hidden and output neurons, respectively.

Figure 1.

Schematic of a neural network neuron. The input vector is weighted, the values are summed together, a bias term is added, and the result is used to feed an activation function giving the output of the neuron. The neural nets used here have one hidden layer with two nodes and hyperbolic tangent activation function, and one output node with linear activation function.

3.2. Learning and Generalization

[9] The weights and biases are the adaptative parameters of the MLP. These are determined during a learning phase, called training, where the weights and biases that minimize a cost function, determined by a set of input-output examples, are found. Here the examples are a set of different atmospheric states (tl) and the corresponding Odin-SMR radiances (yl), given as Z pairs of vectors {yl, tl}l=1…Z, where the radiances are simulated by feeding a forward model of the observation with the atmospheric states. It is important that the resulting NN exhibits good generalization properties, that is, the NN should become a model of the statistical processes generating the data, not a model representing exactly the particular set of examples.

[10] A typical cost function is the mean sum of squares of the difference between targets (the training species profiles tl in this case) and current outputs of the MLP to the corresponding input vectors (the training radiances yl), modified with a penalty term to limit the complexity of the model and improve generalization. Here regularization by weight decay [see, e.g., Bishop, 1995] will be used, so the cost function to be minimized takes the form:

equation image

where N is the number of neurons of the output layer, ∥ ∥2 is the standard 2-norm, u(yl) is the output vector of the MLP for the corresponding spectra yl, H is the total number of weights and bias represented as wi and γ is the regularization parameter. The minimum of the cost function will be searched by using the Marquardt-Levenberg algorithm [Hagan and Menhaj, 1994], and an optimal γ will be found by applying Bayesian techniques [Mackay, 1992]. The practical implementation is done following Foresee and Hagan [1997].

3.3. Feature Extraction

[11] Feature extraction is a name given to the process of generating linear or nonlinear combinations of original input variables to a NN in order to reduce the dimensionality of the input data [Bishop, 1995]. Because of the usual limitations in size of the training set, a smaller number of inputs means NNs with fewer adaptative parameters, resulting in faster training and a higher likelihood of properly constraining the adaptative parameters. Typical radiance vectors for the Odin SMR have more than 1 × 104 values, so a reduction technique extracting features from the original spectra is needed here.

[12] The reduction technique applied here is based on a Hotelling transformation of the spectral space. The standard Hotteling transformation decomposes the spectral covariance matrix as Sy = EΛET, where the columns of E are denoted as eigenvectors and the diagonal elements of Λ are called the eigenvalues. The transformation, for a given dimension k, that minimizes the mean square error between original (y) and transformed equation image vectors is

equation image

where equation image has length k and Ek is the part of E containing the first k eigenvectors.

[13] Instead of a standard Hotelling transformation, a reduction technique developed specially for atmospheric inversions will be implemented here following Eriksson et al. [2002]. If equation (1) is valid, the spectral variability Sy can be expressed as:

equation image

To optimize the transformation for the retrieval, since KbSbequation image+Sϵ can be considered as a perturbing factor for the retrieval, a more optimal transformation is:

equation image

If Sx is approximated by the identity matrix I (that is, the variance of the elements of x is set to one and their correlations are neglected), E can be calculated by a singular value decomposition of Kx as the following relation holds:

equation image

so eigenvectors and left singular vectors coincide.

[14] A further refinement will be applied here. As each MLP is targeting only one specific retrieval altitude, the relevant information is contained in the spectra from the part of the scan with tangent heights around the retrieval altitude. The same applies to the weighting functions, so for each MLP i a new equation image with only the relevant information from the original weighting function matrix Kx will be formed and (10) applied to it.

4. Odin-Smr Simulations

[15] The NN inversion technique is presented by conducting inversions of simulated Odin-SMR spectra. Odin-SMR is observing in different bands in the millimeter and submillimeter region [Merino et al., 2001]. The 544.2-545.0 (544) GHz band contains, along with some other lines, a strong O3 line, and retrieval of stratospheric O3 from synthetic spectra in this band will be used to discuss the NN technique.

[16] To simulate the Odin-SMR observations, a set of atmospheric states was statistically generated by randomly changing the temperature and the vertical distribution of the main species in the band, O3, HNO3 and H2O2, from the assumed mean atmosphere. Mean species, pressure and temperature profiles from the Odin operational climatology corresponding to the 60-70 deg latitude band were used as the mean atmosphere. To randomly generate species and temperature profiles, covariance matrices reflecting the atmospheric variability were set, and Choleski's decomposition method was applied [Cressie, 1993]. Gaussian statistics for the species and temperature distribution, and hydrostatic equilibrium for pressure and temperature profiles were assumed. The species variability was modeled following [Hoogen et al., 1999], setting the species standard deviation (σ) to 0.3 (relative to a normalized species profile) and assuming a exponentially decreasing correlation, with the correlation length (lc) set to 4 km. The temperature distribution was modeled similarly but assumed a linearly decreasing correlation (the correlation between temperature at two altitudes is a linear function of the altitude difference divided by the correlation length) instead of exponential, with lc set to 5 km and σ set to different values depending on the simulations.

[17] A forward model was then run on the set of atmospheric states. For this study, a modular public domain radiative transfer program called ARTS [Eriksson et al., 2000] was employed. The forward model solved the radiative transfer along the observation path, including the main Odin-SMR instrumental characteristics. When needed, a pointing uncertainty was simulated by disturbing with a constant offset all the tangent heights of the spectra in a scan (nominal scan between 12 and 60 km, step of 1.5 km) before generating the spectra, assuming either a Gaussian or a uniform distribution. The spectra were then degraded by adding thermal noise of magnitude similar to that of the observed noise from the Odin-SMR observations. Typical noise-free spectra corresponding to a scanning of the atmosphere in the 544 GHz band are shown in Figure 2.

Figure 2.

Simulation of a limb sounding Odin-SMR scan in the 544 GHz band. The main features of the spectra are a cluster of HNO3 lines, a H2O2 line, and some O3 lines, including a very strong transition at 544.85 GHz.

[18] The simulated spectra were inverted by both the NN technique and OEM. Two similar sets of spectra were needed for each simulation: The first was used to train the NN, while the second was inverted by both techniques. To implement and train the NN, the Neural Network Toolbox from Matlab [Demuth and Beale, 1993] was used.

5. Results and Discussion

[19] The main random retrieval errors in the 544 GHz band are related to the thermal noise and the uncertainties regarding the atmospheric temperatures and the pointing (assuming an incorrect spectrum tangent height). For the OEM inversions, the thermal noise goes into Se, and temperature and a pointing offset are included as retrieval variables, with the best information available before the inversion used as a priori. For the NN technique, the possible uncertainties have to be incorporated during the learning phase. This can be done by having training sets representing all possible realizations expected during the observations for the random uncertainties, as it was demonstrated for the uncertainty of the species a priori by Jiménez and Eriksson [2001].

5.1. Inversion with Thermal Noise

[20] To test the NN technique scheme, first a set of 100 spectra with only thermal noise as random uncertainty (no temperature and pointing uncertainties) was inverted by both the NN technique and OEM. For OEM, both linear and nonlinear inversions were done by applying (3) and (4). No reduction technique was needed because So = Sϵ is a diagonal matrix, only containing the thermal noise description, and is easily invertible despite its very large size. For the NN technique, one MLP for each altitude of the retrieval grid (retrieval altitudes every 1.5 km) was set and trained with 500 pairs of spectra-profiles. For each MLP, those spectra with tangent heights 5 km above and 10 km below the MLP retrieval altitude were judged relevant for the inversion, and equation image was accordingly formed as described in Section 3.3. Only 20 eigenvectors were needed in equation image The NNs used were then relatively simple, MLPs with 20 input nodes, 2 neurons in a hidden layer and 1 output neuron, trained in batch mode to minimize the cost function given in (6) during 20 epochs.

[21] Figure 3 plots the bias and standard deviation of the retrieval error, including the characterization of the error following (2). As expected, the linear OEM inversions give the largest retrieval error, both in bias and standard deviation, while the agreement between the error characterization and the nonlinear OEM and NN retrieval error is excellent. This shows that for well characterized inversions, both the NN and OEM have similar accuracy. This also proves that the reduction technique is able to extract the relevant features from the spectral space, as the OEM inversions did not have any reduction technique and both inversions are nearly identical. The main reason to develop the NN technique was to improve processing time, and the NN inversions, including generation of training sets, training and retrieval, were approximately 3 time faster than the OEM inversions, both inversions computed on the same machine. The advantage of the NN is that, for another 100 similar inversions, the total computational burden would be nearly the same, while for OEM twice the computed time would be required.

Figure 3.

Inversion of spectra with only thermal noise as random error. The O3 retrieval error characterized by the mean (bias) and standard deviation (std) of the difference between true and retrieved profiles are shown. Linear inversions done by OEM are plotted as dash-dotted curves, nonlinear OEM inversions are plotted as dotted curves, and those done by the NN technique are plotted as dashed curves. The linear characterization of the error (2) is also plotted as solid curves.

5.2. Inversion with Temperature Uncertainty

[22] Next we tested the NN scheme assuming an uncertainty in the knowledge of the temperature. Training and inverting sets were prepared with randomly disturbed temperature profiles as described in Section 4, setting σ values to 1, 3 and 5 K at all altitudes. The retrieval errors are plotted in Figure 4 and compared with the MLP inversions of spectra with only thermal noise. As expected, the retrieval error becomes larger when increasing the temperature variability, as the temperature realizations deviate larger from the a priori temperature profile. Clearly the MLPs face a more complicated regression due to the temperature variability incorporated in the training sets, and the retrieval errors become larger.

Figure 4.

Inversion of spectra with thermal noise and temperature as random errors. The O3 retrieval error for the NN technique with different temperature uncertainty is shown. The error for an uncertainty of 1 K is plotted as dash-dotted curves, 3 K is plotted as dotted curves, and 5 K is plotted as dashed curves. The error with only thermal noise (Figure 3) is also plotted as solid curves.

5.3. Inversion with Pointing Uncertainty

[23] Remaining was to test the NN inversions with a pointing uncertainty added. The pointing uncertainty was simulated by adding for each scan a constant altitude offset to all the tangent heights of the spectra. Uncertainties were added to both training and inversion sets. For reasons discussed below, the uncertainty distribution was assumed uniform and symmetric around the nominal tangent heights. Inversions for maximum offsets of 180, 375 and 750 m were done and compared again with the inversions of spectra with only thermal noise. The results are plotted in Figure 5, and the same considerations as for the temperature uncertainty apply here.

Figure 5.

Inversion of spectra with thermal noise and tangent height as random errors. The O3 retrieval error for the NN technique with different tangent height uncertainty is shown. The error for an uncertainty of 180 m is plotted as dash-dotted curves, 375 m is plotted as dotted curves, and 750 m is plotted as dashed curves. The error with only thermal noise is also plotted as solid curves.

5.4. Inversion with Temperature and Pointing Uncertainty

[24] Now both temperature and pointing uncertainties were added to the training and inverting sets. Expected uncertainty for the temperature (using ECWMF temperature profiles as a priori) were set to σ values of 1 K (up to 35 km) and 2 K (above 35 km). For the pointing, a σ value of 100 m (Gaussian distribution) was estimated. The result for these inversions can be seen in Figure 6. As for the case with only thermal noise as random uncertainty, the regression between spectra and profiles gave the same retrieval error as the OEM inversions, proving that the NN technique is able to learn a mapping between spectra and profiles even if the mapping is degraded by the presence of random uncertainties.

Figure 6.

Inversion of spectra with thermal noise, temperature and tangent height as random errors. The O3 retrieval error for the NN technique and OEM with a temperature uncertainty of 1 K and tangent height uncertainty of 100 m is shown. The OEM inversions are plotted as dotted curves, and those done by the NN technique are plotted as dashed curves. The linear characterization of the error (2) is also plotted as solid curves.

[25] The problem is that, to be able to characterize the observations with those given uncertainties, the inversion method is supposed to have incorporated the best a priori information available prior to the inversion, for instance, European Centre for Medium-Range Weather Forecasts (ECMWF) temperature profiles and reconstructed tangent heights. To incorporate this last-minute information is not as trivial for the NN inverting scheme as for OEM. For instance, the last-minute information cannot be incorporated into the present NN scheme.

[26] A first approximation to the problem will be to train the NNs assuming a very large set of possible random realizations, that is, a larger uncertainty, but at the cost of a higher retrieval error, as the following inversions proved. Now the temperature uncertainty was set to 5 K and the pointing set to 750 m, and the values adopted are based on the following. In the 544 GHz band, spectra are scanned every 1.5 km in tangent height, but the exact altitudes can change from scan to scan. When preparing a training set, a nominal tangent height is required to specify the Kx matrix. The nominal tangent heights will be a set of fixed altitudes separated by 1.5 km, covering an altitude range where the presence of spectra obtained every 1.5 km is guaranteed. Then any measured spectrum from the scan should lie within ±750 m from one of the nominal tangent heights. The pointing uncertainty to prepare a training set should then correspond to a uniform distribution symetrically distributed around the nominal tangent heights, with a maximum offset of 750 m. As to temperature, an upper limit of 5 K as σ (Gaussian distribution) was judged to be large enough to cover temperature deviations from the mean a priori profile. The results of the inversion by the NN technique are plotted in Figure 7. For comparison purposes, the OEM retrieval error from the previous inversion (Figure 6) is also plotted, as well as the linear OEM inversion of the same set. The NN technique clearly beats OEM linear inversions and it does a relatively good job, especially considering the large uncertainties assumed, but it cannot match the performance of the OEM nonlinear inversions in error, as expected.

Figure 7.

Inversion of spectra with thermal noise, temperature and random errors. For the OEM retrieval, temperature uncertainty was 1 K and tangent height uncertainty 100 m, while for the NN retrieval it was 5 K and 750 m. The linear OEM retrieval error is plotted as dotted curves, the nonlinear OEM error is plotted as solid curves, and the NN error is plotted as dashed curves.

[27] It is clear that to make the NN technique fully operational, more elaborate schemes are needed where the NNs also incorporate last-minute a priori information, as the OEM does. This can be done, for instance, by adding temperature and tangent height channels to the MLPs next to the spectral channels, so there is a way of incorporating the new information to bias the regression. Further work in this direction needs to be conducted.

6. Summary and Conclusions

[28] This paper describes a neural network technique to derive atmospheric species profiles from Odin-SMR limb sounding data. The technique is intended to be used for nonlinear inversions, where the traditional iterative approaches are computationally very demanding. Special attention is given to the implementation of a reduction technique able to extract relevant features from the spectral space. Considerations are also given to the problem of handling the main observational random uncertainties.

[29] The study was done by inverting the 544 GHz Odin-SMR spectral band, retrieving O3 by the NN technique, and comparing the results with the inversions of similar sets of spectra by OEM. The NN technique was implemented by MLPs with 20 inputs, two neurons in the hidden layer and one output neuron, one for each retrieval altitude. They were trained by weight decay regularization with the regularization parameter optimized by Bayesian techniques. An essential part of the technique was the strong reduction of the dimensionality of the spectral space by a reduction technique that derive the eigenvectors of the space from the weighting functions of the observation.

[30] When the only random uncertainty was the thermal noise, both OEM and the NN technique performed nearly identical inversions, with the advantage for the NN technique of being very fast. This was also the case when the other random uncertainties considered here, temperature and pointing, were incorporated into the training set, proving that the NNs can also do satisfactory regressions between spectra and profiles when the mapping is degraded by the random uncertainties.

[31] In order to extend the NN technique into an operational phase, the problem of incorporating last-minute a priori information to avoid training with large uncertainties (and the subsequent degradation in retrieval performance) needs to be addressed. OEM can do it easily by setting a corresponding new a priori first guess, while the present NN algorithm cannot incorporate the last-minute a priori information. Thus, although the NN technique clearly beats OEM in terms of computational burden, further work is needed before the NN technique can compete with the operational Odin-SMR inversions in terms of retrieval error.


[32] This work was partially supported by the Swedish National Space Board.