Radio Science

Inverting ionospheric radio occultation measurements using maximum entropy



[1] Practical aspects of the inversion of ionospheric radio occultation data using the Abel transform and its inverse are discussed. The linear inverse transform exhibits poor error propagation characteristics, producing significant artifacts preferentially at low altitudes where they might easily be mistaken for intermediate or sporadic layers in the ionosphere. Tikhonov regularization, which can be viewed as fixed linear filtering, reduces the artifacts at the expense of discarding fine structure in the profiles. Improved results are obtained using maximum entropy and Bayesian statistics. The maximum entropy algorithm can be viewed as a nonlinear adaptive filter which suppresses artifacts while preserving fine structure to the degree the data can support. Other advantages of and avenues for improving the basic maximum entropy algorithm are discussed.

1. Introduction

[2] Radio occultation measurements of the Earth's atmosphere and ionosphere using global positioning satellites and receivers placed on satellites in low Earth orbit (LEO) are an important new source of remote sensing information, supplementing ground-based and in situ measurements and serving as sources for assimilative models. Developed for planetary atmospheric research [e.g., Phinney and Anderson, 1968; Fjeldbo et al., 1971], the radio occultation technique involves inferring gradients in the index of refraction from differential phase measurements and the implied optical path length between transmitting and receiving satellites. The index of refraction in the Earth's atmosphere is a function of temperature, pressure, water and water vapor content, and electron density, where the last factor has by far the most significant effect. Inferring total electron content (TEC) on the ray path between the satellites is then a relatively straightforward task. Reviews of different analysis techniques have been given by Kursinski et al. [1997] and Schreiner et al. [1999].

[3] In order to convert the TEC measurements to plasma density profiles, some kind of data inversion must take place. One conventional means of inversion is the inverse Abel transform, which applies in situations where spherical symmetry holds. The Abel transform and its inverse is nothing more than the transformation from a differential ray path length coordinate dl to a differential radial distance coordinate dr. The transformation is such as to suppress features at low altitudes in the occultation profiles. Upon inverse transformation, these features are reamplified, but so too is the experimental uncertainty accompanying the measurements. This effect causes the inverted profiles to oscillate “noisily” at low altitudes, producing artifacts that could easily be mistaken for layers.

[4] The purpose of this paper is to investigate the pathology just described and to explore means of mitigation. Two conceptually related methodologies, Tikhonov regularization and maximum entropy, will be described and compared. While both offer substantial improvement over straight linear inverse transformation, the latter will be seen to outperform the former in its ability to retain fine structure while suppressing artifacts. Some practical issues regarding maximum entropy implementation in the occultation problem and some potential approaches to improving the method are then discussed.

2. Abel Transform and Its Inverse

[5] We are interested in the specific problem of recovering altitude profiles of ionospheric electric density from radio occultation observations made with satellite pairs. We assume spherical symmetry so that the electron density is a function of altitude r alone. We further neglect ray bending and birefringence in this analysis and consider only the effect that the ionospheric plasma has on straight-line propagation through variations in the index of refraction n:

equation image

where ne is the electron number density in MKS units and f is the radio frequency in Hertz, which is taken to be well above the critical frequency in the ionosphere. Because of the dispersion, signals received at different frequencies will have different optical path lengths and arrive out of phase. For a given propagation path with closest approach (to the center of the Earth) s, the phase difference will be related to the electron density profile through Abel's transform (see Figure 1):

equation image

where it is assumed that both the transmission and reception points are above the altitudes where significant electron density exists. (This will not generally be true for LEO satellites, but a modified methodology exists in that case that uses positive and negative elevation rays to resolve the problem [Healy et al., 2002].) Here, the constant C for the two frequencies f1 and f2 is 2π(f1f2)/(cf1f2) (i.e., not the speed of light). As the satellites move, they sweep out the continuous curves ϕ(s). The task is to recover the electron density profile ne(r) by inverting (1).

Figure 1.

Diagram illustrating the Abel transform, which arises purely from the Pythagorean relationship between the radial distance r, the distance of closest approach s, and the coordinate along the ray path x such that dx = rdr/equation image.

[6] In the event that ne(r) decreases more rapidly than r−1 at the upper bound, the inverse of (1) exists:

equation image

This transformation has the added benefit of acting on the derivative of the phase term, sidestepping ambiguities associated with absolute phase measurement. It is a straightforward matter to evaluate (2) using standard numerical integration techniques. We will see that the problem is well posed and that (2) implies no systematic distortion of the desired profile. However, (2) possesses a certain pathology that makes it poorly suited for ionospheric investigation. That pathology and a remedy are described below.

3. Discrete Inversion

[7] Although not strictly necessary, it is expedient to consider the discrete versions of (1) and (2). The electron density and phase difference profiles can be discretized spatially and regarded as column vectors. We designate the phases with the vector d for data and the electron densities with the vector m for model. Let d be specified at n points and m at m points, where the integers n and m need not be equal. The integrals can then be carried out using summation rules. A simple trapezoidal integration scheme will be used here, although higher order schemes are certainly possible. Quadrature schemes can be used in the event that ne and or ϕ can be specified at arbitrary locations, although that is not expected to be the case for radio occultation problems. The derivative in (2) can be evaluated with backward differencing. Finally, the forward and inverse transforms assume the form

equation image
equation image

where G is the n × m linear transformation implied by the integral in (1) and G is its m×n generalized inverse. The caret in (4) denotes an estimate with accuracy depending on the qualities of G; note that G need not and will not in general be G−1. The way forward is the subject of inverse theory, which is described in a number of standard texts [see, e.g., Aster et al., 2005; Menke, 1984; Tarantola, 1987]. Using the Moore-Penrose generalized inverse and given

equation image
equation image

where Up, Sp, and Vp are matrices found through the singular value decomposition of G, truncated to dimensions limited by the rank p of G. Here, Up and Vp are orthogonal matrices (size n × p and m × p, respectively) spanning the data and model ranges, respectively, and Sp is the p × p diagonal matrix composed of positive definite singular values.

[8] It is illustrative to explore briefly the outcomes in two special cases: the overdetermined case, where p = m < n, and the underdetermined case, where p = n < m. The former case, we have VpVpt = I, giving

equation image

which is the classical least-squares estimate assuming uniform, independent errors. It is unique but inexact, as the model in general will be unable to completely reproduce the data. In the underdetermined case meanwhile, UpUpt = I and

equation image

Now, the solution is exact in that the data can be completely reproduced from the model estimate. While there is more than one model vector that solves the forward problem in the underdetermined case, the generalized inverse solution is that one corresponding to the model solution with the minimum L2 norm.

[9] For the purposes of this paper, however, it will suffice to consider the even determined case in which G is full rank and p = m = n. Then the singular values are all nonzero, VpVpt = UpUpt = I, and

equation image

The inverse is unique and exact, since equation image = GG m = m.

[10] The accuracy of the model estimate in the presence of noise depends on the stability of the inversion and is the issue motivating this analysis. The structure of (7) gives an indication. Here, the inverse operator can be seen to decompose the data vector into its component singular vectors and to scale each one by the corresponding inverse singular value. Ill-conditioned problems are signaled by ratios of the smallest-to-largest singular values competing with the floating point precision of the computer, since roundoff error prohibits accurate retention of all the singular modes, leading to instability. For the Abel inversion, this ratio is just comparable to n. Even so, this suggests that error amplification will be substantially different for different kinds of features in the data. This phenomenon can be explored further through an example.

[11] Figure 2 illustrates the performance of the Abel transform inversion by applying the linear transformation in (4) to simulated data. For these simulations, n = m = 60. Figure 2a shows the initial electron density profile, ne(r), discretized on a grid with unit spacing. The electron density here is dimensionless and normalized such that the area under the curve is also unity. The profile is composed of three Maxwellian components and is intended to vaguely resemble the ionosphere.

Figure 2.

Example occultation profile inversion comparing different methods. (a) Simulated “true” electron density profile ne(r). (b) Phase profile ϕ(s) calculated using (3). Simulated phase data with (solid) and without (dashed) noise are shown. (c) Direct linear inversion of the simulated data using (4). Models with (solid) and without (dashed) noise are shown. (d) Singular value analysis of the linear transformation (see text). (e) Linear inversion using Tikhonov regularization. The dashed line is the true model. (f) Results obtained using maximum entropy. The dashed line is the true model.

[12] Figure 2b shows simulated occultation data ϕ(s) computed using the transformation in (3). For these calculations, we take the constant C in (1) and (2) to be unity for simplicity. Two curves are shown in Figure 2b, one without added noise, and one with noise added. The error bars reflect the amplitude of the noise, which is independent Gaussian random noise with a standard deviation σ = 0.04. To our knowledge, there is nothing that prevents correlated measurement errors in radio occultation experiments (see below). However, the results here are easily generalized for the case of nonuniform and correlated errors.

[13] Figure 2c shows the results of model inversion using the linear transformation in (4). Results for the noisy and noise-free cases are shown with solid and dashed lines, respectively. The noise-free inversion is indistinguishable from the original density profile. Error bars for the noisy case were computed using the standard error propagation formula:

equation image

where Cd is the (diagonal) covariance matrix of the simulated data and Cm is the covariance matrix of the inverted model of ne(r). The error bars in Figure 2c reflect the diagonal components of Cm. While Cm is diagonally dominant, significant values exist in the bands near the main diagonal, indicating the tendency for errors in neighboring model bins to be moderately correlated even if the errors in the data are not.

[14] It was pointed out above that the noise-free model inversion is indistinguishable from the initial model. Likewise, both the noisy and noise-free inversions reproduce the simulated data when propagated back through the forward model (3). The linear transformation corresponding to (4) introduces no systematic distortion. This can be illustrated in practice by computing the model resolution matrix,

equation image

which should approach the identity matrix for an unbiased estimator. The matrix was calculated for this example (not shown) and is equivalent to the identity matrix to within the numerical accuracy of the single precision floating point calculations done here.

[15] The problem with the inverse method adopted so far is error propagation. The model inversion in Figure 2c exhibits large fluctuations and wide confidence intervals despite the modest amount of noise added to the simulated data. The problem grows with decreasing range to the point of introducing artificial layers and negative-going excursions. The negative excursions and layers that might easily be confused with intermediate or sporadic E layers in nature are unacceptable artifacts and render direct linear transformation suboptimal.

[16] The cause of the problem is illustrated by Figure 2d. The histograms represent the singular values of G, ranked top to bottom from largest to smallest. The condition number of the inverse transform can be calculated from the ratio of the largest to smallest of the singular values. That ratio is approximately n in this problem. While that is not large enough to give rise to numerical instability, it has serious consequences for error amplification. Also shown in Figure 2d are the singular vectors with the two largest and two smallest singular values. Large (small) singular values correspond to unstructured (highly structured) singular vectors with most of their content at large (small) ranges. Features that resemble the structured singular vectors in the noisy data will be amplified much more than features that resemble the unstructured ones. This explains the preponderance of noise amplification and ragged oscillations in the model inversion at low ranges. The behavior is an inherent feature of the inverse Abel transform.

[17] Two conventional strategies exist for mitigating the oscillatory behavior of the linear inversion. The first is to truncate the singular value matrix, regarding as zero those values that fall below some threshold. The corresponding, problematic singular vectors and associated noise amplification behavior are thus eliminated. A less drastic approach is to filter the singular values, modifying them artificially so that they do not assume too low values. Such so-called damped SVD methods include the method of Tikhonov regularization. Figure 2e shows a regularized version of the model inversion with error bars recalculated according to (8). The forward and inverse models in (5) were recalculated after dividing the singular values λi by the filter coefficients

equation image

where α is the regularization parameter. This is equivalent to finding the model that minimizes a weighted combination of the L2 norms of the noisy data prediction error Gequation image-d and the model itself equation image. The regularization parameter was set to 10 in this example in an attempt to reduce artifacts while retaining fine structure. While the results are superior to the unregularized model in terms of having smaller uncertainties and fewer sporadic oscillations, the profile in Figure 2e is clearly distorted, the peaks being rounded and the valley being filled in. Note that while a non-negativity constraint can be incorporated into regularization, we have not done so here.

[18] Accompanying the linear algebra shown above is a statistical interpretation of inverse theory [see, e.g., Bevington, 1969]. In the presence of Gaussian random noise, the generalized inverse gives the model with the maximum probability or likelihood of occurrence. The solution to the maximum likelihood problem, which is just the linear least-squares problem, is not unique in the underdetermined case, however. There, the generalized inverse gives the model which also minimizes a linear combination of the chi-squared error and the L2 model norm. This can be viewed as maximizing a modified probability which is influenced by a preference for regular models. The efficacy of regularization is also clear in the overdetermined and even determined cases, and modifying the probability metric can be an expedient way to mitigate noise amplification and suppress artifacts. This was demonstrated by the analysis in Figure 2, which exemplifies both the benefits and shortcomings of Tikhonov regularization. Improving the Abel inversion further motivates an investigation of the explicit formulation of the probability problem and of the optimum choice of probability metric for this problem.

4. Maximum Entropy

[19] The solution to the least squares problem may or may not be unique, depending on whether the problem is underdetermined. In the absence of other information, the solution is the maximum likelihood solution. However, there is a class of solutions with chi-squared greater than the minimum but comparable to its expectation. All of these would appear to be reasonable candidate solutions, and additional information is required to select from among them. Preference for regular solutions is a kind of prior information, and regularization implicitly assigns a prior probability associated with one or another model norm to arrive intentionally at something other than the least squares solution.

[20] Another candidate for the prior probability of a model is its entropy, a combinatorial measure from which partition functions in statistical mechanics are constructed. Entropy is also a measure of information content [Shannon and Weaver, 1949]. A model high in entropy has a high probability of occurrence, all other things being equal, and represents minimal independent information, making it minimally committal to unmeasured data. A preference for high-entropy models suppresses artifacts, since every feature in the model must be supported by the data.

[21] Entropy can be derived by considering the number of ways that a given model, which must be nonnegative, could be assembled from quanta falling randomly and indistinguishably into different bins. The greater the number, the more generic the model. If there are mi quanta in the ith of m bins totaling M = equation imagetm, where equation image is a column vector of 1's, the number of ways the model could be assembled is M!/Πi=1mmi!. With the application of Stirling's approximation, maximizing this number can be seen to maximize the entropy of the model [e.g., Jaynes, 1979]

equation image

where the logarithm may be of any base but will be understood to be the natural log in what follows.

[22] Merits of the entropy prior for inverting radio occultation data include that it admits only positive definite models. While it prefers uniform models and therefore smooth or regular ones, it is insensitive to the arrangement of the model bins and does not rule out steep gradients and thin layers, so long as the data can support them. Rationales for using entropy as prior information in this and other applications have been advanced by Ables [1974], Jaynes [1979, 1982, 1985], Skilling [1991], Daniell [1991], Gilmore [1996], and Molina et al. [2001], among others.

[23] Entropy can be incorporated into the inverse problem using Bayes theorem which relates the conditional probabilities of two propositions:

equation image

Here, m and d can be regarded as the model and the data, respectively. The marginal or prior probability p(m) encapsulates information about the model in advance of any measurements. Assuming normally distributed measurement errors, the transitional probability, p(dm) is the joint normal multivariate probability density for the prediction error dGm computed from the data and the forward model (3). Successive application of Bayes theorem even permits assimilation of new data as they arrive. The inverse solution is the model that maximizes the posterior probability p(md). The model or map is found implicitly rather than explicitly, and pathologies associated with (4) are thus avoided.

[24] In the maximum entropy problem, the prior probability is made to be a function of the entropy (usually eS). This step can be somewhat controversial, as entropy lies outside the narrow, frequentist definition of probability, having more to do with the prior state of knowledge and the plausibility of the model. As Gilmore [1996] points out, frequencies are difficult to obtain in scientific measurements, particularly exploratory ones like those of interest here. However, Jaynes [1979] and others have argued that Bayes theorem applies equally well to theoretical probabilities based on information theory as to frequencies. Numerical simulations offer a means of testing the argument.

[25] A slightly different formulation of the maximum entropy problem comes from the application of variational calculus. This particular algorithm is due to Wilczek and Drapatz [1985] and is a generalization of the method of Gull and Daniell [1978] designed for image reconstruction in radio astronomy. Consider maximizing (11) subject to the constraints M = equation imagetm and d = Gm. The solution is the extremum of the functional

equation image

where λ (a column vector) and L are Lagrange multipliers used to enforce the constraints. Maximizing (13) through differentiation with respect to the elements of m yields a model parameterized by λ with the elements

equation image
equation image

where G[,i] denotes the ith column of G. Note that the partition function Z enforces the correct normalization. The model with the maximum entropy will have this form, although the λ remain to be determined.

[26] Using this parameterization, the complete maximum entropy problem incorporating the error covariance is posed as the maximization of the the following functional

equation image

Here, e is a column of measurement errors to account for the difference between the model prediction Gm and the data d, Cd is the full data error covariance matrix, Σ is the expectation of χ2, and Λ is one more Lagrange multiplier meant to constrain the measurement error size. Substituting the model parameterization in (14) affords some simplification.

equation image

The functional is maximized through differentiation with respect to the Lagrange multipliers and the errors. Maximizing (16) with respect to the Lagrange multipliers λ yields m algebraic equations:

equation image

which merely restates the forward problem. Maximizing with respect to the error terms e yields equations relating them to λ:

equation image

Maximizing with respect to Λ yields one more equation relating that term to the others.

equation image

Equations (18) and (19) can be substituted into (17), yielding a system of m coupled nonlinear equations for m unknowns λ. These can be solved numerically using an iterative method such as Powell's hybrid method [Powell, 1970]. The Jacobian matrix for (17) is readily derived and can be used to optimize numerical performance and stability. The resulting vector λ specifies the maximum entropy model through (14).

[27] Results of the maximum entropy analysis for the Abel inversion problem are shown in Figure 2f. The inversion appears to produce an unbiased reproduction of the original density profile, even in the vicinity of sharp peaks and steep gradients. The recovered density approaches but never crosses zero at the layer boundaries. The overall behavior is superior to the Tikhonov regularization solution for any choice of regularization parameter α. Evidently, the noisy data contained sufficient information to specify the presence of steep peaks and shallow valleys in the profile. The constraint that the model be positive definite and in some sense smooth was sufficient to suppress spurious layers and other artifacts without any artificial truncation of the model space. Similar results are obtained for different profile shapes, noise levels, and model sizes.

[28] Error propagation in the maximum entropy problem can be treated as follows [see, e.g., Silver et al., 1990]: Recall the form of the posterior probability

equation image
equation image

where the entropy has been used for the prior probability and the chi-squared prediction error for the transitional probability. The Γ factor is a constant which weights the importance of two probabilities and, like the regularization parameter α, must be adjusted according to some criteria. (In the variational approach to maximum entropy above, the Lagrange multiplier Λ plays the role of Γ. That variable is controlled by Σ, however, and so there is always one free parameter to adjust.)

[29] Consider small departures δm about the maximum probability (minimum E) solution. In the neighborhood of a maximum, the gradient of the argument E vanishes, and we can expand

equation image

with H the Hessian matrix. Now the form of (22) is that of a probability density function for normally distributed model errors δm which is being maximized through the minimization of E. We therefore identify the Hessian matrix with the inverse model covariance matrix. Taking the necessary derivatives yields

equation image

Error bars representing the model variances calculated according to (23) are also shown in Figure 2f. The error bars and associated model fluctuations are now essentially uniform with altitude. The Γ term in (18) represents the influence the data have on the final maximum entropy model. The greater the influence, the smaller the uncertainties. The last term in (23) comes about from the entropy prior and ensures that the model variances become very small where the model values themselves are small. This is an important property in view of the importance of suppressing spurious layers and other artifacts.

5. Discussion

[30] The purpose of this paper is to illustrate some shortcomings of direct linear inversion of radio occultation data and to compare and contrast two common alternative approaches, Tikhonov regularization and maximum entropy. Both reduce the production of spurious layers and artifacts at low altitudes inherent in the structure of the inverse Abel transform. The former is a fixed linear filter that works by attenuating singular modes which tend to amplify noise. The danger with this approach is that it necessarily removes fine structure and distorts the model profile. The latter is an adaptive nonlinear filter which suppresses features in the model that are not supported in the data. Maximum entropy produces positive definite models which retain their fine structure and exhibit little distortion.

[31] The standard maximum entropy algorithm demonstrated here can be improved upon in a number of ways. For example, the definition of entropy can be modified so that the most probable model is something other than uniform [e.g., Jaynes, 1979]. A natural choice for the “default” profile in the occultation case is a Chapman profile or a profile from an empirical or physics-based model. One can also construct prior probability expressions that favor models with statistical correlations between spatially neighboring regions [e.g., Molina et al., 2001; Gull, 1989].

[32] The formulation of the maximum entropy problem above is sufficiently general to handle correlated measurement errors, and an investigation of the properties of the full covariance matrix in practical occultation experiments should be undertaken. The merits of accounting for correlated errors in maximum entropy inversions have been discussed at length by Hysell and Chau [2006]. A more sophisticated treatment of error propagation than given here can also be found in Hanson and Cunningham [1996].

[33] One of the deficiencies of the algorithm described above is the need to set the Σ or Γ parameter. While a natural choice for Σ would appear to be the expected value of chi-squared, there is considerable debate on the matter. Gull [1989] suggested setting the parameter according to the number of significant singular values in the forward model. The efficacy of this strategy is unclear, however, and the problem remains unresolved.

[34] Finally, it should be noted that maximum entropy is a so-called “super-resolution” method, capable of reproducing a greater degree of fine structure than a linear transformation even in the absence of noise. The resolution of the method depends strongly on data quality and the signal-to-noise ratio. An expression for the resolution of maximum entropy inversions has been derived by Kosarev [1989], but the ramifications for radio occultation experiments have yet to be realized.


[35] This work was supported by the Air Force Research Laboratory through award 03C0067 to Cornell University.