An empirical orthogonal function model of total electron content over China


  • Tian Mao,

    1. Institute of Geology and Geophysics, Chinese Academy of Sciences, Beijing, China
    2. National Center for Space Weather, China Meteorological Administration, Beijing, China
    3. Wuhan Institute of Physics and Mathematics, Chinese Academy of Sciences, Wuhan, China
    4. Graduate School of Chinese Academy of Sciences, Beijing, China
    Search for more papers by this author
  • Weixing Wan,

    1. Institute of Geology and Geophysics, Chinese Academy of Sciences, Beijing, China
    Search for more papers by this author
  • Xinan Yue,

    1. Institute of Geology and Geophysics, Chinese Academy of Sciences, Beijing, China
    2. Wuhan Institute of Physics and Mathematics, Chinese Academy of Sciences, Wuhan, China
    3. Graduate School of Chinese Academy of Sciences, Beijing, China
    Search for more papers by this author
  • Lingfeng Sun,

    1. Institute of Geology and Geophysics, Chinese Academy of Sciences, Beijing, China
    2. Wuhan Institute of Physics and Mathematics, Chinese Academy of Sciences, Wuhan, China
    3. Graduate School of Chinese Academy of Sciences, Beijing, China
    Search for more papers by this author
  • Biqiang Zhao,

    1. Institute of Geology and Geophysics, Chinese Academy of Sciences, Beijing, China
    Search for more papers by this author
  • Jianpeng Guo

    1. Institute of Geology and Geophysics, Chinese Academy of Sciences, Beijing, China
    2. Graduate School of Chinese Academy of Sciences, Beijing, China
    Search for more papers by this author


[1] In this paper a climatology model of total electron content (TEC) over China has been developed on the basis of the empirical orthogonal function (EOF) analysis using Global Positioning System (GPS) data from the International Global Navigation Satellite System Service (IGS) and Crust Movement Observation Network of China (CMONOC) covering almost the whole Chinese sector during 1996–2004. The model well represents observational data with mean bias of −0.00994 TECU (1 TECU = 1.0 × 1016 el· m−2) and standard deviation of 5.42 TECU. Then the EOF model and IRI have been used in three-dimensional variational (3DVAR) data assimilation experiments separately, and results reveal that the ability of assimilation nowcasting for the EOF model is better as it provides a more authentic background.

1. Introduction

[2] Over decades, great efforts have been made to model the ionospheric environment through which the radio wave is propagating, as realistically as possible. Many recent reviews have been published about ionospheric models [e.g., Bilitza, 2002]. Empirical models, which are established by statistical analysis of long records of measured data, are widely investigated since they have the advantage of representing the ionosphere through actual measurements.

[3] Among ionospheric characteristic parameters, total electron content (TEC) is a parameter of great interest for both applications like satellite navigation and orbit determination, or satellite altimetry and ionospheric scientific researches. Empirical modeling of TEC, as well as other parameters like the critical frequencies of the ionosphere, the ion composition or electron and ion temperatures, is an important part of ionospheric modeling.

[4] Early empirical models of TEC were constructed on the basis of TEC measured by Faraday rotation technique at a single site [Poulter and Hargreaves, 1981; Liu et al., 1992; Baruah et al., 1993; Jain et al., 1996; Chen et al., 2002; Unnikrishnan et al., 2002]. Later, regional TEC models for Europe use observations of difference Doppler measurement was built during two previous successful Actions COST 238 on PRIME (Prediction and Retrospective Ionospheric Modelling over Europe) and COST 251 on IITS (Improved Quality of Service in Ionospheric Telecommunication Systems Planning and Operation) [Bradley, 1999; Hanbaba, 1999]. Gulyaeva [1999] made a regional analytical model of ionospheric total electron content over the American continent using TEC Faraday rotation observations at four receiving sites around 75°W meridian. However, early regional empirical TEC models are limited by the sparse distributions of observational sites and the estimated vertical TECs are affected by a greater uncertainty over the place without observation.

[5] Since the past decade with a constellation of more than 24 satellites and an ever-growing network of ground receivers, the GPS system has become the major source of electron-content data. Many groups have represented global distributions of vertical total electron content (TEC) from GPS-based measurements [e.g., Wilson et al., 1995; Komjathy, 1997; Feltens, 1998; Mannucci et al., 1998; Iijima et al., 1999; Schaer, 1999; Hernández-Pajares et al., 1999; Orús et al., 2005]. Although there are already several global ionosphere maps (GIM), developing regional ionosphere map (RIM) [e.g., Skone, 1998; Ping et al., 2002; Otsuka et al., 2002; Gao and Liu, 2002; Wielgosz et al., 2003; Meggs et al., 2004; Stolle et al., 2005; Fuller-Rowell et al., 2006; Zapfe et al., 2006] is of great importance since its resolution is higher which is more applicable in studying local ionospheric characteristic. The GIM/RIM is called data-driven models which focus on providing reliable real-time TEC distribution.

[6] Although much research has been devoted on GIM/RIMs, little attention has been paid to developing a climatology model on the biases of GIM/RIMs. This paper will present a TEC climatology model that uses nine years GPS TEC over Chinese sector. The model can be used to investigation climatology feature of TEC. For using abundant observational data and empirical orthogonal function (EOF) analysis, the model may provide a more reliable spatial distribution than early TEC empirical models.

[7] The remainder of this paper is organized as follows. After an introduction to the data and RIM technique, the EOF analysis is introduced to analyze the monthly median maps of RIMs during 1996–2004. Then, an empirical model on the basis of the first three EOF modes is established. Furthermore, an example of 3DVAR assimilation is provided as an application of the model. Finally, a summary ends the paper.

2. Data and RIM Technique

[8] The GPS data presented here are from IGS and CMONOC GPS tracking network during 1996–2004. The general approach of combining GPS pseudorange and phase measurements to extract slant TEC with satellite and receiver biases has already been described in the literature [Lanyi and Roth, 1988]. First of all, we briefly explain how to filter out instruction biases and get vertical grid TEC. In solar-geography reference frame, assuming the ionosphere as a single-layer-model [Wild, 1994], the slant TECs can be converted to vertical TECs on the ionospheric pierce points (IPPs) as follow:

equation image


equation image

where θ is the geocentric latitude of the intersection point of the line receiver–satellite with the ionospheric layer; λ = ϕ − ϕ0 is the sun-fixed longitude of the ionospheric pierce point or sub-ionospheric point, i.e., the difference between the Earth-fixed longitude ϕ and the longitude of the Sun ϕ0. Here the Sun-fixed reference frame is referred to the “mean” Sun, so geographic longitude of the Sun may be written as λ0 = πUT, where UT is the Universal Time (in radians). M(E) denotes the single-layer mapping function; BS and BR is the instruction biases of satellites and receivers respectively; E, z are the (geocentric) zenith distances at the height of the station and the single-layer, respectively; RE = 6371 km is the mean radius of the Earth, and hs is the height of the single-layer above the Earth's mean surface [Schaer, 1999]. Then, by using least square fit and nearest neighbor interpolation method slant TEC observations are converted to vertical TEC data of 0.5° × 0.5° grid; at the same time instrument biases are filtered out. Considering 2-hour data, GPS observations cover almost the whole Chinese sector. Sample of 2-hour distribution of ionospheric pierce points (IPPs), which are intersections of the lines of sight from ground stations to the satellites and a mean ionospheric heights of 430km, on the 100th day, 2006, is shown in Figure 1a. And the corresponding vertical TEC values of 0.5° × 0.5° grid are illustrated in Figure 1b.

Figure 1.

Examples of data coverage plots and Regional Ionosphere Maps at 0500 UT on the 100th day, 2002. The top left panel shows 430 km intercept points for GPS TEC. The top right panel shows the TEC value of vertical grids. The lower left panel is a part of JPL GIM, and the lower right panel is Kriging RIM.

[9] Kriging is an estimation and interpolation method applied in geostatistics, which uses known sample values and a variogram to determine the unknown values at different locations/times. It utilizes the spatial and temporal correlation properties of the underlying phenomenon, and incorporates the measures of the error and uncertainty when determining the estimates [Stanislawska and Cander, 1999; Stanislawska et al., 2000; Wielgosz et al., 2003; Orús et al., 2005]. At each location, Kriging produces an estimate and a confidence bound on the estimate, i.e., the Kriging variance. The uncertainty maps associated with Kriging appear to naturally account for insufficient sampling. Blanch [2002] demonstrated that Kriging could successfully mitigate the undersampled problem due to sparse data points. As an example, a RIM interpolated by Kriging method is presented in Figure 1d. The area studied is separated by 1° × 2° from geographical latitude 5°N to 55°N and from geographical longitude 70°E to 140°E. The time interval between each vertical TEC map is 2 hour. JPL GIM developed by the Jet Propulsion Laboratory (JPL) is used for compare, see Figure 1c. The general magnitude of the two maps is in a good agreement, yet there is some discrepancy. As displayed in Figure 1c, JPL GIM is smooth at northern crest area of equatorial ionization anomaly in the East Asian region because it focuses on the global scale distribution of TEC, and is limited by sparse observational stations of Chinese sector. Our map shows its advantage in grasping local structure of the TEC variation. For example, the TEC gradient along the meridional direction is more close to reality. However, the estimations of our RIM are not very convincible in oceanic area because of sparse observation.

3. Empirical Orthogonal Function Analysis

[10] In the present work, data during the period of 1996–2004 are used to construct the model. The database consists of monthly median TEC RIMs at certain universal time and month number. To be simplified, the database can be represented as a 2-D matrix T(m, x), where m = 1, 2, … 108, is the number of month, which starts from January 1996 and ends on December 2004, and x = 1, 2, …, I (I = 35 × 60 × 12), is a serial varying in a certain order of longitude, latitude and universal time.

[11] Generally, it is possible to represent any 2-D data set (e.g., data in space and time) as a time mean plus the sum of orthogonal function of space multiplied by time-varying coefficients. Therefore, T(m, x), can be represented as

equation image

where equation image(x) is the mean value of monthly median TEC. Ei is the empirical orthogonal function, Ai is the associated coefficient. Ai is a function of month that represents the long-term variability. Ei varies with geography latitude, longitude and universal time. In principle, E can be any orthogonal set of functions. However, EOF analysis provides an algorithm for finding the set which minimizes the RMS error for a given number of terms M ≪ 108.

[12] EOF analysis, also known as Principal Component analysis, was originally introduced into meteorology as a method for extracting the dominant modes of spatial variability in meteorological fields. EOF analysis has been used extensively to represent meteorological and climatology data since the 1950s [see Storch and Zwiers, 2002, and reference therein]. It has also been used for empirical ionospheric modeling. Daniell et al. [1995] applied EOFs to present the altitude profiles of ion concentration in their parameterized model of the ionosphere.

[13] EOF analysis is the decomposition of the data set on a base of orthonormal functions which are directly determined by the data set itself (thence the name empirical) [Daniell et al., 1995; Baldacci et al., 2001; Xu and Kamide, 2004; Mao et al., 2005; Zhao et al., 2005] which cannot be represented as analytic functions. The main idea of using EOF is to suggest a linear transformation of the original data set, producing a new set of orthogonal functions. In many cases, a large fraction of the degrees of freedom of the original data set can be eliminated as unimportant, while retaining the majority of the information contained in the original data set. The decomposition is useful to reduce the dimensionality of the data set and to analyze its spatial and temporal variability.

[14] The treatment of empirical orthogonal functions (EOF) is based on work by Lorenz [1956], Kutzbach [1967], Davis [1976] and Peixota and Oort [1991]. See also Daniell et al. [1995]. The reader is referred to these references for mathematical proofs of the assertions made below. We summarize the algorithm here.

[15] First define the I × I covariance matrix C with elements

equation image

[16] Now consider the eigenvalue/eignenvector problem Cϕ = ϕL or

equation image

where ϕ = {ϕij} is the matrix of eigenvector of C = {Cij}, and L = {δijλj} is a diagonal matrix whose element are the corresponding eigenvalues. (The kth column of ϕ is the eigenvector corresponding to the kth eigenvalue, λk) By convention, the eigenvector and eigenvalue are ordered so that λ1 > λ2 > … > λI. Because C is a real symmetric matrix, eigenvector corresponding to unique eigenvalues are guaranteed to be orthogonal. Because of the origin of the matrix C, it is unlikely that any of its eigenvalues will be degenerate, so we may assume that ϕ is an orthogonal set. According to Secan and Tascione [1984, and references therein], the set of orthogonal functions that minimizes the RMS error for M terms is just the first M eigenvectors:

equation image

These are the EOFs. And the coefficients Ai(m) are calculated from

equation image

[17] Table 1 lists the percentage variance captured by the first eight EOF modes. As can be seen from Table 1, covariance contributions of the first three modes (A1 × E1, A2 × E2 and A3 × E3) are 97.43%, 0.89%, and 0.46%, respectively. Altogether, they are able to explain 98.79% of the data set total variance, leaving only 1.21% unexplained. This manifests one of the important advantages of the EOF analysis, in that only a few EOF components are required to represent most of the variability of the data set that allows us to sensibly reduce the dimensionality of the original data set. This implies the analysis of the spatial and temporal evolution of the phenomena under investigation.

Table 1. Summary of Variances Captured From the GPS TEC Data Set by the First Eight Empirical Orthogonal Functions (EOFs)
EOFVariances (%)Cumulative Variance (%)

[18] Since the first three EOF modes are able to explain more than 98% of GPS TEC data set variance, we only use the first three modes to reconstruct the whole picture of the original data set. Equation (3) is simplified to

equation image

[19] The linear correlation coefficient between reconstructed TEC (using the first three EOF modes) and original data set reaches 0.99. The error distribution is plotted in Figure 2a. The standard deviation between reconstructed TEC and original TEC is 3.74 TECU (1 TECU = 1.0 × 1016el· m−2), and the mean bias is −0.00994 TECU.

Figure 2.

The left panel is the histogram of errors between reconstructed TEC and observed TEC, and the right panel is the histogram of errors between modeled TEC and observed TEC.

[20] EOF analysis has many advantages, such as it is directly determined by the data set itself and it converge quickly. However, one disadvantage clearly is the dependence on the data set. New data point added possibly changes the model. Thus huge number of observations is needed to develop an authentic model.

[21] The first three EOF coefficients, which illustrate long-term variations, are shown in Figures 3b–3d. For comparison, 12 month running average of monthly mean solar 10.7 cm radio noise F10712 during 1996–2004 is plotted in Figure 3a. Mean TEC equation image and the space distribution of first three EOFs Ei at 0700 UT (sample of daytime) and 1900 UT (sample of nighttime) are shown in Figure 4. The patterns of A1 and F10712 are quite consistent, as the linear correlation coefficient between them reaches 0.904, and covariance contribution of the first component (A1 × E1) is 97.43%, which suggests that the dominant factor that controls TEC variability is solar activity. Moreover, A1 has an apparent semiannual variation which is characterized by two maxima appearing in March/April and September/October. In contrast, A2 presents a clear annual variation. The amplitude of A1 is larger than that of A2. This situation also appears in the variation of F2 layer peak electron concentration NmF2. Yu et al. [2004] using global ionosonde network with 104 stations derived the annual and semiannual component in the daytime NmF2. Their results reveal a very strong semiannual component and a weak annual component in the Far East Asian region. The mechanism why the semiannual component is stronger than annual component is that, as a global average, the thermosphere is more mixed at solstice than at equinox [Rishbeth, 1998]. E2 is positive in the daytime (Figure 4e) and negative at nighttime (Figure 3f). And value of A2 is larger in summer than that in winter. The second modes (A2 × E2) represents winter anomaly, which shows that the winter value is greater than the summer value in the daytime and is less than the summer value at nighttime [Zou et al., 2000].

Figure 3.

The panels from top to bottom are separately 12 month running average of monthly mean F107, the first three EOF coefficients. Solid lines are EOF coefficients obtained from EOF analysis, and dots represent the modeled coefficients.

Figure 4.

The spatial feature of mean TEC and the first three empirical orthogonal functions are listed from top to bottom. The left panels represent those of 0700 UT, and the right panels denote those of 1900 UT.

4. Construction of the EOF–Based Model

[22] Because the first three EOF coefficients manifest chiefly the solar cycle, annual and semiannual variations, respectively, we generally separate Ai (i = 1, 2, 3) into three parts,

equation image


equation image

[23] In equation (9), ɛ is error. The solar cycle variation equation imagei0 (m) is expressed as linear function of F10712 and the annual and semiannual variation equation imagei1 (m) and equation imagei2 (m) are expressed as modulated sinusoidal function with the period of one year (Y = 12) and half year(Y/2) respectively. The modulation of the sinusoidal function is also fitted as linear function of F107. Thus by a linear regression method, the coefficients in equation (10), ci0, ci0, … si1, si1… are first estimated and then used to determine equation imagei. Then equation imagei is further used to model the EOF coefficients and plotted in Figure 3 (dots). Comparing with the original EOF coefficients (solid line in the same figure), we find that the results are well coincident. The solar cycle, annual and semiannual variations of each order coefficients are plotted in Figure 5 respectively.

Figure 5.

The different components in the first three EOF coefficients. Circles represent the solar cycle variation, dashed lines represent the annual variation, and solid lines represent the semiannual variations.

[24] After substituting equation imagei for Ai in equation (9), we can model the TEC as follows.

equation image

[25] The model (it is called EOF model hereafter) can be used to predict the regional TEC maps by given month, UT and F10712. The approach described in this section may reduce the number of model coefficients and reproduce a reliable space distribution.

5. Validation

[26] Figure 2b shows the histograms of errors between the observed TEC and the modeled TEC. The standard deviation between original and modeled TEC is 5.42 TECU. The linear correlation coefficient of the modeled TEC and the observed TEC reaches 0.9714. Figure 6 illustrates a comparison of original (upper panel) and modeled (lower panel) monthly median TEC maps at 0100, 0700, 1300, 1900 UT on March, 2002. The model TEC maps well reproduced the observed RIMs in general morphology and primary phenomena like equator anomaly and sunset enhancement. Yet there are some differences between them which are mainly on the ocean. That is because of sparse observation on the ocean in model construction.

Figure 6.

Comparison of original and modeled monthly median TEC maps on March 2002. The top panels are original TEC maps, and the lower panels are calculated by EOF model. The panels from left to right are at 0100, 0700, 1300, and 1900 UT.

[27] To verify the quality of EOF model described above, we calculate monthly median TEC with EOF model and IRI. Sample values of modeled TEC (solid line) and IRI TEC (dashed line) in 2005 at Beijing (a mid-latitude station, 115.9°E, 39.6°N) and Hainan (a low-latitude station, 109.8°E, 19.0°N) are demonstrated in Figure 7. The observed TEC (circle) and deviations of model TEC and IRI TEC from observed TEC are also plotted in Figure 7 to illustrate the model ability in long-term prediction. The mean biases of predicted TEC are −0.098 TECU and −0.38 TECU at Beijing and Hainan, respectively. The standard deviations are 1.07 TECU and 3.78 TECU. In contrast, IRI tends to overestimate TEC [Liu et al., 1994] at both sites in 2005 and has a lower accuracy with mean values of 3.17 TECU and 11.42 TECU, and with standard deviations of 2.53 TECU and 5.07 TECU. Obviously, the accuracy of our regional TEC model is higher than that of the global IRI model. This is to be expected since our model is based on nine years of data from this region while none of these data were used in the development of IRI. Actually, only very few ionosonde data from the Chinese subcontinent were used for IRI [Wu et al., 1996; Liu et al., 2004].

Figure 7.

(a and b) (top) Comparison of diurnal variation of the monthly median TEC (circles) and corresponding predictions calculated by EOF model (solid lines) and IRI (dashed line) at Beijing station (115.9°E, 39.6°N) and Hainan station (109.8°E, 19.0°N) in 2005. (bottom) Errors between observed TECs and predicted TEC.

6. Further Application

[28] As we know, there is a growing need to accurately nowcast and forecast the ionosphere recently to correct the ionospheric error in satellite navigation and orbit determination and avoid the detriments of ionosphere weather on military and civilian systems. One of the most important nowcast products are global/regional ionosphere maps. However, the ionosphere map may be not accurate only by interpolation when the number of observation points is not adequate. Some researchers resolve this problem by assimilating observations into ionospheric model [Pi et al., 2003]. The assimilation technique incorporates available data, the associated data error covariances, a reasonable background specification, and the expected background error covariance into a coherent specification on a global grid. But calculated the background TEC by empirical electron density model such as IRI often has distinct differences from observations. This section we will give an example for the application of our constructed EOF model.

[29] Here we use 3DVAR assimilation technique to make regional ionosphere map over Chinese sector. The method 3DVAR is a statistical minimization method that seeks to minimize a cost function of data perturbations weighted by the data error covariance and the deviations of the model from the background weighted by the a priori background model error covariance. The calculation of 3Dvar can be referred to Bust et al. [2004]. To be simplified, at present the error of observation and background are assumed to be unbiased and proportional to the square of the observation and background respectively. The observation error is set to be independent and background error is considered to be Gaussian correlated with a correlation length of 5 degree in latitude and 10 degree in longitude.

[30] We take the GPS-TEC observations of 0700 UT on the 40th day, 2006 over Chinese sector as an example. Only observations of 4 selected stations are assimilated. These 4 stations locate with the interval of about 10 degree in latitude around east longitude 120 degree. Figure 8a shows GPS-TEC data coverage plots over Chinese sector for the 4 selected stations. To make a comparison, we also show all 25 observation stations of this region in Figure 8b. Figure 9 shows the TEC map obtained by empirical model and assimilation, respectively. The upper panel shows the TEC map calculated by IRI model and our EOF model. Compared with the observations (Figure 8b), the IRI model obviously overestimates the TEC. The lower panel shows the TEC map by assimilating the observations of 4 stations into different backgrounds separately calculated by IRI model and our EOF model. As expected, the TEC map using the background calculated by our EOF model is more close to the observations.

Figure 8.

GPS-TEC data coverage plots over Chinese sector for 4 and 25 sites, respectively. The colors of the dots represent the TEC values of vertical grids.

Figure 9.

TEC maps over Chinese sector calculated by (a) IRI model, (b) EOF model, (c) 3DVAR assimilation using IRI model to calculate the background, and (d) 3DVAR assimilation using EOF model to calculate the background, respectively.

7. Summary

[31] A climatology model of TEC has been developed on the basis of the EOF analysis using GPS tracking network over China during the period of 1996–2004. (Those interested in obtaining the EOF model should contact T. Mao or W. Wan.) The Kriging method applied to regional GPS data over China can produce more detailed maps of the regional ionosphere, as compared to the global GIMs. On the other hand, EOF analysis gets the inherent characters inside the data set and converges quickly what are useful in model making. Only the first 3 EOF modes represent more than 98% of GPS TEC data set variance. Good agreements have been found between the observed monthly median TEC and the predicted TEC using the EOF models. Moreover, the EOF model is a better background than IRI for nowcasting using 3DVAR assimilation over China.


[32] The codes of the IRI models are provided by the World Data Center-A. The F107 index is downloaded from the SPIDR web site The JPL GIM is downloaded from the web site This research is supported by the KIP Pilot Project (kzcx3-sw-144) of CAS, National Science Foundation of China (40636032), and National Important Basic Research Project (2006CB806306).