An empirical orthogonal function (EOF) analysis, combined with a regression analysis, is conducted to construct an empirical model for the ionospheric propagation factor M(3000)F2. First, a single station model is constructed using monthly median hourly values of M(3000)F2 data observed at Wuhan Ionospheric Observatory (geographic 114.4°E, 30.6°N; 45.2°dip) during the years of 1957–1991 for demonstrating the modeling technique based on EOF expansion and regression analysis of the EOF coefficients. The constructed climatological model incorporates the diurnal, seasonal as well as solar-cycle variations of M(3000)F2. A comparison between the observational results and the modeled ones showed good agreement. Then, an attempt is made to model global M(3000)F2 data using data from stations distributed around the world. Our preliminary result showed that the modeling technique based on EOF expansion, which is combined with regression analysis of the EOF coefficients as described in this paper, is very promising when used in global modeling and is worthwhile to investigate further.
 Empirical ionospheric models are important for both ionospheric research and applications. A large number of station-specific, regional, and global models based on data characterizing the ionosphere have been developed. An excellent review of many available empirical models has been presented by Bilitza [2002; and references therein]. Among those models, the International Reference Ionosphere (IRI) [Bilitza, 1990, 2001] model, which was developed and updated periodically by the Committee on Space Research (COSPAR) and the International Union of Radio Science (URSI), is the most widely used one.
 The ionospheric F2-layer, which is primarily responsible for the reflection of radio waves in high-frequency communication and broadcasting, has been studied for a long time. The M(3000)F2 parameter shows great significance in the investigation of the F2-layer. It is defined as
where foF2 is the F2 layer critical frequency and MUF(3000) is the maximum usable frequency at which a radio wave can be reflected and received at a horizontal distance of 3000 km. M(3000)F2 is important not only in practical applications such as frequency planning for radio-communication but also in ionospheric modeling. For instance, empirical ionospheric models, such as IRI [Bilitza, 1990, 2001] and NeQuick [Radicella and Zhang, 1995; Radicella and Leitinger, 2001; Leitinger et al., 2005], usually use the critical points such as F2, F1 and E layer peaks as anchor points, using parameters of foF2, hmF2, foF1, hmF1, foE and hmE (the critical frequencies and peak heights of the F2, F1 and E layers, respectively) as inputs. Usually, the value of hmF2, the F2 layer peak height, is calculated from M(3000)F2 based on their close relationship:
where ΔF is a correction term that accounts for the delay effect caused by ionizations in the E layer [Bradley and Dudeney, 1973; Bilitza et al., 1979]. The IRI model uses the CCIR coefficients to predict M(3000)F2 based on the 12-month running average sunspot number Rz12, then hmF2 is calculated based on this model M(3000)F2 using formula (2). However, recently some authors [Adeniyi et al., 2003; Obrou et al., 2003; Zhang et al., 2004, 2007] found that in the equatorial and low-latitude regions, the values of hmF2 calculated from the CCIR M(3000)F2 model have remarkable discrepancies with the observational hmF2 which is derived from manually edited traces of ionograms using ionogram inversion programs. Their studies also showed that the discrepancy stemmed from the CCIR M(3000)F2 model, because when the measured M(3000)F2 values are used as input, the IRI hmF2 model results showed a good agreement with the observational ones. It was found that the CCIR M(3000)F2 model is able to produce reasonably well the large-scale structures but fails to reproduce the small scale features found in the observational data. This is true in particular for the equatorial and low latitude regions in Africa and the Chinese southern area [Adeniyi et al., 2003; Obrou et al., 2003; Zhang et al., 2004, 2007]. One reason for this model's limitation is may be because the data from the Chinese continent were not used when producing the CCIR coefficients. The other reason may be due to the limited number of terms used when developing the model [Bilitza, 2002]. There is a necessity to update the existing M(3000)F2 model. Recently, Oyeyemi et al.  proposed a new modeling technique based on the application of neural network to model the M(3000)F2. In this paper, we pursue to use another technique based on the empirical orthogonal function (EOF) expansion to model the M(3000)F2. EOF analysis method was invented by Pearson . Since it was introduced into meteorology by Lorenz , it has been widely and successfully applied for the solution of prediction problems in meteorology. In the ionospheric field, it was Dvinskikh  who first introduced the EOF expansion into the empirical modeling of the ionospheric characteristics such as the critical frequencies of the E and F2 layers, and the peak height and semi-thickness of the F2 layer. His pioneer work was followed by other works [e.g., Singer and Taubenheim, 1990; Bossy and Rawer, 1990; Singer and Dvinskikh, 1991; Dvinskikh and Naidenova, 1991]. The EOF expansion has the advantage of high compactness and considerably reduced computing time compared with other expansion methods. It was showed by many researches that the EOF analysis is a powerful method in the ionospheric data representing and empirical modeling [Dvinskikh, 1988; Singer and Dvinskikh, 1991; Dvinskikh and Naidenova, 1991; Daniell et al., 1995; Howe et al., 1998; Matsuo et al., 2002, 2005; Marsh et al., 2004; Materassi and Mitchell, 2005; Zhao et al., 2005; Zapfe et al., 2006]. In this paper, we will first focus on the single station modeling using M(3000)F2 data from Wuhan (geographic 114.4°E, 30.6°N; 45.2°dip) to demonstrate the modeling technique we used. Then an experiment on modeling global data is made to show the possibility of developing regional and global models of M(3000)F2 based on EOF expansion.
2. Data Source for Single Station Modeling
 Since the International Geophysical Year (IGY, 1957), ionosonde measurements have been routinely made at Wuhan Ionospheric Observatory (geographic 114.4°E, 30.6°N; 45.2°dip), which is located in the northern flank of the northern crest of the equatorial anomaly in East Asia. This makes it an important ionospheric sounding station for studying the ionospheric dynamics in the equatorial anomaly region. Moreover, the long history of data accumulation for this station makes it extremely useful for an empirical modeling study of the ionosphere in this region, which leads to the fruitful modeling results of various ionospheric parameters for this station [Chen et al., 2002, 2004; Liu et al., 2004; Mao et al., 2005; Yue et al., 2006]. The data used in the present modeling study is the M(3000)F2 scaled from the ionograms recorded during the years of 1957–1991. Here we use one hour data. The monthly median hourly values of M(3000)F2 are used for the present study as we are modeling the climatological variations of the ionosphere for the present.
3. Description of the Modeling Technique
 First, we use the method of empirical orthogonal function (EOF) expansion to decompose the monthly median hourly values of M(3000)F2 into the EOF bases Ek representing the diurnal variation, multiplied by the time-varying EOF coefficients Ak:
where M(3000)F2 with the elements (m, h) is the monthly median hourly values of the observed data represented as a p × n array with rows corresponding to months (m = 1, 2…, p, p being the total number of months the data used) and columns corresponding to local time LT (h = 1, 2…, n), Ek(h) is the k-th base function representing the diurnal variation, Ak(m) is the amplitude of Ek(h) and it reflects the longer term (seasonal and solar-cycle) variation of M(3000)F2. For the details of EOF expansion technique, please refer to the work of Dvinskikh , Storch and Zwiers  and Xu and Kamide . Since the variation of the monthly median M(3000)F2 is highly auto-correlated, the quick convergence of the EOF expansion makes it very convenient to construct an empirical model for the original data set. In our case, the original data used is a 370 × 24 array. Generally, we need all of the 24 order base functions and their associated EOF coefficients to reconstruct all the variations of the whole M(3000)F2 data set. However, the quick convergence of EOF expansion makes it possible to use only a few orders of EOF components to represent most of the variance of the original data set. This makes the number of components used for the model construction greatly reduced. In our case, the first three EOF components can explain 99.9% of the variance of the whole data set. Therefore we can use only the first three EOF components to reconstruct the M(3000)F2. We found that the correlation coefficient between the M(3000)F2 values reconstructed with the first three EOF components and the original data set is as high as 0.96.
Figure 1 displays the first three EOF base functions obtained (solid lines). Also plotted in the figure is the average diurnal variation of M(3000)F2 (dashed line in the top panel) which represents an average condition of ionosphere. It is easily seen that E1 is very similar to the average diurnal variation of M(3000)F2 both in amplitude and in the variation pattern. A simple calculation showed that the correlation coefficient between them is 0.9996. This implies that the first order base function E1 represents the mean diurnal variation of M(3000)F2.
Figure 2 shows the variations of the first three EOF coefficients A1, A2 and A3 (solid lines with points) obtained by decomposing the original observational data using equation (3). Also shown in the figure (solid lines without points) are the modeled results that will be described and discussed later in this section. We also plot in the top panel of the figure the variations of the monthly median values of the solar activity index F107 (the 10.7 cm solar radiation flux) for the sake of comparison and ease of discussion. As can be seen in Figure 2, A1 has a much larger value than A2 and A3. Therefore the first EOF component E1 × A1 represents most of the M(3000)F2 variance. Evidently, A1 is highly negatively correlated with F107 (the calculated correlation coefficient between them is −0.826). A1 also contains an obvious annual variation component. Figure 2c shows the variation of A2. It can be seen that the variation of A2 also contains mainly an annual variation component, but with a much smaller amplitude than that of A1 and it also has some dependence on the solar activity level. The variation of A3 is shown in Figure 2d. Apparently, A3 contains both the annual and semiannual variation components and their amplitudes also change with the solar activity level represented by F107.
 From what we described and discussed above, we can model the first three EOF coefficients Ak (k = 1, 2, 3) with the formal Fourier series:
where ɛ represents the model residual error and
where Y = 12. In equations (5)–(7), Bk1(m), Bk2(m) and Bk3(m) represent the non-seasonal, annual and semiannual variation components of the EOF coefficients Ak respectively. Bk1(m) is expressed as a linear function of F107. Bk2(m) and Bk3(m) are expressed as modulated sinusoidal functions with periods of one year (Y = 12) and half year (Y/2) respectively, and their amplitudes are also expressed as linear functions of F107 representing their dependence on the solar activity level. Using the linear regression analysis method, the coefficients in equations (5)–(7), ck1, dk1, …, sk3, tk3 are estimated. Our single-station model is based on these determined coefficients (ck1, dk1, …, sk3, tk3) that will be used to calculate the EOF coefficients Ak(m) according to equations (4)–(7). Hereafter, the Ak(m) thus determined are referred to as the modeled EOF coefficients and the M(3000)F2 values calculated using them with equation (3) are referred to as the modeled M(3000)F2. The modeled EOF coefficients A1–A3 are shown in Figure 2 indicated as solid lines without points. As can be seen in Figure 2, the modeled EOF coefficients reproduced very well the original fitted EOF coefficients (solid lines with points in the same figure).
Figure 3 shows the variations of Bk1(m), Bk2(m) and Bk3(m) that represent the non-seasonal, annual and semiannual variation components of the fitted EOF coefficients Ak(m) (k = 1, 2, 3) respectively. We can see that the solar cycle variation contribution is mainly from the first EOF component. The annual variation is contributed mainly from the first and second EOF components but there is also a small contribution from the third EOF component during the high solar activity years. The semiannual variation is contributed mainly from the second and third EOF components.
Figure 4 shows the variations of the observed M(3000)F2 (top panel), the reconstructed M(3000)F2 (middle panel) and the modeled M(3000)F2 (bottom panel) in a solar-cycle. As is shown in the figure, the modeled values of M(3000)F2 reproduced quite well the diurnal, seasonal and solar cycle variations of the original observed data and the reconstructed data. The calculated result showed that the correlation coefficient between the modeled data and the observed data is 0.918.
4. Model Validation Study
 To validate the model, we chose the observational data for a low solar activity year (1976) and a high solar activity year (1981) as testing samples. We repeated the model construction procedure as described in section 3 but excluded the data of the year chosen for testing when constructing the model, then the M(3000)F2 values for the tested year are calculated with the newly constructed model. This means that when we chose the low solar activity year 1976 for evaluation, the observational data for this year is not included in the database used for the model coefficients generation. The same procedure is applied when testing for the high solar activity year 1981. Figures 5–6 show the comparison of the diurnal variations of the monthly median M(3000)F2 observed over Wuhan for the 12 months of the years 1976 and 1981 with the predictions of our single-station model. It can be seen that in general the model predictions agree well with the observational results. To evaluate the overall performance of the model, we calculated the mean difference and the root mean squared error (RMSE) as well as their relative percentage values (r, RMSEr) between model predicted values and the observational ones by the following formulae:
where N is the total number of data points, M(3000)F2obs and M(3000)F2pred are the observed and model predicted values respectively. The calculated values of and RMSE are −0.019 and 0.091 for the low solar activity year 1976, those for the high solar activity year 1981 are 0.006 and 0.106. This indicates that the results predicted by the constructed model and the observational ones are in good agreement, considering that the observational uncertainty of M(3000)F2 parameter is in a magnitude of 0.05. The corresponding relative percentage values r and RMSEr are −0.51% and 2.96% for the low solar activity year 1976, 0.39% and 3.90% for the high solar activity year 1981, respectively. This means that in average the uncertainty of the prediction with the model is in a magnitude of about 3–4%.
5. Modeling Experiment With Global Data
 As a first attempt to develop a global model of M(3000)F2 based on the EOF analysis method, we have made a preliminary experiment to model the global M(3000)F2 data using the EOF method. The data we used for the global modeling are the monthly median hourly values which are calculated from the daily hourly values of M(3000)F2 observed by the ionosondes/digisondes distributed around the world. Only stations with available M(3000)F2 data for a period longer than 11 years were chosen for the modeling. Figure 7 shows the distribution of the 73 stations chosen. Since the time periods with available data are different for different stations chosen, we choose the time period of 1975–1985 as the period for global modeling because data are available for most of the chosen stations during this period. To fill the missing data for any chosen station during the time period of 1975–1985, we preprocessed the data as follows. First, we used the modeling technique described in section 3 to construct a single station model for each chosen station using its all available M(3000)F2 data which should cover more than 11 years as mentioned above. Then, the missing data (if any) for any chosen station during the time period of 1975–1985 are filled with the single station model values.
 After the preprocessing of the raw data as described above, data at 5° × 5° grids in a latitudinal range of 85°S–85°N and longitudinal range of 0°–360° were then obtained by interpolation using kriging method. This gridded data are then expanded using the EOF expansion method as following
where Ek is the k-th EOF base function representing the variation of M(3000)F2 with geographical latitude, longitude and universal time; Ak is the associated coefficient representing the seasonal and solar cycle variations. Our results showed that the first three EOF components can explain 99.9% of the variance of the whole data set. Therefore we use only the first three EOF components to reconstruct the M(3000)F2. Similar to the single station modeling technique, the EOF expansion coefficients Ak obtained here are also modeled using equations (4)–(7) as described in section 3 to estimate the best fitted model coefficients ck1, dk1, …, sk3, tk3. The modeled M(3000)F2 value is calculated with the modeled Ak which are calculated with model coefficients ck1, dk1, …, sk3, tk3 according to equations (4)–(7). Figure 8 shows the examples of the scatterplots of the M(3000)F2 values calculated using the global model constructed versus the observational ones. The upper panel is for the low solar activity year 1976 and the lower panel is for the high solar activity year 1981. It can be seen that the modeled M(3000)F2 values and the observational ones show a high linear dependency on each other, with very high correlation coefficients, 0.928 for the year 1976 and 0.951 for the year 1981, respectively. This implies that the modeled M(3000)F2 values are able to reproduce the observational ones reasonably well. This preliminary result suggests that the modeling technique based on EOF expansion, which is combined with regression analysis of the EOF coefficients as described in this paper, is promising and is worthwhile to investigate further and that will be our future work.
6. Summary and Conclusion
 In the present study, the EOF expansion method, combined with the regression analysis, was introduced to develop empirical model of M(3000)F2. First, a single station empirical model of M(3000)F2 using data from Wuhan, China was developed for demonstrating the modeling technique. The model incorporates the diurnal, seasonal and solar-cycle variations of M(3000)F2. In the single station model constructed, the diurnal variations are represented by the EOF base functions and the longer-term (seasonal and solar-cycle) variations are represented by the associated EOF coefficients. Comparisons between the model predictions and the observational ones showed good agreement. It is demonstrated that with the modeling technique described in this paper, it is possible to use only a few lower order EOF components to reconstruct the original observational data set with high accuracy. Then, an experimental modeling with global data of M(3000)F2 was made based on EOF expansion, which is combined with regression analysis to the EOF coefficients. The preliminary result on modeling the global data of M(3000)F2 suggests that the modeling technique as described in this paper is promising and is worthwhile to investigate further to develop regional or global models of M(3000)F2 and that will be our future work.
 This research was supported by National Natural Science Foundation of China (40774092, 40636032), the National Important Basic Research Project (2006CB806306) and the China Grant (GYHY(QX)2007-6-13). The M(3000)F2 data for global modeling and F107 index data were downloaded from the SPIDR web site http://spidr.ngdc.noaa.gov/.