Empirical Models of foF2 and hmF2 Reconstituted by Global Ionosonde and Reanalysis Data and COSMIC Observations

The F2‐peak plasma frequency (foF2) and the height of the F2 peak (hmF2) are two of the most important parameters for any ionospheric model, as well as radio propagation studies and applications. In this study, we have developed empirical models to capture the most significant variations of foF2 and hmF2. The derived empirical models (referred to as the USTC models within this study) are specified through global ionosonde and reanalysis data based on the International Reference Ionosphere (IRI) Consultative Committee on International Radio (CCIR) method and Constellation Observindg System for Meteorology, Ionosphere, and Climate (COSMIC) observations based on the empirical orthogonal function analysis, respectively. The USTC models are validated against the IRI CCIR model prediction. The comparison results revealed that the empirical foF2 model performs better in capturing the foF2 variations than the IRI CCIR model, which can overcome the underestimation of the IRI CCIR model at low latitudes. Although the IRI CCIR model overestimation at middle latitudes is addressed by the empirical hmF2 model, the visible differences between the model predictions and ionosonde observations still exist at low latitudes, which could be attributed to the significant difference between COSMIC and ionosonde hmF2 measures.


Introduction
The terrestrial ionosphere is the plasma layer of electrons and ions that surrounds the Earth in the altitude range of approximately 60-1,000 km.Termed the F2-layer, it is constituted of plasma with the high-density produced above 200 km altitude by solar extreme ultraviolet radiation.This layer provides a conductive medium for radio propagation studies and applications, therefore it has been traditionally used to reflect radio and radar signals.
Two most important parameters of the F2-layer are F2-peak plasma frequency (foF2) and height of the F2 peak (hmF2), which are the highest values of radio wave reached in the ionospheric F2-layer and mark the point of highest electron density of ionospheric F2-layer, respectively.Because of this importance, the various empirical models of foF2 and hmF2 are developed to use ionospheric and radio propagation studies and applications (e.g., Altadill et al., 2013;Araujo-Pradere et al., 2002;Bilitza et al., 1979;Fuller-Rowell et al., 2000;Jones & Gallet, 1962, 1965;Rush et al., 1983Rush et al., , 1984;;Shubin et al., 2013;Zhang et al., 2009).Here only a few relevant empirical models are listed, and no attempt is made to provide a complete compilation of references.
The International Reference Ionosphere (IRI) model is the most widely used ionospheric model, which provides several options for the foF2 and hmF2 parameters (Bilitza et al., 2022).It offers two alternatives for generating the foF2 parameter, namely, the International Radio Consultative Committee (CCIR) model and the International Union of Radio Science (URSI) model.They use the same mathematical functions but different sets of ionosonde data and different methods of filling data gaps in the ocean areas and other data-sparse regions.The IRI CCIR model used the "screen points" by assuming the same local time variation of foF2 along lines of constant magnetic dip latitude to extrapolate the foF2 from a real observation station to virtual stations to cover ocean areas and datasparse regions (Jones & Gallet, 1965), whereas a theoretical model to determine "screen points" in the ocean areas and other data-sparse regions was applied in the IRI URSI model (Rush et al., 1983(Rush et al., , 1984)).In comparison to the foF2 parameter, three options including CCIR (BSE-1979), AMTB2013, and SHU-2015, are provided for modeling hmF2 in the IRI model, which developed respectively with propagation factor M(3000)F2 ionosonde data (Bilitza et al., 1979), with hmF2 ionosonde data (Altadill et al., 2013), and with hmF2 ionosonde data and Constellation Observing System for Meteorology, Ionosphere, and Climate (COSMIC) radio occultation (RO) data (Shubin, 2015).In addition, the models that used the method of empirical orthogonal function (EOF) analysis, and machine learning, among many others but not included in the IRI model attempt to capture the foF2 and hmF2 parameters (e.g., Lin et al., 2014;McKinnell & Poole, 2004;Sai Gowtam & Tulasi Ram, 2017;Zhang et al., 2009).
Although great efforts have been made to improve the accuracy of these models, the imperfection of the different models still exists and needs to the addressed.For instance, the IRI CCIR and URSI foF2 predictions systematically underestimate the observed foF2 results during most periods of the day at low latitudes (Bertoni et al., 2006;Ikubanni et al., 2014;Wang et al., 2009;Zhang et al., 2007).The hmF2 values can be underestimated or overestimated in a day by the options of the IRI model (Lee & Reinisch, 2006;Mengist et al., 2020;Sethi et al., 2008).One of the reasons for model limitations is the used data are very sparse in the ocean areas and other continental regions.For instance, there are few ionosonde stations for providing foF2 in South Asia in the IRI model.In addition, the "screen points" from real observations and theoretical model have significant differences with the real ionospheric observations, which could result in poor performance in the data-sparse regions.In this study, we build the empirical models of foF2 and hmF2 (referred to as the USTC models) by using the global foF2 ionosonde and foF2 reanalysis data with the method of IRI CCIR model, and COSMIC hmF2 observations based on the EOF analysis, respectively.The global foF2 reanalysis data from data assimilation are first used to fill data gaps in the ocean areas and other data-sparse regions combined with the new ionosonde foF2 observations from several ionosonde stations over South Asia, which are different from that of the IRI CCIR and URSI models.

Data Set
In this study, the monthly median foF2 values from global ionosonde observations and monthly foF2 reanalysis data obtained from data assimilation are used to constitute the empirical foF2 model.Both data have a temporal resolution of 1 hr.There are 174 ionosonde stations for providing foF2 data, which cover most continental regions, such as Europe, North America, Asia, and Australia.In particular, over 10 ionosonde stations are first to fill the data gaps in South Asia.Over two-thirds of the stations have the data over one solar cycle period and about a third of the stations have the data over two solar cycle periods.Due to the ionosonde only providing the data on the land, we used the monthly foF2 reanalysis data to fill data gaps in the ocean areas.The monthly foF2 reanalysis data are obtained from data assimilation provided by Yue et al. (2012) and He et al. (2022); they suggested that the reanalysis has better captured the foF2 variability against the real observation data.The monthly foF2 reanalysis data during April 2002 and September 2011 provided by Yue et al. (2012) and during October 2020 and September 2021 obtained by He et al. (2022) are used here, which cover most of the period of one solar cycle.Note that within this study we only used the monthly foF2 reanalysis data during October 2020 and September 2021 within the 45°N and 45°S because they were assimilated by using the COSMIC-2 data that only provide the observations at equatorial and low latitude regions (Lin et al., 2020).The foF2 observations from global ionosonde stations over the continental regions combined with reanalysis results in the ocean areas and other datasparse regions provide a good data resource for constituting the empirical model to capture climatological features associated with solar activity.The data can cover the whole world with the smallest station spacing from several degrees to ∼15°.The details of the coordinate distribution and time periods of ionosonde stations and the distribution of virtual stations ("screen points") from foF2 reanalysis observation are given in Figure S1 and Table S1 in Supporting Information S1.
Compared to the empirical foF2 model, the empirical hmF2 model used the hmF2 data derived from the RO observations of the COSMIC-1 and COSMIC-2.The hmF2 data from April 2006 to September 2019 are from the COSMIC-1 mission that provides global observations from the equator region to high latitudes (Lei et al., 2007).The other hmF2 data during October 2019 and October 2021 are obtained from the COSMIC-2 mission, which provides a mass of observations at the equatorial and low latitude regions.The total hmF2 data have global coverage and a period of 18 years from April 2006 to October 2021 that overcovers one and a half solar cycles, which can be better used to capture the hmF2 climate variations associated with solar activity.The details of the distribution of COSMIC hmF2 data are given in Figure S2 in Supporting Information S1.

Method Description
The empirical foF2 model uses the same method as the IRI CCIR foF2 model.The method pioneered by Jones andGallet (1962, 1965) used a Fourier time series to describe the diurnal variation of monthly median values of foF2, as defined following equation: where, λ and θ are the geographic latitude and East longitude (0, 360), respectively, T is the universal time expressed as an angle ( 180, 180) and M is the maximum number of the harmonics used to represent the detailed structure of the diurnal variation.M is set to 6, so that the 13 coefficients are considered.Furthermore, a special set of geographic functions is used to describe the variation of the Fourier coefficients with geographic coordinates and with the special modified dip coordinate (modip μ) introduced by Rawer (1963) to describe the magnetic field dependence of the foF2 parameter.The function is defined as: The function uses the eight-order harmonic expansion to describe the longitudinal structure.For additional compactness of the set of coefficients, Jones and coworkers experimented with truncating higher degrees of the latitudinal expansion to smaller orders J(k) that their cutoffs are 11, 11, 8, 4, 1, 0, 0, 0, 0 for k = 0, …, 8, without loss of accuracy (CCIR, 1975).Thus, the total number of longitude-latitude functions and coefficients is 76 for each of the 13 Fourier coefficients.The 988 coefficients (13 × 76) are needed for the empirical foF2 model by considering the order of expansion in T, λ, and θ, to globally represent the 24-hr variation of the monthly median foF2.We obtain these coefficients for every month of the year to consider the seasonal variation of foF2.The drive index IG12 is the 12-month running mean of Ionospheric Global, which is the ionosonde-based index proposed by Liu et al. (1983) used in this study.A linear fit is applied to describe the variation with IG12 and based on this linear relationship two sets of monthly coefficients were established, one for IG12 = 0 and one for IG12 = 100.A detailed description of the method can be found in the review article of Bilitza et al. (2022).
The EOF analysis method, also known as principal component (PC) analysis, is used in this study to capture the most significant variability of hmF2 from a large COSMIC observation data set.The basic modes of the hmF2 variations, which denote the PCs, are consequently derived via eigen-decomposition.Then, the superposition of the basic modes is weighed by the EOF's amplitude to reproduce the hmF2 variations, as defined below equation: where, φ, Ø, T and ε are the geomagnetic latitude, geomagnetic longitude ( 180, 180), universal time, and residual, respectively.The subscript i stands for the ith PC.Amp i (T ) represents the EOF's amplitude for the ith PC.As shown in Figure S3 in Supporting Information S1, the first three PC series take up 98.47%, 0.34%, and 0.14% of the total variance, respectively.The 98.95% of the total variation is captured by these three PCs.Therefore, only the functions of the first three PCs and their associated coefficients (K = 3 in Equation 3) are used to characterize the hmF2 variation.
It should be noted that the development of the EOF model also requires the parameterization of the temporal variation of EOF's amplitude.The function with the solar-geomagnetic conditions and inter annual and universal time variations is used to fit each EOF's amplitude due to each EOF having a unique driven source (Flynn et al., 2018): where, the solar extreme ultraviolet flux proxy F107 on the previous day and the 81-day averaged F107 (F107 A ) as well as the geomagnetic Ap indices and the daily mean of Ap (Ap D ) are used to quantify the solar-geomagnetic dependence in general terms.d is the day number from 1st January of each year.T is the current universal time, which also contains the longitudinal variation.The other coefficients are obtained via linear regression.As shown in Figure S4 in Supporting Information S1, the regression parameterization value can better reproduce the temporal variation of EOF's amplitude.The correlation coefficient between EOF's amplitude and its regression parameterization value is 0.932, 0.936, and 0.692 for first three PC series, respectively.

Results and Discussion
In this section, we have performed comprehensive comparisons between the empirical models of foF2 and hmF2 and IRI CCIR predictions and those from reanalysis data and ionosonde measurements to validate the empirical models.It should be noted that the CCIR predictions from IRI-2016 used in this study are the latest because the next version (IRI-2020) does not update the CCIR models of foF2 and hmF2.

Empirical foF2 Model
Figure  It should be noted that the model-observation deviation distribution is the whole-day analysis result.We further examined the error distribution at different local times.Figure 5 shows the comparison of model-observation

Empirical hmF2 Model
Figure 6 shows the comparison of diurnal variation in the monthly median hmF2 values from the USTC model and IRI CCIR model at low and middle latitudes at 115°E during the equinox (March 2021) and solstice (December 2020) seasons.The variations of monthly median hmF2 values from both models are consistent with the monthly reanalysis hmF2 observations.The diurnal and latitudinal variations of the monthly median hmF2 values from both models have similar characteristics compared to the reanalysis observations, with a few exceptions a slight difference was observed between both model predictions and reanalysis observations.Further, we examined the global variations of the USTC model predictions by comparing those from the IRI CCIR model and reanalysis observations.Figure 7 8).As shown in Figure 9, the modelobservation deviation of the monthly median hmF2 values does not form a regular typical Gaussian distribution for the USTC model at both stations where its variance σ 2 is larger than that of the IRI CCIR model.In addition, as depicted by the model-observation deviation distribution at different local time at Shaoyang in Figure 11, the hmF2 underestimation and overestimation are presented during daytime and nighttime, respectively, although the significant hmF2 overestimation exists in the IRI CCIR model during nighttime.The retrieved hmF2 from COSMIC observations have relatively significant differences as compared with that from ionosonde at low latitudes (Hu et al., 2014;Yue et al., 2010).As illustrated in Figure 12, the COSMIC hmF2 values generally fit the ionosonde observations no matter daytime and nighttime at middle latitudes (Mohe and Beijing).However, a different condition was observed at low latitudes (Wuhan, Shaoyang and Sanya) where the COSMIC hmF2 values are very dispersed and do not always follow the variations of the monthly median hmF2 values from ionosonde observations.The USTC model does not display a superior performance at low latitudes, which could be attributed to the significant differences between the COSMIC and ionosonde hmF2 observations at these latitudes.Further data comparative analysis and investigations are needed.
Finally, it should be noted that the period of monthly reanalysis foF2 data and COSMIC hmF2 is most during solar cycles 23 and 24, which are relatively quiet compared with other solar cycles.The used ionosondes have also an uneven distribution on the land.In addition, we have not examined the model performance at high latitudes where the foF2 and hmF2 variations are complicated under the effects of ionospheric polar processes (Mayer & Jakowski, 2009;Tsai et al., 2010;Yue et al., 2013) and at other longitude regions except South Asia.The validation of the empirical models of foF2 and hmF2 over the ocean and other longitude regions, and at high latitudes during different solar activities, as well as the comparison with other IRI options including the URSI model, AMTB2013, and SHU-2015 needs further investigations.

Summary
In this study, the empirical models of foF2 and hmF2 referred to as the USTC models, are respectively specified through global ionosonde and reanalysis data based on the IRI CCIR method and COSMIC observations based on the EOF analysis to capture the most significant variations of foF2 and hmF2.Our USTC models are validated against the IRI CCIR foF2 and hmF2 models, reanalysis data and ionosonde observations.The comprehensive comparison results revealed that the empirical foF2 model has better reproduced the observed foF2 climate variations associated with solar activity at low and middle latitudes.It can overcome the IRI CCIR model prediction underestimate at low latitudes, showing a better performance in capturing the foF2 variations than the IRI CCIR model.However, the empirical hmF2 model has a different performance at different latitudes.The IRI CCIR model prediction overestimation at middle latitudes is addressed by the empirical hmF2 model, whereas the

•
The empirical models of foF2 and hmF2 are respectively reconstituted by using ionosonde and reanalysis data, and Constellation Observindg System for Meteorology, Ionosphere, and Climate (COSMIC) observations • The empirical foF2 model displays a better performance for capturing foF2 variations compared to the International Reference Ionosphere Consultative Committee on International Radio model • The empirical hmF2 model has a different performance at different latitudes affecting by the COSMIC observations Supporting Information: Supporting Information may be found in the online version of this article.
Figure1shows the comparison of diurnal variation of the monthly median foF2 values from the USTC model and IRI CCIR model at low and middle latitudes at 115°E during the equinox (March 2021) and solstice (December 2020) seasons.The monthly foF2 reanalysis observations are given for reference.It can be seen that the two model predictions have similar characteristics compared with those of reanalysis observations.The monthly median foF2 values from the USTC model display obvious diurnal variations with the typical equatorial ionization anomaly (EIA) structure during the equinox and solstice.The interhemispheric asymmetry of EIA in December 2020 is also well reproduced by the USTC model as the IRI CCIR model prediction does, which demonstrates a robust performance of our model with previous observations (e.g.,Xiong et al., 2013).In addition, we examined the global variations of the USTC model prediction by comparing those from the IRI CCIR model and reanalysis observations.Figure2shows the comparison of global variations of the monthly median foF2 values obtained from both models and the monthly reanalysis observations at low and middle latitudes at 2:00 UT of December 2020 and March 2021.The USTC model can describe the global foF2 variations as the IRI CCIR model does.Both model foF2 predictions manifest similar global variations compared to the reanalysis observations.The foF2 from the USTC model shows an obvious EIA structure depending on the geomagnetic inclination.The significant interhemispheric asymmetry of the global EIA structure is also well reproduced by the USTC model predictions.Furthermore, we verified the USTC model performance by comparing it with the IRI CCIR model and the real ionosonde measures at several stations over South Asia.Figure3shows the comparisons between monthly median foF2 values from both models and the ionosonde observations at low and middle latitudes over South Asia during 2012-2019.The variations of monthly median foF2 values from both models are generally consistent with that from the ionosonde observations at all stations.It's more important that the USTC model displays better

Figure 1 .
Figure 1.Comparison of diurnal variation of monthly reanalysis foF2 data (a, d) and the monthly median foF2 values obtained from the International Reference Ionosphere Consultative Committee on International Radio model (b, e) and the USTC model (c, f) at low and middle latitudes at 115°E in December 2020 and March 2021.

Figure 2 .
Figure 2. Comparison of global variation of the monthly reanalysis foF2 data (a, d) and the monthly median foF2 values obtained from the International Reference Ionosphere Consultative Committee on International Radio model (b, e) and the USTC model (c, f) at low and middle latitudes at the 2:00 UT of December 2020 and March 2021.
Figure6shows the comparison of diurnal variation in the monthly median hmF2 values from the USTC model and IRI CCIR model at low and middle latitudes at 115°E during the equinox (March 2021) and solstice (December 2020) seasons.The variations of monthly median hmF2 values from both models are consistent with the monthly reanalysis hmF2 observations.The diurnal and latitudinal variations of the monthly median hmF2 values from both models have similar characteristics compared to the reanalysis observations, with a few exceptions a slight difference was observed between both model predictions and reanalysis observations.Further, we examined the global variations of the USTC model predictions by comparing those from the IRI CCIR model and reanalysis observations.Figure7shows the comparison of global variations of the monthly median hmF2 values obtained from both models and the monthly reanalysis observations at low and middle latitudes at 2:00 UT of December 2020 and March 2021.The USTC model can describe the hmF2 global variations as the IRI CCIR model does.Both model predictions manifest similar global variations compared to the reanalysis observations.

Figure 3 .
Figure 3.Comparison between the monthly median foF2 values obtained from the USTC model (red lines) and the International Reference Ionosphere Consultative Committee on International Radio model (black lines) and those derived from ionosonde observations (blue lines) at low and middle latitude stations over South Asia during 2012-2019.The solar extreme ultraviolet flux proxy F10.7 is also plotted for reference.The root-mean-square error between the model prediction values and ionosonde observations is also given.

Figure 8
Figure 8 displays the comparison between monthly median hmF2 values from the USTC model, the IRI CCIR model and from ionosonde observations at low and middle latitude stations over South Asia during 2012-2019.The USTC model shows a better performance for predicting the hmF2 variations than the IRI CCIR model at middle latitudes.The monthly median hmF2 values from the USTC model and ionosonde data are generally consistent at middle stations where the difference between the IRI CCIR model predictions and ionosonde data is significant during 2016-2019.The IRI CCIR model predictions greatly overestimate the monthly median hmF2 values during 2016-2019 at Mohe (53.5°N, 122.3°E,MLAT: 44.4°N) and Beijing (40.2°N, 116.2°E,MLAT: 30.9°N).In contrast, the USTC model reproduced the observed hmF2 variations at both stations.The RMSE

Figure 4 .
Figure 4. Model-observation deviation distribution of monthly median foF2 values (blue bars) and its Gaussian-function regression (green line) for the International Reference Ionosphere Consultative Committee on International Radio model (left panel) and the USTC model (right panel) at low and middle latitudes over South Asia during 2012-2019.

Figure 5 .
Figure 5. Model-observation deviation distribution of monthly median foF2 values (blue bars) and its Gaussian-function regression (green line) for the International Reference Ionosphere Consultative Committee on International Radio model (left panel) and the USTC model (right panel) at different local times during 2012-2019 at Sanya (18.3°N, 109.4°E,MLAT: 8.9°N).

Figure 6 .
Figure 6.Similar as Figure 1 but for the monthly median hmF2 values.

Figure 7 .
Figure 7. Similar as Figure 2 but for the monthly median hmF2 values.

Figure 8 .
Figure 8. Similar as Figure 3 but for the monthly median hmF2 values.

Figure 9 .
Figure 9. Similar as Figure 4 but for the monthly median hmF2 values.

Figure 12 .
Figure 12.Comparison of the hmF2 observations between the Constellation Observindg System for Meteorology, Ionosphere, and Climate (COSMIC) and ionosonde at low and middle latitudes over South Asia during 2013-2015.The red and blue points represent COSMIC hmF2 observations during daytime and nighttime, respectively.The monthly median hmF2 values from ionosonde measures are plotted by the black lines.Note that the COSMIC hmF2 observations where the geographic latitude and longitude are respectively within the 1.5°and 15°of the geographic coordinate of ionosonde station are used here.