A Model of High Latitude Ionospheric Convection Derived From SuperDARN EOF Model Data

Forecasting of the effects of thermospheric drag on satellites will be improved significantly with better modeling of space weather effects on the high‐latitude ionosphere, in particular the Joule heating arising from electric field variability. We use a regression analysis to build a model of the ionospheric convection drift velocity which is driven by relatively few solar and solar wind variables. The model is developed using a solar cycle's worth (1997–2008 inclusive) of 5‐min resolution Empirical Orthogonal Function (EOF) patterns derived from Super Dual Auroral Radar Network (SuperDARN) line‐of‐sight observations of the convection velocity across the high‐latitude northern hemisphere ionosphere. At key stages of development of the model, we use the percentage of explained variance P to see how well the model reproduces the EOF data. The final model is driven by four variables: (a) the interplanetary magnetic field component By, (b) the solar wind coupling parameter epsilon ε, (c) a trigonometric function of day‐of‐year, and (d) the monthly F10.7 index. The model can reproduce the EOF velocities with a characteristic P = 0.7. The model and EOF data compare best around the solar maximum of 2001. P $P$ is lower around solar minimum, due to occasional limitations in the geographical and temporal coverage of the SuperDARN measurements. This may indicate the need to modify our model around the minimum of the solar cycle. Our model has the potential to be used to forecast the ionospheric electric field using the real‐time solar wind data available from spacecraft located upstream of the Earth.

LAM ET AL.

10.1029/2023SW003428
2 of 17 atmosphere. The dominant unknown variable in orbital trajectory predictions of LEO objects is the density of the thermosphere, which exerts a time and location dependent drag.
Coupling of the solar wind to the magnetosphere results in energy being injected into the magnetosphere-ionosphere system. In the ionosphere, this is measurable as an electric field that is generated by plasma convection. This energy can be dissipated via Joule heating in the ionosphere-thermosphere, which has been estimated to dissipate over 50% of the total solar wind energy input to the Earth system (Østgaard et al., 2002). Joule heating is a key contributor to the power input into the thermosphere and thereby to the drag on satellites (e.g., Knipp et al., 2005), via upward expansion of the polar atmosphere and resultant significant density changes in the thermosphere. Accurate models and prediction of these high latitude processes are therefore vital to safeguard space assets. However, given that the thermospheric density can vary by ∼80% diurnally and by ∼250% during a geomagnetic storm, this is a major modeling challenge (Sutton et al., 2005).
Joule heating is specified by J ⋅ E in the rest frame of the ionosphere, where J and E are the current density and electric field vectors (e.g., Vasyliūnas & Song, 2005). The model presented here will provide the high-latitude ionospheric convection (E × B drift) velocity field and consequently the associated convection electric field E, having specified the magnetic field B. These can be used in fully coupled (neutral and ionized) models of the lower and upper atmosphere to specify the global thermospheric density at the lower end of LEO orbits. Such models include the Thermosphere-Ionosphere Electrodynamics General Circulation Model (GCM) (Qian et al., 2014) and the Advanced Ensemble electron density (Ne) Assimilation System (AENeAS), which is a data assimilation model derived from TIEGCM (Elvidge & Angling, 2019). The motivation for this study, and the production of the new model, was given by the UK's Space Weather Instrumentation, Measurement, Modeling and Risk-Thermosphere (SWIMMR-T) project, a multi-million-pound project during 2020-2024 to operationalize UK space weather modeling and forecasting.
Expertise in empirical modeling of the polar ionospheric electric field has been acquired over a period of more than four decades using a variety of data sets. Models have exploited both incoherent scatter radars (e.g., Foster, 1983;Zhang et al., 2007) and coherent scatter radars (e.g., Pettigrew et al., 2010;Ruohoniemi & Greenwald, 1996, 2005Thomas & Shepherd, 2018), both low-altitude spacecraft (e.g., Hairston & Heelis, 1990;Heppner, 1977;Heppner & Maynard, 1987;V. O. Papitashvili & Rich, 2002;Weimer, 1995Weimer, , 1996Weimer, , 2001Weimer, , 2005 and the high-altitude Cluster spacecraft (e.g., Förster et al., 2007), and ground-based magnetometer arrays (e.g., Ridley et al., 2000). To clarify, the high-latitude ionospheric plasma drift velocity field is typically referred to in this scientific literature as "convection." The basis for our model is 12 years of data from the Super Dual Auroral Radar Network (SuperDARN), which has been used to monitor mid to high-latitude ionospheric plasma velocities over the last few decades (Chisham et al., 2007;Greenwald et al., 1995;Nishitani et al., 2019). The interval of data, from 1997 to 2008 inclusive, starts just after the solar minimum of August 1996, and includes the solar maximum of November 2001 and the solar minimum of December 2008. In building our model, we first note the mostly two-cell morphology of the climatology of the ionospheric convection at high latitudes driven by magnetic reconnection, and its strong dependence on the interplanetary magnetic field (IMF) magnitude and the Sun-Earth component of the solar wind velocity . It is also dependent on the IMF clock angle (e.g., Grocott & Milan, 2014). If we consider the IMF components , , and in Geocentric Solar Magnetospheric (GSM) co-ordinates (Hapgood, 1992), then the IMF clock angle, clock , is the angle between the projection of the IMF vector onto the GSM y-z plane and the GSM z axis, Indeed, the solar wind electric field magnitude (| | √ 2 + 2 ), the IMF clock angle, and the dipole tilt of the Earth, drive the TS18 model (Thomas & Shepherd, 2018), which is also based on SuperDARN data. Second, we note that a study into the data-driven basis functions of the SuperDARN velocity field that we use here (Shore et al., 2021) finds that the IMF B z component is the dominant driver for the background mean field and, as expected, B z also drives a series of non-leading modes that describe the variability of the two-cell motion (e.g., Cowley & Lockwood, 1992) and that of substorms (e.g., Akasofu, 1964). However, it is IMF B y that dominates the variability of the convection velocity (Shore et al., 2021). This observation, and the strong azimuthal asymmetries imposed by IMF B y (e.g., Friis-Christensen & Wilhjelm, 1975;Friis-Christensen et al., 1985;Tenfjord et al., 2015), are motivations for the inclusion of IMF B y as a driver in our model. Third, we consider the time lag between changes in the solar wind impinging the magnetosphere and the response in the ionosphere. Shore et al. (2019) performed a regression of surface external and induced magnetic field (SEIMF) variations onto solar wind data. Their paper examined the correlation between the terrestrial magnetic field and solar wind coupling parameters, as a function of time lag between the 3 of 17 two quantities. It was found that the epsilon coupling parameter ε (Akasofu, 1979) resulted in a greater percentage of explained variance (which Shore et al. term the "prediction efficiency") than the Milan parameter (Milan et al., 2012).
In addition to the cross-polar-cap electric potential driven by magnetic reconnection, the spatial variation of the high-latitude ionospheric convection velocity is also influenced indirectly by the plasma conductivity. Conductivity is a strong function of magnetic local time (MLT), the seasonal change in the tilt of the Earth toward/away from the Sun, and on changes in the output irradiance of the Sun. Our final additions to the model will allow for these conductivity variations and will also accommodate solar cycle variations via a dependence on both the day-of-year and the phase of the solar cycle. We are developing a model from purely northern hemisphere data, and so this can be straightforwardly applied to the northern hemisphere. It can also be used for the southern hemisphere by applying the model with all the same input coefficient values, but for an IMF B y of opposite sign, with the caveat that any hemispheric differences other than the large-scale IMF B y -related ones will be missing. Due to the lower amount of SuperDARN data covering the southern hemisphere, this is the best that we can currently achieve.
We initially create a hindcast model (the Version 1 or v1 model) using ε, and then develop a second hindcasting version (v2) of the model that also includes IMF B y as a driver. Finally, a version (v3) of the model is produced that is dependent on the day-of-year and a monthly value of F 10.7 , the solar radio flux at 10.7 cm (Covington, 1948), to represent annual and solar cycle variations, respectively, including those in the ionospheric conductivity. We give details of the data sets used to develop the models in Section 2. In Section 3 we present our methodology for producing the three versions of the model. The results are presented in Section 4, including values for the percentage of explained variance, here given as a number between 0 and 1, which gives us a measure of how well the regression model is reproducing the parent data set. In Section 5, we focus on potential causes of variability in the percentage of explained variance, possible future improvements to the model, and our next steps. In Section 6, we present our summary and conclusions.

Data
In this section we discuss the data used to build the models. We build our model using 144 months of data for the interval 1997 to 2008 inclusive. A previous study used data from February 2001 to demonstrate the derivation of Empirical Orthogonal Function (EOF) model patterns from SuperDARN data (Shore et al., 2021), as this month has particularly good data coverage. We also use February 2001 ( Figure 1) to illustrate our method. Please refer to the Data Availability Statement at the end of this paper for details regarding access to the data.

Solar Wind Data
We make use of the IMF and the solar wind velocity data as extracted through OMNIWeb, specifically the OMNI 1-min data set (King & Papitashvili, 2005). The variables are in GSM co-ordinates (Hapgood, 1992). The OMNI solar wind data are provided already lagged from near the L1 Lagrangian point to the arrival time at the bow shock nose using the bow shock model of Farris and Russell (1994). This model requires the magnetopause nose distance which has been calculated for the data in terms of the solar wind pressure and IMF B z using the magnetopause model of Shue et al. (1997). We create 5-min means of the IMF components , , and , and the solar wind velocity component , time-stamped at the center of the 5-min epoch. In subsequent calculations, we omit any 5-min epoch for which there is no OMNI coverage. The remaining epochs are used to calculate the coupling parameter ε at 5-min resolution, where the permeability of free space 0 = 4π × 10 −7 , | | is the solar wind magnetic field magnitude, 2 0 is a scale factor intended to represent the cross-sectional area over which dayside reconnection takes place with 0 = 7 , and = 6,371.2 km is the mean Earth radius. Here, ε is measured in Watts (e.g., Akasofu, 1979).

The Shore EOF Analysis of SuperDARN Velocity Data
The convection velocities that we use here to develop our models have been published (Shore et al., 2022). These data will be referred to in this paper as the Shore EOF model analysis patterns or values. The Shore EOF model analysis used SuperDARN plasma velocity observations of the F-region ionosphere measured using the northern hemisphere radars of the SuperDARN global array. The fitted Doppler velocities were derived from the 4 of 17 original autocorrelation functions using version 4.5 of the radar software toolkit, RSTv4.5 (SuperDARN Data Analysis Working Group, 2021) and, within that toolkit, fitting routine "FitACF v2.5." In the production of the Shore EOF model patterns, the geolocation is dealt with using the Chisham virtual height model (Chisham et al., 2008), which is the state-of-the-art methodology. Further details about the selection of good quality data from the F-region are available in the data section of Shore et al. (2021). The Shore EOF model analysis patterns were determined by applying the EOF method (Shore et al., 2021) to 12 years of SuperDARN plasma velocity measurements (1997-2008 inclusive), which is similar to the length of a typical solar cycle. In contrast to other SuperDARN methods of representing convection, the Shore method includes a self-consistent SuperDARN infill solution, which does not rely on climatological averages or external information, and which achieves complete coverage of the plasma velocity field variability in time and space. The Shore EOF model analysis patterns (Shore et al., 2022) provide a complete representation of the northern hemisphere convection velocity field at ∼250-400 km altitude (within the F-region ionosphere), at 5-min resolution. The velocity data are presented separately as the north-south (NS) and east-west (EW) components of the flow at 559 spatial locations. These locations are defined as the central co-ordinates of equal-area spatial bins extending from 60°N latitude to the pole in the Quasi-Dipolar co-ordinate system (Richmond, 1995). The latitude step for the equal-area grid is ∼3.0°. The north and east directions are defined as positive for these two velocity components. Figures 1g and 1h show example NS and EW velocity components from the Shore EOF model analysis for February 2001 for a single location (73°N magnetic latitude, 00:09 MLT).

Solar Irradiance Proxy: F 10.7 Index
We make use of observations of the solar radio flux at 10.7 cm/2800 MHz (often called the F 10.7 index) from the Low-Resolution OMNI (LRO) data set. This index correlates well with several ultraviolet (UV) and visible solar irradiance records and is easily measured at the Earth's surface. The daily values of F 10.7 were used to create monthly mean values for the interval from 1997 to 2008 inclusive. Three of the months had missing data (missing 5 daily values or less), so those days were not included in the calculation of the monthly mean value.

Methodology
The OMNI data set timestamp takes into account the time lag between the satellite location and the Earth's bow shock nose. In order to best estimate the further time lag τ of the ionospheric response, for use in their solar-wind-driven model of the Earth's magnetic field, Shore et al. (2019) calculated the peak correlation between the terrestrial magnetic field and ε. The peak correlation commonly occurred at a time lag of ∼20 min, but it varied with MLT and latitude. It was longer than 20 min in some parts of the nightside, reflecting the dominance there of the substorm response to the solar wind. In a similar fashion, we calculated the Pearson correlation coefficient between the ε parameter and each velocity component of the ionospheric plasma E × B drift velocity at all locations, for the February 2001 data set. We calculated the correlation for a series of time lags ranging from −10 to +500 min in 5-min steps up to τ = 150 min, then 10-min steps thereafter, totaling 68 separate lags. The results (not shown) strongly resemble the published correlations between ε and the Earth's SEIMF (Shore et al., 2019) in that the correlation in the polar region tends to peak at a time lag of ∼20 min, with a secondary peak corresponding to a longer time lag for some parts of the nightside ionosphere. The time lag value of ∼20 min is supported by much of the literature (e.g., see Grocott and Milan (2014)) and therefore we deem 20 min to be a representative time lag for much of the ionosphere and use that value to develop our models in this paper. The Shore EOF model data set comprises 144 monthly individual analyses of SuperDARN plasma velocity data, extending from January 1997 through to December 2008. For each month of EOF model data, we produce two versions of a hindcast model. Version 1 (v1) is produced via a regression of the data onto ε alone, Version 2 (v2) is produced from a regression of the data onto both ε and IMF B y . We then produce a Version 3 (v3) model by performing a regression analysis of the 144 sets of monthly regression coefficients from the v2 model, with respect to trigonometric functions of day-of-year and to the monthly F 10.7 . The coefficients resulting from this analysis can be used to form a model that could potentially be used for forecasting, which we will refer to as the Lam 2023 (v3) model. We make use of version 8.7 of the Interactive Data Language (IDL) REGRESS function, which performs a multiple linear regression fit (Harris Geospatial Solutions, 2022).
We will use a simple measure to assess how well each model that we build reproduces the Shore EOF model values. Let denote the index for the 559 spatial bins and denote the index for each 5-min interval in a given 6 of 17 month. For a given velocity component, let denote the set of Shore EOF model velocity values, denote the spatial mean of , and ̂ denote the set of our model velocity values. For each of our model versions, we calculate: We can see that Equation 3 can be expressed in terms of the mean squared "error" between one of our regression models (v1, v2, or v3) and the Shore EOF model values, and the mean squared deviation of the Shore values from their spatial average value. This is similar to the formula for the prediction efficiency which is a measure of skill (e.g., Equation 19 of Liemohn et al. (2021)). We shall use ( ) as a way of evaluating the ability that each of our models has at reproducing the parent data set, namely the Shore EOF model values. Hence, we refer to ( ) as the "percentage of explained variance" rather than the "prediction efficiency." For instance, we can calculate the time development of the percentage ( ) for the NS velocity component in February 2001, NS( ), for i = 1 to 8,064 where 8,064 is the number of 5-min intervals in that month. We can calculate EW( ) in a similar fashion. This measure applies to the velocity component across the whole spatial region for any given 5-min interval. Obviously, it does not provide a validation of the model against other independent data sets provided, for instance, by satellite or incoherent scatter radar. It only provides a measure of how well the regression model performs at reproducing the data set from which it was derived.

Version 1 Model
We first create a simple linear hindcast model. The model relates the NS and EW components of the Shore Super-DARN EOF model velocity to via: In all versions of our models, we use to denote the slopes and to denote the constant terms. The subscripts in these coefficients are as follows: 1 denotes model version 1, denotes that the coefficient relates to the solar wind coupling parameter and NS (or EW ) relates to that specific velocity component. For a given month, for each velocity component and for each of the 559 EOF model analysis location bins, we perform a linear regression analysis of the 5-min velocity data with respect to , as indicated by Equations 4a and 4b.

Version 2 Model
The second version of the hindcast model relates the NS and EW components of the Shore EOF model velocity to and IMF via: where the subscript in the coefficients denotes the IMF component . We again perform the regression analysis for each velocity component, each location bin and for each of the 144 months.

Version 3 (Lam 2023) Model
The where = 2 ( -79)∕365.25 and is the day-of-year in the middle of each of the 12 months of the year ( = 1 to 12). Following Shore et al. (2019), we assume that the length of a year is 365.25 days and place the zero of the sine function at Spring equinox and we use day-of-year 79 (20 March) to represent the vernal equinox (Coxon et al., 2016). The regression analysis on the coefficient relating to the slope of ε in the v2 model, that is, Equation 6a, yields four coefficients: s3 NS , c3 NS , and 3 NS are the slopes and 3 NS is the intercept term. This is also the case for the slope in (Equation 6b) and the intercept term in the v2 model (Equation 6c), resulting in 12 coefficients for the NS regression analysis. There is an equivalent set of equations for the EW velocity component (not given here), which means that there are 24 regression coefficients that define the v3 model. These model coefficients (Lam et al., 2023) could be used to forecast by specifying the day-of-year in the middle of the current month (or the actual DOY) and the current monthly mean of F 10.7 . Equations 6a-6c then define values of the three coefficients of the v2 NS model for each of the 559 spatial bins. The same process is used to find the values of the three coefficients for the EW component of the v2 model for each of the 559 spatial bins. Equations 5a and 5b require values of and IMF to produce the associated plasma velocities. Real-Time Solar Wind (RTSW) data from spacecraft located upstream of the Earth, typically orbiting the L1 Lagrange point, can provide the values needed to forecast Earth's ionospheric electric field, as will be discussed in Section 5.

Version 1 Model
We use the regression coefficients found from Equations 4a and 4b to determine the v1 model convection velocities. We assess how well the v1 model recreates the Shore EOF model analysis of the SuperDARN data by calculating the regional values of percentage ( ) using Equation 3. Figures 2a and 2b show the occurrence distribution of ( ) for the two velocity components for February 2001 (dashed line). The shape of the distribution has a clear peak value (the monthly mode value), which is typical of the distributions observed for any of the 144 months.

Version 2 Model
As discussed in the introduction, there are reasons why the model would be improved by adding an explicit IMF dependence to the convection velocity. Having obtained the regression coefficients ( ) for the polar region has increased from 0.40 (v1) to 0.80 (v2) for the NS plasma velocity component, and from 0.60 (v1) to 0.85 (v2) for the EW component. The v2 formulation has markedly improved the percentage of explained variance for both velocity components compared with those of the v1 model. This improvement, resulting from the addition of IMF , is seen for the whole 12-year period (Figures 2c and 2d). The seasonal variation in the monthly mode value of percentage ( ) is no longer apparent in the v2 model results, and the sinusoidal-like solar-cycle-related dependence is much reduced in magnitude. The characteristic value of percentage ( ) for the v2 model χ v2 = 0.90 is also an improvement over that of the v1 model (χ v1 = 0.70). This indicates that the v2 model is a high-quality hindcast model that is good enough to base a forecast model on.  2021)) relates to the flow driven by variability in IMF after reconnection and is very similar to the Disturbance-Polar Type Y in the magnetic field (Friis-Christensen & Wilhjelm, 1975). Using the v2 model coefficients to build a model that could be used for operational purposes has the advantage that the v2 model velocities can be estimated from only two solar wind quantities, namely the IMF and the solar wind velocity.

Version 3 (Lam 2023) Model
The high-latitude ionospheric convection velocity is influenced by solar cycle and seasonal variations, such as those in the ionospheric plasma conductivity. Hence, we formulate the final version of the model using a regression analysis of v2 model regression coefficients with respect to a trigonometric function of the day-of-year, the monthly value of F 10.7 , and an intercept (Equations 6a-6c). F 10.7 is more than a conductivity marker-it tracks the solar cycle and so its inclusion may allow the representation of other solar-cycle-related variations. The maps of the regression coefficient values in the polar region (Figures 4 and 5) show a high degree of coherent spatial structure. If the maps lacked coherent spatial structure, then we might question whether the variables in the v3 model are reasonable choices, or whether there are sufficient data in the analysis. The Lam 2023 (v3) model is visibly less skilled (Figures 6a and 6b) than the v2 hindcast model at reproducing the Shore EOF model patterns, but this is understandable given that the v2 model is based on EOF patterns for the month in question, whereas the Lam (v3) model is not. The Lam model has an overall characteristic value for percentage ( ) of χ v3 = 0.70 and is particularly skilled for the years 2000-2004 inclusive (for which χ v3 = 0.75), as will be discussed further in Section 5.
We represent the convection velocity visually by taking the velocity value for a particular spatial bin and assuming a packet of plasma travels on the velocity trajectory for a characteristic time related to the resolution of the measurement. The end position of the packet is found using spherical trigonometry and scaled for ease of viewing. We present an example of how well the Lam 2023 (v3) model reproduces the Shore EOF model velocity field at a time when percentage  Figures 6a and 6b). One of the lowest values in the monthly mode of percentage ( ) occurs for June 1999. We again present an example of how well the Lam 2023 (v3) model reproduces the Shore EOF model velocity field, but for a time when percentage ( ) has a low value. On the 18 June at 07:17 UT, percentage ( ) = 0.40 in both the NS and EW components. The Lam (v3) model (Figure 7d) velocity field possesses a clearer two-cell structure than the Shore EOF model data (Figure 7c), especially around 9 of 17 noon. The Lam model has also replaced the anomalous vectors (those not of similar size and/or direction to the surrounding flow) at lower latitudes in the Shore SuperDARN EOF model velocity field. Since these values are likely to be poor data (Shore et al., 2021), the Lam model values may be more accurate than these specific Shore EOF model values. This means that the v2 model has some usefulness in its own right as a data set (also see Section 5).

Summary of Model Results
The v1 regression model, based on the epsilon parameter (Akasofu, 1979) has some ability to reproduce the original Shore EOF model patterns, and possesses both a seasonal and a solar-cycle-related sinusoidal-like structure in the monthly mode of the percentage of explained variance ( ) . We defined a "characteristic value" for ( ) as the mean value, χ v1 , of the monthly mode value of ( ) over the whole 12-year period and  (Figures 2c and 2d), due to the addition of IMF .
Compared to the v1 model, the seasonal structure in the monthly v2 mode of ( ) is no longer apparent, the solar-cycle variation has been much reduced, and the v2 model characteristic percentage ( ) is significantly larger (χ v2 = 0.90). The Lam 2023 (v3) model has a lower value for ( ) than the v2 hindcast model and the same value as the v1 hindcast model (χ v3 = 0.70) but, importantly, it has the potential to be used to forecast ionospheric convection or to nowcast outside of the epoch of the Shore EOF model analysis, as outlined in the next section.

Discussion
We have developed a model of high latitude ionospheric convection that is both skilled at reproducing its parent data set and suitable for operational purposes, as part of the SWIMMR-T project. It is worth briefly speculating here on the causes of the drops in the percentage of explained variance ( ) that we see in the 12-year interval examined, both for scientific interest and to aid the development of future improved versions of the model.

of 17
First, one potential cause of reductions in percentage ( ) is a decrease in the amount of SuperDARN data used to produce the Shore EOF model velocity patterns for a particular month. The data coverage within the Super-DARN archive has gaps, both spatially and temporally. The SuperDARN data coverage varies with time of day, year, and solar cycle. This is due to the variation in the high-frequency (HF) propagation conditions, the level of irregularity occurrence, and the dependence of solar wind-magnetosphere coupling on solar cycle phase. In addition, since the radars only measure signals along the line-of-sight direction, the accuracy of the velocity vectors will vary with location and time, causing variations in the degree to which the EOF analysis of the data set is able to represent the plasma motion in any given region. At low latitudes, data gaps have been filled by a sinusoidal fitting procedure (Shore et al., 2021). When the data coverage is very low, the fitted north and east directions can be unrepresentative of the true values, resulting in high-error vectors at lower latitudes in the Shore EOF model values (e.g., Figure 7c), that do not resemble the surrounding flow. 14 of 17 We examine the binned monthly SuperDARN data coverage for the Shore EOF model patterns, where full coverage for a given location bin is defined as having a valid data point in all the 6°-wide look direction bins and each of the 5-min epochs for the whole month. The monthly Shore EOF model analysis radar count is the number of data points in that month when summed over the direction bins, 5-min epochs and location bins. We investigate here whether there is any relationship between the amount of SuperDARN data in a particular month and the value of the percentage of explained variance ( ) for the Lam 2023 (v3) model. The monthly Shore EOF model data count (Figure 6c) is mostly below ∼1.5 × 10 6 from the start of the data set (January 1997) until September 2000. Between September 2000and April 2003, the data count oscillates and is often above ∼1.5 × 10 6 but is mostly below this value for the remainder of the 12-year interval examined. Therefore, there is a gross correspondence between the SuperDARN data coverage (Figure 6c) and the value of percentage ( ) for the Lam (v3) model (Figures 6a and 6b). However, there is no indication that there is a one-to-one correspondence on a monthly timescale. This is consistent with values of the correlation (Pearson coefficient) between the mean data count and the Lam 2023 (v3) model value for percentage ( ) , which are 0.24 for both the NS and EW velocity components.
It is undisputed that the reduced SuperDARN data coverage will affect the quality of the Shore EOF model patterns and therefore the percentage of explained variance ( ) of the model developed, but it does not seem to be the only factor. If the quality of the Shore SuperDARN EOF model values is generally high for the 12-year interval, then the quality of the Lam (v3) model has the potential to be good, and indeed appears to be good, according to the characteristic value for percentage ( ) . Therefore, it is possible that at times when percentage ( ) is low, the Lam (v3) model may give a better idea of the plasma motion when the SuperDARN coverage is low and the Shore EOF model analysis values are unreliable, especially around solar maximum. Future comparisons with an independent data set should resolve this.
A second potential cause for a fall in the percentage of variance explained by the model is a decrease in how well the physical processes driving the convection velocity are being represented. The Lam (v3) model exhibits two intervals of lower and variable values for percentage ( ) . The first spans 1997 to 1999 inclusive, which are the three years just after the solar minimum of August 1996. The second (2005)(2006)(2007)(2008) inclusive) comprises the three years leading up to the solar minimum of December 2008. In contrast, the Lam (v3) model reproduces the parent data set consistently well in the months at the end of 1999 and for the interval 2000 to 2004 inclusive, that is, the two years before and the three years after the solar maximum of November 2001. It is possible that our model of magnetosphere coupling is more accurate around solar maximum and less accurate around solar minimum, due to the differing nature of the solar wind at different phases of the solar cycle (e.g., McComas et al., 2003) and the resultant differing geomagnetic effects (e.g., Tsurutani et al., 2006). We do not aim to prove that this is the case here but propose this as a hypothesis for further study. Although coupling functions should be used with care (Lockwood, 2022) and other more recent coupling functions are available (Lockwood & McWilliams, 2021), the Lam (v3) model has a respectable level of skill and should prove to be a valuable tool for operational purposes, due to its simplicity.
We anticipate that the Lam (v3) model may be improved by a change of the formulation, including additions to the parameters currently used. First, a study of the spatial dependence of the appropriate time lag, as has been done for the terrestrial magnetic field (Shore et al., 2019), would allow the model to be adjusted to have a location-dependent time lag. Second, an increase in the percentage of explained variance of the model may result from developing the v2 model from the preceding month's data and using it to forecast the coming month. Such a model would require continuous determination of monthly EOF model values, which in turn would require all SuperDARN data to be available on a near-real-time basis. This may be possible in the future. Finally, the movement of plasma in the ionosphere is ordered by the open-closed magnetic field line boundary (OCB) (e.g., Milan & Grocott, 2021). The inclusion of the OCB location and motion in the model should decrease errors that would otherwise arise when the OCB motion is fast and/or significant. We would welcome the production of a new version of the SuperDARN EOF model analysis values at higher spatial and temporal resolution, as we believe that this may prove to be a useful improvement to the data.
Our next challenges are to drive the model using RTSW data rather than the quality-checked OMNI 1-min data set used here, and to implement our model within a GCM such as TIEGCM (Qian et al., 2014). In practice, the latter will involve generating electric potential values from our model velocity values, since this is how the solar wind influence on the high-latitude ionospheric plasma is input into TIEGCM. The U.S. National Oceanic and LAM ET AL.

10.1029/2023SW003428
15 of 17 Atmospheric Administration (NOAA) Space Weather Prediction Center (SWPC) data service provides real-time measurements from L1 (presently DSCOVR is the active satellite, with the advanced composition explorer (ACE) also contributing). These will be provided in real time by a combination of satellites monitoring near-Earth space, and forecast models, and so may have quality and data gap issues. This means that we have an opportunity to forecast the ionospheric potential that will be imposed on the Earth's ionosphere in advance, by a time interval T adv . Observations of interplanetary shocks by the ACE spacecraft at L1 and associated sudden commencements in the magnetosphere can give an estimate of the propagation delay between L1 and the Earth. For instance, Baumann and McCloskey (2021) estimate that the delay varies between ∼20 min for extremely fast ICMEs (1,000 km/s) and ∼90 min for slow shocks (300 km/s). If we assume that there is also ∼20 min delay for the ionosphere to respond, then we can estimate the ionospheric potential in advance by a time interval T adv ≥ 40 min, using the solar wind variables observed at L1 and currently provided by the NOAA SWPC (https://www.swpc.noaa.gov/ products/real-time-solar-wind). The term RTSW refers to data from any spacecraft located upwind of Earth (typically orbiting the L1 Lagrange point) that is being tracked by the RTSW Network of tracking stations. Since we can trivially specify the day-of-year and only need the rolling monthly value of the F 10.7 index to use our model, all the variables needed are already available to use the Lam (v3) model to forecast the ionospheric electric field. Our ultimate goal is to quantify the model's effectiveness in estimating satellite drag due to Joule heating in comparison to the existing climatological electric field models such as Weimer (1995Weimer ( , 1996Weimer ( , 2001Weimer ( , 2005 and Heelis et al. (1982).

Conclusions
We present the Lam (2023) model of the northern hemisphere high-latitude ionospheric convection velocity (and by extension the convection electric field), suitable for future development as an operational forecast model. It has been developed from 12 years (approximately a solar cycle's worth) of SuperDARN HF radar EOF model analysis patterns. The model has been developed in three key stages. First, we created a linear hindcast model of plasma velocity driven by the epsilon parameter ε, with a universal time lag between the driver and ionospheric plasma at all polar region locations of 20 min. Second, we expanded the hindcast model to include IMF B y , which generally resulted in a significant improvement in the monthly value of the percentage of explained variance during the solar cycle examined. Finally, a regression analysis of the seasonal and solar cycle dependence of the hindcast model coefficients was used to build a final version of the model. This final model (the Lam 2023 model) was driven by day-of-year, the monthly value of F 10.7 , IMF B y , and ε. Therefore, forecasts of the ionospheric plasma convection velocity could be obtained from the observed rolling monthly average of F 10.7 and currently available real-time values of IMF B y and ε at the L1 point upstream of Earth.
When the Lam 2023 model was assessed to see how well it could reproduce the Shore EOF model analysis patterns (the parent data set), it exhibited a respectable level of skill, and hence it could be a valuable tool for operational purposes due to its simplicity. In this paper, however, we have only verified our model by ensuring that the formulation is consistent with the data set upon which it was constructed. The validation of the model against other independent data sets provided, for instance, by satellite or incoherent scatter radar remains a necessary outstanding task. Although the data coverage of the parent data set from which the model is developed (the Shore et al. (2021) SuperDARN velocity EOF model analysis) will influence the ability of the Lam 2023 model to reproduce the parent data set, it is also possible that the assumed form of the regression equations is more accurate around solar maximum and less so around solar minimum. In which case we can improve on the regression equations used in future versions of the model.

Data Availability Statement
We used NASA/GSFC's Space Physics Data Facility's OMNIWeb service https://omniweb.gsfc.nasa.gov and OMNI data, provided by the Goddard Space Flight Center (GSFC) at the National Aeronautics and Space Administration (NASA) to access the solar wind data such as IMF and velocity (N. E. Papitashvili & King, 2020a). It was also used to access the solar radio flux at 10.7 cm/2800 MHz which was obtained from the Low Resolution OMNI (LRO) data set (https://omniweb.gsfc.nasa.gov/ow.html (N. E. Papitashvili & King, 2020b). The British Antarctic Survey (BAS) EOF model analysis of the SuperDARN plasma velocity data and supporting software (https://doi.org/10.5285/2b9f0e9f-34ec-4467-9e02-abc771070cd9), and also the peer-reviewed description of its derivation (Shore et al., 2021) have all been published. The regression coefficients for the models presented in