Machine Learning‐Based Emulator for the Physics‐Based Simulation of Auroral Current System

Using a machine learning technique called echo state network (ESN), we have developed an emulator to model the physics‐based global magnetohydrodynamic simulation results of REPPU (REProduce Plasma Universe) code. The inputs are the solar wind time series with date and time, and the outputs are the time series of the ionospheric auroral current system in the form of two‐dimensional (2D) patterns of field‐aligned current, potential, and conductivity. We mediated a principal component analysis for a dimensionality reduction of the 2D map time series. In this study, we report the latest upgraded Surrogate Model for REPPU Auroral Ionosphere version 2 (SMRAI2) with significantly improved resolutions in time and space (5 min in time, ∼1° in latitude, and 4.5° in longitude), where the dipole tilt angle is also newly added as one of the input parameters to reproduce the seasonal dependence. The fundamental dependencies of the steady‐state potential and field‐aligned current patterns on the interplanetary magnetic field directions are consistent with those obtained from empirical models. Further, we show that the ESN‐based emulator can output the AE index so that we can evaluate the performance of the dynamically changing results, comparing with the observed AE index. Since the ESN‐based emulator runs a million times faster than the REPPU simulation, it is promising that we can utilize the emulator for the real‐time space weather forecast of the auroral current system as well as to obtain large‐number ensembles to achieve future data assimilation‐based forecast.


Introduction
Forecasting the auroral current system in the polar regions has been one of the core parts of the operational space weather forecast because the auroral current system is the origin of the enhanced satellite drag via the Joule heat in the thermosphere.In recent years, need for forecasting has been especially growing, and the spacecraft operations are getting more sensitive along the heavy utilization of the low-Earth orbit.For example, it was remarkable that as many as 38 commercial satellites lost at the same time during moderate storms in February 2022 (e.g., Kataoka et al., 2022).The auroral current system, including auroral electrojet activities as known by the AE index, has been of fundamental importance for other various space weather aspects, including geomagnetically induced currents (GIC) flowing along the ground-based infrastructures (e.g., Kataoka & Ngwira, 2016), and satellite charging and communications malfunctions.
There is a long history of conducting physics-based simulations to understand the variable polar ionosphere (Lyon et al., 1980;Ogino, 1986).Because of the nonlinear nature of the spatially complex evolution of auroral ovals and the magnetospheric plasma flows as driven by the time-varying solar wind conditions, a global magnetohydrodynamic (MHD) simulation with the input of the solar wind parameters is necessary to reproduce the resultant auroral current system, as depicted by the ionospheric conductivities, potential, and field-aligned currents.Among many sophisticated MHD simulations, REPPU (REProduce Plasma Universe) performs as one of the best models for resolving various space weather phenomena including auroral substorms (Ebihara & Tanaka, 2015a, 2015b;Tanaka, Watanabe, et al., 2022;Tanaka et al., 2017Tanaka et al., , 2018)).However, the major difficulty of REPPU and other simulation codes for the operational space weather forecast is that it is computationally expensive to solve the MHD equations, even using the designated cluster computers.
This study shows that the latest development in machine learning techniques can help solve this cost-efficient issue.The very initial approach of such an emulator version 1.0 was proposed by Kataoka et al. (2023), using the time-dependent machine learning model called echo state network (ESN).In this study, we conducted a major upgrade of the ESN-based emulator by training the emulator model using an order of magnitude larger amount of the REPPU simulation outputs from that of ver1.0, as conducted by the long-term simulation runs (Nakamizo & Kubota, 2021) under the space weather forecast operations at National Institute of Information and Communications Technology (NICT).
In Section 2 we describe the REPPU simulation code and explain the technical details of the machine-learning model, especially focusing on how to emulate the REPPU simulation's ionospheric outputs.In Section 3, we show the primary results of the new emulator model.In Section 4, we discuss the performance and the limitation.Concluding remarks are briefly summarized in Section 5.

Magnetohydrodynamic Simulation Code: REPPU
REPPU is an MHD simulation code developed for studying the global magnetosphere-ionosphere coupling (Tanaka, 1995(Tanaka, , 2015)).The REPPU code is characterized by an excellent ionospheric reproduction of fundamental auroral phenomena such as substorms (Ebihara & Tanaka, 2015a, 2015b), sun-aligned arcs (Tanaka et al., 2017), and the theta aurora (Tanaka et al., 2018).In this study, we used an improved REPPU simulation code (Nakamizo & Kubota, 2021), including the effects of a tilted dipole axis and seasonal changes of solar zenith angles.The total number of grid cells in the magnetosphere is 30,722 (horizontal) × 240 (vertical), where the unstructured grid system (Moriguchi et al., 2008;Nakamizo et al., 2009) is employed.The number of grid cells in the ionosphere is 30,722.In this study, for the training and testing data, we took only the northern polar ionosphere, that is, 30 × 80 pixels in latitude and longitude, after applying the 2 × 4 binning in latitude and longitude.The ionospheric outputs of the field-aligned current J//, conductivities Σ xx (north-south), Σ xy (off-diagonal), Σ yy (east-west), and ionospheric potential Φ are saved every min, where the current continuity equation at the two-dimensional height-integrated ionosphere (x: north-south, y: east-west) is satisfied as: The interplanetary magnetic field (IMF) Bx, By, and Bz are defined in the Geocentric Solar Magnetospheric coordinate system.The real-time solar wind data (IMF Bx, By, Bz, solar wind speed V, proton density Np, and temperature Tp) at 1 min resolution was linearly interpolated if there was a data gap and used as the input time series to run the REPPU simulations.The real-time solar wind data can differ from the finally calibrated solar wind data, such as OMNI data set.Nevertheless, it is essentially little problem for the machine-learning model to learn the REPPU simulation results for variable input patterns.
NICT team has been operating the real-time simulation with the improved REPPU code for the space weather forecast (Nakamizo & Kubota, 2021).The REPPU simulation has been running on the High-Performance Computing System at NICT since August 2020.The simulation-run basically works automatically.Still, it is sometimes manually stopped and restarted due to some failures of the computing system, such as the system maintenances and failures of the simulation.The saved results are, therefore, not necessarily continuous.
In this study, we selected major interplanetary shock events and other large-amplitude events since 2021, including both predominantly southward and predominantly northward IMF conditions to include both storm-time and non-storm-time, respectively, as shown in Table 1.We also selected the longterm non-stop runs from December 2020 to January 2021 to compensate for the winter-time training data.Another long-term results from June-July 2021 is also prepared as the testing time interval.

Machine-Learning Model: Echo State Network
The basic flow of the development of Surrogate Model for REPPU Auroral Ionosphere version 2 (SMRAI2) and the relationship of REPPU simulation and ESN model is graphically summarized in Figure 1.Firstly, we adopted the dimensionality reduction for the ionospheric outputs as obtained from REPPU simulations, by applying the principal component analysis (PCA) (e.g., Halko et al., 2011) using the Python 3 scikit-learn/pca (Pedregosa et al., 2011).Very similar method was used by Licata and Mehta (2023) for different purpose (thermosphere model emulator).The time series of each parameter z = {Σ xy , Φ, or J//}, at certain (latitude, longitude) position of the grid indices (i, j), can be represented by the time averaged spatial pattern z 0 and the linear combination of time-dependent PCA variables α and PCA component patterns U as follows: (5) In this study, we independently constructed the emulators for J//, Σ xy , and Φ maps.The numbers of PCA components N r are selected to be 10 for Σ xy and Φ, and 20 for J//to reconstruct >90% variance of the original features.
The ESN model used in this study is described by the reservoir state vector x and the model output vector y at time steps t = n + 1 as follows: ( + 1) =  out ( + 1). (7) Here, the weight matrices W in and W are multiplied by the input vector u (the solar wind time series) and the reservoir state vector x, respectively.We create the random and sparse node connections of W in and W, where only 10% of the matrix elements are random values between −1.0 and 1.0, and the remaining 90% are zero.The weight matrices W in and W are fixed, while only W out is trained by the ridge regression with the regularization parameter β = 10 −3 to minimize the objective function F, where d is a desired data vector consisting of the time series of the PCA variables of J//, Σ xy , and Φ.
We used the time series of the solar wind parameters as the input.As the input vectors u, the solar wind speed and density are normalized as log 10 V-2.5, and log 10 Np-1.0, respectively, before training the ESN model because both the solar wind speed and density follow log-normal distributions (Burlaga & Lazarus, 2000).We discarded the proton temperature from the input vector.The IMF By and Bz components are also used as the input parameters.Further, the dipole tilt angle is newly introduced as the input to adopt the model for all seasons.The dipole tilt angle (Hapgood, 1992;Tsyganenko & Sitnov, 2005) is calculated from the date and time by Python 3 pyspedas/geopack (https://github.com/tsssss/geopack).
The emulator was trained by 107-day worth of outputs (30,816 time steps) of REPPU simulation results.The testing data is 52 days, including both quiet and active months.The selection of training data and testing data was summarized in Table 1.The basic specifications of ESN-based emulators ver1.0 and ver2.0 are summarized in Table 2.
We optimized the number of the nodes (elements of x) to be 400, 250, and 300 for J//, Φ, and Σ xy , respectively, and the spectral radius (maximum eigenvalue of W) to be 0.99 for all J//, Φ, and Σ xy , by finding the minimum values of the normalized root-mean-square errors (NRMSE) using the testing data for the first PCA variables.From these results, the constructed emulator model has NRMSE of ∼0.7, ∼0.5, and ∼0.8 to reconstruct the first PCA variables of J//, Φ, and Σ xy , respectively.These settings roughly give an emulator's time series memory of several to dozen time steps, that is, approximately an hour back in time.
In this study, we independently constructed the emulators for J//, Σxy, and Φ maps.However, the current continuity Equation 1 relates these parameters, and any inconsistencies among these parameters can therefore give hints to evaluate the deviations in the emulation results for future applications.
It takes less than 10 s for the emulator to calculate a 1-day variation of auroral current system using a single node.In contrast, it takes ∼5 days for the REPPU simulation to calculate the same 1-day variation using the 30-node cluster computer.Therefore, the computational cost of the SMRAI2 is approximately a million times more efficient than the original physics-based REPPU simulation.

Results
One of the major upgrades of SMRAI2 from the emulator ver1.0 (Kataoka et al., 2023) is the dipole tilt angle dependence by learning the simulation outputs from different seasons.From the steady state conditions for different tilt angles, Figure 2 shows that the trained model learned the tilt angle dependence of the Hall conductivity Σ xy .Notably, the dayside conductivity is high in the summer season, while the nightside conductivity is low in the summer.The obtained tendency of the nightside conductivity is consistent with the results of Newell et al. (2010).
Figure 3 shows the IMF clock angle dependence of the Region-1 and Region-2 field-aligned current system (Iijima & Potemra, 1978).The IMF clock angle is defined as the angle made in the By-Bz space, that is, atan(By/ Bz).We picked up the steady-state conditions of SMRAI2 results for each input parameter to make this figure.The overall IMF clock angle dependence and the amplitude of J//are reasonable, and consistent with empirical models such as Weimer (2001a) or Weimer (2005).Further, we can see the IMF By dependent cusp current system in the higher latitude region than the Region 1 currents (Fujii & Iijima, 1979), especially during the northward IMF conditions.(Weimer, 2001b) as shown in Figure 5. Comparing Figures 4 and 5, the IMF By dependent appearances of the crescent-and round-shaped cells are clearly captured.However, the amplitude of cross-polar cap potential is only ∼60% compared to the empirical models.Such an underestimating tendency is naturally expected, as we adopted the coarse-graining of ionospheric potential such as the binning and PCA analysis.We will come back to this point later.

Discussions
One way to examine the performance of the SMRAI2 using the open data is to calculate the AE index (https:// wdc.kugi.kyoto-u.ac.jp/aedir/index.html)from the emulator and compare it with the observed values.In this study, we calculate the AU/AL indices (AE = AU-AL) from the emulator results with the electric field as estimated by the spatial derivatives of Φ map using the central difference, where i and j are the indices of latitude and longitude, respectively, and the Δx (north-south) and Δy (east-west) are calculated from the colatitude θ, longitude φ, and the Earth radius R E as The ionospheric Hall current vectors are then calculated as We then applied the so-called equivalent current theorem (Fukushima, 1976;Maeda, 1955) where the eastwest component of the Hall current in the unit of A km −1 is nearly equal to the north-south component of the magnetic field at the ground in the unit of nT to calculate the AU/AL indices from the envelopes of the emulator outputs.For this AU/AL calculation, we use all grid data of Hall currents and we do not use the actual locations of 12 AE stations.The magnetic latitude range for the AU/AL calculation is selected from 60° to 70°.The resultant AU and AL indices are shown in Figure 6 for the 1-month time interval, using the 5-min OMNI solar wind data.The data gaps of the OMNI 5-min data were filled by the forward interpolation using the Python 3 pandas/fillna/ffill method (The Pandas Development Team, 2020).Figure 7 shows the 2D histogram for the 15-year results, indicating that the ESN-based emulator tends to underestimate the AE index.The cross-correlation coefficients between observed and emulated indices for the 15-year data are 0.592, 0.596, 0.666 for AU, AL, and AE indices, respectively.
Although the original REPPU simulation was tuned to have comparable AE values with the observed AE index (Nakamizo & Kubota, 2021), the obtained AE values from SMRAI2 are underestimated.There are multiple causes for this underestimation of the AE index.First, it is natural to expect that the coarse-graining of ionospheric  Since the AE index roughly represents the macroscopic energy release in the polar ionosphere, we can diagnose some hidden characteristics of the new SMRAI2 via inputting the synthetic solar wind data.We prepared the synthetic solar wind data to pick up the peak values of the predicted AE index during the 80 min time interval after the southward IMF turnings from the steady state of the northward IMF Bz = 1.0 nT, changing the IMF amplitude, solar wind speed, and density.Ebihara et al. (2019) showed, using the REPPU simulations, that the positive density dependence of the auroral electrojet intensity is clear during weakly southward IMF, while it is not likely the same during strongly southward IMF.Similarly complex tendency for the density appeared in the results from the SMRAI2, as shown in the right panel of Figure 8.In contrast to the density, the dependence of the AE peak intensity on the solar wind speed is relatively simple, as linearly correlating with the product of southward IMF Bz and solar wind speed V, which was seen in both Ebihara et al. (2019) as well as in the left panel of Figure 8.
Although it is improved from ver1.0 (Kataoka et al., 2023), the temporal resolution of 5 min still gives the major limitation of the SMRAI2.For example, we cannot discuss the highly dynamic phenomena such as the substorm onset and sudden commencement, in which all ionospheric parameters drastically evolve in a short time scale of less than 5 min.Those rapid variation can cause large-amplitude GIC events, which is one of the important targets of the operational space weather forecast.One of the future works, therefore, include improving of the temporal resolution to 1 min since the ESN method can be applied to diverse temporal scales (Tanaka, Matsumori, et al., 2022).Caveat should also be made here that it may not so simply work to solve the substorm-onset-related problems by improving the temporal resolution because there is an essential difficulty in reproducing the variation just before and after the substorm onsets, as coming from the probabilistic nature of the substorm onsets (Nakano & Kataoka, 2022;Nakano et al., 2023).
Therefore, another natural next step would be the data assimilation of the SMRAI2 emulator to correct the exact timing of the substorm onset and the amplitude via the observation data.The million-times faster SMRAI2 emulator has a significant advantage in this direction, compared to the physics-based simulation, because it is essen tial to increase the ensemble number necessary for data assimilation.For realizing the data assimilation-based forecast, it would be reasonable to use any partial data or point data which is available for real-time use via applying the cutting-edge data assimilation techniques (Nakano et al., 2020).
In this paper we have not checked the self-consistency among the SMARAI2 outputs of the potential, conductivity tensor, and field-aligned current.The future works therefore include the understanding in which situation the inconsistency can be large.The data assimilation approach will help to fix the lack of consistency.

Conclusions
We showed that SMRAI2 emulator model is ready-to-use for the real-time space weather forecast of the auroral current system for both the northern and southern hemispheres.We developed the latest upgraded version 2.0 of the ESN-based emulator for the REPPU simulation's ionospheric outputs of the field-aligned current, potential, and conductivity, which runs a million times faster than the REPPU code.The resolutions of the SMRAI2 are significantly improved in time, latitude, and longitude, compared to the ver1.0, and the dipole tilt angle is also newly introduced as one of the input parameters, in addition to IMF By, Bz, V, and Np, thanks to an order of magnitude larger training data set.We confirmed that the IMF clock-angle dependence of the auroral current system is consistent with that obtained from empirical models.New functions of the ESN-based emulator ver2.0 includes automatic OMNI solar wind data input and the AE index output by indicating the date only.High-Performance Computing System at NICT, which is operated with the support of "Promotion of observation and analysis of radio wave propagation," commissioned by the Ministry of Internal Affairs and Communications, Japan.The real-time solar wind data were provided by National Oceanic and Atmospheric Administration's Space Weather Prediction Center (NOAA/ SWPC).This research was supported by ROIS-DS-JOINT 012RP2023 grant from the Research Organization of Information and Systems (ROIS).In addition, this study is part of the Science Program of Japanese Antarctic Research Expedition (JARE) Prioritized Research Project AJ1007 (Space environmental changes and atmospheric response explored from the polar cap), supported by NIPR under MEXT.

Figure 4 Figure 2 .
Figure4shows the IMF clock angle dependence of the ionospheric potential, almost the same with the results from the emulator ver1.0(Kataoka et al., 2023), consistent with empirical models such as Weimer-2K model

Figure 3 .
Figure 3.The interplanetary magnetic field clock angle dependence of field-aligned current in the northern hemisphere.Steady-state conditions from the Surrogate Model for REPPU Auroral Ionosphere version 2 are shown, fixing the tilt angle = 0.0, B = 5.0 nT, Np = 5/cc, and Vsw = 450 km/s.

Figure 4 .
Figure 4.The interplanetary magnetic field clock angle dependence of ionospheric potential in the northern hemisphere.Steady-state conditions from the Surrogate Model for REPPU Auroral Ionosphere version 2 are shown, fixing the tilt angle = 0.0, B = 5.0 nT, Np = 5/cc, and Vsw = 450 km/s.

Figure 5 .
Figure 5.The interplanetary magnetic field clock angle dependence of ionospheric potential in the northern hemisphere as obtained from the Weimer2K empirical model, with the tilt angle = 0.0, B = 5.0 nT, Np = 5/cc, and Vsw = 450 km/s.

Figure 6 .
Figure 6.Example of the calculation of AU/AL indices by Surrogate Model for REPPU Auroral Ionosphere version 2 (SMRAI2), compared with the observed values, for the 1-month time interval from 1 October 1999.

Figure 7 .
Figure 7. 2D histogram for the AE index as predicted by Surrogate Model for REPPU Auroral Ionosphere version 2 (SMRAI2) against the observed AE index for the 15-year time interval from 1 January 2000.

Figure 8 .
Figure 8.Heat map analysis of the SMRAI2-predicted AE peak intensity in the (left) interplanetary magnetic field (IMF) Bz-V space and in (right) IMF Bz-Np space.

Table 1
List of the Selected Events for Training and Testing the Echo State Network Model Figure 1.Block diagram of Surrogate Model for REPPU Auroral Ionosphere version 2 development to graphically summarize the relationship among the REProduce Plasma Universe, principal component analysis, and echo state network.