Evaluation of SaRIF High‐Energy Electron Reconstructions and Forecasts

Increasing numbers of satellites are orbiting through the Earth's radiation belts, and the range of orbits being commonly used is also growing. As a result, there is an increasing need for services to help protect satellites from space weather. The Satellite RIsk prediction and radiation Forecast (SaRIF) system provides reconstructions and forecasts of the high‐energy electron flux throughout the outer radiation belt and translates these predictions into charging currents, dose rates, total ionizing dose and risk indicators. SaRIF both informs satellite operators of current and expected conditions and provides a tool to aid in post‐event analysis. The reconstructions and forecasts are provided by the British Antarctic Survey Radiation Belt Model (BAS‐RBM) running as part of an automatic system using real‐time data to specify the boundary conditions and drive processes within the physics‐based model. If SaRIF is to provide a useful tool, then the accuracy of the reconstructions and forecasts needs to be understood. Here we assess the accuracy of the simulations for geostationary orbit by comparing the model output with measurements made by the GOES 14 spacecraft for the period March–September 2019. No GOES 14 data was used to create the reconstruction or forecasts. We show that, with some improvements to the original system, the reconstructions have a prediction efficiency of 0.82 for >800 keV electrons and 0.87 for >2 MeV electrons, with corresponding prediction efficiencies of 0.59 and 0.78 for the forecasts.

pulsion for orbit raising (Horne & Pitchford, 2015), which requires them to spend 100s of days passing through radiation belts before reaching GEO.
Forecasts of radiation belt fluxes help satellite operators plan ahead and reconstructions can be used to understand the radiation environment when anomalies occur. As the electron flux can vary by orders of magnitude between these different orbits, reconstructions and forecasts of the flux are required for the whole radiation belt. Ideally, flux measurements would be used for anomaly resolution, but in practice there are relatively few measurements available, mainly from satellites in LEO and GEO. However, 3 dimensional radiation belt models can recreate the environment at locations where no measurements are available and they also provide an energy spectrum that can be used by radiation effects models, so the effect of the environment on materials on the spacecraft can also be determined (e.g., Hands et al., 2018). The calculation of some radiation effects also requires knowledge of the proton spectrum, which can be significant within the proton belts (L < ∼2.5) or during solar proton events. Although models for the proton flux exist (e.g., Selesnick & Albert, 2019;Lozinski et al., 2021) they will not be discussed here.
The British Antarctic Survey Radiation Belt Model (BAS-RBM) (Glauert et al., 2014a;Glauert et al., 2014b;Glauert et al., 2018) was developed as a research model for the radiation belts. In the EU Framework 7 project (FP7) SPACE CAST , it was adapted to provide forecasts of the electron flux three hours ahead, updated every hour. Building on this work, the EU FP7SPACE STORM project used the BAS-RBM to forecast the radiation belts and also coupled the electron spectra from the model to the spacecraft effects codes DICTATE (Rodgers et al., 2000) and SHIELDS-2 (Seltzer, 1994), to provide forecasts of internal charging currents and total ionizing dose in addition to the electron flux. This work has been developed further in the European Space Agency funded Satellite RIsk prediction and radiation Forecast (SaRIF) system. SaRIF is freely available, following registration, and accessed via ESA's Space Situational Awareness website (http://swe.ssa.esa.int/web/guest/sarif-federated). The motivation for the SaRIF system, how it is configured, the data used and the displays showing the SaRIF risk indicators, electron flux and spacecraft effects are all described in a companion paper ("The Satellite Risk Prediction and Radiation Forecast System (SaRIF)," 2021). Here we will focus on how the BAS-RBM is used to provide reconstructions and forecasts of the electron environment and present an initial evaluation of the reconstructions and forecasts for geostationary orbit.
SaRIF reconstructs and forecasts the electron flux across the radiation belts (2 ≤ L* ≤ 7) for electrons with energies from ∼100 keV to >10 MeV. The website displays the 800 keV, 2 MeV, >800 keV and >2 MeV electron fluxes for a number of different orbits, though the system can easily be configured to show any energy and orbit in the aforementioned ranges. In particular, a forecast can be made for any satellite within the simulation region, just using data from GOES 15, and without the need for data from the particular satellite for which the forecast is made. Reconstructions and forecasts are provided for GOES 14 and 15, Galileo In-Orbit Validation Element-A (GIOVE-A) and a satellite in a circular, equatorial orbit at an altitude of about 8,000 km. Comparisons between the model results and measured data were displayed for the GOES 14 and 15 satellites at geostationary orbit.
Since SaRIF was developed, GOES 14 and 15 have been replaced by GOES 16 and 17 and until recently only GOES 16 provided electron flux data in near real-time. The particle instruments and the data formats for the measurements for GOES 15 and 16 are very different, so SaRIF is currently being adapted to use GOES 16 data to provide boundary conditions for the BAS-RBM. The website should become fully operational again later in 2021, at which time the reconstructions and forecasts made with GOES 15 data will still be available in the SaRIF archive. Here we assess the accuracy of the reconstructions and forecasts made using GOES 15 data. Since GOES 15 data is used in the simulations, we compare the flux predicted by the reconstructions and forecasts for GOES 14 with data measured by the spacecraft.
Section 1 describes how the reconstructions and forecasts are made using the BAS-RBM and Section 2 evaluates them for the period 1 March 2019-1 September 2019. In Section 3, various improvements to the simulations are described which are then evaluated by rerunning simulations for original period, using only the inputs that were available at the time. Section 4 discusses our results and our conclusions are presented in Section 5.
The diffusion coefficients, D αα , D αE , D EE and D LL are the drift-and bounce-averaged pitch angle, mixed pitch angle and energy, energy and radial diffusion coefficients respectively. More details are given in Glauert et al. (2014b). The radial diffusion coefficients, D LL , are calculated following Brautigam and Albert (2000), just using the electromagnetic component, as using the electrostatic component leads to unrealistically high fluxes at low L* (Kim et al., 2011). Interactions between the electrons and ELF/VLF waves can result in pitch angle, mixed pitch angle and energy, and energy diffusion. Location, activity and energy dependent diffusion coefficients have been calculated for different waves using the PADIE code (Glauert & Horne, 2005). Diffusion coefficients due to whistler mode chorus waves are presented in Horne, Kersten, et al. (2013) and the contribution from EMIC waves are shown in Kersten et al. (2014). The wave models used to calculate the diffusion coefficients resulting from plasmaspheric hiss and lightning-generated whistlers are described in Meredith et al. (2018).
The diffusion coefficients used in the simulations are determined using the Kp index. The choice of Kp as the driving index is a result of the need for a 24-hr forecast of the driving index. Kp may not be the best choice for some processes, for example, Glauert et al. (2014a) suggests that AE is a better choice for driving chorus waves and Ross et al. (2020) shows that solar wind pressure is a good choice for EMIC waves, but reliable forecasts of these parameters are not available in the SaRIF system.
Collisions between the electrons and neutral atoms in the atmosphere also cause pitch angle diffusion and subsequent loss, though this is only significant near the loss cone. This diffusion is modeled following Abel and Thorne (1998) and the timescale τ C for these losses is one quarter of the bounce-time in the loss cone and infinite elsewhere.
Electrons can also be lost to the magnetopause through magnetopause shadowing (Bortnik et al., 2006;Ohtani et al., 2009;Saito et al., 2010;Shprits et al., 2006;Turner, Morley, et al., 2012;Turner, Shprits, et al., 2012;Ukhorskiy et al., 2006). The method of determining these losses is described in Glauert et al. (2014a) with the timescale for this loss (τ M ) set so that the flux is reduced by 3 orders of magnitude in one drift period outside the last closed drift shell.
For the simulations run in the SaRIF system, the mixed pitch angle and energy diffusion terms are omitted from Equation 2 to reduce the time taken for model runs. Albert and Young (2005) and Subbotin et al. (2010) show that these terms have little effect on equatorially mirroring particles, although other work (Tao et al., 2008;Tao et al., 2009) suggests the effect may be significant at small pitch angles (α < 40°) and higher energies (E > 2 MeV). The BAS-RBM calculates the phase-space density which can be converted to differential flux, J, using the relation f = J/p 2 (Schulz & Lanzerotti, 1974). The integral flux above an energy E G , , required for comparison with integral flux measurements, is given by = ∫ ( ) , where E m represents the upper energy of a detector.
The model requires the state of the radiation belt to be specified at the start of each simulation, that is, the phasespace density defined for all pitch angles, energies and L* included in the simulation. Time sequences of Kp, the solar wind pressure, the interplanetary magnetic field (IMF) Bz component and the GOES 15 >800 keV electron flux for the period being simulated must also be specified. The Kp index is used to select the pre-computed diffusion coefficients, the solar wind pressure and IMF Bz determine the magnetopause location in the model and the GOES electron flux is used to derive the outer radial boundary condition.

Boundary and Initial Conditions
To solve Equation 2 boundary conditions need to be provided on six boundaries, each of which is a two-dimensional surface. Since the low and high energy boundaries are defined by following constant first and second adiabatic invariants, the minimum and maximum energies, E min and E max , depend on L*. The high energy and both pitch angle boundaries are straightforward: = 0 for all energies and * , at max( * ) = 0 = 0 for all energies and * .
The phase-space density is specified as a function of energy and pitch angle on the minimum L* (L min ) and maximum L* (L max ) boundaries, and as a function of pitch angle and L* at the minimum energy (E min ).
There is no suitable satellite data set available in real-time to provide the phase-space density on the minimum L* boundary condition. Instead we follow the approach adopted in Glauert et al. (2018) and use a statistical Kp dependent model for the phase-space density at L* = 2 derived from CRRES data (Glauert et al., 2014a), with the flux on the boundary set to zero for E > 850 keV, since Fennell et al. (2015) show that there is currently no significant flux for electrons above about 850 keV in the inner belt.
The outer boundary condition is derived from the 5 min, real-time GOES 15 data, using the technique described in Glauert et al. (2018) and the Olson and Pfitzer (1977) quiet-time magnetic field model. Glauert et al. (2018) derived a set of spectra for different levels of the >2 MeV flux by fitting Lorentzian kappa distributions to data from both the energetic particle sensor (EPS) and magnetospheric electron detector on GOES 15. The spectrum at the GOES 15 location is then approximated by interpolating between the two spectra whose >2 MeV flux bracket the observed flux. This approach is adopted here with two modifications. First, Glauert et al. (2018) derive the outer boundary from the >2 MeV electron channel on the EPS. This channel can suffer from proton contamination during solar proton events (Onsager et al., 2004;Onsager et al., 1996;Onsager et al., 2002) and also has a relatively high noise floor (1 cm −2 s −1 sr −1 ), so SaRIF forecasts are made by applying the same technique to the >800 keV channel which suffers less from contamination and has higher fluxes so is less affected by the noise floor. Second, to provide forecasts for a larger region, the outer boundary condition is translated adiabatically to L* = 7 for the SaRIF forecasts.
The low energy boundary is placed at 84 keV at L* = 7. The phase-space density on this boundary is derived from the outer boundary condition by scaling an average profile of phase-space density with L* to match the outer boundary condition, see Glauert et al. (2018).
The location of the last closed drift shell during a simulation is estimated from the magnetopause location using the same approach as Glauert et al. (2014a). The magnetopause location is determined using the Shue et al. (1998) model driven by solar wind and IMF data. During the period studied here, solar wind data from DSCOVR was not available and data from the ACE satellite was intermittent. When no solar wind data was available it was assumed that the last closed drift shell was outside L* = 7.
Each SaRIF forecast calculates the flux for the previous 7 days (to provide the recent context for the forecast) and a one day forecast. This means that the simulation requires an initial condition across the whole region (2 < L* < 7), corresponding to the state of the belts seven days previously. The only flux data available to the system is the GOES electron flux. A precomputed set of steady state solutions have been calculated for range of Kp and outer boundary fluxes. The initial condition for a run is determined by selecting the precomputed solution whose 800 keV flux at the location of GOES 15 has the best match to the actual measured flux at the time of the start of the simulation.

Forecasts
To use the BAS-RBM for forecasting requires predictions for the drivers of the model, namely the Kp index, the magnetopause location and the electron flux on the outer boundary. Forecasts of Kp, for the period up to 1 day ahead, are supplied by the Met Office Space Weather Operations Center (MOSWOC) as part of the SaRIF system. The magnetopause location in the model is determined from solar wind data. Predicting the solar wind up to 1 day ahead would require simulations from the Sun to the Earth, which are beyond the scope of the SaRIF system, and since the magnetopause location only affects the simulation when the last closed drift shell is inside the outer boundary of the model, the magnetopause location was assumed to remain unchanged for the 24 hr of the forecast.
Likewise the last flux measurement from GOES 15 is assumed to persist for the forecast period. There are several published models that forecast the electron flux at GOES satellites 1 day ahead, but these mostly forecast a daily averaged flux (e.g., Balikhin et al., 2016;Boynton et al., 2016). Shi et al. (2016) do forecast the measured >2 MeV flux using a neural network, but SaRIF uses the >800 keV flux as it is not contaminated during solar proton events. In an operational system like SaRIF, the boundary forecast needs to be straightforward to implement, quick to execute and reliable, so a persistence forecast was adopted initially. For GOES 15, persistence provides a reasonably good forecast with correlation coefficients for the period March 1, 2019-September 1, 2019 of 0.71 for the >800 keV flux and 0.79 for the >2 MeV flux.

Evaluation of Forecasts and Reconstructions
SaRIF began operating during January 2019 and all forecasts have been stored in an archive that is available on the website. On March 1, 2019, the Kp forecasts provided by MOSWOC changed from being updated every 6 hr to being updated every 3 hr. Since the Kp forecasts are a key factor in producing the SaRIF forecasts, the Kp forecasts made from March 1, 2019-September 1, 2019 will be evaluated here.
To assess the accuracy of the SaRIF forecasts near geostationary orbit, the 1 day forecast of the >800 keV and >2 MeV electron flux at GOES 14 were compared to the flux actually measured by GOES 14. This comparison is provided visually on the website but no data from GOES 14 is used to create the forecast.
On GOES 14 and 15 the EPS is part of the space environment monitor (SEM). The electron flux is measured by a wide-aperture dome detector assembly with a field of view of about 60° by 120° (Onsager et al., 1996). Two sets of dome detectors are carried, one set looking eastward and the other westward, depending on the orientation of the spacecraft (Rodriguez, 2014). For the comparisons on the SaRIF site, the westward detector is always selected. It is assumed that the GOES fluxes are dominated by the 90° flux, so comparisons are made between the measured values and the 90° flux simulated in the model. The same assumption is made when deriving the boundary conditions.
The E >2 MeV electron fluxes may be contaminated by solar proton events. However, the NOAA Space Weather Prediction Center defines a solar proton event to be when the flux of E >10 MeV protons, determined from the GOES EPEAD measurements, exceeds 10 cm −2 s −1 sr −1 and this threshold was not exceeded during the period considered here.

Metrics
Many different metrics have been proposed to assess the accuracy of forecasts. Different metrics focus on different properties of the simulation and they also vary in how easily they can be interpreted (Morley et al., 2018). Pearson correlation coefficients, typically denoted by r, provide a well-known measure that is easy to represent graphically. Morley et al. (2018) introduce two metrics for radiation belt modeling; the median symmetric accuracy (MSA) and the symmetric signed percentage bias (SSPB). These metrics are robust in the presence of outliers, easy to interpret and symmetric when penalizing under and over prediction. The median symmetric accuracy is defined by where the accuracy ratio Q i = Y i /X i , with Y i denoting the model value and X i the observation. The symmetric signed percentage bias is defined by Interpretation of these metrics is straightforward. For a set of observations and simulation results, 50% of the unsigned percentage errors are smaller than the MSA. (Alternatively, half of the results are within a factor of 1 + MSA/100 of the data.) A positive (negative) SSPB indicates the median error is an overestimate (underestimate) by the given percentage.
The MSA and SSPB measure the absolute accuracy of the model simulation. A complementary approach for assessing simulation results evaluates them relative to a reference prediction. This evaluates the work in the context of similar studies, while still providing an objective measure of model accuracy. One such measure is the prediction efficiency (PE) (e.g., Balikhin et al., 2016;Turner et al., 2011), defined by Here, R i is the reference prediction to which the current prediction is being compared, and N is the number of data points in the evaluation. Typically, the reference simulation is a long-term average or a simple prediction, for example, persistence. The PE is in the range −∞ and 1, where a PE of 1 indicates a perfect forecast. If the PE is 0 then the average of the predicted values and the average of the reference data set are equal. A negative PE implies the reference simulation provides a better forecast. Since the electron flux can vary by orders of magnitude, the prediction efficiency and Pearson correlation coefficient are usually calculated using the log of the observed and modeled values , and this approach is adopted here. The average flux observed during the period studied will be used as the reference simulation.

Initial Evaluation of SaRIF Reconstructions at Geostationary Orbit
The SaRIF system reconstructs the radiation environment across the radiation belts, and provides a continuous simulation of the environment, displayed a week at a time. These reconstructions are computed weekly, two weeks after the last date in the simulation. This time delay allows time for the definitive Kp values for the dates in the simulation to become available. As well as using the definitive Kp values, these reconstructions also differ from the forecasts in that they use the measured GOES 15 electron flux and solar wind properties for the whole simulation. A visual comparison of the reconstruction with the >800 keV and >2 MeV fluxes measured by GOES 14 is provided on the SaRIF website. Here we seek to quantify that comparison. Although the two often agree within a factor of 2, during extended periods of decay SaRIF fluxes tend to be higher than the measured fluxes. The >2 MeV fluxes shown in panel (b) shows a similar trend, though the agreement is generally less good, and the tendency to over-predict the flux is most apparent when the flux is low. The Kp index for the period is shown in panel (c). Panels (d) and (e) are scatter plots of the predicted flux against the measured flux for >800 keV and >2 MeV electrons, respectively. There is a clear correlation that is stronger for the >800 keV electrons, but both energies show a degree of scatter and there is a tendency to over-predict the flux.
The metrics shown in Table 1 quantify the trends that are apparent in Figure 1. There is reasonable correlation between the model and data (r = 0.84 and 0.87 for the >800 keV and >2 MeV flux respectively) but the model tends to over-predict the flux, with smaller errors for the >800 keV flux (MSA = 56.1%) than the >2 MeV flux (MSA = 435.2%), even though the linear correlation is higher at >2 MeV. The larger errors in the >2 MeV flux are probably due to the outer boundary condition being derived from the >800 keV flux measured by GOES 15. The shape for the rest of the spectrum is assumed (see Glauert et al., 2018) and may not always be correct. This issue is addressed in Section 5. The prediction efficiencies of 0.74 and 0.62, for the >800 keV and >2 MeV flux respectively, show that the model is a much better estimate of the measured flux than using the long-term mean value.

The KP Forecast
In the configuration of the BAS-RBM used in SaRIF, the radial transport and wave-particle interactions are driven by the Kp index. The Kp sequence used in the simulation is a combination of near-real time Kp from GFZ Potsdam for the past 7 days and a Kp forecast provided by the Met Office. The Kp forecast is updated every 3 hr and provides Kp at 3 hr intervals for the 24 hr following the time of the forecast. Met Office forecasts have been evaluated elsewhere (Sharpe & Murray, 2017), so the assessment here focusses on the particular period being studied (March, 1-September 1, 2019) to provide a comparison with the accuracy metrics for the flux. The mean Kp value for the period is used as the reference forecast when calculating the prediction efficiency. It should be noted that the evaluation here is for the forecast Kp as it is used in the model. As the Kp forecast is only updated every 3 hr, the Kp forecast used by the BAS-RBM may have been made over 2 hr previously and, strictly speaking, applies to a time that is now less than 24 hr ahead. In this case it is assumed to persist to 24 hr ahead.  Figure 2b shows a scatter plot of forecast Kp against definitive Kp for the same period. Since Kp takes discrete values, the color of the dots shows the number of samples in the bin. The Pearson correlation coefficient for the forecasts is 0.22, indicating some correlation with the definitive Kp, but the negative prediction efficiency of −1.14 suggests that the mean Kp for the period would   provide a better forecast. However, although the mean Kp for the period is commonly used as the reference forecast (e.g., Balikhin et al., 2016;Boynton et al., 2016), it cannot actually be used as a forecast. When the forecast is compared to the mean Kp for the previous 12 months (March 1, 2018-February 28, 2019), which could be used in practice, the prediction efficiency is 0.28, suggesting that it is worth using the Met Office forecast, rather than the mean Kp. The influence of the Kp forecast on the forecasts of the electron flux is discussed in the companion paper ("The Satellite Risk Prediction and Radiation Forecast System (SaRIF)," 2021).

Initial Evaluation of SaRIF Forecasts at Geostationary Orbit
SaRIF forecasts the electron flux in the radiation belts up to 24 hr ahead, updated every hour. The simulations calculate the flux at 10 min intervals throughout the forecast period, but only the 24 hr forecast will be assessed here. Although the SaRIF system aims to provide a forecast every hour, very occasionally the system fails due to, for example, issues with the host computing resources or an absence of the required input data. The results presented here only include those forecasts that were actually made at the time.  Table 2 quantifies the agreement shown in Figure 3. As with the reconstructions, the forecasts of the >800 keV are more accurate than those for >2 MeV (the MSA is 79.3% for the >800 keV flux and 275.3% for >2 MeV flux), and, again, there is a positive bias at both energies. The prediction efficiencies for the forecasts (0.40 for >800 keV and 0.62 for >2 MeV) are lower than those for the reconstructions (0.74 and 0.62, respectively) but still show that the forecasting is a better predictor than the long-term average. The prediction efficiency for the >800 keV forecasts is lower than that for the reconstructions (0.40 and 0.74, respectively), the forecast's MSA is higher (79.3% compared to 56.1%) and the CC is lower (0.61 and 0.84). Similarly, the CC for the >2 MeV flux forecast is lower than that for the equivalent reconstruction (0.70 and 0.87). However, the MSA is lower for the >2 MeV forecast compared to the reconstruction (275.3% and 435.2%) and the prediction efficiencies are the same (0.62). This may be because the Kp forecasts tend to under-predict Kp during quiet periods, leading to less internal acceleration in the model, which in turn reduces the tendency of the model to over-predict the fluxes during sustained periods of decay.

Choice of Initial Condition
As Table 2 shows, the model has a strong positive bias, that is, it tends to over-predict the flux. This bias is often evident during periods of extended decay, and may in part be due to the choice of initial condition. The simulations require the state of the entire radiation belt at the start of the simulation to be specified, but the only flux data available in the system are the fluxes measured at GEO by GOES 15. There are several ways this initial state could be specified. One option would be to use the results from previous simulations to determine the initial state. However, this would mean that the forecasting runs would be dependent on previous simulations, potentially causing issues when those simulations had not run for some reason, and requiring a more complicated system with "catch-up" time following system outages. In order to avoid this, the system was developed to use initial conditions that were independent of the previous simulations.
A set of steady state simulations were performed with different, fixed values of Kp and different values of the >800 keV flux on the outer boundary. For each forecast, the steady state where the >800 keV flux was closest to that observed by GOES 15 was used as the initial condition. This procedure normally works well as the forecasts are generally fairly insensitive to the initial state after a few days' of simulation and each forecast is preceded by a seven day simulation. The main situation where the initial state has a significant effect on the forecast is during very quiet periods when the initial flux in the simulation is too high. This is illustrated in Figure 4. . The light blue curve shows the flux when the initial condition is the steady state that best matches the GOES 15 >800 keV flux at the start of the simulation. The red curve has the same initial state except it is multiplied by a factor of 10, while the purple curve has the same initial state as the light blue curve multiplied by a factor of 0.1. The three simulations converge fairly rapidly at the start of the simulation and are within a factor of 1.06 after 1 day. Figure 4b shows the same simulation at L* = 5.1. Here the simulation with the steady state initial condition (blue) and the steady state initial condition multiplied by 0.1 (purple) converge rapidly, but the initial condition with the highest flux (red) takes much longer and is still a factor of 1.11 above the other two after 7 days.
Figures 4d-4f show analogous results but for a very quiet period (April 16-23, 2019). Now the convergence from the three different initial states at L* = 6.1 is much slower. After 7 days the two lower initial states have converged to within a factor of 1.05, but the simulation from the highest initial state is still a factor of 1.43 above the others. These results illustrate that while in many situations the initial condition will have little influence on a forecast made after 7 days of simulation, in very quiet conditions the choice    of initial state is important, and an under estimate of the initial flux is likely to result in a better forecast than an overestimate of the initial state.
In active conditions, radial transport is rapid so the boundary conditions have a strong influence on the calculation. If the initial state in the interior is not consistent with the boundary then there will be rapid radial transport due to steep gradients near the boundary combined with high radial diffusion coefficients. Acceleration and loss due to wave-particle interactions will also occur. Different initial states will rapidly converge to the same solution.
In quiet conditions the timescale for diffusion is much longer. The difference illustrated here between over and under estimated initial conditions is a consequence of the total phase-space density in the system. Reducing a flux that has been overestimated by a factor of 10 requires the transport of more phase-space density than increasing a flux that is underestimated by a factor of 10.
One option to address this issue is have a longer simulation before the forecast to ensure that the solution is independent of the initial state before the forecast is made. However, this would increase the time taken for the simulation and would be unnecessary except in very quiet conditions. An alternative approach will be discussed in Section 5.1.

Improvements to the SaRIF Reconstructions
The SaRIF system is a complicated combination of real-time data collection and processing, model simulations, data provision (to the Daniel Heynderickx Consultancy server for the spacecraft effects calculations) and visualizations, see "The Satellite Risk Prediction and Radiation Forecast System (SaRIF)" (2021). The modeling described and evaluated in Sections 2 and 3 was used to develop SaRIF as many of the component parts already existed or could easily be adapted from previous work. This section describes some improvements to the modeling and evaluates their effect. These improvements will be included in the SaRIF system when it is reactivated later in 2021, and all the reconstructions on the site will be recalculated using the techniques described below.

Outer Boundary Condition
In the initial implementation of SaRIF the outer radial boundary condition was derived from GOES 15 data, by applying the technique described in Glauert et al. (2018) to the >800 keV flux. In this approach, GOES fluxes are mapped to a fixed L*, using statistical asynchronous regression (SAR) (O'Brien et al., 2001), removing the diurnal variation from the flux. One of a set of spectra is then selected and fitted to the chosen flux to provide the whole differential spectrum. This approach can fit to either the >800 keV flux or the >2 MeV flux but not both.
An alternate approach, adopted here, maps both the >800 keV and the 2 MeV flux to a fixed L*, and then fits an exponential spectrum through both values.
Following a similar approach to Gannon et al. (2012), assume that the differential flux J (E) can be described by J (E) = Ae −E/ϵ , then the integral flux for a given energy ( ) = [ − ] , where E M is the maximum energy of the detector. Since the GOES 15 data provides two flux values (>800 keV and >2 MeV) then the constants A and ϵ can be determined at each time in the GOES data set, and the phase-space density on the boundary follows from f = J/p 2 .
Other forms for the flux spectrum could also be employed. Two widely used approaches are to assume a power law or a kappa distribution (Summers & Thorne, 1991). Tests (not shown) assuming a power law spectrum J (E) = AE −n , rather than an exponential, made no significant difference to the results. A kappa distribution has three free parameters but for most of the time, the EPS only provides two flux measurements (>800 keV and >2 MeV), as the >4 MeV channel is rarely above background level. A third measurement could have been obtained from the lower energy channels measured by the magnetospheric electron detector (MAGED) on GOES 15, but this data was not included in the SaRIF system. Satellites at GEO move through a range of L* values as they orbit at the geographic, rather than the geomagnetic, equator. For example, the L* for a 90° pitch angle electron at GOES 15 moves between L* = 5.79 to L* = 6.40 at the start of March 2019, using the Olson-Pfitzer quiet-time model (Olson & Pfitzer, 1977). Since SAR calculates the flux at a fixed L* within this range, the L* of GOES 15 will be outside of the L* returned by SAR for some of the orbit, as will the location of other spacecraft at GEO. To enable the model to provide results for all GEO spacecraft, the outer boundary flux is translated to L* = 7 adiabatically. When radial diffusion is the dominant process, then this flux should approximate the flux at the location of GOES 15, once it has diffused inwards. However, inward radial diffusion occurs over a finite time that may be significant during quiet conditions. For example, the timescale for radial diffusion (1/D LL ) at L* = 7 is about 2.3 days when Kp = 1 but only about 9.9 min when Kp = 6, close to the drift time of about 9.4 min. To account for this, the time at which the outer boundary condition is applied is adjusted to allow for the time it will take for the flux to diffuse inwards.
Since D LL = (〈ΔL, ΔL〉)/2 dt, the time, dt, to move a distance ΔL can be approximated by So, if the application of Statistical Asynchronous Regression results in a phase-space density f 0 (E) at L* = L 0 at time t 0 , then the boundary condition becomes ̄0 ( ) at L* = 7 at time t 0 − dt, where ̄0 ( ) is the phase-space density f 0 (E) adiabatically translated to L* = 7, and the dt is calculated using the Kp value at t 0 . As a large value of dt could result in one value of the boundary flux being applied before the previously observed flux, dt was limited to ensure that each value of the boundary flux was used for at least 45 min. Since the outer boundary is derived from 1 hr averages, the 45 min limit is a compromise between allowing each measured value to influence the simulation (so all flux peaks are captured) and the need to allow time for the boundary condition to diffuse inwards.

Pitch Angle Distribution
In the initial implementation of SaRIF, the boundary conditions were derived assuming that the pitch angle distribution could be approximated as a sin distribution in pitch angle, i.e., f (α) = sin(α) f (90°). Shi et al. (2016) used Van Allen Probes data to characterize the pitch angle distribution as f (α) = sin (α) n f (90°), where n depends on location, energy and activity. These distributions from Shi et al. (2016) have been incorporated into the BAS-RBM and are used in the improved reconstructions shown here.

Changes to the Model Configuration
In the original SaRIF simulations the high energy boundary was set at 80 MeV, but in the revised simulation it was reduced to 30 MeV. The diffusion coefficients due to plasmaspheric hiss and lightning-generated whistlers that were used originally only extended out to L* = 6.0 . To improve the modeling of losses during quiet periods the diffusion coefficients from Glauert et al. (2014b), which extend to L* = 6.5 are now used. Figure 5 follows the same format as Figure 1, but the simulations included the changes described above. The agreement between the GOES 14 data and the model results is significantly improved, particularly for the >2 MeV flux (panel (b)). The improved agreement is quantified in Table 3 which can be compared to Table 1.

Evaluation of the Improved Reconstructions
The most noticeable improvement is in the MSA for the >2 MeV flux which has fallen from 435.2% to 92.6%. However, all the metrics have improved at both energies and the prediction efficiencies have risen to 0.82 and 0.87 for the >800 keV and >2 MeV fluxes respectively. This is mainly due to the changed shape of the spectrum on the outer radial boundary, as a result of fitting the spectrum to both the >800 keV and the >2 MeV fluxes, with the other changes having a smaller effect.

Improvements to the SaRIF Forecasts
The developments to the modeling detailed in the previous section can also applied to the forecasting simulations, along with some addition improvements to the initial condition, forecast for the flux on the boundary and the location of the magnetopause.

Initial Condition
As discussed in Section 3.5, using an appropriate steady state solution as the initial condition works well, except when the initial flux is overestimated during quiet times. In these conditions, wave activity is low and the main process acting is radial transport, so translating the outer boundary flux to lower L* adiabatically may provide a better estimate of the initial state. This estimate can be improved by using Equation 6 to estimate the time for inward diffusion, and reducing the flux assuming an average lifetime of 5 days (Claudepierre et al., 2020). For L* < 3.5 the flux can be set to zero to approximate the slot region. This estimate of the initial condition may be less accurate during active conditions, when waves such as chorus will be significant, but, as demonstrated in Figure 4, the initial condition is unlikely to influence the forecast in this case. Note. MSA, median symmetric accuracy; PE, prediction efficiency; SSPB, symmetric signed percentage bias.

Boundary Forecast
Since SaRIF used GOES 15 flux to generate the outer boundary condition, it requires a forecast of the flux on the outer boundary. In the initial implementation this was provided by persistence of the >800 keV flux, after the application of SAR. Persistence provides a reasonable forecast with a correlation coefficients of 0.71 and 0.79 for the >800 keV and >2 MeV flux respectively, and corresponding prediction efficiencies of 0.58 and 0.79 for the period 1 March -1 September 2019.
However, there are two situations where persistence will predictably give a poor forecast. The first is when a flux dropout event has just occurred and the flux has fallen several orders of magnitude. The flux will typically start to recover after about 12 hr (Borovsky & Denton, 2009), resulting in a post-storm flux that exceeds the pre-storm level in 50% of storms, is about the same in 25% and is below the pre-storm level in the remaining storms (Reeves et al., 2003). If the flux has recently fallen by several orders of magnitude then a flux dropout event is likely to be underway, and a prediction that the flux remains at this low level 24 hr later is likely to be incorrect. To avoid this, a possible dropout is detected if the gradient of the log of the flux from 3 hr before the final flux value is less than −0.2 (−0.25) for the >800 keV electrons (>2 MeV electrons). In this case the 24 hr forecast is set to the mean of the last flux and the mean flux for the last 24 hr.
The second situation where persistence is likely to be a poor prediction is when the forecast Kp increases to a high value. In this case a dropout is likely, though not inevitable. So, if the forecast Kp is Kp ≥ 4 at some time in the next 24 hr and is forecast to increase further by at least 1, then a dropout is likely and the predicted flux is reduced by an order of magnitude.

MP Forecast
Persistence is also used to forecast the magnetopause location and hence the last closed drift shell (Glauert et al., 2014a). The magnetopause location is only relevant to the calculation if the last closed drift shell is within the outer boundary of the simulation, i.e., only during active geo-magnetic conditions. When the last closed drift shell is within the simulation domain, it is only likely to remain within the domain for a period of hours, so persisting the location for the full 24 hr forecast is unrealistic. In this case, instead of using persistence for the forecast period, the mean position of the magnetopause in the 24 hr preceding the final known position is calculated. It is then assumed that the magnetopause will return linearly to this location in the next 12 hr and then remain there, and therefore the last closed drift shell moves out as well.

Evaluation of the Improved Forecasts
To illustrate the difference that the changes described above can make, the simulations that produced the forecasts for 1 March -1 September 2019 were rerun, using the same input parameters as the original forecasts but applying the improved methodology described above. These changes will be implemented on the SaRIF site when it is reactivated. Figure 6 is analogous to Figure 3 but shows how the 24 hr forecasts would be improved. The corresponding metrics, given in Table 4, show a significant improvement in the accuracy of the forecasts. As in the case of the reconstructions, the largest improvement is in the predictions of the >2 MeV flux, more than halving the MSA from 275.3% to 119.1% and increasing the PE from 0.62 to 0.78.

Discussion
The metrics presented for the improved SaRIF forecasts suggest they are comparable to previously published models. Balikhin et al. (2016) present correlation coefficients and prediction efficiencies for the Relativistic Electron Forecast Model (REFM) and for the SNB 3 GEO model. Both these models only predict the 24 hr average >2 MeV flux at geostationary orbit, which has much less variation than the measured flux. For predictions 1 day ahead, REFM has a correlation coefficient of 0.85 and a prediction efficiency of 0.70, while for SNB 3 GEO the corresponding values are 0.89 and 0.77. Boynton et al. (2016) also forecast daily average fluxes at GEO but for both >800 keV and >2 MeV. Their correlation coefficients are 0.85 for >800 keV and 0.91 for >2 MeV with corresponding prediction efficiencies of 0.72 and 0.82. Shin et al. (2016) forecast the measured, rather than the daily average, >2 MeV flux at GEO using a neural network. As would be expected, their prediction efficiencies for 1 day ahead are slightly lower than those for the 24 hr average flux mentioned above, at 0.70 for GOES 15 fluxes and 0.68 for GOES 13. Chen et al. (2019) present the PreMeV model that, like SaRIF, forecasts the flux across the radiation belt for a range of energies, though their predictions are 25, rather than 24 hrs ahead. At GEO, their prediction efficiency for 1 MeV electrons is about 0.7, but this falls to about 0.53 for 2 MeV electrons.
The different forecasting techniques discussed above cannot be compared simply on the basis of the metrics quoted, as they predict different types of flux (integral and differential and daily averaged or instantaneous) for different energies. Also, some forecasts are made just for a specific satellite (e.g., Balikhin et al., 2016), while the SaRIF system and Chen et al. (2019) predict the whole radiation belt. They are also evaluated over different time periods,   Further work is required to assess the accuracy of the model in other locations. However, Glauert et al. (2018) compare BAS-RBM simulations, but not forecasts, with data from the Galileo In-Orbit Validation Element-B (GIOVE-B) spacecraft over a four year period from 2008 to 2012 for several L* > 4.6. For detectors with energies roughly corresponding to the >800 keV and >2 MeV discussed here, the correlation coefficients and prediction efficiencies were similar to those presented here with correlation coefficients between 0.72 and 0.88 and prediction efficiencies in the range 0.6-0.8, suggesting that the results at other L* may be similar to those presented here.
In the improved simulations, the forecast for the outer boundary condition is made using persistence with a few adjustments made for particular circumstances, that is, when a high value of Kp is forecast or a dropout event is in progress. A more sophisticated method for the outer boundary could take an approach like that in Shin et al. (2016) to forecast the measured flux at GOES 15, and then use this to determine the boundary condition for the forecast period. Pakhotin et al. (2014) use a more sophisticated prediction of the flux at GEO from the Nonlinear AutoRegressive Moving Averages with eXogenous inputs (NARMAX) model coupled to the VERB 3-d radiation belt model. Comparisons of the model results with data are only provided for three relatively short periods (the longest being 25 days) and no metrics are used to quantify the accuracy of the simulations. However, as the authors point out, NARMAX only provides a 24 hr average of the electron flux at GEO updated every 24 hr and this is not sufficient to resolve the rapid buildups and drops observed at geostationary orbit.
In Section 4.1 it was noted that the timescale for diffusion (1/D LL ) at L* = 7 for Kp = 6, was very close to the drift time for 2 MeV electrons. Radial diffusion is driven by a drift resonance, so it requires the electron to execute multiple drift orbits (Schulz & Lanzerotti, 1974). However, if the timescale for the diffusion is comparable to the drift time, electrons will be transported to other regions before they can execute multiple orbits, suggesting that radial diffusion may not be an appropriate description for the radial transport under these conditions. Alternative transport processes, for example convection, are not currently included in the BAS-RBM, so the assumption that radial diffusion occurs at large L* when Kp is high is largely is a pragmatic one, to enable forecasts to be made under all conditions. However, during these very active conditions, the magnetopause is often compressed so that the last closed drift shell lies inside L* = 7. In this case, outside the last closed drift shell electron loss is the dominant process and transport, radial or otherwise, is less important.
One noticeable feature in Figure 3 is the overestimation of the flux during extended periods of decay, particularly at >2 MeV, for example in the last week of April and first few days of May. The overestimate is also present in the reconstruction and even persists, though reduced, in the improved forecast ( Figure 6). Both the >2 MeV forecasts and reconstructions have a positive SSPB, showing the model has a tendency to over-predict the flux. This may be due to a missing or underestimated loss process in the modeling. For example, losses due to plasmaspheric hiss are not included in the model for L* > 6.5, as this is region is considered to lie permanently outside the plasmapause. However, during extended quiet periods, such as those associated with solar minimum, or in plasmaspheric plumes, the plasmasphere may extend out further and losses due to hiss may be significant in this region. Alternatively, losses due to electromagnetic ion cyclotron (EMIC) waves could be responsible. Since EMIC waves interact primarily with electrons with energies above about 2 MeV, this would explain why the feature is more noticeable at the higher energy. The model for EMIC waves (Kersten et al., 2014) used here was derived from CRRES data and has been shown to underestimate the diffusion due to EMIC waves (Ross et al., 2020). The new EMIC model developed by (Ross et al., 2020) is likely to improve this aspect of the modeling and, if further work demonstrates this, it will be employed when the SaRIF system is reactivated.
With the improvements outlined in Sections 4, the CC and PE for the >800 keV flux (e.g., CC = 0.87, PE = 0.82 for the reconstruction) are lower than those for the >2 MeV flux (CC = 0.89, PE = 0.87) even though the MSA and SSPB are smaller for the >800 keV (MSA = 42.4%, SSPB = −39.9%) than for >2 MeV (MSA = 92.6%, SSPB = 52.0%), illustrating that different metrics assess different aspects of the simulation. Correlation coefficients, for example, simply assess whether the two sets of data correlate, they don't account for a systematic offset.
The better correlation for the >2 MeV flux may be because it takes longer to accelerate electrons to higher energies, providing a better correlation with the data. Since the PE is a comparative measure, it depends not only on the model results but also on the extent to which the data set departs from the reference forecast. The >800 keV fluxes have a range of about 2 orders of magnitude (excluding occasional dropouts), compared to over 3 orders of magnitude for the >2 MeV flux. The PE measures the error in the prediction relative to the error in using the long-term average, so the larger error in reference error term, because of the larger range, is likely to contribute to a higher PE for the >2 MeV flux.
The SaRIF simulations place the outer radial boundary at L* = 7, so that all geostationary orbits (which trace different paths in L* due to the offset between the geographic and geomagnetic equators) are within the region of the calculation. The boundary condition is derived for L* = 6.1 and then translated adiabatically to L* = 7, allowing for the time taken for the psd on the boundary to diffuse inwards. Adiabatically translating the outer boundary condition to a higher L* is a common approach in radiation belt modeling (e.g., Pakhotin et al., 2014;Glauert et al., 2018). However, this assumes that radial diffusion is the dominant process and that pitch angle and energy diffusion due to wave-particle interactions can be neglected. Table 5 shows metrics for the final reconstructions and forecasts when the outer boundary is only translated to L* = 6.6 instead of L* = 7. For the reconstructions, the CCs for the >800 keV and >2 MeV flux increase from 0.87 and 0.89 to 0.91 and 0.92 respectively, and the MSA for the >2 MeV flux reduces from 92.6% to 68.7%. The metrics for the forecasts also show an improvement, though it is smaller. This suggests that processes other than radial diffusion, for example, wave-particle interactions, are playing a significant role in this region and that care should be taken when adiabatically translating boundary conditions. The results presented here use GOES 15 data to determine the outer and low energy boundaries. GOES 15 was placed into storage mode in March 2020 and as a consequence the SaRIF forecasts and reconstructions are temporarily unavailable. Work is underway to convert the website to use GOES 16 data instead. The high energy magnetospheric particle sensor (MPS-HI) on GOES 16 provides differential fluxes at a 10 energies from about 50 keV to about 3 MeV (Dichter et al., 2015), rather than the two integral flux measurements provided by GOES 15. Hence, determining the boundary conditions from GOES 16 data should be simpler and more accurate than using GOES 15 data, as fewer assumptions need to be made about the shape of the spectra and no conversion from integral to differential flux is required. The metrics presented here should improve once GOES 16 data is used for the boundary.

Summary and Conclusions
The SaRIF website provides both post-event reconstructions and forecasts 24 hr ahead for high-energy electrons (> ∼100 keV) across the radiation belts from L* = 2 to geostationary orbit using the BAS-RBM. Amongst its outputs, the website provides visual comparisons between the model results and measurements made by the EPS on the GOES 14 spacecraft. We have presented an evaluation of these comparisons the period March 1-September 1, 2019, by comparing the simulations with data from GOES 14 which is not used in the modeling. After an initial evaluation of both the reconstructions and the forecasts, some alterations to the simulations are described and these are shown to improve the metrics for both the reconstructions and forecasts. Our conclusions are: 1. The original SaRIF post-event reconstructions were a reliable guide to conditions near GEO, with correlation coefficients of 0.84 for the >800 keV flux and 0.87 for the >2 MeV flux compared to GOES 14 data 2. When SaRIF is reactivated later in 2021 the reconstructions will have been updated using the improvements described here, leading to correlation coefficients of 0.87 and 0.89, for the >800 keV and >2 MeV flux respectively, with corresponding prediction efficiencies of 0.82 and 0.87. These reconstructions will provide a useful resource for anomaly resolution 3. The improvements described here had a more significant impact on the accuracy of the forecasts, increasing the correlation coefficients from 0.61 to 0.69 for the >800 keV flux, and from 0.70 to 0.78 for >2 MeV flux. These improvements will be incorporated into the processing of the GOES 16 data when SaRIF is reactivated  4. The reactivated SaRIF website will use the differential electron flux spectrum measured by the MPS-HI instrument on GOES 16, which should improve the accuracy of both the SaRIF reconstructions and forecasts