Comparison of JPL and ESP Solar Proton Fluence Models Using the Background‐Subtracted RDSv2.0 Data Set

High energy protons from solar energetic particle (SEP) events are a hazard to spacecraft systems and instruments. For interplanetary and geosynchronous‐Earth‐orbiting spacecraft, a mission's cumulative SEP fluence is an important consideration for hardware design. The total solar proton fluence for a mission can be dominated by a small number of very high‐fluence events. Because of the sporadic and unpredictable nature of these large events, data sets collected over multiple solar cycles are needed to construct a statistical model that can predict a mission's risk of seeing a given fluence exposure during its mission. Several statistical models have been developed, including the JPL model and the Emission of Solar Protons (ESP) model. The models produce somewhat different results, which could be due in part to the different data sets from which they were derived. To understand the sensitivity of predicted mission fluence to the choice of data set and to the statistical distribution to which that data set is fit, we present a comparison of the JPL and ESP cumulative fluence models as reformulated from the same SEP data set, a background‐subtracted version of the Reference Data Set Version 2.0 (RDSv2.0) based on data from IMP‐8 and GOES, covering 41 years of SEP events from 1974 to 2015 with proton energies between 5 and 289 MeV. The comparisons show that different modeling approaches can produce a factor of 2 or greater difference in the mission fluences even when the same data set is used for model development.

on the data quality. In parallel to the JPL model, the Emission of Solar Protons (ESP) model was also developed, initially to model the peak proton fluxes by fitting the SEP event distribution (Xapsos et al., 1998). The ESP method was later expanded to model the cumulative proton fluence for >1 to >300 MeV using data from IMP-3, -4, -5, -7, and -8 as well as GOES-5, -6, and-7 (Xapsos et al., 1999, 2000). The ESP model was later reanalyzed by combining the best data features from different instruments using data primarily from IMP-8 along with IMP-3, -4, -5, and GOES-6 and -7 to fill in data gaps (Xapsos et al., 2004). More recently, the Solar Accumulated and Peak Proton and Heavy Ion Radiation Environment (SAPPHIRE) model Jiggens, Varotsou, et al., 2018) was developed through the ESA Solar Energetic Particle Environment Modeling (SEPEM) (Crosby et al., 2015). SAPPHIRE uses the SEPEM Reference Data Set Version 2.0 (RDSv2.0) (Sandberg et al., 2014), which aims to resolve data quality issues faced in previous models by combining data across spacecraft through cross-calibration. Statistics of the historical solar particle events have been also studied in terms of event fluences, durations, and time intervals (Jun et al., 2007).
The JPL and ESP methods are two commonly used cumulative proton fluence models and although the underlying statistics used in both models are quite similar, the models produce somewhat different results, particularly at lower energy (Xapsos et al., 2004). The differences between the model outputs could be due in part to the different data sets from which they were derived. A comparison of the two models using the same data set could disentangle the models' statistics-driven differences from data-driven differences such as systematic effects of SEP event identification or data set cross calibration.
In this paper, a comparison of the JPL and ESP cumulative fluence models, as reformulated from the same SEP data set, was performed in order to understand the sensitivity of predicted mission fluence to the choice of data set and statistical methods. The comprehensive RDSv2.0 and the background-subtraction method to isolate the SEP events will be described in Section 2. Throughout the paper, the original data set will be labeled RDSv2.0 while the background-subtracted version used in this study will be labeled RDSv2.0.j. Section 3 will review the statistical methods used in the JPL and ESP models. Finally, in Section 4, the mission fluence predictions made by JPL and ESP methods using the RDSv2.0.j data are compared with each other and to the original JPL and ESP models.

Data
Data used in this analysis are based on the comprehensive RDSv2.0 data set, which provides fluence of discrete SEP events from data collected by IMP-8 and GOES spacecrafts (Sandberg et al., 2014). The data set covers 41 years of SEP data from 1974 to 2015, spanning nearly 4 full solar cycles and containing proton energies from 5 to 289 MeV. Based on previous studies, each solar cycle can be divided into two periods, a high-fluence, active Sun period of 7 years and a low-fluence quiet Sun period of 4 years (Feynman et al., 1990). The active period begins 2 years before the year of the solar sunspot maximum and includes the fourth year after solar maximum. There are a total of 25 active solar years within the RDSv2.0 data set, using the solar cycle maximum dates: 12/1979 for cycle 21, 11/1989 for cycle 22, 11/2001 for cycle 23, and 04/2014 for cycle 24. This data subset will be used for modeling mission fluence of protons during the active phase of the cycle, for both the JPL and ESP method.
This analysis uses a background-subtracted version of the RDSv2.0 data, labeled as RDSv2.0.j, which uses a different background subtraction and SEP event identification method than the background-subtracted SEPEM RDSv2.1 Jiggens, Varotsou, et al., 2018).

Background Subtraction
The calculation of accurate SEP event fluences requires the subtraction of background levels, particularly for higher energy fluxes that often increase only slightly above background. However, due to solar modulation and other effects, the particle background intensity changes significantly with time. To account for this, a background subtraction method was devised to automatically identify the changing background levels across the long, continuous SEPEM RDSv2.0 data set (Minnow et al., 2020;Whitman et al., 2018). This process was carried out iteratively, resulting in a mean background and expected level of variation for each day of the data set with the idea that increases above the expected level of variation are likely due to SEP events. This analysis employs a two-step background subtraction for data processing. Before the first step, a single threshold value was applied to the flux of each energy channel across the data set to roughly exclude very high intensities due to SEP events. Then, the remaining fluxes were binned into histograms over 150 day periods and the mean (μ) and standard deviation (σ) were calculated. A Step-1 "moving" threshold of μ + 3σ for every 150 day period was applied to the SEPEM data set to once again exclude SEP intensities above the threshold. In Step 2, demonstrated in Figure 2, a trailing 27-day window is applied only to the blue points (i.e., all the data points below the green threshold line) in Figure 1 (right) to make a new estimate of the mean background (μ) and expected level of variation (σ) for each day of the SEPEM data set. In other words, for a given day, the previous 27 days of data points were used to estimate the background levels. The 27 days window was chosen for two reasons. First, it is equivalent to a single rotation of the Sun and averages out short-term modulation effects. Second, there are occasions when proton fluxes can remain elevated for extended periods of time due to multiple SEP events in quick succession; the window size attempts to retain enough data points around SEP events to estimate the background. To ensure enough data points are included to be meaningful, the 27 days window was required to contain at least 24 days worth of data points as gaps in data may occur due to elevated periods being excluded. If this criterion was not met, then the last good background value was used. For each day of the data set, a histogram of the background values in the 27 days window is generated and the mean and standard deviation are calculated. This results in a more refined daily "moving" threshold of μ + 3σ. All points above this threshold are identified as possible increases due to SEP events. The background values derived for the full 1974-2015 time period are shown in Figure 3. Note that this background subtraction process has been applied separately to each channel of the original SEPEM data set (RDSv2.0). A final background-subtracted data set, RDSv2.0.j, was produced by setting all background values below threshold to zero and subtracting the daily mean background from the fluxes above threshold. Examples are shown in Figure 4.

SEP Event Selection
The previous section describes the creation of a background-subtracted time series containing only elevated points above background that may be due to SEP events. To produce a final SEP event list, a set of criteria were applied to discard the random points above background and include only those elevated periods due to SEPs. Following similar methodology used to derive the SEPEM Reference SEP Event List for 7 MeV protons Jiggens, Varotsou, et al., 2018), the following criteria had to be met for an elevated period to be identified as an SEP event: • Minimum duration: An increase above threshold was required to continue for at least 4 hr • Missed points: Up to 10 data points within the 4 hr period were allowed to drop below threshold to account for random variations near threshold or bad points during the event • Dwell time: Fluxes must fall below threshold for 3 hr to identify the end of the event; the event end is defined as the beginning of the 3 hour period • Multiple channels: An increase had to be observed in at least 3 energy channels The criteria were chosen to identify as many SEP events as possible and to separate them into individual events even if they occurred in quick succession. The background calculation and these criteria were applied to each energy channel, essentially creating independent SEP data sets for each energy. The requirement for increases to be observed in multiple energy channels was included in an effort to generate a cleaner event list for the lower energies which exhibit high variability due to the influence of the solar wind.
Upon applying these criteria across the RDSv2.0.j timeseries, SEP event lists containing start and end times were derived for each energy channel as shown in the example in Figure 5 (left). The results were reviewed and small corrections were made to exclude any false identifications. Some start and end times include multiple sharp increases in flux which can be defined as distinct SEP events depending on the energy channel. This occurs when fluxes did not return to background levels for 3 or more hours (dwell time) between potential SEP events, therefore the term "events" in this study should be interpreted as continuous increases in proton flux above background levels. This resulted in 384 events for the 5-7. The results described in this paper use each of the event lists derived for each energy channel independently as it is most consistent with the original JPL method, that is, the >5 MeV distribution is created from the 384 events in the 5-7.23 MeV channel. The integral event fluence in cm −2 was calculated for each SEP event by summing together the fluence for all channels with energy above the energy threshold. As the highest energy channel in the RDSv2.0.j data set is between 200-289 MeV, the proton fluence above 289 MeV is lost from all channels which may affect the results of the highest energy channels and will be discussed in Section 4.

JPL Method
The JPL method predicts the cumulative mission fluence by first modeling the active-solar-year SEP event distribution as a log-normal distribution that relates the size of an event to its likelihood of occurrence. For each proton energy channel, all SEP events from the active-solar years are first ranked according to the formula (Cunnane, 1978): where F i is the probability ranking of event i, and n is the number of events in that energy channel. As the model is primarily concerned with the largest fluence events, a lower event fluence cut-off is set before ranking the events (Table 1). The cut-off event fluences for each energy were set to be consistent with the original JPL analysis for existing energy channels within the original JPL model (Feynman et al., 1993). For energy channels not within the original JPL model, the cut-off is set to smoothly transition between the energies to the nearest order of magnitude to stay consistent with the original JPL method. The effect of the cut-off fluence has been studied in Rosenqvist et al. (2005) and found to have a 20%-50% impact at high confidence levels.
The event fluences above the cut-off are plotted as a function of the probability ranking, with the X-axis set to a normal-probability scale where the axis is linear in normal quantiles, and the Y-axis set to a log scale for the event fluences (Chambers, 1985). Using this axis formatting, a distribution following a log-normal function should form a straight line. The event distributions are then fit using the cumulative distribution function (CDF) of a log-normal distribution: .0 data set. Solar modulation due to the solar cycle is clearly visible. Jumps in the data set occur during transitions from one spacecraft to another and wiggles, as are seen between 1989 and 1995, are present in the original SEPEM RDSv2.0 data set and are likely instrumental effects. Increased variability is observed during solar maxima with some contamination in the background levels due to increased number of solar energetic particle (SEP) events, particulary for 31.62-45.73 MeV energies and below. However, SEP increases above background tend to be higher at these energies, particuarly for the energetic events selected for this study.  Note that the full time period shown here is equivalent to a single "event" in the 5-7.23 MeV energy channel, whereas higher energy channels return to background levels more quickly and are separated into two or more events. Thus, the derived start and end times identify continuous increases in proton flux above background.
where σ is the standard deviation and μ is the mean of the log-normal distribution. As the total proton fluence accumulated during a mission is dominated by the probability of occurrence for the largest events, the original JPL model only fits the event fluences >50%. Several thresholds for the fit were studied for the RDSv2.0.j data and a + > + 70% of the event fluence threshold was selected in order to have a better fit for the very high fluence events at lower proton energies, as suggested by M. Xapsos (Private Communication, 2021).
Using the SEP event distribution model, the JPL model then performs Monte Carlo simulations in order to estimate probability of a mission exceeding a given cumulative proton fluence value. The probability for a mission length τ exceeding a cumulative proton fluence f p = 10 F , where F, the log-cumulative fluence, is normally distributed to simplify the Monte-Carlo sampling, is defined as: where w is the true rate of SEP events, which is approximated by using the active solar year SEP event rate from the data, p(n, wτ) is the probability of n SEP events occurring during the mission, and Q(F, n) is the probability that the n events will have a fluence exceeding f p . The probability of a mission exceeding the fluence f p is related to the confidence level (C.L) of a mission by: For example, a 95% C.L means that if 100 missions of duration τ are flown at different, non-overlapping time intervals within the active period of some solar cycle, only 5% of the identical missions will experience a proton fluence greater than f p . The SEP events are assumed to be Poisson distributed (Feynman et al., 2002), so p(n, wτ) is defined as: To obtain P(>F, τ), the total probability is calculated by stepping the n value from 1 to a maximum value. The n values are simulated typically from 1 up to 120. With the typical SEP event rate of 10 events/year, the Poisson probability for n = 120 for mission lengths up to 7 years (the maximum explored herein) is essentially zero.
For each n value, Q(F, n) is calculated by first sampling n random fluence values from the fitted log-normal distribution and obtaining the total fluence by summing them together. This sampling process is performed 10 million times for each n value to reduce simulation uncertainty. The probability that the fluence is greater than F is just: where t n,>F is the number of samples with a total fluence greater than F for a particular n value, and t n,tot is the total number of Monte-Carlo samples drawn.

ESP Method
The ESP model utilizes the Maximum Entropy Principle (Kapur, 1993) and determined that a log-normal distribution is the most suitable model for the yearly cumulative proton fluence. Rather than modeling the distribution of individual SEP events, the ESP method models the active-solar-year annual proton fluence distribution with the CDF of a log-normal function (Xapsos et al., 2000). Note that the JPL method uses a log10 CDF while the ESP model uses a natural log distribution function. The fit functions are mathematically equivalent, offset by a constant factor; however, the respective log functions are used in this study for direct comparisons with the original methods. Once the parameters of the annual proton fluence distribution are determined, the distribution can be used to read out the probability that a 1-year mission will exceed a certain cumulative fluence. To scale the probability distribution to multiple year missions, the ESP model takes the best fit 1-year σ and μ values and calculates the mean fluence (Φ mean ) and relative variance (Φ RV ) values (Xapsos et al., 1999(Xapsos et al., , 2000: The mean fluence and its relative variance for a τ year distribution can then be derived from compound Poisson process theory (Kellerer, 1985). For a mission duration of τ years, the mean mission fluence becomes τ × Φ mean and the relative variance becomes Φ . These relationships can be inserted back into Equations 7 and 8 to solve for μ and σ which define the log-normal function that gives the cumulative probability distribution for the multiyear mission of length τ years.
To implement the ESP method, the SEP events from RDSv2.0.j are first grouped by year, taking only active solar years as defined in Section 2, and summed to obtain the annual proton fluence. The annual proton fluence values are then ranked and the data above the 70th percentile in the annual proton fluence versus probability distribution are then fit using the log-normal distribution. Fitting data above the 70th percentile is not part of the original ESP model and was implemented to be consistent with the implementation of the JPL method applied to the RDSv2.0.j data (as described in Section 3.1). One caveat is that as the number of active solar years is limited in statistics, taking the 70th percentile cutoff generates a sparse data set and creates larger fitting uncertainties which limit the reliability of the model, particularly when extrapolating to long mission durations. Figure 6. The >5, >10.46, >31.62, and >66.13 MeV solar energetic particle event distribution fits from the JPL method using a log-normal function (blue line). The probabilities can be interpreted as the probability of an event having a magnitude less than the fluence plotted. The original JPL model fits from Feynman et al. (2002) are shown in orange line as a comparison. High fluence data (greater than 70th percentile) used in the fit are the blue squares while the low fluence data not used in the fit are the red squares.

Results and Discussion
The JPL and ESP methods were applied to all energy channels of the RDSv2.0.j data. The original JPL model was based on proton data at > 4, >10, >30, and >60 MeV and although the RDSv2.0.j data does not have energy channels exactly matching the JPL data, the >5, >10.46, >31.62, and >66.13 MeV channels are close enough in energy and are highlighted for a direct comparison. The original JPL model output, based on (Feynman et al., 2002), was generated using the JPL NSET webtool (Natural Space Environments Tools (NSET), 2021) while the original ESP model output, based on (Xapsos et al., 2004), was generated using the ESA SPace ENVironment Information System (SPENVIS) webtool (Heynderickx et al., 2012). Figure 6 shows the results of the SEP event distribution fits >70% using the JPL log-normal function. For the >5, >10.46, and >31.62 MeV channels, the fit to the RDSv2.0.j data has a larger σ value so the log-normal CDF is sloped more steeply compared to the original JPL model, suggesting the predicted fluence from the new model will be higher. Finally, the σ value for the >66.13 MeV channel is lower than the original JPL model (denoted "Feynman" in the figure), so the resulting fluence projections will be lower compared to the original JPL model. At energies below 66.13 MeV, the RDSv2.0.j data shows some flattening at the high-fluence tail of the probability distribution at ∼95% which does not fit well to the straight line from the log-normal distribution, a feature observed in previous studies (Feynman et al., 1993). Adjusting the fitting thresholds of the SEP event fluences between greater than 50% to greater than 70% did not significantly improve the fitting performance to these highest fluence events. This suggests that the data does not fit well to log-normal statistics and a different approach like using a different fitting function that better represents the high fluence events may be necessary. As discussed in Section 2.2, it is possible for the highest fluence SEP events in the low energy channels to be a combination of multiple SEP events due to the channel flux never falling below the background threshold. A check was performed for the JPL method by analyzing the four lowest energy channels using the SEP event list from higher energy channels  MeV) which may potentially break apart the continuous high fluence SEP events into lower fluence individual SEP events. However, the highest fluence events were unchanged and the results of the event fluence distribution fits did not significantly   The curves showing the probability of exceeding a certain fluence during a given exposure period (Equation 3) using the JPL method are shown in Figure 7 for the >5, >10.46, >31.62, and >66.13 MeV channels. As expected, the curves produced by the RDSv2.0.j data for >5, >10.46, and >31.62 MeV show a larger fluence at 95% C.L compared to the original JPL model (denoted "Feynman" in the figure). The >66.13 MeV channel predicts a lower fluence at 95% C.L compared to the original JPL model.
The ESP model fits of the annual proton fluence distribution for >5, >10.46, >31.62, and >66.13 MeV channels are shown in Figure 8. Similar to each individual SEP event fluence distribution, the annual SEP distribution at low energies also flattens at higher fluences. The annual SEP data above 70th percentile fits better to a log-normal distribution, however, the fit uses very few years which limits the confidence in the model when extrapolating to high mission duration and C.L. A comparison of the fit to the RDSv2.0.j data to the original ESP model for  (Heynderickx et al., 2012) is shown in Figure 9 and shows good agreement. Note that the annual proton fluence fit comparison for the >10.46 MeV channel in Figure 8 is from Xapsos et al. (2000) while the original ESP model results generated using SPENVIS are from Xapsos et al. (2004). Comparisons of JPL and ESP models of the cumulative fluence for a 1-year mission at 95% C.L using both the RDSv2.0.j data as well as the original model outputs are shown in Figure 10 as well as in Table 2.
Across all energies, the JPL and ESP methods applied to the RDSv2.0.j data are within a factor of 2-3 to each other for a 1-year mission at 95% C.L. Below 60 MeV, the JPL method projects a higher fluence than the ESP method while above 60 MeV the two methods trend closely together. The difference at low energies is likely due to the shape of the SEP event fluence distribution not fitting well to a log-normal function ( Figure 6). As the SEP event fluence tail flattens at high fluences, the log-normal function over-predicts these highest fluence events, leading to a larger cumulative fluence. When the proton energy increases, both the SEP event fluence distribution and the annual fluence distribution are well represented by the log-normal function. As the 95% C.L 1-year mission fluence prediction can be interpreted as an interpolation of the data set itself, the results from the two methods converge when both methods fit the data well as expected.
Comparing the new model outputs to the respective original model predictions as shown in Figure 10, both new models differ by a factor of 2-5 from the respective original model predictions below 60 MeV. As the proton energy increases, the new models begin to diverge more from the original model predictions. In particular, the original JPL model only has data up to 60 MeV so mission fluences for energies beyond 60 MeV in the original JPL model are extrapolated using a power-law, which may possibly explain the higher fluence results compared to the other models. As discussed in Section 2.2, this analysis does not include the >289 MeV proton fluence which may play role in the mission fluence prediction, particularly as the channel energy increases. A sanity check was performed to estimate the order-of-magnitude contribution by the >289 MeV proton fluence by histogramming the ratio between successive channel fluence bins. The ratios between the successive channel tend to be in the range of 0.5. Hence, the >289 MeV proton fluence can be assumed to increase 50% of the >200 MeV energy channel which will bring the mission fluence projections slightly closer to the original model predictions. Given the importance of the high energy protons to spacecraft design, future studies should carefully include the >289 MeV proton fluence into the model input. An additional sanity check was performed by increasing the fitting threshold in order to only focus on the highest fluence years while ignoring the rest of the distribution for the ESP method which should produce an upper bound on the mission fluence prediction. Using the exaggerated fit, the >138 and >200 MeV channel increased by a factor of 1.4 and 2 for the 1-year 95% C.L fluence prediction, which brings the results closer together to the original models but still leaves a significant difference. Additional comparisons between the JPL and ESP models along with the SAPPHIRE model have been made in , Jiggens, Varotsou, et al. (2018) using the original RDSv2.0 data set.
The larger difference between the new and original model predictions highlights the importance of the data set used to generate the model. The factor of 2-3 difference between the new implementations of the JPL and ESP models using the RDS2.0.j data also highlights the inherent model uncertainties, particularly when applied to sparse data sets which increase the fitting uncertainty and reduces the confidence in the model outputs. At JPL, a multiplicative factor, the radiation design factor, of 2 is used on all radiation analyses to account for inherent model uncertainties. Table 3 compare the 95% C.L cumulative fluence projections of the JPL and ESP model using the RDS2.0.j data for mission lengths varying from 1 to 7 years Figure 11 also includes the multi-year trends for the original JPL and ESP models as well. The trends found in the 1-year projections, with the two models being off by a factor of 2-3 at low energies and matching at higher energies, are also reflected in the multi-year projections.

Conclusions
The methodologies of the JPL and ESP cumulative solar proton fluence models were applied to the background-subtracted RDSv2.0.j data set, which spans 41 years, nearly 5 solar cycles of solar proton measurements. The JPL model fits the SEP event distribution with the CDF of a log-normal distribution. The fit results are then used to perform Monte Carlo simulations to generate the probability of a mission exceeding a certain cumulative fluence. Using the Maximum Entropy Principle, ESP model fits the annual proton fluence distribution using a log-normal distribution. The probability of a mission exceeding a certain fluence can be read out directly from the log-normal fit.
The model results were compared with each other, the original JPL model implemented in NSET, and the original ESP model implemented in SPENVIS. The model results evaluated on the RDSv2.0.j data are within a factor of 2-3, while comparisons to the original model outputs show a factor of 2-5 difference below 60 MeV and begin diverge more above 60 MeV due to differences in data used. These results highlight the inherent differences between the JPL and ESP models at energies <60 MeV where the SEP event fluence distribution is not well fit by a log-normal distribution while the annual proton fluence distribution is well fit by the log-normal. The differences between the models derived from the RDSv2.0.j data and the original model outputs also emphasize the importance of the data used to generate the model.

Data Availability Statement
The raw RDSv2.0 data set can be obtained through (Heynderickx et al., 2017). The background subtracted data set RDS2.0.j can be obtained through (Whitman et al., 2023).