On Generalized Additive Models for Representation of Solar EUV Irradiance

In the context of space weather forecasting, solar EUV irradiance specification is needed on multiple time scales, with associated uncertainty quantification for determining the accuracy of downstream parameters. Empirical models of irradiance often rely on parametric fits between irradiance in several bands and various solar indices. We build upon these empirical models by using Generalized Additive Models (GAMs) to represent solar irradiance. We apply the GAM approach in two steps: (a) A GAM is fitted between FISM2 irradiance and solar indices F10.7, Revised Sunspot Number, and the Lyman‐α solar index. (b) A second GAM is fit to model the residuals of the first GAM with respect to FISM2 irradiance. We evaluate the performance of this approach during Solar Cycle 24 using GAMs driven by known solar indices as well as those forecasted 3 days ahead with an autoregressive modeling approach. We demonstrate negligible dependence of performance on solar cycle and season, and we assess the efficacy of the GAM approach across different wavelengths.


Introduction
Accurately describing space weather effects on the upper atmosphere is of critical importance for space situational awareness, satellite collision avoidance, safeguarding the electrical power grid, and protecting astronauts (Bussy-Virat et al., 2018).A key component of space weather operations involves describing the variability and effects of solar extreme ultraviolet (EUV) radiation, nearly all of which is absorbed in the thermosphere and serves as the dominant driver of energy input into the upper atmosphere during geomagnetic quiet times (P.Richards et al., 1981;Stolarski et al., 1975).Solar EUV irradiance additionally plays a central role in modulating the global variation of total electron content (TEC) (Hocke, 2008;Lean et al., 2011) and in driving the thermosphere response at multiple timescales (Guo et al., 2007).
2. The emissions most strongly correlated with each solar index are absorbed in different regions of the thermosphere and mesosphere, resulting in either increasingly complex parameterization for their ingestion into atmospheric models and non-trivial impacts on quantification of uncertainty in derived thermospheric temperatures and densities (Thayer et al., 2021).3. The indices struggle to capture variation of solar irradiance both beyond timescales of ∼27 days and on the order of hours (de Wit et al., 2008;Tapping & Charrois, 1994), and do not capture the influence of solar flares inherent in EUV measurements (Pawlowski & Ridley, 2008).
These limitations have led to observed discrepancies in model performance when using different indices, especially when representing TEC (Tariku, 2019).More significantly, in the context of blackouts in solar proxies, it is possible to reconstruct those proxies for operational usage using solar radio measurements in neighboring wavelengths (Elvidge et al., 2023), which provides additional robustness for space situational awareness, but exacerbates the effects of the already inherent limitations of the index due to imperfect reconstruction.
In response to some of these concerns, empirical models of solar EUV irradiance have been developed, the outputs of which can be ingested into coupled atmospheric models such as the Global Ionosphere Thermosphere Model (GITM) (Ridley et al., 2006) or SAMI3 (Huba & Liu, 2020).The EUVAC model (P.Richards et al., 1994) is one such model that has seen regular used since its inception, and was succeeded by HEUVAC (P.G. Richards et al., 2006), which featured increased spectral resolution and flexible wavelength binning.Both models were developed based on a parameterization of F10.7, 81-day averaged F10.7 and the F74113 quiet sun reference spectrum derived from rocket measurements during the 1960s and 1970s (Heroux & Hinteregger, 1978).Other paradigms include the GOES-R EUVS model developed by the National Oceanic and Atmospheric Administration (NOAA) (Thiemann et al., 2019), the SOLAR2000 empirical solar irradiance model and forecast tool, which was implemented with the development of the E10.7 solar EUV proxy derived from the time-dependent integrated solar EUV flux at the top of the terrestrial atmosphere (Tobiska et al., 2000), and the Flare Irradiance Spectrum Model (FISM) (Chamberlin et al., 2007).The current iteration, FISM2 (Chamberlin et al., 2020), includes a daily component and a flare component, and improved empirical modeling due to the incorporation of measurements from the Solar Dynamics Observatory (SDO)/Extreme Ultraviolet Variability Experiment (EVE), SORCE/X-ray Photometer System (XPS), and SORCE/Solar Stellar Irradiance Comparison Experiment (SOL-STICE).FISM2's daily component models the solar spectral irradiance at a given wavelength as a sum of an unchanging solar minimum reference spectrum for all days and times, a contribution due to variability due to the solar cycle, and a contribution due to variability associated with the solar rotation.The solar cycle and solar rotation components use the centered 108-day smoothed value of proxies and measurements, but must rely on the average of the previous 54 days for near-real time estimations.
This work builds upon the empirical modeling paradigm by introducing a novel framework for parameterizing solar EUV irradiance using Generalized Additive Models (GAMs), a class of models that relate a response variable linearly to a sum of smooth functions of predictor variables of interest (Hastie, 2017).This class of models allows for the capturing of the proportional impacts of various solar processes represented by proxies on each wavelength band, and natively support intuitive quantification of uncertainty in modeled irradiances due to the relating of the distribution of expected values of the response variable to the predictor variables through a link function.Since reliable measured solar EUV irradiance records do not extend before the early 1990s, we leverage FISM2 model results for the entire period considered, and construct a GAM to represent integrated solar EUV irradiance across all the considered wavelength bands using that data, after which we construct a second GAM to model residuals with respect to native FISM2 outputs.The results of the GAM approach are evaluated using known historical space weather drivers, and are compared to TIMED/SEE and SDO/EVE measurements in select wavelength bands.GAM results are also evaluated using space weather drivers hindcasted 3 days into the future using an Autoregressive Modeling approach during two 30-day periods in SC24: one corresponding to low solar activity and one corresponding to high solar activity.The contributions this study include (a) the application of a rigorous mathematical modeling method for representing of solar EUV irradiance, and (b) the initial verification of this method for forecasting over time scales routinely utilized in the space weather community through hindcasts performed over low and high solar activity periods.
We note that the relative weakness of Solar SC24 in comparison to the preceding Solar Cycles may affect the results.While we foresee this effect being minimal since solar indices during this time show a weakness that is similar to the same degree, we acknowledge that it is of enough significance to warrant investigating the cycle-by-cycle modulations in indices and drivers in more detail in a future study.Further, we have limited inclusion of TIMED/SEE results to 16 bands between 280 and 1,170 Å due to the fact that TIMED/SEE did not make spectral measurements below about 280 Å; the TIMED/SEE Level 3 irradiance below 280 Å is derived from a model, subjecting it to model error.Additionally, above 1,170 Å, FISM2 is calibrated against SORCE/SOLSTICE, which is more accurate than TIMED/SEE, and disagreement between FISM2 and TIMED/SEE in this range is expected and systematic due to FISM2's higher absolute accuracy.Likewise, due its limited wavelength range, we only consider 21 wavelength bands corresponding to SDO/EVE, which suffers much less from inaccuracies due to degradation than TIMED/SEE, since it undergoes regular calibrations.
The obtained GAM functions can be used for: (a) operational assessment of the current state of solar irradiance, specifically when there are data outages, (b) improved thermosphere and ionosphere modeling and uncertainty quantification of the atmospheric state, and (c) forecasting future values of solar EUV irradiance across multiple wavelength bands and on multiple time scales.They represent an innovation in that they do not require estimates of time-averaged solar indices centered on the given day in order to be used for nowcasting or forecasting.
The paper is organized as follows: first, the data pre-processing strategy is outlined.Then, the GAM approach for modeling irradiance is described, followed by a description of Autoregressive Models.After that, a description of the techniques used for validation of results is outlined.Thereafter, irradiance estimation results are presented and assessed by comparison against native FISM2 outputs, TIMED/SEE, and SDO/EVE.Finally, the results are discussed and conclusions drawn.

Data Pre-Processing
We considered F10.7, revised Sunspot Number (SSN) and Lyman-α as drivers for the GAMs fit to FISM2, and we compared GAM outputs to native FISM2, as well as TIMED/SEE and SDO/EVE measurements in specific wavelength bands.These indices were chosen primarily due to their widespread use and familiarity within the space weather community.Additionally, since indices serve as proxies primarily for the upper chromosphere/ lower corona and solar photosphere (Johnson et al., 2023;Nusinov & Katyushina, 1994), constraining model parameterization to these parameters demonstrates the versatility of the GAM approach in its ability to address the problem of solar EUV modeling even with limited information.For this reason, we have not included other proxies such as the Mg-II index (taken as the ratio between chromospheric Mg II h and k lines at 2,795.6 and 2,802.7 Å respectively, and the weakly varying photospheric wings) or Ca-II index, even though their inclusion may increase accuracy by capturing the dynamics of slowly varying corona (Schonfeld et al., 2019) and better describing variation within the entire chromosphere (Viereck et al., 2004).The selected indices were obtained from the NASA OMNIWeb Data Explorer (Papitashvili & King, 2020).We used cubic smoothing splines to interpolate over gaps in the OMNIWeb solar index data, using the Python CSAPS package (De Boor & De Boor, 1978).The FISM2, Level 3 TIMED/SEE, and Level 3 SDO/EVE irradiances were obtained from the LASP Interactive Solar Irradiance Datacenter (LASP, 2005) and we used nearest-neighbor interpolation to upsample them from their native daily resolution to the hourly resolution of the OMNIWeb solar indices.Additionally, the FISM2 and TIMED/SEE irradiances were arranged into 59 wavelength bins used by GITM for ease of comparison and for eventual ingestion into GITM (see Table 1 below).Due to the lower accuracy of TIMED/SEE compared to FISM2 outside of 280-1,170 Å, only 16 non-singular wavelength bins between 280 and 1,170 Å were used from TIMED/SEE for additional comparisons.In the case of SDO/EVE, the Level 3 data are bounded between wavelengths centered at 65-1,055 Å, in 10 Å intervals; we consider 21 of these wavelength bands in our analysis.In Table 1 below, the salmon-colored bins correspond to those for which there was SDO/EVE data alone, the orchid-colored bins correspond to where there was both TIMED/SEE and SDO/EVE data, and the sky blue-colored bin corresponds to where there was TIMED/SEE data alone.FISM2 data was available in every bin.
Binning was performed as follows: For each spectrum at a given time, irradiance at the wavelength nearest to each of the singular wavelengths was obtained.These constitute the singular wavelengths, or "lines"; here the irradiance was assigned the single line.Afterward, the irradiance in each wavelength range (the remaining nonsingular bins) was calculated by summing the irradiance in that bin.This was done by adding the values of irradiance corresponding to the wavelengths between the boundaries of each bin and multiplying by the bin width.
Figure 1 shows the EUV spectrum for FISM2, TIMED/SEE, and SDO/EVE during solar maximum in SC24, obtained via our binning procedure.

Generalized Additive Models
Conventional linear models (Nelder & Wedderburn, 1972) assume a response variable y follows an exponential family distribution with mean μ, which may be a nonlinear function of β T X, where β is a vector of coefficients and X = [X 1 , X 2 , …, X n ] are covariates (predictor variables): where g is the link function relating the predictor variables to the expected value of the dependent variable y.
GAMs, by contrast, are linear models in which the response variable depends linearly on unknown smooth functions of several predictor variables.They take the following form: where f i are "feature functions" which may be constructed from various families of bases functions.In the context of this paper, the solar EUV irradiance in a given band y was regressed using the predictor variables Day-ofthe-year (DOY), F10.7, SSN, and Lyman-alpha.We specifically include DOY as a model driver in order to represent seasonal variations in the solar indices.The regression was carried out using the GAM framework, which required the fitting of univariate spline functions for each those predictor variables.We assumed a normal/Gaussian error model for the responses y which naturally leads to a least-squares fitting problem.In our analysis, we proceeded with our approach as follows: 1.For SC20 through the beginning of SC23, an initial GAM Y was fit between solar EUV irradiance represented by FISM2 and the F10.7 index, revised Sunspot Number (SSN) (Clette et al., 2015), and the Lyman-α index (Woods, Tobiska, et al., 2000) using penalized B-splines and the Normal distribution.2. A second GAMs ζ to capture the behavior of initial model residuals was fit during the remainder of SC23. 3. SC24 integrated solar EUV irradiance was modeled using the F = Y ζ for known solar inputs (termed models F K ; likewise, Y K refers to models Y driven by known solar inputs).4. Relative irradiance error ɛ with respect to native FISM2 was compared from F K , and assessed for solar cycle, seasonal, and solar activity dependence.This was also done for TIMED/SEE Level 3 irradiances in 16 wavelength bands between 280 and 1,170 Å and SDO/EVE Level 3 irradiances in 21 bands between 65 and 1,055 Å. 5.The approach was applied again for 59 different wavelength bands, to assess the behavior of the mean, standard deviation, kurtosis, and skew of relative error ɛ as a function of wavelength band i.These bands were selected due to their usage in GITM (a subset of the bands corresponding to those used by EUVAC is also used by SAMI3 (Huba & Liu, 2020)).This step provided us with models ]. 6.The above steps were applied for a 3-day ahead Autoregressive Model hindcasts of the solar indices mentioned in Step 1, which were used to drive the GAMs.This was done for two 30-day periods in SC24: one during the low solar activity in the ascending phase and one during high solar activity during the peak.
The initial fitting for Y i was performed on solar index and FISM2 data between the beginning SC20 (10 October 1964) and the peak of SC23 (taken as 23 January 2002).The fitting of ζ i was done between the Y i residuals and solar indices between the peak of SC23 and the remainder of its descending phase (up to 1 December 2008).The final models F i were evaluated over the entirety of SC24 (1 December 2008-1 December 2019).The GAMs were implemented with the aid of the recently-developed PyGAM package (Servén & Brummitt, 2018).
An example of the results of the fitting procedure in two different wavelength bands (centered at 19.5 and 225.0 Å is shown) is shown in Figure 2 below.The solar indices used to perform the fits also shown, along with the GAM outputs for the same period of time corresponding to the training set used for fitting.
Figure 3 shows the results of the fitting procedure for ζ i for the wavelength bands centered at 19.5 and 225 Å.We observe that the aim of the inclusion of ζ i is to account for any trends in the residuals related to season or solar activity.While we acknowledge the limitations of fitting ζ i only during the descending Phase of SC23, we elected to restrict the fitting to that period of time in order to demonstrate the effectiveness of the approach with use of limited information, and to avoid the problem of over fitting.
Figure 4 shows the results of the fitting procedure in the same two bands, but viewing up close a time period during the descending phase of SC22, between 1 March 1993 and 1 December 1996.From this figure can be seen qualitatively how the inclusion of the second GAM ζ i improves the overall correspondence between the model outputs and FISM2, while avoiding over-fitting.
For added clarification on the construction of the GAM, we show Partial Dependence Functions (PDPs) for GAMs Y i for wavelengths centered at 19.5 and 225 Å in Figures 5 and 6 below.PDPs are widely used within the field of interpretable Machine Learning to explain the marginal effect of single features on a model prediction (Friedman, 2001).In the case of the present study, the PDPs are identical with the aforementioned feature functions used to construct the GAM.Thus, in the examples shown, they illustrate the proportional functional contributions to solar EUV irradiance in a specific band due to a specific feature represented by either DOY (a Note.FISM2 data were available in every bin.proxy for season) or a solar index.The versatility of the GAM approach is demonstrated by the varied nature of PDPs for different wavelengths.This behavior is expected, since each model driver is not expected to retain the same relationship with each wavelength band.
In particular, for the examples shown, we observed oscillatory behavior between DOY and its respective contribution to solar EUV centered at 19.5 Å that, by inspection, was characteristic of a dominant period between 75 and 100 days in duration.This was contrasted with the same PDP for 225 Å, which showed not only similar oscillations on a shorter characteristic period of ∼50 days in duration, but an annual trend that reached a trough just prior to the summer solstice.These oscillations were observed in every other wavelength (not shown), but we attribute them to a combination of (a) residual variations unable to be accounted for by the solar drivers manifesting as variations in the DOY PDPs and (b) noise.This attribution stems from the fact that the FISM2 data itself has been normalized to 1 AU, and as a result, oscillations are unlikely to be due to variations in Earth-Sun Distance or eccentricity associated with seasonal changes.That the oscillations are an order of magnitude smaller than the variations (having <∼10% effect on the system) seen in the solar drivers suggests that the effect, although observable, is minor, and likely can be attenuated by the inclusion of other solar indices and the neglect of DOY as a model input altogether, something we will investigate in subsequent work.
We also observed unique behavior for the PDPs of the solar indices for each wavelength.For 19.5 Å, the PDP for F10.7 is nearly linear, and only fails the to meet the description of a monotonically increasing function at values beyond F10.7 = 300.This, however, was not observed with the wavelength bin centered at 225 Å which behaved nearly like a negative quadratic function until F10.7 = 300.The PDP for SSN at 19.5 Å showed an inverse relationship, while at 225 Å, this inverse relationship persisted only until SSN ∼ 140, after which it reversed, terminating in a sharp, nearly exponential relationship above SSN ∼ 400.Lastly, the PDP for Lyman-α was a nonmonotonic function at 19.5 Å, but it was a monotonically increasing function at 225 Å.

Autoregressive Models
Autoregressive (AR) models represent random processes by modeling values at future time steps as a weighted sum of values at previous time steps.Conventionally, for some AR model of order n, values of a quantity x at time t are given by where the β i are the parameters of the model and δ t is white noise (Box et al., 2015).AR models have been used for probabilistic forecasting of the disturbance storm time index (Chandorkar et al., 2017), predicting MeV electron fluxes in the outer radiation belt (Sakaguchi et al., 2015)   In our approach, we relied on an AR paradigm known as the Autoregressive Integrated Moving Average (ARIMA), which not only models the variable of interest as a function of its own prior values, but utilizes a moving average to model the regression error as a linear combination of error terms whose values occurred contemporaneously in the past.ARIMA models take advantage of non-stationarity (the mean and variance of a process vary as a function of time) and thus are suitable for forecasting solar indices, since they have been found to exhibit heteroscedasticity (Wang et al., 2018).ARIMA models have a general form given by the following: where α is a constant, β i are AR model parameters for an AR model order of p, ϕ i are moving average model parameters up to order q, and δ i are lagged forecast errors.
A key component of ARIMA models is that they employ differencing in order to enforce stationarity.This involves subtracting the previous value from the current value a total of d times.In the present study, during the application of our AR approach, we focused on forecasting daily solar indices, and set p = 33, d = 1, and q = 2, similar to the methods for short-term F10.7 AR forecasting recommended by Du (2020).An example of the solar indices hindcasted during SC24 is shown in Figures 7 and 8 below.We utilized the Python Statsmodels package to perform model fitting and forecasting (Seabold & Perktold, 2010).We briefly consider the percentage difference between average values of the residuals for each hindcasted solar index, using the expression below: where x i is the initial average value of the residuals (in this case corresponding to low solar activity) and x f is the final average value of the residuals (in this case corresponding to high solar activity).Overall, we observed that the hindcasted indices exhibited lower prediction error during low solar activity, with values of the residuals increasing by a percentage difference of P ≈ 179.43%, 156.7%, and 158.97% for F10.7, SSN, and Lyman-α, respectively, from low solar activity to high solar activity.We suspect this is primarily due to increased uncertainty owing to the impulsive nature and sporadic occurrence of active regions on the solar disk during high solar activity as suggested by Du (2020) and observed in F10.7 by Wilson et al. (1987), especially due to the fact that on time scales in excess of 1-2 days, magnetic structures tend to dominate in affecting fluctuations in solar irradiance (Solanki et al., 2003).Despite this, a detailed investigation and compensation for this behavior is beyond the scope of this paper.For more detail on AR models, we refer the reader to Shumway et al. (2017).

Validation
To assess the accuracy of the estimated irradiances, we followed an analog of the analysis of Gondelach & Linares (2021).This involved comparing estimated irradiances with native FISM2 estimates of solar EUV irradiance.Due to non-trivial degradation issues with TIMED/SEE that have worsened in severity since late 2017, and due to the limited wavelength coverage for SDO/EVE, we avoid using both of those sources as controls for the evaluation of the GAM results.Measurements for TIMED/SEE and SDO/EVE are included in select results for qualitative comparison only.For the comparisons, we define the relative irradiance error as follows: where ĪEST and ĪFISM2 indicate the daily average estimated and measured solar irradiance.For additional insight into the behavior of the relative irradiance error, we generated histograms of ɛ by wavelength band along with corresponding fits to a skew normal distribution, and we assessed variation of ɛ as a function of Day-of-the-Year (DOY) and solar activity proxied by F10.7.
We wish to note that particularly at wavelengths below 6 Å, use of Equation 6 becomes less insightful, since the characteristic irradiances are on the order of 10 8 W/m 2 /nm, and discrepancies of two or three orders of magnitude can result in values of ɛ that rapidly grow beyond hundreds of percent.Since these discrepancies are expected (as it is routine for the irradiance in these lower bands to often be measured at 0, or nearly equivalent to it in numerical precision), we also make use of the Normalized Root Mean Square Error (NRMSE) when evaluating performance as a function of wavelength band.RMSE is defined as follows (Wilks, 2011): where n represents the number of data points, and the remaining variables are defined as they are in Equation 6.We performed normalization in the same manner as Lean et al. (2009) by multiplying by 100 and dividing by the mean value of the observed values (i.e., NRMSE = 100RMSE < ĪFISM2 > ).In the determination of this statistic, we minimized the influence of outliers by computing the centered 24-hr rolling NRMSE with respect to FISM2, and considering the median values obtained across all considered wavelength bands.

Results
Results for the GAM approach are first given for integrated solar EUV irradiance across all 59 wavelength bands.We consider how well results from F K correlate with FISM2, as well as the distribution of relative irradiance error ɛ over SC24.Thereafter is detailed the behavior of the variance and skew of ε as a function of band, followed by an example of forecasting 3 days ahead with the AR approach.

Integrated Solar EUV
Figure 9 shows the overall result of the GAM approach using known solar indices.With the exception of the very end of the declining phase, we observe consistent correspondence between F K and FISM2 throughout the entirety of SC24.For Y and ζ, we use the default condition that spline terms have a penalty on their second derivative, which encourages the feature functions to be smoother.In both cases, the regularization parameter λ, which controls the strength of this penalty, was set to 0.6 for all terms.The number of samples η used to fit each model differed due to the training set for Y being ∼5.5 times greater in length than the set used for fitting ζ.We obtained important statistics for Y and ζ, including deviance explained, scale, and McFadden's pseudo R-squared (McFadden, 1973).Deviance, as detailed by Wood (2017), is defined as where l βmax ) is the maximized likelihood of the saturated model (a model with one parameter for each data point, so that the data are fitted exactly), l β) is the maximized likelihood of the fitted model, and ϕ is the scale parameter.The scaled deviance is given by In the case of Y and ζ, ϕ is estimated during model fitting, and represents the residual standard error squared, due to the use of the Normal distribution.The deviance explained Ξ then corresponds to the representing D* as the proportion of total deviance explained by the current model.We also computed McFadden's adjusted pseudo Rsquared ρ 2 adj ) as the coefficient of determination for Y and ζ, allowing us to determine the proportion of variation of integrated solar EUV irradiance predicted by the fitted model parameters, while controlling for the number of those parameters (more detail may be found in Long and Freese (2006)).Succinctly, this statistic gives us an idea of how much variation in each of the model parameters affects changes in the irradiance.Table 2 below shows the values of η, Ξ, and ρ 2 adj for both Y and ζ.
While we expect values of ρ 2 adj to run lower than conventional R 2 , as shown in Figure 5.5 of McFadden (1973), we note that for both Y and ζ, values of ρ 2 adj are sufficiently high (especially for the former case) to show excellent model fit for both parameterizing FISM2 irradiance and the associated residuals with respect to native FISM2 with a GAM, respectively.
Next, we consider how well the GAM correlates with native FISM2 over SC24. Figure 10 shows the correlations for F and FISM2.The correlation is significantly positive with a value of Pearson's Correlation Coefficient of 0.992.We observe that the linear fit between the outputs of F and that of native FISM2 suggests a tendency for overestimation that slightly increases as a function of solar irradiance.We attribute this primarily to the relatively short period over which ζ was fit in comparison to Y, though we note that the overestimation grows only from ~0.0003 to ~0.001 W/m 2 throughout the entire range of irradiance values, showing a high degree of consistency.
We also considered the variation of relative irradiance error ɛ over the solar cycle in general, as shown on the left in Figure 11.To illustrate the improvement afforded by the inclusion of ζ, we show the relative error over SC24 between both Y K and native FISM2 as well as F K and native FISM2.
For added clarity we generated histograms of ɛ for Y K and F K with respect to FISM2, and skew normal distributions were fit in order to elucidate the statistical behavior of the ɛ (on the right of Figure 11).The resulting parameters describing each distribution can be found in Table 3.We consider in particular the shape (α), location (ξ), scale (ω), kurtosis (κ), and skewness (γ).We observe a growth in the value of α after applying ζ, which, along with a decreased value of κ, indicate that the effect of the inclusion of the second GAM is to draw mean error closer to zero and constraint the majority of errors to clustering around values in the vicinity of ~2.5%.The negative excess  kurtosis shown by the value of κ for F K also indicates that the inclusion of ζ results in a reduction of the likelihood of errors attaining values more extreme that corresponding to a normal distribution.

Dependency of ɛ on Season and Solar Activity
To evaluate dependency on season and solar activity, we sorted values of ɛ by DOY and F10.7 (Figure 12).Linear fits to the sorted data resulted in slopes (m) and intercepts (b) suggesting a slightly negative correlation (Pearson's R ≈ −0.303) between season and relative error, while the converse is true regarding dependency on solar activity (Pearson's R ≈ 0.134).We note that the clustering of values of ɛ plays a role in affecting the resulting linear fits, particular for dependency on solar activity.The coefficient of determination R 2 for the linear fit of ɛ to DOY was ~0.092, while it was ~0.018 for F10.7.These values indicate no statistically significant relationship between DOY or solar activity and relative irradiance error.

Behavior of ɛ as a Function of Wavelength
Next, we consider the behavior of mean μ ε i ) and standard deviation (σ ε i ), kurtosis (κ ε i ), and skew γ ε i ) relative irradiance error as a function wavelength band.We additionally compare the values corresponding to F K i with those of TIMED/SEE and SDO/EVE with respect to FISM2, in select bands.We show these results for wavelengths between 4 and 1,750 Å, inclusive (Figure 13).For the bin centered at 1.5 Å, we observed that μ ε 0 ≈ 125.18, σ ε 0 ≈ 1546.73,κ ε 0 ≈ 613.29, γ ε 0 ≈ 23.30, and for the bin centered at 3 Å, we observed that μ ε 1 ≈ 355.44, σ ε 1 ≈ 4800.59,κ ε 1 ≈ 28.22, γ ε 1 ≈ 955.11.These values, particularly for μ ε i and σ ε i , differed significantly than those of the other bands, hence their suppression in Figure 13.In these two wavelength bands, FISM2 irradiances routinely were at values of zero, outside of which they would oscillate according to a pattern typical of neighboring wavelength bands and solar indices.By virtue of its construction, the GAM models quantities smoothly, and therefore would occasionally return negative values of irradiance in these bands over regions where it the prediction should be zero.In these cases, we manually zeroed the model estimate.Unfortunately in cases where FISM2 values were nonzero but exceedingly small (on the order of (10 11 W/m 2 and less) and the GAM result was greater by 1 or 2 orders of magnitude, and in the converse, we observed a dramatic increase in the estimates of relative irradiance error.In these bands, when temporal averaging is employed over a rolling 30-day period, the values of μ ε i in particular, are reduced to 87.07% for the bin centered at 1.5 Å, and to 84.55% for the bin centered at 3.0 Å (Figures 14 and 15).This is expected, as the effect of the temporal averaging is to act as a low-pass filter that excludes high frequencies corresponding to the most extreme deviations of model results  from FISM2; the temporal averaging has the effect of reducing the impact of outliers.It is likely that the use of a non-Gaussian error model in the fitting of the GAMs could serve to reduce the effect of these outliers.
In order to understand the performance of the approach of the present work without recourse to temporal averaging, we also considered NRMSE values across bands.We first considered values of the rolling NRMSE computed in centered 24-hr windows (not shown).In the wavelength bands centered at 1.5 and 3.0 Å, we still observed occasional large spikes in NRMSE similar to that of ɛ, so we initially computed the median NRMSE value as a function of band in order to minimize the influence of outliers (Figure 16).We observe a behavior of the median values of NRMSE similar that of μ ε i , with values for the GAM approach lower than that of SDO/EVE in only wavelength band (centered at 275 Å) and lower than TIMED/SEE in all wavelength bands.Regarding the first two wavelength bands, we observe median NRMSE values of ∼59% for both.Though these are the highest values observed for all wavelengths, we note that they remain below the highest median values for SDO/EVE and TIMED/SEE, which at 1,075 Å were ∼83% and ∼199%, respectively.
Regarding μ ε i , we observe no values in excess of 10% for the GAMs F K i in any wavelength band.In comparison, TIMED/SEE and SDO/EVE show values of μ ε i in excess of in excess of 10% in 8 and 3 of their 16 and 21 wavelength bands, respectively.Only a single wavelength band centered at 475 Å, was μ ε i for TIMED/SEE lower than that of F K i , and then only by an absolute difference of ɛ i ∼ 0.24.In comparison, values of μ ε i for SDO/EVE were lower than that of F K i in 7 of the 21 considered bands, with the most appreciable performance exhibited by SDO/EVE at 125, 175, 225, and 275 Å.We highlight that the values of μ ε i we observe are indicative of a systematic tendency of TIMED/SEE to overestimate FISM2, owing in part to continued degradation and reliance on calibrations corresponding to rocket measurements for which the most recent rocket flight was 2012.The daily calibrations performed for SDO/EVE partially contribute to its greater correspondence to FISM2 in several wavelength bands, such as that centered at 375 Å, as shown in Figure 17.
Results for σ ε i show that for F K i , there is a general downward trend as wavelength increases, and for 46 of the 59 wavelength bands considered (∼78%) we observed values of σ ε i under 5%.These values show tight clustering of   The top shows the original results before temporal averaging, and the bottom shows the results after a rolling centered 30-day average was applied to both FISM2 and F K 0 outputs.We note that histograms were constructed using 50 bins set within the widest range afforded by the 5th and 95th percentiles of both data sets.distributions of relative irradiance error for F K i that indicative of the most favorable performance for the GAM approach in particular above 250 Å.For TIMED/SEE, all values of σ ε i were in excess of 2%, whereas for SDO/ EVE, values of σ ε i all were below those of TIMED/SEE and but in excess of F K i in all but 2 wavelength bands (75 and 1,075 Å).
For κ ε i , for F K i , we observe a general decrease of kurtosis as wavelength increases, with distributions of ɛ i having positive excess kurtosis (leptokurticity) in 34 of the 59 bands and negative excess kurtosis (platykurticity) in 25 of the 59 bands.By inspection, we observe a cutoff at ∼1,000 Å below which leptokurticity dominates and above which we only observe platykurticity.Given that platykurtic distributions produce fewer and less extreme outliers than the normal distribution, we observe again that values of κ ε i show that the GAM approach is again more favorable as wavelength increases.For TIMED/SEE and SDO/EVE, we observe platykurticity in only 1 and 3 of their respective considered bands.Regarding skew, we observe variable skew for F K i that is primarily positive below ∼1,050 Å, after which it is remarkably consistent in the vicinity of ∼0.35.Positive skewness is observed in 39 of the 59 bands for F K i , with a trend that mirrors that of μ ε i .This indicates that over ∼66% of the solar EUV spectrum considered, relative irradiance error is most likely to deviate in the positive direction, a result indicative of a minimal but consistent tendency for the GAMs to overestimate FISM2.

Short-Term Forecasting
The suitability for the GAM approach for forecasting was evaluated through hindcasts of daily solar EUV irradiance integrated between 1 and 1,750 Å.These hindcasts were performed during 30 days of low solar activity during the beginning of SC24 and 30 days high solar activity during the peak of SC24.As before, we assess this suitability first for solar EUV irradiance integrated across all 59 wavelength bands, followed by an evaluation of the behavior of ɛ as a function of wavelength.
In the case of integrated daily solar EUV irradiance, we observe that due to the behavior of the residuals in the hindcasted solar indices, values of ɛ are lower for low solar activity (average of 0.24%) than high solar activity (average of 1.68%) (Figure 18).The movement of the mean value of ɛ from negative to positive from low to high solar activity indicates the predilection of the GAM approach, as applied in the present work, to generally overestimate values of the solar irradiance during high solar activity when forecasting.We also observe a shift in the standard deviation σ ɛ from 0.512% to 3.5%, corresponding to an increase by a percentage difference of ∼148.95%.In absolute terms, this change of P is nearly identical to that corresponding to the mean, which grew by a percentage difference of 150%.No absolute values of ɛ were observed to exceed 10%, and were comparable to values resulting from successful companion techniques such as the Air Force Data Assimilative Photospheric Flux Transport (ADAPT), which relies on comprehensive estimates of the solar magnetic field distribution to derive estimates of EUV irradiance (Arge et al., 2010;Henney et al., 2015).
In closing, we consider values of ɛ as a function of wavelength band, for both low and high solar activity (Figure 19).We focus in particular on the quantities μ ε i and σ ε i , which show the most variability as wavelength increases.We note that values of μ ε i often show opposite sign from low to high solar activity.In particular, during

Space Weather
10.1029/2023SW003680 low solar activity, values of μ ε i are most often negative below 375 Å (associated with higher values of σ ε i ), they oscillate between the boundaries of ±5% between 375 and 1,000 Å, while they remain negative but increasingly close to zero above 1,000 Å.For high solar activity, values of σ ε i show the same decreasing trend as a function of increasing wavelength that is observed at low solar activity, but their baseline is notably higher, indicating a spread in errors indicative of lower accuracy during the peak of the solar cycle.Additionally, although the oscillatory behavior is again observed between 375 and 1,000 Å, the spread is slightly larger, extending up to ±10%, while values below 375 Å and above 1,000 Å show systemic overestimation.Overall, we observed the most favorable performance at low solar activity for wavelengths in excess of 1,000 Å, while for wavelengths below 1,000 Å, results indicate that forecasted values are on average likely to have absolute relative errors at most in the vicinity of 10% when forecasts are on the order of 3 days.This performance, however, becomes less reliable at wavelengths below 10 Å, where difficulty forecasting sharp declines in irradiance in the vicinity of zero can result in significant uncertainty.Overall, and coupled with the NRMSE results in Figure 16, we observe favorable results for the GAM approach in all wavelength bands that are comparable with and routinely exceed the accuracies of the measurements from TIMED/SEE and SDO/EVE, with forecasting errors that are comparable with companion approaches that rely on much more comprehensive information.This speaks to the fidelity of the GAM approach in retaining the statistical characteristics of the FISM2 estimates even when parameterized with a constrained set of solar drivers.

Conclusions and Discussion
When fitted appropriately, the GAM approach demonstrates itself as robust, statistically well-grounded, and accurate for representing solar irradiance in multiple wavelength bands.As shown for the case concerning integrated solar EUV irradiance, a robust GAM may be constructed between integrated solar EUV irradiance from FISM2 and only three solar indices with minimal sacrifice of statistical characteristics of estimated irradiance.This demonstrates the power GAMs for capturing non-linear behavior with limited drivers The present work also highlights the degree to which FISM2 remains a powerful and versatile empirical paradigm for modeling of solar EUV.FISM2's capacities are inherent in its construction using three solar irradiance data sets (from SDO/EVE, SORCE/SOLSTICE, and SORCE/XPS), three solar proxies (F10.7,Mg-II, and Lyman-α), and four additional solar proxies primarily from emission lines measured by SDO/EVE.When the GAM approach is applied in two steps: (a) an initial fit between solar indices and FISM2 outputs, and (b) a second fit between solar indices and residuals between the initial fit and FISM2, the resulting model that is the sum of the two fits can be run entirely independent of FISM2 for the purposes of nowcasting and forecasting.The approach yields robust performance over an entire solar cycle, as shown by absolute mean values of ɛ under 10% across the overwhelming majority of wavelength bands considered.
We additionally observe that combined with well-principled autoregressive model approaches for forecasting solar drivers, the GAM performs well in the context of short-term 3-day forecasts, with the resulting absolute forecast errors again regularly attaining values below 10% for both low and high solar activity, on par with companion techniques that rely on estimates of the solar magnetic field or utilize neural networks (Stevenson et al., 2022).As noted by Chamberlin et al. (2020), FISM2 models solar cycle variations using a 108-day smoothed value of proxies and measurements, centered on the present day, but in an operational context, must resort to use of the previous 54 days, plus whatever days after that are available.The GAM approach presented in this work circumvents this difficulty through parameterization only on the daily average of underlying solar drivers, making it particularly appropriate for operational use.While our approach has demonstrated suitability for nowcasting and forecasting solar EUV irradiance, it experiences some limitations, particularly in the context of heightened forecast errors during solar maximum and at wavelengths below 6 Å.This is not surprising, since this wavelength range is dominated by solar flares, which are significantly more challenging to predict than quiescent irradiance changes.We contend that these drawbacks are attributable to (a) the choice of solar driversimproved performance may be achieved with the inclusion of other drivers such as Ca-II, Mg-II, S10, allowing for greater capturing the influence of solar chromospheric activity and solar active regions, which are particularly important during solar maximum and (b) the fitting of ζ only during the descending phase of a single solar cycle.Use of the descending phase only results in difficulty capturing variability of the residuals to a similar degree as Y captures the variability of integrated solar EUV irradiance due the usage of fewer samples.We contend that the fitting of ζ over multiple solar cycles would thus decrease kurtosis and scale of the resulting skew normal distribution of ɛ.Additionally, the choice of a Gaussian error model during the fitting of the GAM may be primarily responsible for the decreased performance observed below 6 Å. Improved performance may be obtained using a heavy-tailed error model instead (Lee & Nelder, 2006).Likewise, given that it has been determined that the mutual relationship between solar indices such as F10.7 and SSN show quasi-linearity that is dependent on the degree of temporal averaging (Clette, 2021), it is worth investigating how such averaging can lead to improved representation of solar EUV using the GAM approach in wavelengths below 6 Å that have shown difficult to model with a high degree of accuracy in the present work.In order to contextualize this investigation, we contend that it should be placed in the context of how the resulting irradiance estimates affect downstream ionospheric and thermospheric parameters in a coupled thermosphere-ionosphere model.We additionally are aware of the constraints in accuracy related to the AR approach to forecast model drivers.As such, we will consider the evaluation of improvements to the approach of the present work with a multi-step and/or dynamic solar driver forecasting approach as described by Daniell and Mehta (2023).
We emphasize that the principal power of this approach is its applicability for forecasting.GAMs constructed in various wavelength bands in the manner described in this paper enable forecasts of solar EUV irradiance directly from a reduced number of solar indices as drivers.With robust approaches to solar index forecasting, the GAM approach can be used to obtain much more comprehensive and accurate solar EUV forecasts for ingestion into thermospheric models, allowing for the reduction of thermospheric density errors.This is crucial especially for short-term forecasts that are needed to reduce the uncertainty of atmospheric density for satellite collision avoidance (Bussy-Virat et al., 2018).Future work will involve the improvement of the GAM approach with the use of rigorous statistical methods such as Feature Ordering by Conditional Independence (FOCI) (Azadkia & Chatterjee, 2021), and extended application through principled medium and long-term solar driver forecasting for prediction of solar EUV irradiance on multiple timescales with quantified uncertainties.

Figure 1 .
Figure 1.The solar EUV spectrum on 30 April 2010: (left) the spectrum in base units, and (right) the spectrum on a logarithmic scale.In both figures, blue represents FISM2 estimates, cyan represents TIMED/SEE measurements, and red represents SDO/EVE measurements.

Figure 2 .
Figure 2. Time series data for solar EUV irradiance centered at 19.5 Å (top), solar EUV irradiance centered at 225.0 Å. (second from the top), F10.7 (middle), SSN (second from the bottom), and Lyman-α (bottom).The top two solar EUV plots show data for FISM2 and for the GAMs Y i (initial fit) and F i (initial fit-model for residuals).The model fits were performed between the beginning of SC20 and the peak of SC23.

Figure 3 .
Figure 3.The results of fitting the GAM ζ i to model residuals for solar EUV irradiance centered at 19.5 Å and at 225 Å.The time period over which this fit occurred corresponds to the descending phase of SC23.

Figure 4 .
Figure 4.A closer examination (during the descending phase of SC22) of the same model results as shown in the top two plots of Figure 2.

Figure 5 .
Figure 5. PDPs for the GAM Y i used to model solar EUV irradiance centered at 19.5 Å. Clockwise from the top-left: the PDP for DOY, for F10.7, SSN, and Lyman-α.In each plot, the PDP itself is in blue, and the red dashed lines are the corresponding 95% confidence bands.

Figure 6 .
Figure 6.The same as Figure 5, but for solar EUV irradiance centered at 225 Å.

Figure 9 .
Figure 9. Results of the GAM approach applied to solar EUV irradiance integrated across all considered wavelengths, showing the entirety of the training, correction, and test sets (top) and a zoom in on the test set alone (bottom).In the top, the training region over which Y was fitted is shaded gray, the correction region over which ζ is fitted is shaded yellow, and the test region over which F is evaluated is shaded green.The pale orange shaded region in both figures corresponds to the 95% confidence interval for F K .

Figure 10 .
Figure10.The linear relationship between the GAM outputs from F K and FISM2, for solar EUV irradiance integrated across all 59 wavelength bands.The blue line corresponds to the line-of-best fit, and the red shaded region corresponds to the 95% confidence interval.

Figure 11 .
Figure11.Left: Relative error (ɛ) for Y K and F K with respect to FISM2 integrated solar EUV irradiance throughout the entirety of SC24.These results demonstrate laudable performance at under 5% relative error for F K throughout nearly the entirety of SC24.They additionally show that these improvements are possible in part due to the action of ζ in reducing error.Right: Relative error (ɛ) histograms for Y K (top) and for F K (bottom), with respect to FISM2 integrated solar EUV irradiance for the entirety of SC24.The action of ζ is observable in its moving the center of the distribution of errors closer to zero for F K compared to Y K , as well as reducing the width of the distribution.

Figure 12 .
Figure 12. ɛ for integrated solar EUV irradiance across 59 wavelength bins as a function of season (top) and ɛ as a function of F10.7 (bottom).The coloring of the data in each plot relates to the density of the data points, which in each plot, have been distributed over 50 bins in both axes.

Figure 13 .
Figure 13.Clockwise from the top left: Mean, Standard Deviation, Kurtosis, and Skew of relative irradiance error as a function of wavelength band, for GAMs F K i , TIMED/SEE, and SDO/EVE, with respect to FISM2.The bands shown exclude the first two from Table1, and are only those between 4 and 1,750 Å inclusive.The only values not shown on a symmetric log scale are σ ε i .

Figure 14 .
Figure 14.(Left-top and bottom) Solar EUV irradiance in the bin centered at 1.5 Å over SC24.The top shows native FISM2 outputs in blue and GAM outputs in orange.The bottom shows the a rolling 30-day average of the same.(Right-top and bottom) Histograms of the ɛ between F K 0 and FISM2.The top shows the original results before temporal averaging, and the bottom shows the results after a rolling centered 30-day average was applied to both FISM2 and F K 0 outputs.We note that histograms were constructed using 50 bins set within the widest range afforded by the 5th and 95th percentiles of both data sets.

Figure 15 .
Figure 15.The same as Figure 14 but for EUV irradiance in the bin centered 3 Å.

Figure 16 .
Figure 16.Median rolling daily NRMSE as a function of wavelength band, during SC24.

Figure 17 .
Figure 17.Time series of solar EUV irradiance centered at 375 Å, during a period of time corresponding to uninterrupted coverage provided by SDO/EVE during SC24.

Figure 18 .
Figure 18.Time series of hindcasted integrated solar EUV irradiance during low solar activity (top) and high solar activity (bottom) during SC24.The light shaded orange region denotes the 95% Confidence Interval for the GAM results.The y-axes have been harmonized to show irradiance values that span 0.007 W/m 2 (left axis) and values of ɛ between 6% and 10% (right axis).

Figure 19 .
Figure 19.Mean (left) and standard deviation (right) of ɛ i as a function of wavelength band, during low (blue) and high (red) solar activity during SC24, on a symmetric logarithmic scale.

Table 1
Solar EUV Irradiance Wavelength Bins Considered for Analysis: The Salmon-Colored Rows Are for Bins Which There Was SDO/EVE Data Alone, the Orchid-Colored Rows Are for Bins for Which There Was TIMED/SEE and SDO/EVE Data, and the Sky Blue Row Is for Where There Was Only TIMED/SEE Data BRANDT ET AL.

Table 2
Statistics for the Components of F for Representing Integrated Solar EUV Irradiance, Rounded to Three Decimal Places

Table 3
Statistics for Skew Normal Distributions of ɛ for Y K and F K , Rounded to Three Decimal Places BRANDT ET AL.