Bayesian Inference and Global Sensitivity Analysis for Ambient Solar Wind Prediction

The ambient solar wind plays a significant role in propagating interplanetary coronal mass ejections and is an important driver of space weather geomagnetic storms. A computationally efficient and widely used method to predict the ambient solar wind radial velocity near Earth involves coupling three models: Potential Field Source Surface, Wang-Sheeley-Arge (WSA), and Heliospheric Upwind eXtrapolation. However, the model chain has eleven uncertain parameters that are mainly non-physical due to empirical relations and simplified physics assumptions. We, therefore, propose a comprehensive uncertainty quantification (UQ) framework that is able to successfully quantify and reduce parametric uncertainties in the model chain. The UQ framework utilizes variance-based global sensitivity analysis followed by Bayesian inference via Markov chain Monte Carlo to learn the posterior densities of the most influential parameters. The sensitivity analysis results indicate that the five most influential parameters are all WSA parameters. Additionally, we show that the posterior densities of such influential parameters vary greatly from one Carrington rotation to the next. The influential parameters are trying to overcompensate for the missing physics in the model chain, highlighting the need to enhance the robustness of the model chain to the choice of WSA parameters. The ensemble predictions generated from the learned posterior densities significantly reduce the uncertainty in solar wind velocity predictions near Earth.


Introduction
The ambient (or background) solar wind is the long-lived large-scale plasma that emanates from the Sun and travels into interplanetary space, which excludes interplanetary coronal mass ejections and other transient events.It is crucial to accurately predict the ambient solar wind since interplanetary coronal mass ejections, the primary source of extreme space weather events, are modeled as perturbations to the ambient solar wind [35].Additionally, corotating interaction regions between fast and slow ambient solar wind streams are drivers of moderate space weather events [48].In fact, corotating interaction regions have been found to contribute to 70% of geomagnetic activity at Earth during solar minimum and about 30% during solar maximum [43].Thus, generating reliable predictions of the ambient solar wind is essential for improving space weather prediction capabilities and for accurately assessing the risk of space weather events.
State-of-the-art ambient solar wind models couple two regions: the corona and heliosphere.The coronal domain spans from the surface of the Sun (1R S , i.e., one solar radius) up to the coronal outer boundary, which is typically set to a distance between 2.5R S to 30R S depending on the model that is used.The solution at the coronal outer boundary is extrapolated into the heliospheric domain up to Earth's orbit and beyond.High-fidelity simulations of the ambient solar wind are constructed via timedependent magnetohydrodynamic (MHD) models, such as the Magnetohydrodynamics Algorithm outside a Sphere [23,29,51] and the Space Weather Modeling Framework [62,63].Such MHD models simulate the ambient solar wind by relaxing the coronal and heliospheric simulations to a steady-state solution, requiring high computational costs.In an effort to reduce simulation time (especially in operational settings), the space weather community commonly uses lower-fidelity models based on reduced physics and empirical relations.A well-established and widely-used chain of lower-fidelity models couples the Potential Field Source Surface (PFSS) [2,55], Wang-Sheeley-Arge (WSA) [4], and Heliospheric Upwind eXtrapolation (HUX) [45,44,16] models.
There has been a continuous effort in the space weather community to improve ambient solar wind models by comparing the model predictions to in-situ spacecraft observations [42].Such in-situ observations can also be leveraged to reduce prediction uncertainties stemming from initial conditions, boundary conditions, fitting parameters, numerical errors, measurement noise, etc.Here, we study parametric uncertainty in the PFSS→WSA→HUX model chain.We rigorously examine the uncertainty and sensitivity of eleven parameters, such as the source surface height and WSA numerical parameters, and their impact on the solar wind radial velocity predictions near Earth.Quantifying and reducing the parametric uncertainties in the ambient solar wind models is critical for making informed decisions in operational settings.
As a core contribution of this work, we present a comprehensive uncertainty quantification (UQ) framework to advance the use of rigorous UQ techniques in space weather.The proposed UQ framework is described in the following steps and illustrated in Figure 1.First, we perform variance-based global sensitivity analysis to identify which parameters influence the solar wind predictions near Earth the most.Subsequently, parameters that hardly contribute to the prediction variability are set to their fixed deterministic values, which facilitates a posteriori parameter dimensionality reduction.Then, we apply Bayesian inference to uncover the posterior density of the influential parameters, which is the conditional probability of the influential parameters given observational data.Lastly, we sample from the learned posterior densities and generate an ensemble of the ambient solar wind predictions near Earth, which demonstrates that the UQ framework reduces the parametric uncertainty in the predicted solar wind velocity.
Sensitivity analysis quantifies the contribution of parametric uncertainty on the variability of a quantity of interest (QoI).Common methods can be classified into two groups: local and global.Local sensitivity analysis methods vary the parameters about a deterministic value by computing local partial derivatives, whereas global sensitivity analysis methods account for variance effects in the entire parameter space [54].We use variance-based global sensitivity analysis by estimating Sobol' sensitivity indices [59].A sensitivity analysis study by [19] estimated the Sobol' sensitivity indices associated with uncertain parameters in the MHD Alfvén Wave Solar atmosphere Model [63].A recent study by [41] used Morris screening [33], a hybrid local/global sensitivity analysis method, to identify the most influential parameters in the WSA model.The Morris screening method averages local derivative approximations to provide global sensitivity measures [57, §15.2].It is typically used when variance-based methods are prohibitively expensive since it can only rank the parameters based on their importance but, unlike variance-based methods, does not quantify the relative contributions of each parameter to the QoI variance.Our study differs from [41] since we consider parametric uncertainty stemming not only from WSA but also from PFSS and HUX.Additionally, our study differs from [41] since we compute full global sensitivity information.
Parameter estimation techniques are typically divided into two approaches: frequentist and Bayesian.The frequentist approach seeks to find a single 'optimal' value of each parameter by solving an optimization problem.For example, maximum likelihood estimation [30], such that the optimal values of the parameters minimize the difference between the model prediction and observational data.[50], [41], and [21] took a frequentist approach to find the optimal parameters in the WSA model.This Figure 1: A comprehensive UQ framework that allows for ensemble predictions with reduced uncertainties.The framework proceeds in the following steps: First, we identify the problem space and parameters that have uncertainties.Second, through global sensitivity analysis, we find out which of these parameters significantly impacts the variance of the quantity of interest.Third, we infer the full distribution of the most important parameters through Bayesian inference.Fourth, we make ensemble predictions with these newfound parameter distributions.
paper presents a Bayesian approach to learning the uncertain parameters in the PFSS→WSA→HUX model chain.The Bayesian approach views the parameters as random variables and seeks to learn their posterior density.In the Bayesian setting, the solution to the UQ inverse problem is represented by a probability density function of the parameters.In stark contrast, the frequentist approach yields a point estimate.Thus, the Bayesian approach provides a complete picture of the uncertainty associated with the model parameters.Subsequently, relevant point estimates, such as the maximum a posteriori (MAP), variance, and mean, can all be computed from the posterior density.Here, we use Markov chain Monte Carlo (MCMC) [28,13] to learn the posterior densities of the most influential parameters.In particular, we employ the MCMC affine invariant ensemble sampler [10,9], which is robust to different scales in the parameters.
The main questions we seek to answer via the proposed UQ framework are: (1) How does parametric uncertainty in the PFSS→WSA→HUX model chain impact the uncertainty in the solar wind velocity predictions near Earth?Can we reduce such uncertainties using Bayesian inference methods?(2) What are the most influential parameters in the model chain?(3) How do the posterior densities of the influential parameters change over time?Is there a clear trend in the posterior evolution?(4) Is the model chain robust to the choice of its parameters?Is it reliable enough to be used for real-time operational forecasting?
This paper is organized as follows.Section 2 describes the models and observational data used in this work.Section 3 discusses variance-based global sensitivity analysis, an algorithm to compute Sobol' sensitivity indices via Monte Carlo integration, and numerical results.In Section 4 we discuss Bayesian inference via MCMC algorithms and numerical results.Section 5 then offers conclusions and an outlook to future work.

Ambient Solar Wind Model Chain and Observational Data
We consider the coupling of three well-established models: (1) Potential Field Source Surface (PFSS), (2) Wang-Sheeley-Arge (WSA), and (3) Heliospheric Upwind eXtrapolation (HUX), to predict the ambient solar wind radial velocity near Earth.[40,41] and [5] use a similar chain of models with the addition of the Schatten Current Sheet (SCS) model, developed by [55], resulting in the following model chain: PFSS→SCS→WSA→HUX.The SCS model is added to correct the PFSS radial magnetic field latitudinal variations to match Ulysses' observations [65].However, a recent study by [21] showed that adding SCS to the chain of models did not necessarily improve the accuracy of the solar wind radial velocity predictions at L1 (during 2006-2011).We thus analyze the PFSS→WSA→HUX model chain.
The PFSS model is used to predict the magnetic field in the coronal domain.The PFSS magnetic field solution is then used as an input to the WSA relation, which computes the solar wind radial velocity at the outer boundary of the coronal domain.The WSA results are then set as the initial condition for the HUX model, which extrapolates the solar wind radial velocity into the heliospheric domain.Finally, the model chain solar wind radial velocity predictions are compared with ACE spacecraft in-situ observations.We use synoptic magnetograms from Global Oscillation Network Group (GONG) as the inner boundary condition for the PFSS model.A flowchart of the models and data used in this study is shown in Figure 2. The subsequent sections explain the different components of the model chain and observational data in further detail.

Potential Field Source Surface (PFSS) Model
The PFSS model proposed by [2] and [55] solves for the coronal magnetic field B(r, θ, ϕ) = [B r (r, θ, ϕ), B θ (r, θ, ϕ), B ϕ (r, θ, ϕ)] from the photosphere (the visible surface of the Sun) to the outer radius called the source surface.The PFSS model assumes that beyond the source surface, the magnetic field is purely radial, i.e. open magnetic field lines are carried into interplanetary space by the solar wind.Additionally, the PFSS model neglects the coronal electric current density since, above the photosphere, there is a large decrease in particle density and a smaller decrease in magnetic field strength [20]; it also assumes that the corona is electrostatic since during solar minimum, the corona evolves slowly, and features can last for several Carrington rotations (CRs).It is important to mention that some of the above assumptions hold less during specific time periods.For example, during solar maximum, the photospheric field changes more rapidly, challenging the electrostatic assumption.Additionally, [47] found that the concept of spherical source surface is more reasonable during solar maximum than during solar minimum.These assumptions (coupled with Ampère's law) lead to ∇ × B = 0, so that the magnetic field can be described by its potential B = −∇Ψ.By combining the potential description with Gauss's law (∇ • B = 0), we get Laplace's equation subject to the following boundary conditions where θ ∈ [0, π] is Carrington colatitude, ϕ ∈ [0, 2π] is Carrington longitude, R S denotes solar radii unit of distance which is 695,700km, and approximately 1/215th of an astronomical unit (AU), r ∈ [1R S , r SS ] is the radial distance from the center of the Sun, r SS is the source surface height, and g(θ, ϕ) is a given photospheric synoptic map.The PFSS model is typically solved via spherical harmonic expansion or numerical discretization methods [8,24,60].We employ the pfsspy Python package (version 1.1.2),developed by [60], for solving PFSS via finite-difference discretization and for tracing magnetic field lines.The finite-difference discretization is on a rectilinear grid equally spaced in sin(colatitude), longitude, and ln(radius) coordinates, see [60] for more details on the solver.In this study, all simulations are performed on a 180 × 360 × 100 grid resolution in sin(colatitude), longitude, and ln(radius), respectively, i.e. we solve for 6.48 × 10 6 states.As an illustrative example, Figure 3(a) shows the radial magnetic field at the photosphere (inner boundary) for CR 2053 obtained by the GONG synoptic maps (see Section 2.4.1) and Figure 1(b) shows the radial magnetic field results at the source surface (outer boundary), which is set to r SS = 2.5R S for this example.

Wang-Sheeley-Arge (WSA) Model
The WSA model developed by [4] is a semi-empirical model of the ambient solar wind velocity in the inner-heliosphere, which fuses the Wang-Sheeley (WS) model developed by [64] with the distance to the coronal hole boundary (DCHB) model developed by [46].The WSA model (coupled with the MHD Enlil model) is used in operational forecasting at the National Oceanic and Atmospheric Administration (NOAA) Space Weather Prediction Center [36].The WSA model is given by , where v 0 and v 1 correspond to the minimum and maximum solar wind velocities, d is the minimum angular distance that an open field footpoint lies from a coronal hole boundary, f p is the magnetic field expansion factor, and α, β, γ, δ, w, ψ are additional tunable parameters.The magnetic expansion factor f p is derived from the coronal magnetic field by tracing down field lines from the source surface to the photosphere, namely where B r (r, θ, ϕ) is the radial magnetic field component, and the subscripts p and SS refer to the field line trace at the photosphere and solar surface, respectively.The distance to the coronal hole boundary d is also derived from the coronal magnetic field solution via a two-step approach.First, the coronal hole regions are identified by tracing field lines from the photosphere to the source surface and detecting the footpoint of all open magnetic field lines, i.e. coronal hole regions.Second, the great-circle angular distance d is computed between the footpoints of the open magnetic field lines to the nearest coronal hole boundary.To illustrate these concepts, Figure 3(c) presents the magnetic expansion factor for CR 2053, and the black dashed line shows ACE's spacecraft projected trajectory.Similarly, Figure 3(d) shows the coronal hole map for CR 2053 with ACE's trajectory field line traces, which mainly trace down to low-latitude coronal holes.

Heliospheric Upwind eXtrapolation (HUX) Model
The two-dimensional HUX model developed by [45] extrapolates the coronal solar wind radial velocity into the heliospheric domain.The HUX model is based on simplified physical assumptions of the fluid momentum equation, which reduces to the following nonlinear scalar homogeneous timestationary equation where the independent variables are the radial distance from the Sun r and Carrington longitude ϕ, and the dependent variable is the solar wind radial velocity v(r, ϕ).The angular frequency of the Sun's rotation is evaluated at a constant Carrington colatitude θ [44], which is estimated by and is defined on the longitudinal periodic domain 0 ≤ ϕ ≤ 2π and r ≥ r SS .[45] suggest adding an acceleration boost to the boundary condition (before propagation) to account for the residual acceleration present in the inner heliosphere, i.e.
where v rSS (ϕ) is the radial velocity at the source surface (obtained from WSA relation), α acc is the acceleration factor, and r h is the radial location at which the acceleration ends.We discretize Eq. ( 3) via finite-differencing on a uniform mesh with 600 × 300 resolution in ϕ ∈ [0, 2π] and r ∈ [r SS , r max ], respectively.We set r max to be ACE's maximum radial distance from the Sun for the considered CR.We solve the equation using the first-order upwind scheme, see [16] for more details about the numerical scheme and stability requirements.Figure 3(e) shows the coupling of PFSS, WSA, and HUX solar wind speed predictions in comparison to ACE's in-situ observations for CR 2053, see Section 2.4.2 for more details about ACE.We set θ to be ACE's mean latitude over a CR and obtain the inner boundary velocity profile by computing the magnetic field expansion and distance to the coronal hole at ACE's projected trajectory (which are inputs in the WSA model).The solar wind velocity at ACE's trajectory is obtained by linearly interpolating the two-dimensional HUX solution along ACE's trajectory.

Global Oscillations Network Group (GONG) Synoptic Magnetograms
Deployed in 1995, the GONG synoptic magnetograms are produced every hour at GONG's six ground-based sites with identical telescopes.The six sites in California, Hawaii, Australia, India, Spain, and Chile, are distributed worldwide so that the Sun is visible at nearly all times.The lineof-sight full-disk GONG magnetograms are provided every minute by its main instrument known as Fourier Tachometer [14,12].The magnetic field strength (measured in Gauss) is determined spectroscopically using the Zeeman effect.In the presence of a magnetic field, gas spectral lines split into two or more components, and the frequency of the spectral lines depends on the strength of the magnetic field [31].
In this study, we use the GONG full CR synoptic magnetograms as the photospheric radial magnetic field g(θ, ϕ) boundary condition for the PFSS model, see Eq. ( 1), which are publicly available at National Solar Observatory's website1 .The synoptic maps are calibrated from roughly 8,000-10,000 input full-disk 10-min average magnetograms [14] and are provided at 180 × 360 resolution in Carrington sin(colatitude) and Carrington longitude, respectively.These synoptic maps are obtained over a full CR and reasonably approximate the solar conditions at quiet times of the cycle when the solar evolution is slow.Figure 3(a) above shows the GONG synoptic map for CR 2053.

Advanced Composition Explorer (ACE) in-situ Solar Wind Measurements
The NASA ACE satellite launched in 1997 is in Lissajous orbit around L1 (one of Earth-Sun gravitational equilibrium points), located about 1.5 × 10 6 km forward of Earth.The location of ACE gives about 1-hour advance warning of the arrival of space weather events on Earth.The ACE instruments measure the solar wind, interplanetary magnetic field, and high-energy particles.This study uses the solar wind radial velocity in-situ measurements provided by ACE's Solar Wind Electron Proton Alpha Monitor (SWEPAM) instrument [25].To download the radial velocity data and ACE's trajectory at a 1-hour cadence, we used HelioPy, a community-developed Python package [61], for retrieving space physics datasets from NASA's Space Physics Data Facility website2 .

Model Chain Simulations
The PFSS→WSA→HUX model chain simulations are run on the Alfvén server at the University of Colorado SWx-TREC (Space Weather Technology, Research, and Education Center), which is equipped with 2x AMD EPYC 74F3 24-Core processors (3.2 GHz) and a total 2 Tb of RAM.The model chain takes about 16 seconds to simulate on one CPU.We profiled the model chain computations and found that 98% of the total time is spent solving the PFSS model and computing the distance to the coronal hole and magnetic expansion, 1.8% is spent on solving the HUX model and less than a percent is spent on evaluating the WSA model.The sensitivity analysis results required 3×1.3×10 5 = 3.9×10 5 simulations and the MCMC results required 10 × 250 × 2.6 × 10 4 = 6.5 × 10 7 model simulations, i.e. a total of approximately 10,400 CPU hours.

Global Sensitivity Analysis
Variance-based global sensitivity analysis aims to identify the parameters that contribute the most to a given QoI variability, which can be done quantitatively via computing Sobol' sensitivity indices [59].Parameters with high sensitivity indices are classified as influential, whereas parameters with low sensitivity indices are classified as non-influential.Computing Sobol' sensitivity indices facilitate a posteriori parameter dimensionality reduction in subsequent inverse UQ tasks (such as Bayesian inference).This is established by setting non-influential parameters to their deterministic values and only considering parametric uncertainty stemming from influential parameters.Parametric dimensionality reduction is often necessary for computationally demanding models and unbiased inverse UQ methods.

Uncertain Parameters
The PFSS→WSA→HUX model chain has many parameters that are uncertain, inducing uncertainty in the solar wind velocity forecasts near Earth.All uncertain parameters in the model chain are mainly non-physical.We identified a total of eleven uncertain continuous parameters: one parameter in PFSS (source surface height), eight parameters in WSA (numerical parameters), and two parameters in HUX (acceleration parameters).Table 1 lists the uncertain input parameters and their corresponding prior densities.We set all prior densities to be uniform with reasonable ranges determined in previous parametric studies by [22, §2], [3, §2.5], [27, Eq. 9], [21, Table 1], and [50, Table 1].
The source surface radial height r SS in the PFSS model has intrinsic uncertainties since, in reality, it is non-spherical and is a function of space and time.Lowering the source surface results in more coronal holes, open flux, and strong curvature in the heliospheric current sheet, whereas raising the source surface height results in the opposite effect.[47] suggested avoiding the strict constraint of a spherical source surface by a detailed comparison of PFSS to MHD models, and [20] altered the PFSS model to employ an oblate or prolate elliptical source surface.[3] show that the source surface  [22] found that setting the source surface to 1.8R S matched best the IMF strength during the minimum of solar cycle 23.The optimal source surface heights determined in [3] and [22] do not agree and further emphasize the need for additional numerical investigation.Additionally, [22, Figure 14] and [34, Figure 3] compared the PFSS coronal holes to observed extreme ultraviolet synoptic images, their results suggest 1.5 − 1.8R S for the source surface during CR 2060.Similar to our study, [27] also considers the source surface as an uncertain input parameter and learns its density via particle filtering and WIND spacecraft observations.Based on [22, §2], [3, §2.5] and [27, Eq. 9], we allow the source surface to vary from 1.5R S to 4R S .The eight numerical parameters of the WSA model, v 0 , v 1 , α, β, γ, δ, w, ψ, similar to the source surface, cannot be directly measured and are usually adjusted for different observatories, e.g.Wilcox solar observatories and GONG [50].Additionally, [50] and [21] showed that the optimal parameter can vary greatly from one CR to the next.It is, therefore, important to understand the uncertainties in the WSA parameters and their impact on predicted solar wind speed near Earth.We set the eight parameter ranges based on previous parametric studies by [50, Table 1] and [21, Table 1].
The HUX model has two free parameters α acc and r h in the acceleration boost term, see Eq. ( 4).[45] suggest setting α acc = 0.15 and r h = 50R S .A recent study by [44] compared HUX to threedimensional MHD velocity predictions and found the optimal α acc and r h via nonlinear least-squares optimization for a few CRs spanning from CR 2029 to CR 2231.They found that the average optimal α acc and r h are 0.16 and 52.6R S , respectively.[44] took a frequentist approach to find the optimal HUX parameters.In this study, we formulate the inference problem using the Bayesian approach, which provides a complete picture of parametric uncertainty in the form of a non-parametric posterior density.From this, one can compute any relevant estimates, such as the MAP, mode, etc.We allow the two HUX parameters to vary based on physically reasonable ranges specified in Table 1.

Sobol' Indices
To introduce the notion of global sensitivity indices, let (Ω, F, P) be a probability space with sample space Ω, σ-algebra F, and the probability measure P, where X : Ω → X is a random vector with its entries being independent random variables X i for i = 1, . . .d.We denote with x = X(ω) a sample (realization) of the random vector X for a given event ω ∈ Ω.From the independence assumption, the joint probability density function (pdf) π(x) is the product of the marginals, i.e.
We assume that f is square-integrable with respect to π, such that the expectation (mean) 2 π(x)dx of the QoI are both finite.Definition 1 (Sobol' Indices) The first-order Sobol' sensitivity indices measure the main variance contribution due to the ith random input parameter, such that where Var Xi denotes the variance with respect to only the X i random input parameter, Var without subscript denotes variance involving all parameters, and X ∼i denotes all random input parameters but X i .The second-order Sobol' sensitivity indices measure the secondary variance contribution due to the interaction of the ith and jth parameters (where i ̸ = j), such that The total-order Sobol' sensitivity indices measure the total variance contributions of the ith parameter, such that where H.O.T. refers to higher-order terms.The first, second, and higher-order indices sum up to 1, such that Notice that if the total-order index T i ≈ 0, then E X∼i [Var Xi (f (X)|X ∼i )] ≈ 0, which, by the nonnegativity of the variance operator, implies that Var Xi (f (X)|X ∼i ) ≈ 0. Therefore, if T i ≈ 0, the uncertainty in X i hardly influences the variance of the QoI, and X i can be deemed as non-influential.
Sobol' sensitivity indices can not be computed in closed form except for QoIs that are integrable with respect to π(x) (the joint probability of the uncertain parameters X).Appendix A shows that the sensitivity indices can be computed analytically for the simple Wang-Sheeley model [64].However, the QoI that we consider (like most model QoIs arising from simulations of complex systems) is not integrable with respect to π(x).Thus, we need to approximate the indices numerically.

Estimating Sobol' Indices via Monte Carlo Integration
The first-and total-order Sobol' sensitivity indices described in Eqns.( 5) and ( 6) can be estimated via Monte Carlo (MC) integration, which requires N (d + 2) model evaluations, where N is the number of independent samples of X and d is the number of uncertain parameters.Since each model evaluation is independent of the other, the MC model evaluations can be easily computed in parallel.The fourstep algorithm of [53], which is based on [59] original work, is implemented as follows: 1. Draw 2N quasi-random samples of the random vector X and store them as i denotes the ith entry and jth sample of X. Quasi-MC methods generate near-random samples that aim to distribute well over the parameter space.These sampling strategies usually result in a faster rate of convergence in MC integration.We use Latin hypercube sampling developed by [26].Other common quasi-random low-discrepancy sequences are Sobol' [58] and Halton [11].
2. Define matrices C (i) for i = 1, 2, . . ., d, which are a copy of B except the ith column is replaced by A(:, i), the ith column of A, so that 3. Evaluate the QoI for each row of the matrices A, B, C (i) , denoted as A(j, :), B(j, :), C (i) (j, :), i.e. y (j) B = f (B(j, :)) ∈ R and y (j) )) ∈ R, for j = 1, . . ., N .The evaluation of y A and y B requires 2N model evaluations, whereas the evaluation of y C (i) requires d • N model evaluations, which results in a total of N (d + 2) model evaluations.
4. Estimate using MC integration the first-order S i and total-order T i sensitivity indices for i = 1, . . ., d.We use the unbiased Janon/Monod's estimator [17,32], such that A y (j) .
The algorithm intuition can be explained as follows.The first-order sensitivity estimation is based on the product of y A and y C (i) , which multiplies the QoI with input A and the QoI with input C (i) where all parameters except X i have been re-sampled.Intuitively, if X i is influential then y A and y C (i) are correlated and S i is large.We can intuit the estimation of the total-order indices T i in a similar way.The product of y B and y C (i) multiplies the QoI with input B and the QoI with input C (i) where we only re-sample X i .Thus, if X i is influential then y B and y C (i) are not correlated and T i is large.
There are many other MC Sobol' sensitivity indices estimators, see [39] for a comprehensive comparison.Saltelli's [53] and Jansen's [18] estimators are also commonly used estimators.Such estimators require N (d + 2) model evaluations to compute the first and total-order indices.Computing second-order indices requires additional model evaluation, i.e. a total of N (2d + 2) model evaluations.We thus only compute first and total-order indices, which give a strong indication if a parameter is influential or not.We compared Saltelli's, Jansen's, and Janon/Monod's estimators and found that Janon/Monod's estimator resulted in faster convergence for our study.

Global Sensitivity Analysis Numerical Results
We perform global sensitivity analysis using the PFSS→WSA→HUX model chain for CR 2048 (September 21st, 2006 to October 18th, 2006), CR 2053 (February 4th, 2007 to March 4th, 2007), and CR 2058 (June 21st, 2007 to July 18th, 2007).The three CRs occurred during the declining phase of solar cycle 23.The eleven model input parameters are listed in Table 1 and are described in Section 3.1.We use N = 10 4 Latin Hypercube samples [26] to estimate the Sobol' sensitivity indices via MC integration (Section 3.3), which requires N (d + 2) = 1.3 × 10 5 model evaluations for each CR.We consider two QoIs: the root mean squared error (RMSE) between ACE velocity measurements and the model predictions at L1 and longitude-dependent model predictions at L1 (independent of ACE observations).The results are discussed in the following subsections.

Sobol' Indices for RMSE
The QoI f (X) for which we compute the Sobol' sensitivity indices is the RMSE between the model chain solar wind radial velocity prediction at L1 and ACE at 1-hour cadence observations.Figure 4 shows the total-order indices for CR 2048, CR 2053, and CR 2058.The total-order indices for all three CRs have the same ordering for the five most influential parameters, i.e. β, γ, α, v 1 , w (listed in descending order).The five most influential parameters are all WSA model parameters.The other six parameters r SS , ψ, v 0 , δ, r h , α acc are deemed as non-influential since their total-order indices are less than 0.05.We also estimate the uncertainty in the estimated total-order indices by using bootstrapping with N = 3 × 10 3 samples and 100 replications.The box plots for each index are shown in Figure 4.The uncertainty in the estimated total-order indices does not influence the classification between influential and non-influential parameters.The total-order indices show that six out of the eleven uncertain parameters are non-influential, and subsequently, they hardly contribute to the predicted solar wind radial velocity variability at L1. We, therefore, set the six non-influential parameters to their fixed deterministic values (see Table 1) in the subsequent Bayesian inference, which facilitates a posteriori dimensionality reduction and significant computational speed up for performing MCMC.
Figure 5(a) shows an ensemble of the global sensitivity analysis model evaluations for CR 2053.We plot the median and 50% and 95% credible interval of the 1.3 × 10 5 model evaluations used to compute the sensitivity indices (which were constructed via prior density samples).A credible interval is an interval within which the ensemble members fall with a particular probability.The 95% credible interval shown in Figure 5(a) spans an excessively large range and includes non-physical solutions (for example, solar wind radial velocity at 1800 km s ).This is because the uncertainties in the model chain parameters highly influence the solar wind velocity predictions at L1.We aim to reduce such large parametric uncertainties via Bayesian inference; see Section 4. Figure 5(b) and Figure 5(c) show histograms of the RMSE and Pearson correlation coefficient (PCC) between the global sensitivity analysis simulations in comparison to ACE observations for CR 2053.The histograms show that the RMSE mean is 217.2 km s and the PCC mean is 0.5.Also, the RMSE MAP is 101.5 km s and the PCC MAP is 0.71.By reducing the uncertainty in the model parameters, we expect the ensemble to be more accurate [57, §8.1].

Longitude-Dependent Sobol' Indices
We define longitude-dependent (or time-dependent) QoI, which is the solar wind radial velocity predictions at L1 at a 1-hour cadence.In contrast to the RMSE indices, defined in Subsection 3.4.1, the longitude-dependent QoI is independent of ACE observations.Figure 6 presents the seven largest first-order S i and total-order T i indices as a function of Carrington longitude for CR 2048, CR 2053, and CR 2058.We do not plot the first-and total-order indices of ψ, δ, α acc , r h since their maximum indices (in longitude) are less than 0.05.The five most influential parameters during all three CRs are β, γ, α, v 1 , w, which agree with the RMSE indices results, see Figure 4.The first-order indices are significantly smaller than the total-order indices, which indicates that higher-order interactions between parameters influence the predicted solar wind velocity variability.Additionally, the influence of w, the width over which the solar wind ramps up from low-to high-speed flow at coronal-hole  boundaries, is mainly around a small longitudinal region.For example, during CR 2053, w is only influential from approximately 190 • to 280 • .We suspect this is because ACE's footpoints lie closer to the center of a low-latitude coronal hole at approximately 250 • to 310 • , see Figure 3(d).The large distance to coronal hole boundary d corresponds to high solar wind speed in this region, which is then advected to 190 • to 280 • at L1 (at 1  v speed, see Eq. ( 3)).Thus, w seems to be influential only in regions where d, the distance to the coronal hole boundary, in the WSA relation is large.Similarly, the influence of α seems to be correlated to f p , e.g. when f p is low at approximately 220 • to 300 • in longitude during CR 2053, see Figure 3(c), we see a decrease in the total-order index of α, see Figure 6(b).

Bayesian Inference via Markov Chain Monte Carlo Sampling
After identifying the set of influential parameters via variance-based global sensitivity analysis, our goal is to learn the uncertainties of such influential parameters, which we achieve through Bayesian inference.Bayesian parameter estimation leverages Bayes' theorem to learn the pdf of uncertain model parameters given observational data.Samples from such pdfs can be directly obtained using Markov chain Monte Carlo (MCMC) algorithms.These samples are then used to generate an ensemble prediction to quantify and reduce the effect of the parametric uncertainty on the QoI.The following subsections introduce Bayesian inference and MCMC sampling.

Bayesian Parameter Estimation
The philosophy behind Bayesian statistics is that the model parameters are random variables with an unknown pdf.This differs from the frequentist perspective, where the parameters are assumed deterministic but unknown.In the Bayesian setting, we seek to estimate the pdf of model influential parameters X given a parameter-dependent QoI f (•) (e.g.solar wind radial velocity at L1) and measurements of the QoI z = {z 1 , z 2 , . . ., z n } (e.g.ACE radial velocity measurements) taken at time instances t 1 < t 2 < . . .< t n (e.g., at a 1-hour cadence).In other words, we aim to estimate the conditional pdf π(x|z), which is referred to as the posterior density or simply posterior.The posterior density can be evaluated via Bayes' rule: where π(x) is the prior, π(z|x) is the likelihood, and π(z) is the evidence (also referred to as the marginal likelihood or normalizing constant).The parameters x are samples of the random variable X, and the observations z i are samples of the random variable Z i .Most often, the evidence can not be properly defined, so we estimate the posterior up to a normalizing constant.The posterior density can be continuously refined as more measurements are included.
The prior density π(x) is chosen based on physically meaningful ranges and previous studies in the literature; see Table 1 for the list of the uniform prior densities used in this study.Generally, the priors are not restricted to uniform densities and may weigh favorable values more heavily.However, if prior knowledge is of questionable accuracy, it is better to use non-informative priors [57, §8.1].
We assume that the QoI of the model and measurements are related via where Z i is a random variable representing the measurements at time instance t i , f (X; t i ) is the QoI at time instance t i and ϵ i is a random variable representing the discrepancies between the QoI and measurements.Here, we model ϵ i as a Gaussian random variable with zero mean and standard deviation σ ∈ R + .We note that the model discrepancies and measurement noise are modeled as additive and mutually independent of X.Thus, by the assumption of Gaussian additive error and independence of measurements, we can write the likelihood as Although we derived an expression for the prior and likelihood densities, we can not directly sample from the posterior density since the evidence (or normalizing constant) remains unknown.To overcome this issue, MCMC algorithms enable sampling from arbitrary pdfs and allow for the unbiased estimation of the posterior density, mean, and variance.

Markov Chain Monte Carlo Sampling
MCMC algorithms generate samples from an arbitrary target pdf (such as posterior pdf) by generating a random walk in the parameter space that draws a representative set of samples from the target pdf.The random walk is a Markov chain, with the property that each sample only depends on the position of the previous sample.MCMC algorithms converge to the exact target pdf as the number of samples increases.This convergence property is established by the ergodicity property of MCMC, which requires the Markov chain to be aperiodic, irreducible, and reversible with respect to the target pdf [52].
The first and most frequently used MCMC algorithm is the Metropolis-Hastings algorithm [28,13] developed at Los Alamos National Laboratory.The Metropolis-Hastings algorithm generates samples from an arbitrary pdf iteratively.The samples are drawn from a proposal density which is chosen a priori and depends on the position of the previous sample of the Markov chain.A proposed sample is then accepted or rejected with some probability.If accepted, the proposed sample is appended to the Markov chain and used to generate the next sample; otherwise, if rejected, the proposed sample is discarded, and the previous sample is appended to the Markov chain.A common choice of proposal density is the Gaussian distribution centered at the previous sample location.
We employ the affine invariant ensemble sampler (AIES) developed by [10], which we describe in detail in Appendix B. The AIES offers several advantages over the Metropolis-Hastings algorithm.Firstly, AIES has only two hyperparameters, while Metropolis-Hastings has approximately d 2 hyperparameters, where d represents the number of uncertain parameters.Secondly, AIES remains invariant to affine transformations, enabling easy sampling from anisotropic pdfs.Finally, AIES can be run in parallel, leading to considerably faster convergence (measured by the integrated autocorrelation time, see Appendix B) compared to the single-chain Metropolis-Hastings algorithm.We also elaborate in Appendix B on MCMC burn-in and convergence assessment which are important heuristics that verify the Markov chain has reached a stationary distribution.

Markov Chain Monte Carlo Numerical Results
We use the AIES sampler (described in Appendix B) to uncover the posterior density of the five most influential parameters β, γ, α, v 1 , w, for CR 2048 to CR 2058 (spanning from September 21st, 2006 to July 18th, 2007).We performed the computationally expensive global sensitivity analysis for CR 2048, CR 2053, and CR 2058, and assume the results hold for the whole time period between CR 2048 to CR 2058.We exclude CR 2051 since, during this time period, three CMEs reached L1, and the PFSS→WSA→HUX model chain does not account for transient events such as CMEs.We assume in Eq. ( 7) that the model chain solar wind radial velocity predictions at L1 and ACE measurements (at 1-hour cadence) are related via Gaussian error with mean zero and standard deviation σ = 80 km s .The standard deviation is chosen from previous parametric studies by [41, Table 1] and [21, Figure 8].The posteriors are approximated with 2.6 × 10 4 MCMC iterations, 10 3 iterations excluded for burn-in, and L = 250 walkers, resulting in M = 6.25 × 10 6 MCMC posterior samples per CR.

Markov Chain Monte Carlo Posterior Densities
The posterior densities for CR 2052 and CR 2053 in one-and two-dimensional projected parameter space are shown in Figure 7.It is apparent that the marginal posterior of v 1 is uniformly distributed (resembling the prior density in Table 1), meaning that v 1 is highly uncertain and can take any value in the prior range with equal probability.This means that the likelihood function is flat in the v 1 direction, i.e. v 1 may not be identifiable from ACE observations.The corner plot shows that β and v 1 are negatively correlated.Note that the marginal posteriors of parameters α and β have little to no support overlap in CR 2052 and CR 2053.This suggests that such parameters are difficult to predict in advance.
The marginal posterior densities for CR 2048 to CR 2058 (excluding CR 2051) are shown in Figure 8, which indicates that the posterior densities evolve from one CR to the next in a nonpredictable fashion.For example, the marginalized posterior of α has relatively small support that varies randomly from one CR to the next.We also notice that the MAP, shown in dashed vertical lines, changes greatly from one CR to the next, which agrees with the previous parametric studies by [21] and [50].Since the posterior densities vary greatly from one CR to the next, it is not possible to use the posterior samples from a given CR to create an accurate ensemble prediction of the next CR (in contrast to the adaptive-WSA method proposed by [41]).If the model chain is used for reanalysis studies, we recommend using the proposed UQ framework to generate accurate ensembles.The ensembles generated from the MCMC posterior samples will be highly accurate as the parameter posteriors are learned using observational data at L1 and will also be able to successfully capture the parametric uncertainty on the predictions.The next subsection discusses our ensemble prediction numerical results.

Ensemble Prediction
We generate an ensemble prediction based on varying model parameters using MCMC posterior samples.The ensemble members are then used to compute ensemble statistics, such as the ensemble median and prediction interval.The prediction interval accounts for both the propagated parametric where Z i is independent of the data used to construct the random variables Z l and Z u [57, §9.4.1].
We estimate the interval [Z l , Z u ] via computing the α/2th and 1 − α/2th quantiles of the set of ensemble members with added Gaussian model discrepancy errors.In general, one should be cautious when assessing the ensemble prediction via quantiles when the posterior predictive density, defined as π(Z i |Z), is multimodal.We have checked that the computed posterior predictive density is indeed unimodal for CR 2052 and CR 2053 (not shown here).Figure 9(a) and Figure 10(a) show the median and 50% and 95% prediction interval of the 5 × 10 3 ensemble members during CR 2053 and CR 2052, respectively.The ensemble is generated using posterior samples trained separately on each CR.  ), it is apparent that the ensemble generated from the posterior density is able to substantially reduce the parametric uncertainty and improve the accuracy of the ensemble prediction.Specifically, the mean RMSE is reduced from 217.2 km s to 55.7 km s , and the mean PCC is increased from 0.5 to 0.88.The results show that the proposed UQ framework is able to successfully reduce the uncertainty of the model parameters on the solar wind radial velocity prediction and should be utilized in further re-analysis studies.
Figure 9(a) and Figure 10(a) also show the model prediction with all parameters set to their deterministic values listed in Table 1, which we label as 'deterministic estimate'.The figures show that in both time periods, CR2052 and CR2053, the ensemble prediction after UQ is substantially more accurate than the deterministic estimate before UQ.In particular, the deterministic estimate underestimated the solar wind velocity in both time periods.For a more quantitative comparison, Table 2 lists the RMSE and PCC of ACE observations in comparison to the ensemble mean and deterministic estimate during CR2052 and CR2053.The numerical results show that the ensemble mean RMSE is nearly 50% smaller than the deterministic estimate RMSE for both time periods.Thus, the proposed UQ framework is also able to substantially improve the accuracy of the model's solar wind radial velocity predictions at L1.

Conclusions and Discussion
The PFSS→WSA→HUX model chain is commonly used to predict the ambient solar wind radial velocity near Earth.The model chain has eleven uncertain input parameters that can not be directly measured since they are mainly non-physical.We, therefore, propose a comprehensive UQ framework for quantifying and reducing the parametric uncertainty of the model chain.The proposed framework utilizes variance-based global sensitivity analysis to reduce the dimensionality of the parameter space, followed by Bayesian inference to learn the full parameter pdfs via MCMC.We apply the UQ framework on a time period spanning from CR 2048 to CR 2058 during the declining phase of solar cycle 23.The sensitivity analysis results show that β, γ, α, v 1 , w are the five most influential parameters in the model chain.These parameters are all WSA parameters.We learned the posterior densities of the five  1), which we label as 'deterministic estimate', and ACE observations.most influential parameters using AIES (an MCMC sampler).The posterior samples are then used to generate an ensemble prediction and quantify the parametric uncertainty in the predicted solar wind velocity.We found that the ensemble results are able to accurately quantify the uncertainty in the predictions and thus suggest the proposed UQ framework should be utilized in further re-analysis studies employing the model chain.
The Bayesian inference numerical results also show that the posterior densities vary randomly from one CR to the next.This is mainly due to the following reasons: (1) the model chain is not robust to the choice of WSA numerical parameters, and (2) the WSA model is overparameterized (i.e.needs to be reformulated for forecasting purposes).The reformulation of the WSA model will involve searching for a parsimonious model that is robust to its choice of parameters.Candidates of models that balance the trade-off between accuracy and parsimony can be found using sparse regression techniques with different regularization [7].The optimal model can later be selected via the Bayesian or Akaike information criterion [56,1].The substantial and unpredictable change in the posterior densities from one CR to the next questions the applicability of the model chain in operational real-time forecasting.
We suspect the drastic changes in the posterior densities are also due to the parameters trying to overcompensate the intrinsic and "hard-wired" limitations of each of the models (i.e.biases due to epistemic uncertainties).We next discuss such limitations.First, we do not have an accurate estimate of the photospheric fields [38].There are differences between magnetograms from different observatories.There are also different saturation levels and noise [49].Second, the PFSS solutions rely on the existence of a spherical source surface, which does not exist [47].The sensitivity analysis results show that the choice of the source surface height is non-influential on the predicted solar wind velocity at L1, yet in the analysis we assume it exists.Also, the fields are not potential, particularly around active regions.Third, the WSA model has known inaccuracies, e.g. the expansion factor in the vicinity of pseudostreamers [50], as well as unknown inaccuracies.Fourth, the HUX model assumes only radial propagation and neglects external forces and the pressure gradient [45].Fifth, time dependence is not included in synoptic maps and all throughout the model chain.Thus, the physics simplifications in the model chain introduce model discrepancies between the spacecraft observations and model predictions.We assume such discrepancies are Gaussian distributed in the Bayesian inference setup.This is generally a reasonable assumption, which is necessary in order to formulate the likelihood in the Bayesian setting, yet it is important to point out that the model chain discrepancies are structured and are not i.i.d.
Future studies can incorporate the proposed UQ framework for learning the posterior densities of uncertain parameters for various (and more complex) space weather models, for example, the WSA-Enlil model [36].It will be interesting to apply the proposed UQ framework to WSA-Enlil to make sure the WSA posteriors do not change drastically in time and verify that the WSA-Enlil model is reliable for real-time forecasting.Depending on the computational resources and computational complexity of the model at hand, one might need to incorporate surrogate models (computationally efficient approximate models), such as projection-based reduced-order models [6,15] and interpolatory surrogates [66], to compute Sobol' sensitivity indices and run MCMC.If the model is computationally efficient (i.e.order of seconds/minutes) we recommend using the MC methods presented in this study as they are unbiased estimators.Other unbiased estimators include multi-fidelity estimators, see [37] for a detailed survey.where v 0 and v 1 correspond to the minimum and maximum solar wind velocities, f p is the magnetic field expansion factor, and α is an additional numerical parameter.The Sobol' sensitivity indices described in Eqns.( 5)-( 6) for the WS model parameters v 0 , v 1 , α can be computed analytically (symbolically) if we assume the model parameters are independent and have uniform priors.In contrast, for the PFSS, WSA, and HUX models, the sensitivity indices can only be approximated numerically via MC integration, see Section 3.3.We set the priors to be uniform with ranges listed in Table 1.Table A.3 shows the Sobol' sensitivity indices of the three model parameters v 0 , v 1 , α for f p = 10, 10 2 , 10 4 .Larger f p corresponds to slower solar wind velocity, in which case v 0 becomes more influential, and v 1 becomes less influential.The second-order indices show that v 0 and v 1 do not interact and that α's interaction with v 0 and v 1 is minor compared to the first-order indices.By the first-and total-order indices of α, we can conclude that it is the most influential parameter independent of f p (in comparison to v 0 and v 1 ) which agrees with the ordering in the WSA model, see Section 3.4.

Appendix B. Affine Invariant Ensemble Sampler (AIES) and Convergence Assessment
In this study, we use the affine invariant ensemble sampler (AIES) developed by [10], which is an adaptive ensemble extension of the original Metropolis-Hastings sampler [28,13].Instead of evolving a single Markov chain, AIES evolves an ensemble of chains in parameter space called walkers.It is computationally advantageous to evolve an ensemble of chains instead of a single Markov chain since one can exploit the parallelism of the ensemble method and reach convergence significantly faster (as measured by the integrated autocorrelation time, see Eq. (B.2)).AIES is invariant under an affine transformation of the parameter space, which is particularly appealing for problems where the parameter scales vary by several orders of magnitude, i.e. highly anisotropic target pdfs.AIES can transform anisotropic pdfs to isotropic pdfs with an affine transformation, which is much easier to sample from.Additionally, other MCMC samplers typically require tuning many sampler hyperparameters; for example, Metropolis-Hastings has d 2 hyperparameters where d is the number of uncertain parameters (entries of the Gaussian proposal distribution covariance).Such tuning is often infeasible when the posterior evaluations are computationally demanding, as is the case in many space weather applications.AIES addresses this challenge by having only two hyperparameters in the stretch move [10].One hyperparameter in AIES is the number of walkers L, which is required to be greater than double the number of uncertain parameters L ≥ 2d + 1, and the other hyperparameter denoted by a is related to the stretch move, which we explain next.
The AIES stretch move is described as follows.Consider an ensemble of walkers {Υ 1 (ℓ), . . ., Υ L (ℓ)}, where ℓ = 1, . . ., M is the iteration index and L is the number of walkers.The proposed next step for an arbitrary walker Υ k (ℓ) is given by Υ k (ℓ + 1) = Υ j (ℓ) + S (Υ k (ℓ) − Υ j (ℓ)) where Υ j (ℓ) is a complementary walker in the ensemble chosen at random (where j ̸ = k), S is a random variable with density g(s) that satisfies g 1 s = sg (s).An example of such a density, proposed by [10] and implemented in the emcee Python package [9], is where a > 1 can be adjusted to improve the sampler's performance and is typically set to a = 2. Thus, the proposed next step for a given walker lies on a straight line connecting the walker's current location and another random walker in the ensemble.The acceptance probability of the next proposed step is where S is the random variable with density defined in Eq. (B.1), d is the number of uncertain parameters, and π is the target pdf.If the proposal is rejected, then Υ k (ℓ + 1) = Υ k (ℓ).
In this study, we use the Python implementation of AIES, i.e. the emcee package (version 3.1.4)developed by [9], with the stretch move, a = 2, and L = 250 walkers.We initialize the walkers by randomly sampling a Gaussian density with the mean set to the prior mean and standard deviation set to 10 −2 times the prior range.
Markov Chain Monte Carlo Burn-in.MCMC burn-in refers to the period when a Markov chain exhibits initial transient behavior unrepresentative of the target pdf.It is therefore recommended to disregard the first few iterations at the beginning of the Markov chain [57, §8.4].Burn-in is typically an artifact of selecting a low-probability initial condition and can also be thought of as a way to find a better initial condition.The burn-in length can be chosen by detecting the iteration where the target pdf evaluations start to plateau, which can be assessed visually (or statistically) by monitoring the likelihood evaluations and the marginal paths associated with each parameter as a function of MCMC iterations.We found that after 10 3 iterations, the likelihood evaluations began to plateau, meaning the Markov chains entered a region of high probability.We, therefore, disregard the first 10 3 samples in each walker, which we consider as the burn-in period.
Markov Chain Monte Carlo Convergence Assessment.Estimating the mean of a Markov chain (or an ensemble of Markov chains) is challenging since its samples are not independent and identically distributed (i.i.d.).This is because-by definition-each sample depends on the previous sample in a Markov chain.Therefore, samples drawn close to each other tend to be correlated.The MC mean estimator of an ensemble of Markov chains with L walkers and M iterations is an unbiased estimator, i.e.where τ is the integrated autocorrelation time (IAT) and C(ℓ) = lim h→∞ Cov[Υ(ℓ + h), Υ(h)] is the lag-ℓ autocovariance function.In practice, the IAT and the autocovariance function are estimated using a finite Markov chain of length M , see [9] for a more detailed discussion.The larger the IAT, the more samples are needed to converge to the target pdf.
In this study, we run the chains until their length M is at least 50 times the maximum IAT (which is computed for each parameter) as suggested by [9] and compute the estimated IAT using the Python emcee package [9].

Open Research
The public repository https://github.com/opaliss/Parameter_Estimation_Solar_Windcontains a collection of Jupyter notebooks in Python 3.9 containing the code and data used in this study.

Figure 2 :
Figure 2: A flowchart of the models (blue panels: PFSS, WSA, and HUX) and observational data (green panels: GONG and ACE) utilized in this study.The GONG and ACE images are adapted from NSO and NASA, respectively.

Figure 3 :
Figure 3: An illustration of different components in PFSS→WSA→HUX model chain during CR 2053.The PFSS radial magnetic field at the lower and upper boundaries is shown in (a) and (b).The lower boundary condition is obtained from GONG synoptic maps and the extrapolated upper boundary is shown at the source surface, which is set to r SS = 2.5R S in this example.The WSA inputs, i.e. the magnetic field expansion factor and coronal hole map, are shown in (c) and (d).In subfigure (d), the red (blue) coronal hole areas show the negative inward (positive outward) fields and the black dashed lines show the magnetic field lines traced from ACE's projected trajectory at the source surface back to the photosphere.Lastly, (e) shows a comparison of the model chain solar wind radial velocity predictions at L1 with ACE in-situ observations.In this example, we set the WSA parameters to v 0 = 250 km s , v 1 = 945 km s , α = 0.16, β = 1, γ = 0.6, w = 0.02rad, δ = 1.75, ψ = 3 and the HUX parameters to αacc = 0.15, r h = 50R S .

Figure 4 :
Figure 4: The total-order indices T i of the RMSE between the model chain and ACE observations are shown for (a) CR 2048, (b) CR 2053, and (c) CR 2058.The box plot for each index shows the uncertainty in the index estimate using bootstrapping with N = 3 × 10 3 samples and 100 replications.The box plots display the range between the first and third quartiles, with a middle line indicating the median.The whiskers represent the span from the minimum to maximum estimates.The results show that r SS , ψ, δ, v 0 , r h , αacc are non-influential as their total-order indices are lower than 0.05 (shown in dashed black horizontal line).

Figure 5 :
Figure 5: (a) Ensemble generated from prior samples of the global sensitivity analysis model evaluations for CR 2053.The credible interval shows that parametric uncertainty in the model chain results in very high uncertainty in the solar wind radial velocity predictions at L1.A histogram of the ensemble RMSE and PCC are shown in (b) and (c), respectively.

Figure 6 :
Figure 6: Longitude-dependent (top) first-order S i and (bottom) total-order indices T i for (a) CR 2048, (b) CR 2053, and (c) CR 2058.We do not plot the indices of ψ, δ, αacc, r h since their maximum first-order and total-order indices (in longitude) are less than 0.05.The parameters β, γ, α, v 1 are the most influential across all longitudinal locations, whereas w seems to be more longitudinal (or time) dependent.For example, during CR 2048, w is only influential from approximately 280 • to 360 • in longitude.

Figure 7 :
Figure 7: A corner plot of the posterior density of the five most influential parameters v 1 , α, β, w, γ during CR 2052 (blue) and CR 2053 (red).The corner plot shows the MCMC samples in two-dimensional and one-dimensional projected parameter space.The dashed line shows the estimated MAP of each parameter.

Figure 8 :
Figure 8: The marginal posterior densities of the five most influential parameters v 1 , α, β, w, γ from CR 2048 to CR 2058 (excluding CR 2051).The solid line shows the estimated MAP of each parameter.The marginal posteriors change substantially from one CR to the next.

Figure 9 (
b) and Figure 10(b) show a histogram of the RMSE of the ensemble members for each CR.

Figure 9 (
c) and Figure 10(c) show a histogram of the PCC of the ensemble members for each CR.By comparing Figures 5(b/c) to Figures 9(b/c

Figure 9 :
Figure 9: (a) Ensemble prediction with 5 × 10 3 ensemble members generated from posterior samples for CR 2053.The figure shows the ensemble statistics (median and prediction intervals), a model chain prediction when all parameters are set to their deterministic values (from Table1), which we label as 'deterministic estimate', and ACE observations.Figures (b) and (c) show the RMSE and PCC of the ensemble in comparison to ACE observations, respectively.
Figure 9: (a) Ensemble prediction with 5 × 10 3 ensemble members generated from posterior samples for CR 2053.The figure shows the ensemble statistics (median and prediction intervals), a model chain prediction when all parameters are set to their deterministic values (from Table1), which we label as 'deterministic estimate', and ACE observations.Figures (b) and (c) show the RMSE and PCC of the ensemble in comparison to ACE observations, respectively.
[2] a "breathing" effect of which the canonical 2.5R S source surface, originally suggested by[2], matches measured interplanetary magnetic field (IMF) open flux near Earth during solar maximum, yet extends up to 4R S during solar minimum of solar cycle 23 and the start of cycle 24.A similar study by

Table 2 :
The RMSE and PCC of ACE observations in comparison to the model chain prediction with all parameters set to their deterministic values before UQ (see Table1) vs. the ensemble mean after UQ during CR2052 and CR2053.− α) × 100% prediction interval for a fixed but unknown new observation Z i at time instance t i is the interval [Z l , Z u ] such that

Table A .
3: The analytically computed Sobol' sensitivity indices of the WS model for fp = 10, 10 2 , 10 4 .The results show that α is the most influential parameter (in comparison to v 0 and v 1 ).Additionally, the indices indicate that v 0 is more influential when fp is high and that v 1 is more influential when fp is low.