Community-wide validation of geospace model ground magnetic field perturbation predictions to support model transition to operations

Authors


Corresponding author: A. Pulkkinen, The Catholic University of America, Physics Department, 620 Michigan Ave, NE, Washington, DC 20064, USA. (antti.a.pulkkinen@nasa.gov)

Abstract

[1] In this paper we continue the community-wide rigorous modern space weather model validation efforts carried out within GEM, CEDAR and SHINE programs. In this particular effort, in coordination among the Community Coordinated Modeling Center (CCMC), NOAA Space Weather Prediction Center (SWPC), modelers, and science community, we focus on studying the models' capability to reproduce observed ground magnetic field fluctuations, which are closely related to geomagnetically induced current phenomenon. One of the primary motivations of the work is to support NOAA SWPC in their selection of the next numerical model that will be transitioned into operations. Six geomagnetic events and 12 geomagnetic observatories were selected for validation. While modeled and observed magnetic field time series are available for all 12 stations, the primary metrics analysis is based on six stations that were selected to represent the high-latitude and mid-latitude locations. Events-based analysis and the corresponding contingency tables were built for each event and each station. The elements in the contingency table were then used to calculate Probability of Detection (POD), Probability of False Detection (POFD) and Heidke Skill Score (HSS) for rigorous quantification of the models' performance. In this paper the summary results of the metrics analyses are reported in terms of POD, POFD and HSS. More detailed analyses can be carried out using the event by event contingency tables provided as an online appendix. An online interface built at CCMC and described in the supporting information is also available for more detailed time series analyses.

1 Introduction

[2] The geomagnetically induced current (GIC) problem [e.g., Boteler et al., 1998; Pirjola, 2005] has received elevated international interest over the past 3–4 years, especially in terms of the potential impact on high-voltage power transmission systems. The current worst-case scenarios range from wide-scale voltage and system collapses [North American Electric Reliability Corporation, 2012] to catastrophic loss of a large number of high-voltage power transformers [National Research Council, 2008]. While better quantification of the hazard will require additional interdisciplinary science and power engineering investigations, it is commonly accepted that the problem is serious enough that actions need to be taken for mitigating the impact. Consequently, the space weather modeling and forecasting community is responding to this elevated need by supporting the operational utilization of the latest advancements in science. More specifically, the community needs to work on new regional or even local predictions of the geomagnetic environment pertaining to GIC. Initial steps toward this goals have been taken both on the empirical and first-principles based modeling fronts [e.g., Weigel et al., 2003; Wintoft, 2005; Weimer et al., 2010; Pulkkinen et al., 2009; Zhang et al., 2012]. The next logical step is to transition the new scientific capability into an operational setting, which is the primary motivation of the work at hand.

[3] Understanding model capabilities to reproduce observed features in the signal of interest is a key element of space weather monitoring and forecasting. Quantification of the model performance becomes critical as one moves from the research to operational environment where inaccurate model estimates and misleading error bars can potentially lead to poor and costly decisions by the end user. Consequently, detailed model validation tests for model robustness and general quality checks (e.g., model response to bad input data) are a central part of model transition to operations and of general interest to operational space weather forecasting entities.

[4] Acknowledging the importance of rigorous model validation and building on the earlier reports on the topic [Pulkkinen et al., 2010, 2011; Rastätter et al., 2011], as well as the excellent work on geospace model validation conducted under the auspices of the Geospace Environment Modeling (GEM) Metrics and Validation Focus Group, NOAA's Space Weather Prediction Center (SWPC) requested the Community Coordinated Modeling Center (CCMC) to evaluate geospace models available at the CCMC for possible transition to operations. This effort included the participation of model developers, as well as the CCMC, SWPC, and through GEM, the broader scientific community. Planning and discussions with modelers and the scientific community were held at GEM-sponsored meetings, the annual Space Weather Workshop in Boulder, and at meetings of the American Geophysical Union. One benefit of building on previous work done by the GEM Geospace Environment Modeling Challenge is that, over time, we will be able to track model improvements, as new and improved versions of existing models or new models are delivered to the CCMC.

[5] The definition of the validation setting, selection of metrics, and the general validation process were discussed comprehensively and agreed as the work progressed over the past approximate 2 years. All intermediate results of the analyses carried out by CCMC were communicated to the community and modelers, and it was made certain that the model installations and tools at CCMC were acceptable to all participating groups. Generally, the validation process was made as transparent as possible including early communication of NOAA SWPC criteria for selecting models entering the transition process.

[6] In contrast to earlier GEM efforts on the topic, the focus of the latest model validation effort was to study the models' capability to reproduce the observed “dB/dt events,” i.e., rapid fluctuation of the ground magnetic field. The primary argumentation for studying dB/dt is that the time derivative of the ground magnetic field (referred to as “dB/dt”) can be used as an indicator for the level of geomagnetically induced electric field or geoelectric field, on the surface of the Earth [e.g., Viljanen et al., 2001]. The geoelectric field, in turn, is the primary physical quantity driving GIC. Consequently, although numerous additional complexities such as ground conductivity, conductor system configuration, and other engineering details including high-voltage power transformer design are critical for more detailed assessment of the threat, dB/dt can be used as an indicator for a potential GIC hazard. Further, if data from an upstream monitor such as NASA's Advanced Composition Explorer (ACE) is used to produce dB/dt, one can generate short lead time (15–30 min) forecast estimates of the potential hazard.

[7] The structure of the paper is as follows. In section 2 of the paper, we will describe the setting used in the validation effort. Section 3 details the metrics used in the quantification of the model performance, and in section 4, each participating model is summarized. The main results of the validation effort are reported in section 5. Finally, section 6 provides a brief discussion of our findings.

2 Validation Setting

[8] Six geospace storm events listed in Table 1 were chosen for the study. We note that although the number of events may seem small, the amount of effort required for analysis of individual events including verification of good quality simulations and processing of observational data did not allow larger sample size. Four of the events (events 1–4 in Table 1) were used in the earlier GEM Challenges [Pulkkinen et al., 2010, 2011; Rastätter et al., 2011] and two new “surprise events” not communicated to the modelers prior to the model and model setup delivery to CCMC were added to the list. The two new events were selected jointly by CCMC and NOAA SWPC scientists. Solar wind bulk plasma and the interplanetary magnetic field observations carried out by Solar Wind Electron, Proton, and Alpha Monitor (SWEPAM) and MAG instruments onboard Advanced Composition Explorer (ACE) for the events are shown in Figure 1. Level 2 ACE data were used in the analyses. Due to limitations of the SWEPAM instrument during the October 2003 event (event 1), only low temporal resolution plasma velocity data could be constructed [Skoug et al., 2004]. Further, the plasma density data for the event were obtained from the Geotail Plasma Wave Instrument. Earlier, GEM Challenge events 1 and 2 are well-known coronal mass ejection-related major storm events, and events 3 and 4 are less active periods associated with much more subtle changes in the solar wind driving. Events 1–4 are from the solar cycle 23. The new surprise event 5 is one of the first CME-related events of the cycle 24 and was of special interest due to the very large substorm event that was associated with the storm. Event 6 in turn was the first severe storm of the cycle 24.

Table 1. Geospace Events Studied in the Validation Activitya
Event #Date and TimeMin (Dst)Max (Kp)
  1. aThe last two columns give the minimum Dst index and the maximum Kp index of the event, respectively.
129 October 2003 06:00 UT–30 October 06:00 UT−353 nT9
214 December 2006 12:00 UT–16 December 00:00 UT−139 nT8
331 August 2001 00:00 UT–1 September 00:00 UT−40 nT4
431 August 2005 10:00 UT–1 September 12:00 UT−131 nT7
55 April 2010 00:00 UT–6 April 00:00 UT−73 nT8−
65 August 2011 09:00 UT–6 Aug 09:00 UT−113 nT8−
Figure 1.

Solar wind bulk plasma and the interplanetary magnetic field observations (in each panel from top to bottom: plasma density, plasma temperature, x-component of the plasma flow velocity, y-component of the plasma flow velocity, z-component of the plasma flow velocity, x-component of the magnetic field, y-component of the magnetic field, z-component of the magnetic field) for the studied storm events (Figures 1a–1f corresponding to events 1–6, respectively) given in Table 1. Data in Geocentric Solar Magnetospheric coordinates. See the text for details.

[9] Solar wind observations were propagated to model inflow boundaries by ballistic propagation and the x-component (Geocentric Solar Magnetospheric coordinate system) of the interplanetary magnetic field was set to zero. While solar wind propagation constitutes a source for modeling errors, identical uncertainties were introduced for all models in the specification of the inflow boundaries.

[10] For each event in Table 1, the model performance was evaluated by comparing the observed versus predicted ground dB/dt. Throughout the paper, dB/dt is defined as follows:

display math(1)

where Bx and By indicate the two horizontal components of the magnetic field (geomagnetic dipole coordinates). 60 s geomagnetic observatory recordings were used to provide the observed signal. It is noted that while optimally higher temporal resolution data such as 1 s (not available for the selected stations and events) should be used, mapping from dB/dt into the geoelectric field driving GIC is typically a low-pass filtering operation. Consequently, 60 s sampling rate of the ground magnetic field is able to capture mostly the same features of the surface geoelectric field variations as that of the 1 s sampling rate [Pulkkinen et al., 2006]. Again, following the earlier GEM Challenges, 12 geomagnetic observatories (magnetometer stations) listed in Table 2 and shown in Figure 2 were selected based on the global spatial and temporal coverage. Station PBQ was discontinued November 2007 and replaced by station SNK. Consequently, for events 5 and 6, station SNK was used in place of PBQ. One minute temporal resolution magnetic field recordings were downloaded via INTERMAGNET (www.intermagnet.org). The data were transformed from geographic coordinates, as provided by INTERMAGNET, into geomagnetic dipole coordinates. IGRF 2000 coefficients were used to compute the coordinate transformation matrices as given by Hapgood [1992]. The quiet-time baseline level was determined visually for each station and for each event, and the baseline was removed from the magnetic field data to obtain the disturbance field. Small data gaps with length of no more than few minutes were patched by means of linear interpolation. The modeled magnetic field data were resampled by means of spline interpolation to match the time stamps of the observations.

Table 2. The Locations of the Geomagnetic Observatories Used in the Studya
Station NameStation CodeGeomagnetic LatitudeGeomagnetic Longitude
  1. aBold typeface stations indicate the six stations (stations PBQ and SNK are alternates, see the text for details) used in the final analyses discussed in section 5.
YellowknifeYKC68.9299.4
MeanookMEA61.6306.2
NewportNEW54.9304.7
FresnoFRN43.5305.3
IqaluitIQA74.05.2
Poste-de-la-BaleinePBQ65.5351.8
SanikiluaqSNK66.4356.1
OttawaOTT55.6355.3
FredericksburgFRD48.4353.4
HornsundHRN73.9126.0
AbiskoABK66.1114.7
WingstWNG54.195.0
FurstenfeldbruckFUR48.494.6
Figure 2.

The locations and the station codes of the geomagnetic observatories used in the study. Geomagnetic dipole coordinates are used. Thick and thin circles indicate high-latitude and mid-latitude stations, respectively, used in the final analyses in section 5.

[11] In contrast to the earlier GEM Challenges on the topic, modeled ground magnetic field variations were computed by integrating over the global model electric current sources both in the ionosphere and magnetosphere. The integration includes magnetic field perturbations generated by all major sources including high-latitude ionosphere, magnetopause, magnetotail, and the ring current. The CCMC tool used for integration was validated, for example, against a comparable Space Weather Modeling Framework tool [Yu and Ridley, 2008] and is described in detail in a companion paper by Rastätter et al. (Community-wide geospace model validation: Computation of δB at the CCMC, submitted to Space Weather, 2013). The two empirical models (see Table 3) provided direct predictions of the magnetic field at the used station locations. All model runs and ground magnetic field calculations were carried out at CCMC.

Table 3. Models Analyzed in the Validation Efforta
Identifier(Model Version) ModelGrid (No. of Cells, Min. Res.)
  1. aEach model is assigned a unique model identifier given by the leftmost column of the table. The table indicates the model setting, and if applicable, the number of cells and the minimum spatial resolution used in the global MHD part of the mode. See text in section 4 for details.
2_LFM-MIX(LTR-2.1.1) LFM coupled with ionospheric electrodynamics163,000, 0.4 RE
3_WEIGELempirical modelN/A
4_OPENGGCM(OpenGGCM 4.0) global MHD coupled with CTIM3.9 million, 0.25 RE
5_WEIMERempirical modelN/A
9_SWMF(SWMF 2011-01-31) BATS-R-US coupled with RIM and RCM1 million, 0.25 RE

[12] The ground magnetic field is generally a sum of internal (induced) and external components. While empirical models take also the internal component implicitly into account, the first-principles models applied here do not. The magnitude of the magnetic signal originating from the induced currents is dependent on a number of factors such as the ground conductivity structure, the source structure, and the distance from the source. However, as a very rough rule of thumb, about 20–40% of the ground magnetic field can be of internal origin during active periods [Tanskanen et al., 2001; Pulkkinen and Engels, 2005]. While ideally also the internal contribution should be taken into account, more detailed geomagnetic induction calculations require information about the local ground conductivity structure not known for the stations used in this study. As the majority of the observed dB/dt (horizontal components) come from the external sources, we do not believe exclusion of the internal part of the field for fist-principles models affects the central results of this paper.

[13] Six stations were selected out of the original 12 GEM Challenge stations to represent the high-latitude and mid-latitude locations. The selected high-latitude stations are PBQ/SNK, ABK and YKC and mid-latitude stations WNG, NEW, OTT (see Table 1 and Figure 2). The selected six stations represent all three meridional chains used in the earlier Challenges and have equal weight on mid- and high-latitude locations. No observed data was available for station ABK for event 5. Although all 12 stations are available for both observation and model prediction data sets and can be viewed via CCMC's online model validation interface, only the above six stations are used in the results discussed in section 5.

3 Selected Metrics

[14] Based on the earlier GEM Challenge experiences and operational needs in terms of dB/dt prediction capability, it was agreed that the model validation should be built on event-based analyses. An event is defined here as follows: within a forecast window 0≤ttf, the absolute value of the parameter of interest exceeds an event threshold |xthres| (here dB/dt). The windows are moved over the time series in non-overlapping segments, and events for the given tf and |xthres| are recorded for both the measured and the modeled x. By comparing threshold crossing for both observed and modeled time series, one can then build a four-element matrix known as contingency table. The table reports the number of correct hits, false alarms, missed events, and correct no events [e.g., Lopez et al., 2007].

[15] In this work the length of the analysis window tf was selected to be 20 min, and the thresholds dB/dt0.3, 0.7, 1.1 and 1.5 nT/s were used. The selected thresholds represent values that both span lower and higher ranges of rates of change and are also in the “mid-range” in a sense that enough threshold crossing could be detected for good statistics. We carried out systematic sensitivity analyses to study the impact of the selected forecast window length. While predictability of events gets somewhat poorer with shorter window lengths, the ranking of the models did not change significantly as a function of the analysis window length (not shown). Consequently, it was concluded that varying the analysis window length between 10 and 45 min did not change the central results notably.

[16] The elements of the contingency table contain the number of correctly predicted threshold crossings H (hits), the number of false alarms F, the number of missed crossings M, and the number of correctly predicted no crossings N. The set {H,F,M,N} can be used to compute a number of different metrics quantifying the performance of individual models. In this study, three metrics proposed by NOAA SWPC were selected for use in the final analyses. The selected metrics are Probability of Detection (POD), Probability of False Detection (POFD) and Heidke Skill Score (HSS). We describe each metric more in detail in the following subsections.

3.1 Probability of Detection

[17] POD is defined for the set {H,F,M,N} as

display math(2)

The metric measures the fraction of observed threshold crossings which were correctly forecast. It ranges from 0 to 1 with 1 being a perfect score. Since a model providing artificially large signal amplitudes will tend to generate large H and large POD, the metric should be used in conjunction with POFD defined below.

3.2 Probability of False Detection

[18] POFD is defined for the set {H,F,M,N} as

display math(3)

The metric measures the fraction of correctly predicted no crossings that were incorrectly forecast as crossings. POFD ranges from 0 to 1, with 0 being a perfect score. Similar to POD, a model predicting artificially low-signal amplitudes will provide low F and small PODF, and thus, the metric should be used in conjunction with POD.

3.3 Heidke Skill Score

[19] HSS is defined for the set {H,F,M,N} as

display math(4)

The metric measures the fraction of correctly predicted threshold crossings after eliminating those predictions that would be correct purely by random chance. It ranges from negative infinity to 1. Negative values indicate that random forecast is better than the model prediction, 0 indicates no skill (as good as random), and 1 indicates a perfect score.

[20] It is noted that for HSS to be meaningful measure of the model performance a variety of states of the system should be studied. For example, perfect prediction of no 1.5 nT/s crossings for a weak event (H=M=F=0) is reported as HSS=0/0, which is not defined. Consequently, to guarantee well-defined HSS, one should be careful to pick thresholds that are crossed for the selected sets of events.

4 Models

[21] Five models were evaluated in the model validation activity. These included empirical models by D. Weimer (Virginia Polytechnic Institute) and R. Weigel (George Mason University) and major US global magnetohydrodynamic (MHD) models from University of Michigan, Center for Integrated Space Weather Modeling (CISM) and University of New Hampshire. Also, the Finnish Meteorological Institute's Grand Unified Ionosphere-Magnetosphere Coupling Simulation (GUMICS) global MHD group participated in the discussions associated with the validation work. However, while GUMICS is available for runs-on-request at CCMC, due to the current serial implementation of the model, the group did not participate in the validation of the model itself. The full parallel version of GUMICS is expected to be available early 2013.

[22] All models that participated in the validation effort were delivered to CCMC by January 2011. CCMC had extensive communications with the model developers to guarantee correct installation and to ensure the usage of appropriate settings for each model. Based on a variety of tests, such as code robustness carried out at CCMC, model developers provided revisions to the model settings. The final selection of all model settings was accomplished by mid-August 2011. To allow for simulations in a realistic real-time computational environment, it was required that settings for all models were such that the simulations would run not slower than twice the modeled physical time on 64 Beowulf cluster processors. All simulations were performed at CCMC using the same computational architecture.

[23] Below, each model and settings pertaining to the validation activity are described. Table 3 summarizes some of the key features of each individual model. A version of the Weimer model and all global MHD models discussed in this work are available at CCMC for runs on request.

4.1 Weimer Empirical Ground Magnetic Field Prediction Model

[24] The empirical model supplied by Daniel Weimer provides values of the magnetic perturbations at the surface of the Earth, for vector components in the North, East, and Vertical (down) directions. Vectors are returned as a function of location in either geographic latitude/longitude coordinates or corrected geomagnetic latitude/magnetic local time coordinates. More specifically, the model internally uses “Modified Apex” coordinates [Richmond, 1995; Emmert et al., 2010; VanZandt et al., 1972]. If geographic locations are specified on input, then the output vectors are also in geographic coordinates such that positive North is toward the geographic pole; otherwise, North is toward the corrected geomagnetic apex pole. For the purpose of the challenge, the output vectors are rotated into geomagnetic coordinates. The data needed to drive the model are the Geocentric Solar Magnetospheric (GSM) y- and z-components of the interplanetary magnetic field (IMF), the solar wind velocity, the dipole tilt angle of the Earth's magnetic field, and the solar F10.7 index. Ideally, 25 min mean values of the IMF and solar wind velocity should be used, with a 20 min delay after propagation to the magnetosphere's bow shock [Weimer and King, 2008].

[25] The model uses Spherical Cap Harmonic Analysis (SCHA) [Haines, 1985] within a cap that extends down to 33.4° apex latitude. SCHA coefficients up to order m=3 and degree l=16 are used. Measurements of the IMF and solar wind on the ACE spacecraft, from February 1998 through December 2005, were used to generate the model, as well as measurements from over 120 magnetometer stations. Details of data preparation and initial tests are provided by Weimer et al. [2010]. A least error fit was used to find how each of the SCHA coefficients varies as a linear function of the input values, with 17 terms for each coefficient. In order to handle the nonlinear “saturation” response of the ionosphere, the fits are derived separately within 23 bins, divided according to the magnitude of the IMF.

[26] More recently, an improved version of this model has been developed, but it was not available for the validation work discussed in this paper [Weimer, 2013]. The latest version extends down to the geomagnetic equator, using data from 143 magnetometer stations. It uses spherical harmonics up to degree l=31, and divides the IMF measurements into 29 bins.

4.2 Weigel Empirical Ground Magnetic Field Prediction Model

[27] Three models were used for this study. All of the models were developed using available 1 min ground magnetometer measurements from World Data Centre for Geomagnetism (Edinburgh) (http://www.wdc.bgs.ac.uk/) in the time interval 1 January 2000 through 31 December 2006, excluding event intervals that occurred in this time range.

[28] The first model is the climatological average of ΔB(disturbance field) for each component computed by taking the average of ΔB in 48 local times.

[29] The second and third models are linear impulse response filters using the method of Weigel [2007]. Both models predict geomagnetic disturbance GB or dB/dt) in vector direction i using

display math(5)

where the solar wind velocity v, and the rectified −z component of the interplanetary magnetic field, Bs=1/2(|Bz|−Bz), are from the OMNI high-resolution data set (http://omniweb.gsfc.nasa.gov/ow$_$min.html), and the h coefficients depend on local time, LT. The model coefficients are determined using a least-squares minimization of the prediction error.

[30] The model for GiBi has Nc=12 and predicts Gi at 48 local times. Physically, this model predicts the ground magnetic field given the past 6 h of solar wind measurements. Data were resampled to place the predictions on a 1 min time grid.

[31] The model that predicts G=dB/dt has Nc=4 and predicts G at 1440 local times. The model G=dB/dt was used in the metrics analyses carried out in this paper.

4.3 Space Weather Modeling Framework

[32] The Space Weather Modeling Framework (SWMF) [Tóth et al., 2005, 2012] (http://csem.engin.umich.edu/swmf) is a flexible software framework designed to model a variety of space physics phenomena. The SWMF divides the complex space physics systems into physics domains. The domains used in the SWPC modeling challenge are the Global Magnetosphere (GM), the inner magnetosphere (IM), and the Ionosphere Electrodynamics (IE).

[33] The GM model is the Block-Adaptive Tree Solarwind Roe-type Upwind Scheme (BATS-R-US) [Powell et al., 1999; Gombosi et al., 2004]. In the work described in this paper, the semi-relativistic MHD equations [Gombosi et al., 2002] are solved. We use an explicit/implicit time stepping scheme [Tóth et al., 2006] with a 5 s time step (potentially reduced by the adaptive time step control scheme if necessary). The computational domain extends from 32RE upstream to −224RE downstream in the x direction and ±128RE in the y and z coordinates (GSM). The inner boundary is at 2.5RE distance from the center of the Earth. The domain is discretized with a block-adaptive Cartesian grid. The roughly 1 million grid cells vary in size from 1/4RE near the inner boundary to 8RE in the distant tail. The boundary conditions are the usual(see referred publications for details), except for the density at the inner boundary, which is set as ρinner=28+0.1CPCP, where CPCP is the average of the northern and southern cross polar cap potentials measured in keV, and the density is measured in amu/cm3. In these runs, we used the artificial wind scheme [Sokolov et al., 2002] with Koren's limiter [Koren, 1993] (β=1.2) and the eight-wave scheme [Powell et al., 1999].

[34] The IM domain is represented by the Rice Convection Model (RCM) [Wolf et al., 1982; Toffoletto et al., 2003]. RCM solves for the bounce averaged and isotropic but energy resolved particle distribution of electrons and various ions. We used the standard RCM settings except for one modification: we added an exponential decay term to the RCM equations, so that the phase space density decays toward zero with 10 h e-folding rate. With this modification the Dst index of the coupled model recovers better after large storms.

[35] The IE domain is represented by the Ridley Ionosphere Model (RIM) [Ridley et al., 2004]. RIM uses the field-aligned currents obtained from GM and the F10.7 flux (set as an input parameter for each event) to calculate particle precipitation and conductances based on empirical relationships. RIM solves a Poisson-type equation for the electric potential on a 2-D spherical grid. We set the lower latitude boundary to 10°.

[36] In the work described in this paper, the BATS-R-US and RIM models are coupled every 5 s, while the BATS-R-US with RCM as well as the RIM to RCM couplings are done every 10 s. In the BATS-R-US-RIM coupling, the MHD model calculates the field-aligned currents (FAC) at 3RE and maps it down to the ionospheric grid. The electric field obtained by RIM, is mapped back to the inner boundary of GM, where the E×B/B2 velocity is calculated. The cross polar cap potentials are also sent to GM, and are used to set the density at the inner boundary. In the RIM to RCM coupling, the electric potential is passed and interpolated onto the RCM grid. In the BATS-R-US to RCM coupling, BATS-R-US finds the closed field line region and calculates field volume integrals with an efficient parallel field line tracing algorithm [Glocer et al., 2009a]. The integrated GM density and pressure are applied as outer boundary conditions for the IM model assuming a 90% H+ to 10% O+ number density ratio. In the RCM to BATS-R-US coupling, the GM grid cell centers are traced to the inner boundary along the magnetic field lines with an efficient parallel algorithm [De Zeeuw et al., 2004]. The BATS-R-US pressure and density are nudged toward the RCM values with a 20 s relaxation time.

[37] In addition to the basic variables used in the various models, the SWMF can also calculate various plasma parameters along satellite trajectories, ionospheric foot-points of satellites, integrated line-of-sight images, various geomagnetic indexes (Dst,Kp), as well as local magnetic perturbations [Yu and Ridley, 2008]. The SWMF can model space weather events starting from the Sun all the way to the Earth [Tóth et al., 2007]. The magnetospheric components of the SWMF have been validated in several studies [Ridley et al., 2002; Zhang et al., 2007; Glocer et al., 2009c; Welling and Ridley, 2010]. In addition to the magnetospheric models used in this study, the SWMF also contains the radiation belt and the polar wind components [Glocer et al., 2009a, 2009b], and the CRCM and RAM-SCB inner magnetosphere models [Buzulukova et al., 2010; Zaharia et al., 2010].

4.4 Lyon-Fedder-Mobarry Model With Magnetosphere-Ionosphere Coupler and Electrodynamics Solver

[38] The Coupled Magnetosphere Ionosphere Thermosphere Model (CMIT) [Wang et al., 2004; Wiltberger et al., 2004] couples the Lyon-Fedder-Mobarry global magnetosphere model (LFM) [Lyon et al., 2004] with the Thermosphere-Ionosphere-Electrodynamic Global Circulation Model (TIEGCM) [Roble and Ridley, 1994] via the Magnetosphere Ionosphere Coupler Solver (MIX) [Merkin and Lyon, 2010] to provide a comprehensive global simulation of geospace response to solar and solar wind drivers. The LFM portion of the model solves the ideal magnetohydrodynamic equations to describe the interaction of the solar wind plasma with the plasma in geospace. This portion of the model requires that the solar wind and IMF conditions be specified typically from ACE or WIND spacecraft observations. These conditions are assumed to be constant along planar fronts propagating through the computational domain. The LFM is electrodynamically coupled to the ionosphere through the MIX model. MIX solves for the cross polar cap potential taking currents from the magnetospheric domain and conductance from the ionosphere. In order to obtain the conductance information, MIX uses a series of empirical relationships described in Wiltberger et al. [2009] to transform the MHD parameters at the inner boundary into a characteristic energy and flux of precipitating electrons. The ionospheric component uses the electron flux information along with a parameterization of the solar extreme ultraviolet (EUV) flux driven by the F10.7 index to compute the conductance. It is noted that, in this validation work, TIEGCM was not used. Instead of the full ionosphere-thermosphere system, only the ionospheric electrodynamics was treated via MIX.

[39] For the validation work discussed in this paper, we used a modest resolution version of the model that provides reliable performance on small amount of computational resources. In the LFM, the simulation grid was 53(radial)×48(azimuthal)×64(polar) points allowing for a typical resolution in the inner magnetosphere of roughly 0.4 RE. The electrodynamic grid used in MIX was 2°×2° covering the high latitude region down to a magnetic colatitudes of 45°. The model runs faster than real time on 24 processors. In the magnetosphere, the typical time step is approximately 0.1 s. The electrodynamic coupling between the ionosphere and magnetosphere is updated every 5 s. The computational model can provide a vast array of information relevant to space weather ranging from the magnetic fields at geosynchronous orbit to the ground magnetic field perturbations.

[40] CMIT and its component models have been used to study a variety of processes in geospace ranging from magnetic storms [Goodrich et al., 1998] to substorms [Lopez et al., 1998; Wiltberger et al., 2000] including driving by CMEs [Baker et al., 2004] and CIRs [Wiltberger et al., 2012]. We have validated the model against numerous measurements of magnetopause crossings [Lopez et al., 2006; Garcia and Hughes, 2007], ground magnetometer observations [Wiltberger et al., 2003], and climatology data from Geotail [Guild et al., 2008a, 2008b]. The version of the model used in this study does not include an inner magnetosphere model, but coupling with the Rice Convection Model has recently been completed [Pembroke et al., 2012] and will be part of a future release to the CCMC.

4.5 Open General Geospace Circulation Model

[41] The Open General Geospace Circulation Model (OpenGGCM) global MHD model simulates the interaction of the solar wind with the magnetosphere-ionosphere-thermosphere system. Besides numerically solving the MHD equations with high spatial resolution in a large volume containing the magnetosphere, the model also includes ionospheric processes and their electrodynamic coupling with the magnetosphere. The coupling between the magnetosphere and the ionosphere is an essential part of the model because the ionosphere controls, to a large extent, magnetospheric convection, by providing the resistive closure of the field-aligned currents that are generated from the interaction of the solar wind with the magnetosphere [Raeder et al., 1996, 1998]. Processes that occur in the near-Earth region on polar cap and auroral field lines and that are inherently kinetic have been parameterized in the model using empirical relationships. These processes include the field-aligned potential drops that are associated with upward field-aligned currents, electron precipitation caused by the field-aligned potential drops [Knight, 1972], and the diffuse electron precipitation that is caused by pitch angle scattering of plasma sheet electrons [Lyons et al., 1979; Robinson et al., 1987; Weimer et al., 1987; Kennel and Petschek, 1966]. The electron precipitation parameters and the ionosphere potential are then passed to the Coupled Thermosphere Ionosphere Model (CTIM), which is coupled to the MHD part of the code. CTIM [Fuller-Rowell et al., 1996] is a dynamic model of the ionosphere and thermosphere with a long heritage, covering the globe from 80 km to several 1000 km altitude, and following several neutral and ionic species and their photochemical interactions. CTIM computes self-consistently the ionospheric Pedersen and Hall conductances, which are then used to solve the ionospheric potential equation [see Raeder, 2003, for details].

[42] The OpenGGCM requires as input solar wind and IMF data, and the F10.7 solar radio flux as a proxy for solar UV/EUV radiation. Solar wind and IMF data are ballistically propagated from the monitor location to the upstream boundary of the simulation. Furthermore, we calculate the normal direction of the solar wind fronts using the minimum variance method [Sonnerup and Cahill, 1967, 1968] in order to extrapolate the single point measurements to the entire inflow surface. Without such treatment, the IMF Bx component cannot change in time without violating ∇·B=0 [Raeder et al., 2001c].

[43] The OpenGGCM computes all magnetospheric and ionospheric quantities that are necessary to determine GICs [Raeder et al., 2001a]. Other quantities of interest to space weather can also be derived, such as the total and equivalent ionospheric current, ionosphere electron content, and neutral density affecting low-Earth orbit satellites [Li et al., 2011]. The model has been used for a variety of studies, for example, for the study of substorms [Raeder et al., 2001c, 2008, 2010; Ge et al., 2011; Gilson et al., 2012], storms [Raeder et al., 2001a, 2001b], flux transfer events and dayside reconnection [Raeder, 2006; Dorelli et al., 2004; Muhlbacher et al., 2005; Connor et al., 2012], ionospheric convection [Vennerstrom et al., 2005, 2006; Siscoe et al., 2004; Lu et al., 2011], and plasma entry under northward IMF [Li et al., 2005, 2008, 2009, 2011]. A more detailed description of the code and the methods used can be found in Raeder [2003] and Raeder et al. [2008].

5 Results

[44] To demonstrate a typical storm-time situation, Figures 3 and 4 show example time series of the observed versus modeled dB/dt for the event 2 (Table 1). Again, all data are viewable via CCMC's online visualization interface accessible at http://ccmc.gsfc.nasa.gov/challenges/dBdt/metrics\_results.php. As is seen from the figures, models do not generally capture the dB/dt fluctuations point-by-point, which is not surprising considering the complex waveform of the signal. However, the amplitudes of high-latitude dB/dt fluctuation especially in the beginning of the event are reproduced to a degree by the models. Although all models miss some of the observed activity, for example, at station ABK (Figure 3) around 26–32 MLT, the capability, at times, to reproduce comparable dB/dt amplitudes indicates that the models may provide utility in capturing events within given forecast windows. We will quantify this capability to capture the events using metrics discussed in section 3.

Figure 3.

Time series of the observed (blue curves) versus modeled (black curves) dB/dt at the three high-latitude stations indicated in Table 2 for event 2 indicated in Table 1. The time is magnetic local time (MLT) and the dashed lines indicate the dB/dt thresholds of 0.3, 0.7, 1.1 and 1.5 nT/s. Results are shown for the models as follows: (a) 2_LFM-MIX, (b) 3_WEIGEL, (c) 4_OPENGGCM, (d) 5_WEIMER, and (e) 9_SWMF (see Table 3).

Figure 4.

Time series of the observed (blue curves) versus modeled (black curves) dB/dt at the three mid-latitude stations indicated in Table 2 for event 2 indicated in Table 1. The time is magnetic local time (MLT) and the dashed lines indicate the dB/dt thresholds of 0.3, 0.7, 1.1 and 1.5 nT/s. Results are shown for the models as follows: (a) 2_LFM-MIX, (b) 3_WEIGEL, (c) 4_OPENGGCM, (d) 5_WEIMER, and (e) 9_SWMF (see Table 3).

[45] The final metrics-based analyses were carried out for each individual model using events and stations described in section 2, and the corresponding contingency tables with elements {H,F,M,N} were generated for each model for each event and station for dB/dt thresholds of 0.3, 0.7, 1.1 and 1.5 nT/s. Here we will report only the results integrated, i.e., summed contingency table elements, over all events. The summary results are integrated also separately over high-latitude (PBQ/SNK, ABK, YKC) and mid-latitude stations (WNG, NEW, OTT). Figures 5 and 6 show the corresponding POD, POFD, and HSS for all participating models.

Figure 5.

Probability of Detection (POD) (blue curve) and Probability of False Detection (POFD) (black curve) defined in section 3 for the dB/dt thresholds (a) 0.3 nT/s, (b) 0.7 nT/s, (c) 1.1 nT/s, and (d) 1.5 nT/s. In Figures 5a–5d, the top panel shows POD and POFD obtained by integrating over the three mid-latitude stations, and the bottom panel shows POD and POFD obtained by integrating over the three high-latitude stations. The models (see Table 3) are ordered according to their POD. The model with the largest POD is the leftmost in all panels.

Figure 6.

Heidke Skill Score (HSS) defined in section 3 for the dB/dt thresholds (a) 0.3 nT/s, (b) 0.7 nT/s, (c) 1.1 nT/s, and (d) 1.5 nT/s. In Figures 6a–6d, the top panel shows HSS obtained by integrating over the three mid-latitude stations, and the bottom panel shows HSS obtained by integrating over the three high-latitude stations. The models (see Table 3) are ordered according to their HSS. The model with the largest HSS is the leftmost in all panels.

[46] The focus of this paper is to report on the process, metrics, and initial results from the evaluation of physics-based and empirical models that predict regional ground-based dB/dt variations during strong geomagnetic activity. Future work is needed to understand where model improvements are needed to better represent observations. At this stage of the work, it is important to quantify model capabilities and to provide information that will be used to assess whether or not these models provide useful guidance for improved forecasts of regional ground-based magnetic field perturbations. While refraining from further interpretations due to the nature of the paper, it is quite clear that for a given set of stations, events, and metrics, the model 9_SWMF provides the highest POD and HSS for most of the thresholds. As an indication that large dB/dt events are still a challenge to capture accurately, for threshold 1.5 nT/s, none of the models is capable of providing POD or HSS greater than 0.5.

[47] We emphasize that for optimal statistics the summary results reported here are obtained by integrating over selected stations and all events. The results and the ranking of the models vary from station to station and event to event (see Figure 7). Further, event 1 dominates the statistics for larger dB/dt thresholds due to the strength of the Halloween storm event. The breakdown of the results (contingency tables) for individual events are available in the supporting information and should be used for any further interpretations.

Figure 7.

Metrics results for dB/dt threshold of 0.3 nT/s individually for events 5 and 6. (a) POD and POFD for event 5, (b) POD and POFD for event 6, (c) HSS for event 5, and (d) HSS for event 6. Compare to Figures 5 and 6.

6 Discussion

[48] In this work, coordination among the CCMC, NOAA SWPC, modelers, and science community has resulted in the evaluation of several geospace models capable of predicting the fluctuation of the ground magnetic field. The work was a continuation of earlier GEM modeling challenges and was designed to support model transition into operations at NOAA SWPC. The primary NOAA interest in this specific effort was to study models' capability to reproduce the observed dB/dt, which can be used as an indicator for GIC activity.

[49] The supporting information by Rastätter et al. (submitted manuscript, 2013) describes the development of the global (integration over global MHD, gap region, and the ionosphere) Biot-Savart integration tool used in the activity and CCMC's online interface that can be used for further station by station and event by event science analyses. We demonstrated here how the online tool can be used to study, for example, models' capability to capture substorm-related ground magnetic field signatures.

[50] We reported here the metrics results integrated over selected stations and all events. Model 9_SWMF provided the highest POD and HSS for all dB/dt thresholds used to build the event detection-based contingency tables. However, we emphasize again that the metrics results vary from station to station and event to event. One should thus be cautious in making general interpretations without studying the more detailed breakdown of the analysis. For this purpose, along with CCMC's online analysis interface, contingency tables for each individual event are available in the supporting information of this paper.

[51] Finally, the key question is “are the models good enough to provide tangible value for the end user in need to mitigate GIC?” This is a multifaceted complex question and the answer most likely varies from user to user. Based on the summary results for POD, POFD, and HSS with the dB/dt threshold of 1.5 nT/s, it is clear that predicting large dB/dt is still a challenge. For example, if one picks, say, 50% chance of detecting an event as the threshold (this threshold will be user and application dependent) for identifying a good and not good enough prediction, one can see that POD and HSS were below 0.5 for all models for the dB/dt threshold of 1.5 nT/s. Users requiring localized predictions for large dB/dt with high likelihood of event detection may not be satisfied with the current state of the art. However, we saw that models have the capability to capture the general level of enhanced activity. Consequently, users satisfied with more rough characterization of dB/dt activation over the storm periods may be able to use the models for generating actionable information.

[52] The models validated in this paper can provide short lead-time dB/dt predictions. The meaning of “short” will vary as a function of the speed of transient structures in the solar wind and the computational capacity available for model execution. Lead times of 15–30 min at best can be expected for fast coronal mass ejection events. Obviously, continuous high-quality upstream solar wind plasma and magnetic field monitoring used to drive the models are also required. It is important to acknowledge that while providing 24/7 data stream, ACE SWEPAM plasma experiment has limitations during strong solar energetic particle events [Skoug et al., 2004] often associated with major Earth-directed coronal mass ejections. The Deep Space Climate Observatory (DSCOVR) mission that will replace ACE as the primary upstream monitor is expected to launch in 2014.

[53] Finally, one of the results of this effort to evaluate geospace models for transition from the research environment to operations is that it has accelerated the delivery of new versions of models to the CCMC for use by the science community. It has also resulted in the rigorous validation of models and initiated feedback from the operations to research that will ultimately result in a better understanding of where model improvements are most needed.

Acknowledgments

[54] The results presented in this paper rely on data collected at magnetic observatories. We thank the national institutes that support them and INTERMAGNET for promoting high standards of magnetic observatory practice (www.intermagnet.org). The Dst index data was provided by the World Data Center for Geomagnetism, Kyoto, Japan. CCMC staff is acknowledged for their support throughout the work reported in this paper. Terry Onsager and Kent Doggett of NOAA SWPC are acknowledged for their support throughout the work carried out in this paper. Minna Palmroth and Ilja Honkonen of Finnish Meteorological Institute (GUMICS group) are acknowledged for their valuable input that helped to define the validation work.

Ancillary