A Large Ensemble Simulation of Geomagnetic Storms – Can Simulations Predict Ground Magnetometer Station Observations of Magnetic Field Perturbations?

We use the Space Weather Modeling Framework (SWMF) Geospace conﬁguration to simulate a total of 122 storms from the period 2010-2019. With the focus on the storm main phase, each storm period was run for 54 hours starting from 6 hours prior to the start of the Dst depression. The simulation output of ground magnetic variations were compared with ground magnetometer station data provided by SuperMAG to statistically assess the Geospace model regional magnetic perturbation prediction performance. Our results show that the regional predictions at mid-latitudes are quite accurate, but the high-latitude regional disturbances are still diﬃcult to predict.


Introduction
The Earth's surface magnetic field varies in response to currents in the near-Earth space (Chapman & Bartels, 1941) and in the ionosphere (Zmuda & Armstrong, 1974).The ground signal from the high-altitude currents is dominated by the ring current, but during geomagnetic storms, the magnetopause compression especially at the storm onset can cause substantial disturbances due to enhanced dayside magnetopause currents (Villante & Piersanti, 2008), while the tail currents can significantly contribute to the signal during the storm main and recovery phases (Ganushkina et al., 2010).The high-altitude currents are best detected by sub-auroral magnetometers where there are weaker ionospheric currents overhead, and thus the horizontal B H component of the magnetic field provides a measure of the intensity of the current flowing parallel to the equatorial plane (Huang et al., 2004).At the auroral latitudes, the signal is dominated by the ionospheric currents at about 100-km altitude (Richmond et al., 1990).
Geomagnetic indices are used as a measure of the level of geomagnetic activity.The Auroral Electrojet indices are composed of 12 high-latitude (northern hemisphere) stations as extrema of the horizontal component at the stations at each time step.Thus, the auroral upper (AU) index is a measure of the peak eastward current in the ionosphere, while the auroral lower (AL) index is proportional to the peak westward current (Davis & Sugiura, 1966).The stormtime disturbance index Dst, and its high-cadence version SYM-H record the average of the magnetic variation at four (Dst) or six (SYM-H) midlatitude stations respectively around the Earth, weighted by the average of the station colatitudes (Sugiura & Poros, 1971).The peak value of the Dst is used as a measure of storm intensity, and is often interpreted as being proportional to the intensity of the ring current encircling the Earth (Siscoe & Crooker, 1974).Prediction of the magnetic disturbances caused by the stormtime currents is important for space weather, and one of the main motivations of this study.
Rapid variations in the geomagnetic field induce a geoelectric field at Earth's surface, which in turn drive geomagnetically induced currents (GIC), which can have harmful effects on power grids, natural gas pipelines, or other technological systems (Pirjola et al., 2000).The geoelectric field depends on the ground conductivity structure, which means that the local geology influences the formation and intensity of the GIC (Zheng et al., 2013).As the power spectrum of the geoelectric field is dominated by frequencies below 1 Hz, the GICs act as a direct current (DC) component on top of the 50 or 60 Hz alternating current (AC) power system (A.Pulkkinen et al., 2017).The effects in the power systems include saturation of transformers, which can generate equipment damage and/or system-wide disturbances and power outages (Bolduc, 2002;Lanzerotti, 2001).
In the natural gas pipelines, the DC currents can cause corrosion and thus shorten the lifetime of the system (Boteler et al., 1998;Pirjola et al., 2005).As the GIC and their effects on the systems depend on the regional rather than global level of disturbances, the global activity indices are not well suited for serving the system operators wishing to get advance warnings and nowcasts of the intensity of the disturbances (Abt Associates Inc., 2019  Weather Prediction Center, 2022).This study addresses the capability of the SWMF Geospace to provide accurate regional predictions of the ground geomagentic disturbances.As an operational model, the Geospace model has been comprehensively validated with regard to its predictions of the global indices (Cash et al., 2018;Liemohn et al., 2018).
A. Pulkkinen et al. (2013) set the ground work for validating a model's performance in predicting dB H /dt. The study included SWMF as well as other models, with a focus on 6 storms and 12 stations in the northern hemisphere.Welling et al. (2017) extended that study by assessing the model performance as a function of the solar wind driver and SYM-H.It is important to note that in this study, we examine two latitude bands, midlatitude stations between -50 • and 50 • , while the previous studies focus on the northern hemisphere in the latitude band between 50 • and 65 • .This study is in the same vein as the previous studies of its kind, but focuses on the localized disturbances recorded at individual ground magnetometer stations to maximize the usability of the results in practical applications.The exponential increase in computer power and storage have allowed the use of methods like ensemble modeling employed for example by Morley et al. (2018).
They simulated a storm with several variations of the input solar wind, which allowed them to evaluate a quantifiable uncertainty.Our approach is to use a wealth of solar wind data during a large number of storms to obtain a statistical database of storm simulations, from which we can draw statistical conclusions.A similar study, but with fewer storms , and without including coupling of the inner magnetosphere model that allows the development of an intense ring current during the storm main phase, was carried out by (Kwagala, Norah Kaggwa et al., 2020).In the analysis, we make use of best practices solidified by the community (Welling et al., 2018), such as the use of contingency tables and Heidke Skill Scores (HSS) as well as the horizontal component of magnetic perturbations (but not dB/dt in this case).
In this paper we describe a statistical database of storm simulations and use that together with the SuperMAG ground magnetometer observations database to assess the model performance at individual stations, comparing results in different latitude bands.
-3-manuscript submitted to Space Weather Section 2 describes the methodology, and the results, Section 3 summarizes the results and concludes with discussion.

Observations
Following the often-used definition of a geomagnetic storm as an event with a peak Dst value below −50 nT, we examined all periods during 2010-2019 fulfilling that condition.A small subset of the periods were discarded due to lack of clear signature of storm onset or main phase development or significant data gaps in solar wind and IMF records.
A total of 122 such periods with minimum SYM-H below -50 nT were included in the study.The storm onset time was selected to be the time when the SYM-H index started to decrease (often following a compression caused by the ICME-associated shock).Each interval had a duration of 54 hours starting from 6 hours prior to the storm onset.While this duration captures all storm main phases in the dataset, it does not always extend far enough to capture the entire storm recovery phase.
The solar wind measurements were obtained from the OMNI database (Goddard Space Flight Center, 2021), which provides the interplanetary magnetic field (IMF) and the solar wind plasma parameters propagated to the upstream bow shock allowing for direct association with the geomagnetic activity indices (Papitashvili et al., 2014).The solar wind driver intensity was assessed using the Newell coupling parameter (Newell et al., 2007), which is proportional to the rate of change of magnetic flux at the nose of the magnetopause, and can be written in the form (2) where θ = tan −1 (B Y /B Z ) is the IMF clock angle and B T = (B 2 Y + B 2 Z ) 1/2 denotes the transverse component of the magnetic field perpendicular to the Sun-Earth line and is given in nT, V is the solar wind speed in km/s, and α is a scaling factor of the order of 10 3 scaling it to dimensional units of Wb/s (Cai & Clauer, 2013).The ground magnetic field variations were analyzed using the SuperMAG (Gjerloev, 2012) database comprising 1-min magnetic measurements from several hundred magnetic stations over the globe through the INTERMAGNET et al. (2021) network and others.While the total number of stations is large, at any given time instance, the number of available stations ranges from about 50 to 150.We categorize the set of stations into three latitude ranges, the mid-latitudes (-50 • to 50 • magnetic latitude), high-latitudes (50 • to 90 • magnetic latitude) and auroral region stations (60 • to 70 • latitude).For each event, we used all stations that had continuous data throughout the event interval.For each panel, the root mean square (rms) error and standard deviation (σ) are given in the description at the top.

Space Weather Modeling Framework Geospace Configuration
The Space Weather Modeling Framework (SWMF) is a combination of numerical models to simulate space physics processes from the Sun to the Earth's upper atmosphere and outer heliosphere (Tóth et al., 2012;Gombosi et al., 2021).The core of the SWMF is the Block-Adaptive-Tree-Solarwind-Roe-Upwind-Scheme (BATS-R-US), a 3D model solving the magnetohydrodynamic equations.The Ridley Ionosphere Model (RIM) is a potential field solver for the ionosphere, and the Rice Convection Model (RCM) is a kinetic model of the ring currrent and inner magnetosphere.The SWMF couplers tie these three components together to simulate the space weather effects in space and on ground.
The Geospace configuration used for this study mimics the one used operationally at the NOAA Space Weather Prediction Center (SWPC).
The The model output contains a regular 360×180 grid of magnetic disturbances between 0-360 • longitude and ±90 • latitude stored at 1-minute cadence.The ground magnetic disturbances predicted by the simulation are computed by Biot-Savart integration of the currents external to the Earth, using both the BATS-R-US (for the magnetospheric currents) and RIM (for the ionospheric currents) models (Yu & Ridley, 2008;Gombosi et al., 2021).The contribution of the RCM is to describe the strong ring current that develops in the inner magnetosphere, and the RCM plasma pressure is coupled back to the BATS-R-US and thereby creates a significant effect on the ground magnetic variations.The resulting model values are rotated to obtain the B N orth , B East and B Down components that are directly comparable with the ground magnetic stations.
Figure 1 shows the simulated values (in blue) over the observed values as well as thethe magnitude of the difference between the observed and simulated values normalized by the mean of the observed values shown(red, scale to the right).While the simulated SYM-H follows the observed one quite well, it can be seen that the AL index is not as well reproduced by the simulation: the simulated AL follows the observations early in the storm when the AL is not very large, but the simulation does not catch the very strong currents associated with the substorm activity near the peak of the storm.This is typical of the simulations in the data set (T. Pulkkinen et al., 2022).The two bottom panels show representative local magnetic measurements from YKC and BOU stations.
In both latitude bands, the simulation tends to underestimate the magnitude of the disturbance.

Statistical Storm Simulation Dataset
The full dataset (Al Shidi, 2022) comprises 122 storms during years 2010-2019.Figure 2 shows the superposed epoch of the observed, simulated and error (difference) of the SYM-H for all storm time periods used in this study.The time's origin has been shifted from simulation start time to the storm onset time.The blue lines show the mean of the entire dataset, the dotted line is the median and the upper and lower gray curves are the inter-quartile range of the dataset.Throughout this study, we define the error as (with y as the time-dependent quantity) The storms included in the study are shown in the appendix.

Error Statistics
The one-minute-data from the simulation were compared with the one minute perturbation observations that were available from the SuperMAG database.The error magnitude differences between a mid-latitude and auroral station reflects the different drivers and magnitudes of the disturbances: Under most conditions, the midlatitude stations record the variations in the ring current, with typical signal intensity of few tens of nT, or during storm times up to −100 nT and even below.Other currents that contribute are the magnetopause current and field-aligned currents (Ganushkina et al., 2018).On the other hand, the auroral stations recording the strong substorm activity, the disturbances are of the order of several hundred nT, at times reaching to −1000 nT and even more.
The right panels of Figure 4 show the error statistics for all stations in the latitude band above 50 • magnetic latitude in both northern and southern hemispheres (top panel) and in the latitude band −50 • < Mlat < 50 • , representative of auroral/polar cap stations and mid/low latitude stations, respectively.While the distributions are slightly wider than those for individual stations, they share the same features of relatively symmetric distributions with long tails in the negative error direction.

Error Statistics at Individual Stations
Next we examine the errors at individual stations, to resolve the spatial distribution of the errors over the globe.To that end, we examine the horizontal magnetic field    is due to the fact that the simulation does not model the equatorial electrojets that may contribute to the measured disturbance and hence to larger errors near the magnetic equator (Forbes, 1981;Rastogi, 1989;Onwumechili, 2019).We note that the opposite signs of the medians (negative for the horizontal component and positive for the north component) arise from the fact that the horizontal component is a positive quantity while the north component is a negative quantity, and hence the errors have different signs for the model underpredicting both components.

Station above Station below
Simulation above H (hit) F (false positive) Simulation below M (miss) N (true negative)

Heidke Skill Score Analysis
To quantify the ability to forecast measurements at the individual stations, we assign Heidke skill score values (HSS) (Heidke, 1926) to each station.The Heidke Skill score is defined in the typical skill score format with skill given by the ratio of the model's score and a random chance score difference by the perfect score and random chance score difference.The HSS can be obtained through a 2 × 2 contingency table where the simulation and observations are compared to a threshold value (Table 1), and obtained using the definition (Jolliffe & Stephenson, 2012) which shows that the HSS maximum value for no misses and no false positives is 1, value of zero indicates no skill, and negative values indicate skill worse than chance.
A key for the usability of the HSS to assess the prediction quality for operational customers lies in a correct selection of the threshold value separating "Hits" (True positives) and "True negatives" (Quiet time).As the typical signal intensity varies with latitude, to gain meaningful results, the thresholds for "True positives" should likely be different for stations at different latitudes.On the other hand, for practical purposes, the threshold should reflect the level of disturbance requiring action from a space weather user.
For the lower latitude stations, the storm limit −50 nT can be argued to be a suitable event threshold.In this database consisting of simulations of storm periods, reaching below the 50 nT threshold should occur for a substantial portion of the time, and such disturbances are likely to drive currents that are of concern e.g. to the power system operators.
Figure 7 shows a geographic map of the HSS values for each of the stations available through the SuperMAG network using the 50 nT event threshold.The skill scores vary from very low (especially in the polar regions) to above 0.6, with best HSS values at the low and mid-latitudes.
Interestingly, there is a band of lower skill scores in the range 0.3-0.4 in the latitude band around 50-60 • latitude.This may be due to the fact that during storms, this latitude band is often underneath the ionospheric currents, which create very strong disturbances.If the model auroral oval is not able to accurately track the real one, motion of the equatorward edge can cause the station to be on one side of the boundary in the model and on the other side in the simulation.
Note that the skill scores with the 50 nT threshold are quite good in the auroral oval region around 60 • -70 • geographic latitude.This indicates a high number of hits, highlighting the fact that the model -even if not capturing the intensity of the perturbations -is often able to capture the event occurrence.
The bottom panel of Figure 7 shows the HSS results with a threshold of 200 nT more corresponding to the magnitude of the auroral electrojet currents during geomag-   netic activity (Klumpar, 1979;Akasofu et al., 1980;Waters et al., 2001).Obviously, the skill scores at lower latitudes are low, as the disturbances rarely reach such high values.
The auroral oval region shows a good skill score values with a median HSS value of 0.45 for the threshold value of 200 nT.
Figure 8 shows the HSS values for each individual station as function of magnetic latitude.The HSS 95% confidence interval for each station is calculated using bootstrap resampling (Efron, 1979).The confidence interval lengths are relatively small with an average of 0.018 for the threshold of 50 nT and 0.024 for the threshold of 200 nT.The plot clearly demonstrates the dip in HSS around the magnetic latitude of the equatorward edge of the auroral region (shaded gray) as discussed above.
A quantitative summary of the HSS values and their inter-quartile ranges is given in Table 2.The table demonstrates the conclusion discussed above that the performance of the model in the mid-latitudes, with a median HSS of 0.51, is better than that of highlatitudes, with a median HSS of 0.32.Furthermore, the higher-latitude stations have a higher inter-quartile range of 0.24 as compared to the mid-latitude error range of 0.11.

Discussion
In this paper, we have addressed the capability of the SWMF Geospace model to predict magnetic disturbances at individual stations.In general, the results are encouraging with positive HSS skill scores in the inter-quartile range of 0.36-0.53for most stations.
The magnetic field disturbance intensity has a tendency to be underpredicted in the simulation (Figure 6).The errors have a longer tail in the direction of underprediction (Figure 4).This indicates that for a sufficiently low threshold for event occurrence is required for the model not to produce an overly large number of misses.
Lower-latitude stations (−50...50 • magnetic latitude) generally have a higher skill score than their higher latitude counterparts.This is expected, as the highly variable ionospheric currents arise from disruptions of the tail current associated with substorms and/or the field-aligned currents associated with Earthward propagating flow bursts, both of which are difficult to model to high accuracy in time and location.Furthermore, the model is optimized for computing the Dst and SYM-H indices (Liemohn et al., 2018).
Further developing the ionospheric response and the coupled magnetotail processes offers an opportunity to improve the accuracy of the higher-latitude responses.In this study, we coupled the BATS-R-US global magnetosphere with 0.125 R E maximum resolution to the RIM ionospheric module.Both increasing resolution of the MHD model (Welling et al., 2019) and improving the description of the conductances in the ionosphere module (Mukhopadhyay et al., 2020) can influence the model accuracy and performance as measured by skill scores.
Also, as shown by Ridley et al. (2010), MHD simulations of the magnetosphere come with inherent limitations.The RIM model is a potential solver that forces a zero potential at 10 • latitude.The conductances come from an empirical model, driven by fieldaligned currents from the magnetospheric module.The relationship between the conductances and the field-aligned currents is a key factor contributing to the intensity of the ground magnetic perturbations (Mukhopadhyay et al., 2020).
The magnetic perturbations in the mid-latitude and equatorial regions are due to magnetospheric currents, while the perturbations in the auroral regions are caused by currents flowing in the ionosphere, which create an order of magnitude larger ground magnetic perturbation than the magnetospheric (ring and tail) currents.As the auroral ionospheric currents have a sharp equatorward boundary, errors near that boundary may be large if the model is not able to predict the location of the auroral oval correctly.
Generally, the dayside currents are largely directly driven by the solar wind, while the nightside involves more processes arising from the magnetotail plasma sheet, which complicates the relationship between the solar wind and the magnetosphere -ionosphere coupling processes.
Using a similar SWMF configuration as in this study, and a 2-year interval of the operational model, Liemohn et al. (2018) obtained a Heidke skill score of 0.51 for the hourly Dst index and −50 nT event threshold.Figure 7 shows that, indeed, stations at the midlatitudes (typically used to derive Dst) have an HSS of 0.5 or higher.It is also important to note that the two-year interval is dominated by quiet time , while our data set is strongly biased towards storm times.As this leads to more balanced number of hits and true negatives, the skill scores are higher -even for the individual stations.
The observed Dst is calculated as a weighted average of several stations at midlatitudes, attempting to capture the currents flowing near the equatorial plane in the magnetosphere.On the other hand, the Dst index is derived from the SWMF results by Biot-Savart integration of the magnetic disturbance at the center of the Earth Yu and Rid-ley (2008).Thus, there might be small differences between the simulation Dst as it is calculated now, and that computed from the simulated station disturbances at the actual Dst stations.However, the good correspondence between the simulated and observed Dst shows that this difference in methodology is not a major source of error.

Summary and Conclusion
In this study, we performed a comprehensive study of over 100 simulations of geomagnetic storms and the resulting predictions of disturbances measured at over 300 ground magnetometer stations covering a wide range of latitudes and longitudes.Using event threshold values of 50 nT below 50 • latitude and 200 nT at high latitudes above 50 • , the SWMF reproduced ground magnetometer signals to accuracy (HSS > 0.6) that is useful for the space weather operators.This leads to the possibility of using the model operationally to predict hazardous levels of geomagnetic activity in a more localized sense, giving regional predictions for auroral and subauroral regions.
We have shown that the model has a good ability to predict ground magnetic perturbations in the mid-latitude (equatorial) range with a median HSS of around 0.51.We also show that, although the predictive power at higher latitudes near the auroral oval is lower (median HSS of 0.45), the individual stations still provide results that are meaningful for space weather forecasts.Finally, we note that the region near the equatorward edge of the auroral oval presents challenges for the simulation, which will require further work.

Figure 1
Figure1illustrates a sample storm in the data set.The major storm occurred on May 7-9, 2014, and had a peak SYM-H intensity below −100 nT.The storm was run using the Space Weather Modeling Framework's Geospace configuration with virtual ground magnetometers in the locations where real magnetometers are found around the globe.The panels show the Newell coupling function illustrating the solar wind driving intensity as well as a summary of the solar wind input to the model, the SYM-H index, the AL index, and the Cross-Polar Cap Potential (CPCP) estimated using a model driven by solar wind parameters and the Polar Cap Index (PCI)(Ridley & Kihn, 2004).As an example, the two bottom panels show observations from two stations, from Boulder, Colorado, in the mid-latitudes, and from Yellowknife, Northwest Territories, Canada, within the auroral region (note that the observations are shown in different scales).

Figure 1 .
Figure 1.A sample storm on May 7-9, 2014.From top to bottom: The Newell et al. (2007) function, the SYM-H index, the AL index, the CPCP using the Ridley and Kihn (2004) model, magnetic north components from Yellowknife, Canada and Boulder, CO.Observed values are shown in black, while the Geospace simulation results are shown in blue.The red lines show the error normalized to the mean of the observed values (error = |observed-model|/mean(observed)).
solar wind and the magnetosphere are modeled by BATS-R-US in ideal MHD mode with the adaptive grid resolution changing between 0.125 R E in the near-Earth region and 8R E in the distant tail.The simulation box in the Geocentric Solar Magnetospheric (GSM) coordinates covers the region from 32 R E to −224 R E in the X direction and ±128 R E in the Y and Z directions.The inner boundary is a spherical surface at radial distance R = 2.5 R E .A steady state is found for each solar wind initial conditions while the simulation grid resolution varies spatially, with the resolution increasing in areas that are highly dynamic like the magnetosphere.The RIM model solves the Poisson equation for the electrostatic potential at a twodimensional ionospheric surface(Ridley et al., 2006).BATS-R-US feeds the RIM the fieldaligned currents from the simulation inner boundary, and the ionospheric conductances are derived using the field-aligned current intensity and location combined with background dayside and night-side conductances.The potential is set to zero at the lower latitude boundary at 10 • .The RIM then solves theVasyliunas (1970) equation for the electric potential, and gives the velocity boundary condition by using the electric field produced to derive velocity using the MHD equations.The ionosphere and magnetosphere models are coupled at a cadence of 5 seconds of simulated time.

Figure 3
Figure3shows a 2-dimensional historgram for the entire data set similar in format to those shown inCamporeale et al. (2020) which compare dB/dt.Each column is normalized by its maximum value for easier comparability.The left panels show two individual magnetomter stations in Yellowknife, Canada (YKC) and Boulder, Colorado (BOU) as representatives for high-latitude (50 • -90 • magnetic latitude) and mid-latitude (-50 • -50 • magnetic latitude) regions.We note again that this latitude binning is different from previous work such as A.Pulkkinen et al. (2013), as we consider all available stations in the SuperMAG database(Gjerloev, 2012) that cover all latitudes.The panels confirm previous results that indicate that SWMF typically underestimates the the observed magnetic disturbances, especially at high latitudes.The errors were calculated as a simple difference which produces positive error values when the simulation shows lower activity (less negative values) than the observations (i.e., underestimates the disturbance intensity), and negative error values when the simulation overestimates the (negative) disturbance intensity.The left panels of Figure4show the distribution of errors in the magnetic field components averaged over all simulated storms for two representative stations, Yellowknife (YKC) at the auroral latitudes (magnetic latitude of 69 • ) and Boulder (BOU) in the midlatitudes (magnetic latitude of 48 • ).The gray-shaded region in the figures indicates the inter-quartile range of the horizontal component errors.(Note that the two stations are shown in different disturbance magnitude scale in the horizontal axis).These two stations were chosen as they have the largest datasets of magnetic perturbations for the time periods covering the storms.The error statistics has roughly a normal distribution.The Fisher-Pearson skewness coefficient values also shown in the figures show that the errors mostly fall in the category of nearly symmetric (|g 1 | < 0.5) or moderately skewed (0.5 < |g 1 | < 1) distributions.However, there is a systematic tendency for the mean of the negative values to be larger (in absolute values) than the mean of the positive values, indicating that there is a longer tail in the negative error distribution or in the degree of underestimation of the disturbance magnitude.

Figure 2 .
Figure 2. Superposed Epoch analysis of the simulated 122 storms.The panels show (top) simulated SYM-H, (middle) observed SYM-H, (bottom) error in SYM-H as the difference between model and observed values.The blue curves show the mean and the black curves show the inter-quartile range while the dashed black curve is the median.This is using the start of the Dst decrease as zero epoch.

Figure 3 .
Figure 3.A 2d histogram of the magnetometer station horizontal magnetic field values.The histograms are normalized by each column.The left panels show the magnetometer stations in Yellowknife, Canada (top) and Boulder, Colorado (bottom).Right panels show an aggregated version with all high-latitudes (50 • -90 • , top), and mid-latitudes (-50 • -50 • , bottom).Red line shows the line of perfect match.

Figure 4 .
Figure 4. Left panels: Error distributions for magnetometer station in Yellowknife, Canada (top) and Boulder, Colorado (bottom) Right panels: Error distributions for stations in the highlatitudes (50-90 • magnetic latitude, top), and mid-latitudes (-50-50 • magnetic latitude, bottom).The grey shading shows the inter-quartile range of the horizontal errors.The magnetic field components shown are: North (blue), East (orange) and Horizontal (Green).The Fisher-Pearson skewness values (g1) for each component are given in the legend.

Figure 5 .
Figure 5. Geographical map of station median horizontal magnetic field component errors.The color scale shows values under-predicted by the model in blue colors, and values overpredicted by the model in red colors.

Figure 5
Figure5gives a geographical map of the median error of the horizontal component.Consistent with the overall average, the errors are smaller for the lower-latitude stations and larger for the higher-latitude stations.The errors are largest at stations poleward of the auroral oval, within the region typically in the open field-line polar cap region.The origin of the larger errors found in the stations on the coast of Greenland is uncertain.

Figure 6
Figure 6 shows the median and inter-quartile ranges of the errors for all available magnetometer stations.The four plots show the North and Horizontal components, and the two latitude bands above and below 50 • separately.The top panels showing the northern high-latitude stations also indicate the typical auroral region with gray shading for reference.The left panels showing the horizontal component results show that the median errors are quite small, especially in the mid-latitudes (note that the op and bottom figures have different vertical scales).The errors grow in the auroral region while having a consistent positive offset towards the polar regions above.Interestingly, the median errors

Figure 6 .
Figure 6.From top to bottom, horizontal component errors in high-latitudes (50 • -90 • magnetic latitude) and mid-latitude (-50 • -50 • magnetic latitude) of individual magnetometer stations.The bars show the inter-quartile ranges of the errors in the stations found on those latitudes.The top is the third-quartile, the cross is the second-quartile (median), and the bottom is the first-quartile.In the top panels, the grey-shaded regions show the typical auroral zones.

Figure 7 .
Figure 7. Station Heidke Skill Scores with a threshold of 50 nT (top and middle) and 200 nT (bottom).The top, middle, and bottom plots are in projections equal-area, equirectangular, and conic.The middle plot is in a qualitative color scheme to accentuate the change in HSS at auroral latitudes.

Figure 8 .
Figure 8. Heidke Skill Score plot for individual magnetometer stations by magnetic latitude.The black lines show the median HSS for the station in that magnetic latitude, the blue lines show the lower and upper bounds of the 95% confidence interval of the HSS for that individual station.The shaded region shows the auroral oval region.
Haiducek et al. (2017) studied statistics of individual storms using the global indices during the month of January 2005.As these are storms not included in this study, they provide an independent comparison.They found an error probability density for SYM-H similar to the errors for mid-latitude stations shown in the bottom right panel of Figure4.Furthermore, they also assert that increasing the simulation resolution does not necessarily improve the accuracy of SYM-H predictions.Camporeale et al. (2020) examined statistics of individual stations, specifically regarding dB H /dt over a 2-year interval, focusing on three stations FRN, OTT, IQA at low, (sub-)auroral, and high latitudes.They showed that SWMF operational Geospace configuration predicts the changes in perturbation better at mid-latitudes than at highlatitudes, consistent with results in this study.Furthermore, they proposed a machine learning algorithm combined with SWMF, which was shown to increase skill scores significantly.
).The Federal Emergency Management Agency National Threat and Hazard Identification and Risk Assessment report (Federal Emergency Management Agency, (Congress, 2020) space weather-associated power outage as one of two natural hazards (besides a pandemic) that can have nation-wide impacts.Furthermore, the Promoting Research and Observations of Space Weather to Improve the Forecasting of Tomorrow Act, or the PROSWIFT Act,(Congress, 2020)recognizes addressing the GIC risk to power systems as critical for the nation's safety and security.The NOAA Space Weather Prediction Center (SWPC) in Boulder, CO, produces ground magnetic perturbation maps based on the University of Michigan Space Weather Modeling Framework (SWMF) Geospace model for the ground infrastructure operators (Space It is clear that the superposed epoch curves follow each other very well, and the superposed epoch error is quite small, while showing a trend toward negative errors (model values recovering faster than the observed ones).Although not shown here, this negative trend is largely caused by large storms with maxima below −100 nT.It is also notable that the scatter of the errors is larger during the storm main phase and reduces toward the storm recovery.This is indicative of the highly varying SYM-H values that can lead to large errors e.g. by a small time shift between the model and the simulation.

Table 1 .
2 × 2 contingency table describing the HSS."Above" and "Below" indicate field values above and below the threshold.

Table 2 .
A table with a quantitative summary of the Heidke Skill Scores of the stations in certain regions of magnetic latitude.Given are the threshold values used to calculate the HSS, the median HSS of that region, the 25-th percentile (p(25)) and 75-th percentile (p(75)) which show the inter-quartile range.