## 1 Introduction

In this second paper of a series of three, we study the relationship between geomagnetic interdiurnal range indices and sunspot numbers in detail. We investigate the degree to which sunspot data are consistent with the geomagnetic data and aim to define sunspot data sequences to use as inputs to the open solar flux model prediction in Paper 3 [*Lockwood and Owens*, 2014].

As discussed in Paper 1 [*Lockwood et al*., 2014c], there are two main long-term indices in common use to quantify sunspot activity; the sunspot number and the group sunspot number data sequences which extend back to 1749 and 1610, respectively, in monthly means. We here start from the sunspot number *R* sequence published by the World Data Centre for the Sunspot Index and Long-term Solar Observations *(WDC-SILSO) of the Royal Observatory of Belgium, Brussels*, which is a composite of the international, Zürich, and Wolf sunspot numbers. The international number has been compiled from data generated by a number of observers (currently numbering 86 in 29 different countries) from 1 January 1981 onward. This uses the same algorithm as, and is a continuation of, the daily Zürich sunspot numbers, *R*z, which were based upon observations made at Zürich and its two branch stations in Arosa and Locarno. Only one observer was used to make each daily *R*_{Z} estimate (who was the highest ranked observer available on that day in a hierarchy ordered by perceived reliability). The “Waldmeier discontinuity” discussed in Paper 1 relates to a putative change in observing and processing practices for compiling *R*_{z} around 1945.

As discussed by *Clette et al*. [2007], the basic sunspot number formulation was first used in 1849 by Rudolf Wolf, Director of the Zürich Observatory. He was succeeded by his former assistant Alfred Wolfer, who introduced an important change in 1882. In generating his sunspot number, *R _{W}*, Wolf did not count the smallest spots when quantifying the number of individual spots,

*N*, in order to try to maintain compatibility with earlier observers. However, this requires a subjective choice of which spots to include. Wolfer abandoned this practice and his new procedure was calibrated against Wolf's sunspot numbers over a 16 year interval (1877–1892). This yielded a seemingly constant scaling factor of 0.6 that is still used today to scale the sunspot observations to the pre-1882 Wolf sunspot number,

_{S}*R*. In addition, Wolf had become so convinced of a linear correlation between his geomagnetic and sunspot data that he recalibrated his sunspot numbers for before 1849 upward by about 20%. An additional problem is that the data before 1849 are sparse, leading to greater uncertainties [

_{W}*Usoskin*, 2013]. Another inhomogeneity in the data series arises in 1818, before when only monthly values have been compiled, whereas after then the basic data available are daily. Annual means of

*R*are available from 1700 [

_{W}*Chernosky and Hagan*, 1958], but they are not generally regarded as reliable because the data are so sparse and highly interpolated and consequently these are rarely used [

*Usoskin*, 2013].

*Leussu et al*. [2013] have recently used a homogeneous series of recently digitized sunspot drawings by Schwabe [*Arlt et al*., 2013] to study the calibration of the Wolf sunspot numbers, *R _{W}*. They find a systematic error in the scaling of

*R*before 1849, calling for a 20% reduction in values of

_{W}*R*for all years before 1849. This almost certainly is reversing Wolf's 20% upward recalibration of sunspot numbers to agree with geomagnetic activity data, a practice that is now generally thought to be both unreliable and undesirable [

*Mursula et al*., 2009,

*Lockwood et al*., 2014c]. Hence, we will refer to the correction called for by the work of Leussu et al. as the “Wolf discontinuity.”

The group sunspot number series was introduced by *Hoyt and Schatten* [1994, 1998] and is extremely valuable because it extends back to before the Maunder minimum in sunspot activity. However, it has to be compiled from observations with sparse availability for the early years (*N* in equation (2) of Paper 1 is small and some observers may not have been active throughout any 1 year). *Usoskin et al*. [2003a] studied the effect of this and made some minor corrections, and other corrections have been made, based on additional historic observations that have come to light [*Vaquero et al*., 2011; *Vaquero and Trigo*, 2014]. The Royal Greenwich Observatory (RGO)/Solar Observing Optical Network (SOON) sunspot group data discussed in Paper 1 cover 1874 to the present and form the backbone of the *R*_{G} data sequence, and the correlation demonstrated in Paper 1 is used to ensure that it is very similar to the international sunspot number since 1874. However, it is well known that international/Zürich/Wolf composite and group sunspot numbers diverge as one goes back in time before this date, as indeed was noted by *Hoyt and Schatten* [1994, 1998] when they first derived the *R*_{G} sequence [e.g., *Hathaway et al*., 2002].

Paper 1 quantified the magnitude of the Waldmeier discontinuity to be 11.6%. In addition, *Leussu et al*. [2013] have quantified the Wolf discontinuity to be 20%. We here implement both these corrections before establishing the relationship between the geomagnetic variation and sunspot number and how it varies over the solar cycle. To ensure that there is no confusion, we here call the corrected international/Zürich/Wolf sunspot number composite *R*_{C} where *R*_{C} = *R* for 1946 and after, *R*_{C} = 1.116*R* for 1849–1945, and *R*_{C} = (1.116 × 0.8)*R* for 1848 and before, where *R* is the sunspot number composite, as published by WDC-SILSO. In the present paper we also suggest a simple extension of the resulting *R*_{C} back to 1610 using 1.3*R*_{G}. (Note that this is not, in any sense, a recalibration of *R*_{G} ; rather, it is a means of extending the *R*_{C} data series back to the Maunder minimum that is greatly preferable to using the sparse, highly interpolated early annual means of the Wolf number).

*Svalgaard and Cliver* [2005] introduced the *IDV* index based on the interdiurnal variability *u* index of *Bartels* [1932]. The major difference between *u* and *IDV* is that the latter is based on near-midnight values only, whereas the *u* index used whole-day means. *Svalgaard and Cliver* [2007] did not see this as a significant difference because they used Bartels' *u* index data to extend the *IDV* sequence back to 1835, which is just 3 years after Gauss' establishment of the first magnetometer station in Göttingen. The data used by *Svalgaard and Cliver* [2007] to extend *IDV* to before 1880 were compiled by Bartels, but it is not a homogeneously constructed index, being compiled in a different way after 1872 to before then. Bartels notes that before 1872, no proper data to generate an interdiurnal index were available to him and so he used other correlated measures of the diurnal variation as proxies. Bartels himself stresses that his *u* values before 1872 are “more for illustration than for actual use” and describes data for 1835–1847 as “least reliable,” 1847–1872 as “better,” and 1872–1930 as “satisfactory.” Given that Bartels does not include his data before 1872 in his satisfactory classification, it is not just a semantic point that he regarded the data before 1872 as “unsatisfactory.” *Svalgaard and Cliver* [2007] carried out some tests to justify employing the *u* proxy data for before 1872 which enabled them to make a reconstruction of geomagnetic activity back to 1835. However, Figure 5 of Paper 1 [*Lockwood et al*., 2014c] shows that the early *IDV* data have different characteristics to the later data. Specifically, the early *IDV* index was much more similar to the sunspot number data than at later times strongly suggesting that the independence of the two data sets had not been maintained and that at some point calibration of one using the other had taken place. This may partially have come about through Wolf's use of geomagnetic data to recalibrate *R _{W}*, but as shown in Paper 1, other geomagnetic data do not show the same error as

*IDV*. This is a pitfall we remain conscious to avoid in this paper.

*Lockwood et al*. [2013a] introduced the *IDV*(1d) index because of the following: (1) they were concerned about the overstrong correlation between *u* (and hence early *IDV*) and sunspot numbers in the early years and (2) because they found that the dependence on interdiurnal variation data on interplanetary parameters depended on station latitude (at all latitudes, not just near the auroral oval) and because *IDV* was compiled from an evolving mix of stations, its dependence on interplanetary parameters in the past would be different from that in the space age. These authors also returned to Bartel's original concept of using whole-day means which has advantages in suppressing both instrumental and geophysical noise: indeed, the range of correlations for stations at different latitudes is found to be as great as that for one station at different UTs, and hence, 24 stations of *IDV* data are required to achieve the same noise suppression as one station giving *IDV*(1d). This has the advantage of allowing *IDV*(1d) to be compiled from just one magnetic observatory in the same geographical region, avoiding artifacts caused by the response to interplanetary variations varying with the location of the observatory. *Lockwood et al*. [2013a] were able to extend the *IDV*(1d) data sequence back to 1845 using data from the Helsinki observatory. *Svalgaard* [2014] pointed out a calibration error in the Helsinki data that applied to 7 years during cycle 11, and *Lockwood et al.* [2014a] have studied the optimum correction method and implemented it to allow for this. It is important to note that although *Svalgaard* [2014] used *R*_{G} to identify the calibration change in the Helsinki data, the correction implemented by *Lockwood et al*. [2014a] did not employ any form of sunspot number and so the full independence of the geomagnetic and sunspot number data sequences was maintained.

From modeling using *R* to quantify emergence rate, *Wang et al*. [2005] found an approximate relationship between open solar flux and *R ^{n}*, with an exponent of

*n*near 0.5. However, these authors also noted that the correlation was high for photospheric flux but poor for open solar flux.

*Svalgaard and Cliver*[2005] found that their

*IDV*index correlated with

*R*. They also noted that

^{n}*IDV*correlates with the near-Earth interplanetary magnetic field (IMF)

*B*with very little influence of the solar wind speed (see

*Lockwood*[2013]) so

*Svalgaard and Cliver*[2005] concluded that

*B*also varies with

*R*. As noted by

^{n}*Lockwood et al*. [2014a], the peak correlation coefficient between the sunspot number to the power

*n*,

*R*, and the observed IMF is

^{n}*r*= 0.84 for

*n*= 0.4, with significance (computed against the AR-1 autoregressive “red-noise” model) of

*S*= 96.3%. Hence, the simultaneous

*R*“explains” about 70% of the overall observed IMF variation. Paper 1 shows the peak correlations for

^{n}*IDV*, and

*IDV*(1d) are 0.85 and 0.83 for

*n*= 0.54 and

*n*= 0.69, respectively. Both are more statistically significant than the correlation with the IMF data (

*S*> 99.9%) because of the larger number of data samples. We here investigate these correlations of

*IDV*and

*IDV*(1d) with sunspot numbers in more detail.