#### Data

We assembled 38 time series of carabid catches with associated weather data from the Czech Republic, the Netherlands, the UK and the USA. All data sets comprised at least 15 consecutive pitfall samples at a single location. Most data sets (28) are from arable crop systems, one was from a field edge, four from perennial grassland and five from apple orchards (details in Tables S1 and S2 in Supporting Information). The data sets were collected from 1974 to 2010. Data originating from the same area and year but from different types of vegetation were analysed separately because differences in vegetation structure affect microclimate and trap catch (Crist & Ahern 1999; Hatten *et al*. 2007). Data sets were standardized by calculating the rate of catch as numbers caught per trap per day. As the analysis entails taking logarithms, we added 1 to all data points to account for zeros.

Pitfall traps are usually placed and emptied in the morning. Accordingly, the mean minimum and maximum temperatures experienced during a sample interval were calculated from the first day of the sampling period until (and including) the day before emptying (Hemerik & Brussaard 2002).

#### Model for the relationship between temperature and catch rate

As a basis for our analysis, we postulate that an absolute change in temperature will result in a relative change in daily catch and that this relative change in daily catch per unit of temperature is constant over an ecologically relevant range of temperature (Williams 1940). Mathematically:

- (eqn 1)

where *T* is temperature in °C, and is the daily catch, that is, the number of individuals caught daily at temperature *T*, and the estimated parameter *r* represents the rate of change in relative catch rate predicted to occur at a given temperature. As an example, if *r *=* *0·04, an increase in 1 °C will lead to an increase of exp(*r*) = 1·0408 in catch, that is, 4·08%.

The relative character of the parameter *r* with respect to the measurement of catch is critical because details of pitfall-trapping method vary by study (i.e. they differ in size, material, liquid in the pitfall, cover, et cetera; see Table S1). If the effect of temperature was expressed as an absolute change in the catch, effects of the pitfall design would enter into the estimate of the parameter *r* and make the result less generic. Moreover, the use of a relative change in the catch implies an exponential relationship, which is characteristic of temperature-dependent rates in biological systems (Williams 1940; Logan *et al*. 1976).

The solution to equation (1) is an exponential relationship between the catch and temperature during any two sampling periods, with a multiplication factor of exp(*r*) per °C:

- (eqn 2)

where *n*_{1} and *n*_{2} are catch samples from the same data series at any two times 1 and 2, and *T*_{1} and *T*_{2} are the average temperatures during the catch intervals for both catches. Formula (2) can also be expressed as

- (eqn 3)

where log denotes natural logarithm. Thus, *r* can be estimated from the data, using the relationship:

- (eqn 4)

that is, by regressing the difference in natural logarithm of two catches on the temperature difference between two subsequent catch periods, which estimates how an increase or decrease in log(catch) between two dates is related to the difference in temperature. Both minimum and maximum daily temperatures were tested as a predictor of catch, considering that the catch may contain both diurnal and nocturnal species. Maximum temperature data were not available for data set #33; therefore, this data set was analysed for minimum temperature only. Wherever we discuss the relationship between catch rate and temperature in the remainder of this study, this was effectively studied by regressing the difference in the log of the catch rate (+1) on the difference in temperature.

#### Estimation of the effect of temperature in individual data sets by regression with differences

Time series are prone to showing autocorrelations that may be corrected by detrending. The need to detrend the time series was shown by conducting an autoregression analysis on the catch and temperature data (Table S3). Calculated autoregression coefficients, *ar*_{k}, were calculated using the **ar** function in the programming language R, version 2.8.0 (R Development Core Team 2010), where, for example, *ar*_{1} indicates a linear trend, *ar*_{2} a quadratic trend, etc.

Equation 4 was fitted to the data by taking first-order differences of the log of the catch and of the temperature records through time and regressing one on the other (Cormac & Ord 1979). A difference in catch rate between two periods is therefore compared with the difference in temperature between the same two periods. In the process of taking differences, the effect of seasonal trends in temperature and catch is removed, avoiding the risk of spurious correlation when unrelated time series are regressed against one another (Cormac & Ord 1979). We also tested two other methods for estimating the local (i.e. one point in time) response of catch rate to temperature. These are called ‘two-point piece-wise detrending’ and ‘four-point piece-wise detrending’, based on the number of time points that is considered in addition to the focal time point (see Supporting information: Appendices S1 and S2). The key difference between the methods is the width of time interval over which reference data are used to estimate the temperature response at a given point in time: two or four time points. Appendix S1 gives theory and Appendix S2 shows an example data analysis. As the three methods of parameter estimation yielded similar results, we focus on results from first-order differencing, a well-established statistical method (Cormac & Ord 1979; Shumway & Stoffer 2006).

#### Synthesizing regression results in individual data sets to an overarching relationship, using meta-analysis

Following the estimation of the slope of the relationship between Δlog(catch) and Δtemperature in 38 data sets, the overall effect of temperature was assessed by combining in a meta-analysis, the 37 estimated rate coefficients for maximum temperature and the 38 estimated rate coefficients for minimum temperature. In meta-analysis, a weighted mean rate is calculated taking into account the variability of the rate estimates in each study. In the first step, it is assumed that all studies are essentially estimating the same rate, and variability among the studies (between study variance) is assumed to be due to sampling error only. This is the fixed-effects model (Rosenberg *et al*. 2004; Madden & Paul 2011). On the contrary, the random-effects model accounts for the possibility that different studies estimate different rates, due to uncontrolled differences in the study designs, for example, the vegetation, the type or size of the trap, duration of sampling interval, the collection fluid, etc.

In the fixed-effects model, the weight for each study is inversely proportional to the variance of the rate estimate:

- (eqn 5)

where *v*_{i} is the variance of the estimated rate in study *i*. In the random-effects model, the weights are calculated as:

- (eqn 6)

where *Q*_{T} is the total heterogeneity determined from a fixed-effects model meta-analysis (Rosenberg *et al*. 2004; see below). Although the weights are defined differently for the fixed- and random-effects model, the average rate is calculated for both with the same formula:

- (eqn 8)

where *r*_{i} is the rate estimate in study *i*. If the pooled variance in eqn. 7 is very large as compared to the variance of single study estimates (i.e. large heterogeneity), then all studies have approximately the same weight, and meta-analysis yields the simple arithmetic average as overall rate estimate. If the pooled variance is small, the studies are weighed according to the precision (as measured by the inverse of the variance) of the estimate of each *r*_{i}. The average rate has variance (=squared standard error):

- (eqn 9)

Significance of this average rate (as compared to a value of 0 under the null hypothesis of no relationship between temperature and the catch) is determined by constructing a confidence interval based on the t-distribution, and determining whether zero is included.

The need for using the random-effects model is assessed by calculating a measure of heterogeneity between studies in the fixed-effects model:

- (eqn 10)

*Q*_{T} is tested against a χ^{2} distribution with *n*-1 degrees of freedom, where *n* is the number of studies (Madden & Paul 2011). If there is significant heterogeneity, the random-effects model is supported, and estimates from the fixed-effects model are not statistically valid.

The meta-analysis was performed in MetaWin 2.0 (Rosenberg, Adams & Gurevitch 2000).

#### Correction of time series for temperature bias

After the size of the temperature effect is estimated from the data, this effect may be corrected for to obtain a standardized catch rate, with all temperature influence removed. The correction can be done using Equation 2, taking *n*_{1} as the corrected catch at reference temperature *T*_{1}, while the observed catch is *n*_{2} and the observed temperature *T*_{2}. A whole data series can be corrected in this way, where *n*_{2} and *T*_{2} vary according to the chosen time point in the data series, while *T*_{1} is a constant reference temperature. As a result, *n*_{1} is a time series corrected for temperature bias. In this study, we used either the average maximum temperature during an experiment or a constant temperature of 20 °C as reference temperature *T*_{1}.

A salient question is whether the rate estimate for bias correction can be taken from the meta-analysis in the current study (see 'Results') or should be estimated from an analysis of the relationship between log(catch) and temperature within the time series that is under consideration. The rate estimate from our study (see 'Results') would be preferable if it has lower uncertainty then the rate estimate from a new study. In the case of the random-effects model, the standard error of the rate estimate for a new study *r*_{new} (i.e. prediction error) comprises two components: the variance of the average rate estimate obtained in this study (eqn. 9) and the between study variance (eqn. 7). These are combined as:

- (eqn 11)

This prediction error can be directly compared to the standard error of a single estimate *r*_{i} in a study, and to the overall mean error of individual rate estimates (= square root of the mean within study variance. We make these comparisons to assess whether it is advisable to use the average rate from the meta-analysis for bias correction in future work.

#### Species-specific responses

To determine whether we could identify temperature effects in catches of single species, we conducted the analysis for catch series of ‘dominant’ species, that is, species that constituted more than 5% of the total catch in a data set (Table S4). Each species was classified according to its diel activity, if known (e.g. Thiele 1977; Luff 1978; Kegel 1990). A total of 165 data sets (maximum temperature) and 168 data sets (minimum temperature), representing 37 species, were analysed.

#### Multiple climatic factors

Multivariable effects were investigated using multiple linear regression in 19 data sets (#1 - #17, #37, #38). Before analysis, we first excluded variables showing strong collinearity. For instance, daily heat sum is strongly correlated with irradiation (Crawley 2005). Five weather variables showing minimal collinearity were selected: maximum temperature, daily precipitation, air pressure, air humidity and wind speed. These variables were calculated first as a daily value, and then averaged over the sampling interval. First-order differences were taken before analysis. We started out by fitting all variables in a full regression model without interactions and then reduced the model by step-wise removal of insignificant variables on the basis of F-tests, until a parsimonious model with only significant terms was obtained (Crawley 2005).