Philosophy of Climate Science Part I: Observing Climate Change

This is the first of three parts of an introduction to the philosophy of climate science. In this first part about observing climate change, the topics of definitions of climate and climate change, data sets and data models, detection of climate change, and attribution of climate change will be discussed.


Introduction
In his speech to the 2014 Climate Summit, Secretary-General of the United Nations Ban Ki-Moon called climate change the 'defining issue of our time'. This is the first part of a three-piece article in which we review epistemological and decision-theoretic issues arising in connection with climate change. In Part I, we discuss the empirical observation of climate change; in Part II, we investigate climate modelling; Part III examines what principles guide decision-making in climate policy.
In this first part, we start by reviewing different definitions of climate and climate change (Section 2). We then turn to the nature of climate data sets (Section 3) and discuss how these data sets are used to detect climate change (Section 4). This leads to discussion of attribution, the question of what the causes of climate change are (Section 5). We end with some brief conclusions (Section 6).

Definitions of Climate and Climate Change
Intuitively speaking, the weather is the state of the atmosphere at a certain point of time. The climate, by contrast, is the distribution of certain variables (called the climate variables) arising for a certain configuration (i.e. certain gas greenhouse concentrations and certain aerosol emissions) of the climate system. The climate variables include those that describe the state of the atmosphere and the ocean and sometimes also other variables such as those describing the state of glaciers and ice sheets (IPCC 2013). So the climate is not about the exact values of the surface air temperature, ocean temperature etc. at a certain point of time, but about the surface air temperature, ocean temperature etc. that one can expect when the climate system is in a certain configuration.
How can this intuitive idea be made more precise? It turns out that there are several competing ways to do this. In particular, there are two very different kinds of definitions of climate. Climate as distribution over time roughly corresponds to what one learns about climate in geography education at school. Here, climate is about the distribution of the climate variables that arises when the climate system evolves over a certain time period. Climate as ensemble distribution is popular with scientists who are concerned with predicting the climate. It is about the distribution quantifying how likely it is that certain values of the climate variables are expected at a certain point of time in the future given our current uncertainty about the climate variables.
Let us first consider the IPCC's definition: The problem with these definitions is that they are vague. 'Climate in a narrow sense' seems to refer to a distribution over time, but, as we will see, there are several definitions of climate as distributions over time. 'Climate in a wider sense' is even vaguer and seems to be compatible with any definition of climate (since any definition corresponds to a distribution and hence offers, in some sense, a statistical description). To see how climate can be defined more precisely, we will now review five popular definitions of climate (three of them are distributions over time and two of them are ensemble distributions). It will become clear that how to define climate and climate change is both nontrivial and contentious.
Let us start with distributions over time. The climate is inf luenced by the external conditions of the climate system (such as volcanic activity and the amount of solar energy reaching the Earth). Suppose that the external conditions are small f luctuations around a mean value c over a certain time period, where the climate variables are in a certain initial state at the beginning of the time period. According to Definition 1, the climate over this time period is the finite distribution over time of the climate variables given the initial states, which arises when the climate system is subject to constant external conditions c (e.g. Dymnikov and Gritsoun 2001;Lorenz 1995). (The value assigned to a certain set A by a finite distribution over time, assuming, say, that time is measured in days, is given by the number of days where the values are in A divided by the total number of days in the time period). Climate change then amounts to different distributions over two successive time periods. However, in reality, the external conditions are not constant, and even when there are just small f luctuations in external conditions around a mean value, this can lead to altogether different distributions (Werndl 2015). Hence, there is the problem that this definition might not have anything to do with the distributions of the actual climate system. Thus, the varying external conditions need to be taken into account.
The most direct way to do this is to adopt Definition 2that the climate is the finite distribution over time of the actual evolution of the climate variables given the initial states (i.e. when the external conditions vary as in reality). Again, climate change amounts to different distributions for successive time periods. If good observations are available, climate, thus defined, can be readily estimated from the observations and hence this definition is very popular. For instance, this is the definition endorsed by the World Meteorological Association (2015), when they write: climate, sometimes understood as the "average weather", is defined as the measurement of the mean and variability of relevant quantities of certain variables (such as temperature, precipitation or wind) over a period of time (cf. Hulme et al. 2009). However, this definition suffers from a serious problem which is best illustrated with an example. Suppose that the time period from R 0 to R 1 is marked by two different regimes because at R M = R 0 + (R 1 ÀR 0 )/2, the Earth was hit by a meteorite and thus became much colder. Clearly, the climate before and after R M differ. Yet Definition 2 does not imply this: there is nothing that forbids one to say that the climate is the distribution over time from R 0 to R 1 because, according to this definition, the climate is just a distribution over a certain time period.
To avoid this problem, Werndl (2015) proposes a slight modification of the second definition by introducing the idea of regimes of varying external conditions. According to Definition 3, the climate is the finite distribution over time of the climate variables arising under a certain regime of varying external conditions ( given the initial states). Spelling out in detail what a regime of varying external conditions amounts to is not easy, but a reasonable requirement is that the mean of the external conditions is approximately constant over different sub-periods within the period. This implies that there are different regimes and hence different climates before and after the hit of the meteorite. Thus, Definition 3 avoids the problems of Definition 2. Again, there is climate change when there are different climates for two successive time periods. As for all other definitions of climate as distributions over time, for Definition 3 the question arises over which finite time period the distributions should be taken. A pragmatic approach as adopted by Lorenz (1995) seems most promising: the choice of the time interval should be inf luenced by the purpose of research and should be short enough to ensure that changes which are conceived as climatic are classified as different climates but long enough so that no specific predictions can be made.
Let us now turn to the very different second class of definitions where climate is an ensemble distribution (an ensemble consists of a collection of simulations; in what follows, these are predictive simulations arising from different initial conditions). Suppose one wants to make predictions at t 1 in the future and that from the present to t 1 the external conditions take the form of small f luctuations around a mean value c. Let the present measurement of the climate variables be represented by an initial conditions ensemble (describing all the possible initial states of the system given our measurement accuracy). According to Definition 4, the climate at time t 1 is the distribution of the possible values of the climate variables at t 1 assuming that the external conditions were constant. That is, the climate is the distribution of the climate variables that arises when the initial conditions ensemble is evolved forward under the climate model until t 1 under constant external conditions c (e.g. Lorenz 1995;Stone and Knutti 2010). 1 Many philosophers have expressed their concerns that this definition is rather peculiar and has little, if anything, to do with our intuitive idea of climate. 2 We will come back to these concerns when discussing Definition 5. Right now it is important to point out that the requirement of constant external conditions again causes problems. In reality, the external conditions are not constant, and even when there are just small f luctuations around a mean value this can lead to altogether different distributions (Werndl 2015). Therefore, Definition 4 may not tell us anything about the actual future possible values of the climate variables. So this again shows that the varying external conditions need to be taken into account.
The most direct way to achieve this is to define climate as the actual ensemble distribution. That is, suppose again that one wants to make predictions at t 1 in the future, and let the present measurement of the climate variables be represented by an initial conditions ensemble. According to widely endorsed Definition 5, the climate at time t 1 is the distribution of the possible states of the climate variables at t 1 . That is, the climate is the distribution of the climate variables that arises when the initial conditions ensemble is evolved forward under the climate model until t 1 for the actual path taken by the external conditions. This is what Daron and Stainforth (2013, 2) have in mind when they write: 'For the purposes of climate prediction, therefore, it is most useful to view climate as a distribution conditioned on our knowledge of the system's state at some point in time'.
While this definition is predictively very useful (because when making predictions, one is often interested in the expected value of the climate variables, e.g. the temperature, at a certain point of time), there are several problems, which also arise for Definition 4. First, one usually thinks of climate as something objective that is independent of our knowledge. Yet, Definition 5 depends on our present knowledge about the climate variables! Second, Definition 5 defines the future climate, but it is difficult to see how the present and past climate should be defined. Yet without a notion of the present and past climate, we cannot define climate change. Also, we talk about the present climate and past climates, and hence, a definition of climate that cannot make sense of this is unsatisfactory. Third, it should be possible to estimate the climate from high-quality observational records. Yet it can be shown that the ensemble distribution of Definition 5 does not relate to records of observations in any way (Werndl 2015).
It should be noted that there are also infinite versions of Definitions 1-3 (which arise when the time period goes to infinity) and Definitions 4-5 (which arises when the predictive lead time goes to infinity). However, they seem to be inferior to the finite versions because they suffer from the additional problems that the relevant limits may not exist and that the infinite distributions may not be empirically relevant because they may not approximate the finite distributions for time periods or prediction-lead times of interest (Lorenz 1995, Werndl 2015. To sum up, defining climate is nontrivial and there is no definition of climate that is uncontroversial or does not raise many questions. Overall, we prefer Definition 3 (climate as a distribution over time for a certain regime of varying conditions) as it suffers from fewest of the problems mentioned above.

Data Sets and Data Models
There are various ways data are obtained in climate science. Meteorological ground stations measure the temperature of the air near the surface of the Earth using thermometers. A network of free-f loating buoys, which sink down to a depth of about 2000 m, then come back to transmit their data, and then sink back down again, and so on, provide measurements of the temperature of the ocean. Satellites record concentrations of greenhouse gases, aerosols, cloud coverage, etc.
Often raw data, the unprocessed data received directly from the measurement instruments, contain errors, are irregularly spaced and are incomplete in various ways. For example, records of surface temperatures further in the past are available only for certain locations (and the further one goes into the past, the fewer the records are). A standard technique is to use either climate or weather models to interpolate and fill in missing data. In particular, so-called reanalyses are estimates of the historical atmospheric temperature and other quantities, which are usually created by algorithms combining information from models and observations (IPCC 2013). Furthermore, particularly in the field of paleoclimate, often no direct measurements are available.
These various ways of obtaining data raise a host of philosophical problems. First of all, the use of instruments to obtain data, e.g., of thermometers to measure the temperature, raises the question of theory-ladenness of observation. This is a classical topic that has been extensively discussed in philosophy (e.g. Kitcher 1995). One way of taming the problem is to require that the working of the instruments has been independently tested and confirmed. Parker (2014) argues that this is indeed the case for many relevant observations.
The controversy about satellite measurements of global mean temperature trends illustrates the problems with theory-ladenness of observations (Lloyd 2012). Models predicted that the tropical troposphere will warm faster than the surface of the Earth as the enhanced greenhouse effect takes hold. However, data from satellite measurements indicated no tropospheric warming. This discrepancy triggered a controversy over the reality of global warming and even led to an investigation by a National Academy of Sciences panel. Satellites collected microwave raw data, which were converted into temperatures by complicated algorithms. Radiosonde data were used to validate the satellite-derived temperature trends. The radiosonde measurements, however, were then found to be unsuited to produce long-term trends and hence provided an unsuitable source to validate satellite data. In the end, Lloyd concludes, it now appears that the models were mostly right and the early data were mostly wrong, and therein lies an interesting story about data and their relations to scientists, models, and reality (ibid., 391).
This episode illustrates the level of complexity which the problem of theory-ladenness reaches in the climate case.
Second, as mentioned above, raw data in climate science often contain errors or are incomplete. Hence, models are applied to filter and to correct the raw data and to extend the raw data to global data sets. Edwards (1999; speaks in this context about model-filtered data and a symbiotic relationship between data and models. It may be that the use of models to correct and extend data in climate science is more widespread than in other sciences. Yet, as Edwards (1999) and Norton and Suppe (2001) acknowledge, model-filtered data are not specific to climate science and they equally occur in other fields such as biology and physics. In effect, modelfiltering is a special case of theory-ladenness and can be dealt with by applying the same maxim: model-filtered data can be trusted as long as the models used to correct and extend the data have been independently tested and are confirmed. The important clause here is 'independently': the models that are used to filter and correct the data have to be confirmed by other data. In other words, if the models were just tested by the data that they are supposed to be correcting and filtering, there would be a confirmatory circle, and whether such model-filtered data could be trusted is doubtful.
Third, often no direct measurements, say of surface temperature changes several thousand years ago, are available. For this reasons, scientists gather proxy data of surface temperature changes which are derived from natural recorders such as ocean sediments and tree rings. The quality of proxy data depends on two factors. First, the availability of the proxy itselffor example, ancient tree rings, mud cores and ice cores can be collected only in a small number of locations. Second, the reliability of statistical methods used to process raw data and turn them into the data one is interested in (e.g. to turn a tree ring growth pattern into a temperature record) must also be assessed, and in some cases, there is little data available for the calibration of these relationships.
Both factors have given raise to heated debate, for instance in the so-called hockey stick controversy. From the 1990s onwards, proxy indicators were used to arrive at a quantitative estimate of the Northern Hemisphere temperature record of the past 1000-1400 years. These graphs took the shape of a hockey stick and indicated that recent warming is exceptional: they were relatively f lat until 1900 as for a hockey stick's shaft and followed by a sharp increase in the 20th century as for a hockey stick's blade. The methods used to arrive at these temperature reconstructions were disputed by politicians, policy makers and some scientists. Referring to the hockey stick graph and casting doubt on the methods used to create it, Republican Jim Inhofe in a Senate speech claimed: 'could it be that man-made global warming is the greatest hoax ever perpetrated on the American people? It sure sounds like it.' (Inhofe 2003) Nowadays there are more than two dozen reconstructions of the temperature record using various statistical methods and proxy records. They lead to the broad consensus that temperatures during the late 20th century are likely to have been the warmest in the past 1400 years (Frank et al. 2010). 3 It is nevertheless interesting to have a closer look at the arguments used in the debate. First, there is still considerable uncertainty about some details of the temperature record, due to the lack of direct measurement and the uncertainty in calibrating proxy data. 'Sceptics' 4 use these uncertainties to argue that the overall shape of the graph cannot be trusted. However, research is ongoing on the temperature reconstruction methods, directly incorporating uncertainty assessments into the calculations. Consequently, the hockey stick itself is now presented as a range rather than a single time series, showing the results of multiple studies using different lines of evidence (Frank et al. 2010, IPCC 2013. Although there is a small probability that warmer periods have occurred, the balance of available evidence weighs against this. A second tactic used by the 'sceptics' is to argue that the statistical methods used in the reconstruction would produce the hockey stick shape from almost any data (e.g. McIntyre and McKitrick 2003). This argument, however, did not stand up to statistical scrutiny (cf. Frank et al. 2010). Furthermore, recent studies have added to the range of evidence about temperature reconstructions using other statistical methods.
Finally, it should also be mentioned that data in climate science are extensively used in the construction of models: models in climate science contain many observationally derived approximations and heuristics. In particular, parameterizations in climate models represent processes that cannot be explicitly resolved at the spatial or temporal resolution of the model and are thus replaced by simplified processes which are data-driven and usually in part also physically motivated. Edwards (1999) speaks in this context about data-laden models. A concrete example is the aerosol forcing (the parameter that describes the cooling of the Earth arising from a certain concentration of aerosols). In many climate models, the aerosol forcing is an unknown free parameter, and hence data about the past temperature changes have been used to constrain and estimate it. This data-ladenness is widely acknowledged (Edwards 1999;Norton and Suppe 2001). The interesting question with data-ladenness is whether data that are used in the construction of models can also confirm the very same model or whether this is ruled out because it would amount to a confirmatory circle. We return to this question in Section 3 of Part II.

Detection of Climate Change
Do rising temperatures indicate that there is climate change, and if so, can the change be attributed to human action? These two problems are known as the problems of detection and attribution. Intuitively, detection of climate change is the process of determining that some significant change has occurred in the observed variables of the climate system without providing a reason for that change. A typical detection statement can be found in Working Group 1 Summary for Policymakers: 'The globally averaged combined land and ocean surface temperature data as calculated by a linear trend, show a warming of 0.85°C, over the period 1880 to 2012' (IPCC 2013, 5).
Turning an intuitive characterisation of detection into a workable definition turns out to be a task saddled with difficulties, many of which are closely related to the discussion above regarding the definition of what we mean by climate. The Intergovernmental Panel on Climate Change (IPCC) defines these as follows: "Detection of change is defined as the process of demonstrating that climate or a system affected by climate has changed in some defined statistical sense without providing a reason for that change. An identified change is detected in observations if its likelihood of occurrence by chance due to internal variability alone is determined to be small" […]. Attribution is defined as "the process of evaluating the relative contributions of multiple causal factors to a change or event with an assignment of statistical confidence". (IPCC 2013, 872) These definitions raise a host of issues. The root cause of the difficulties is the clause that climate change has been detected only if an observed change in the climate is unlikely to be due to internal variability. Internal variability is the phenomenon that climate variables such as temperature and precipitation would change over time due to the internal dynamics of the climate system even in the absence of climate change: there have been (and will) be hotter and colder years irrespective of human action; indeed irrespective of the existence of humans.
Taken at face value, this definition of detection has the consequence that there is no internal climate change. The ice ages, for instance, would not count as climate change if they occurred because of internal variability. This is at odds with basic intuitions about climate and with the most common definitions of climate as a finite distribution over a relatively short time period (where internal climate change is possible). Furthermore, it seems to blur the boundary between detection and attribution: if detected, climate change is ipso facto change not due to internal variability, all factors pertaining to the internal climate dynamics are a priori excluded from being drivers of climate change. Not only does this seem counterintuitive, it also seems to prejudge answers in a less than helpful way.
A solution to this problem points to the distinction between internal variability and natural variability. The onset of ice ages, for instance, is commonly attributed to orbital changes of the earth. These changes, however, are not internal variations; they are natural variations. This move solves the above issues, but it does so at the cost of incurring a new problem: where should one draw the line between natural and internal factors? In fact, there does not seem to be a generally accepted way to draw a line, and the same factor is sometimes classified as internal and sometimes as external. Glaciation processes, for instance, are sometimes treated as internal factors and sometimes as prescribed external factors. Likewise, sometimes the biosphere is treated as an external factor, but sometimes it is considered as part of the internal dynamics of the system. One could even go so far as to ask whether human activity is an external forcing on the climate system or an internally generated earth system process. Research studies usually treat human activity as an external forcing, but it could consistently be argued that human activities are an internal dynamical process. The appropriate definition simply depends on the question of interest.
Even if definitional questions are resolved, estimating the effect of internal variability is a difficult problem. The effects of internal variability are present on all time scales, from the sub-daily f luctuations experienced as weather to the long-term changes due to cycles of glaciation. Since internal variability results from the dynamics of a highly complex nonlinear system, it is unlikely that the statistical properties of internal variability are constant over time. So ideally, a study of internal variability would be based on thousands of years of detailed observations. Unfortunately such observations are not available, and so scientists turn to climate models to estimate the magnitude of the variability ( for discussion of climate models see Part II).
Before we can understand the role of climate models in detection studies, a comment about the nature of these studies is in order. Detection studies rely on statistical tests, and the results of such studies are often phrased in terms of the likelihood of a certain event or sequence of events happening in the absence of climate change. In practice, the challenge is to define an appropriate null hypothesis (the expected behaviour of the system in the absence of changing external inf luences), against which the observed outcomes can be tested. Because the climate system is a dynamical system with processes and feedbacks operating on all scales, this is a nontrivial exercise. An indication of the importance of the null hypothesis is given by the results of Cohn and Lins (2005), who compare the same data against alternate null hypotheses, with results differing by 25 orders of magnitude of significance! This does not in itself show that a particular null hypothesis is more appropriate than the others nor does it show that they are all on par; but it demonstrates the sensitivity of the result to the null hypothesis chosen.
In practice, the best available null hypothesis is often formulated on the basis of the best available model of the behaviour of the climate system, including internal variability, which for most climate variables usually means a state-of-the-art global climate model (GCM). This model is then used to perform long control runs with constant forcings in order to quantify the internal variability of the model. Climate change is then said to have been detected if the measured values fall outside a predefined range of the internal variability of the model. Hence, estimates of internal variability in the climate system are produced from climate models themselves (Hegerl et al. 2010).
The difficulty with this method is that there is no single 'best' model to choose: many such models exist, and state-of-the-art climate models run with constant forcing show significant disagreements both on the magnitude of internal variability and the time scale of variations. 5 This underscores the difficulties in making detection statements based on the above definition, which recognises an observed change as climate change only if it is unlikely to be due to internal variability.
This raises the question of how important this choice is. The differences between different models are relatively unimportant for the clearest detection results such as recent increases in global mean temperature. As stressed by Parker (2010), detection is robust across different models ( for a discussion of robustness, see Section 6 of Part II). Moreover, there is a variety of different pieces of evidence all pointing to the conclusion that the global mean temperature has increased. However, the issues of which null hypothesis to use and how to quantify internal variability are usually much more important for the detection of subtler local climate change.
Common counter-arguments by 'sceptics' tend to reject the use of computational models as a null hypothesis and return to approaches based on trend-fitting and statistical analysis of residuals, focusing on the hiatus rather than on the acknowledged upward trend of the 20th century (e.g. McKitrick 2014). Even taking this extreme view about the relative capabilities of statistical models, it has been shown that detection of climate change is also robust to changes in trend-fit model specifications (e.g. Rybski et al. 2006;Imbers et al. 2014). 6

Attribution of Climate Change
Once climate change has been detected, the question of attribution arises. This might be an attribution of any particular change (either a direct climatic change such as increased global mean temperature or an impact such as the area burnt by forest fires) to any identified cause or multiple causal factors (such as increased CO 2 in the atmosphere, volcanic eruptions or human population density), with an assignment of statistical confidence. Where an indirect impact is considered, a two-step (or even multi-step) approach may be appropriate, first attributing an intermediate change to some forcing agent and then an indirect impact to the intermediate change. An example of this, taken from the IPCC Good Practice Guidance paper (Hegerl et al. 2010), is the attribution of coral reef calcification impacts to rising CO 2 levels, in which an intermediate stage is used by first attributing changes in the carbonate ion concentration to rising CO 2 levels, then attributing calcification processes to changes in the carbonate ion concentration. This also illustrates the need for a clear understanding of the physical mechanisms involved, in order to perform a reliable multi-step attribution in the presence of many potential confounding factors.
Statistical analysis quantifies the strength of relationships, given the simplifying assumptions of the attribution framework, but the level of confidence in the simplifying assumptions must be assessed outside that framework. This level of confidence is standardised by the IPCC into discrete (though subjective) categories: 'virtually certain' (>99%), 'extremely likely' (>95%), 'very likely' (>90%), 'likely' (>66%), etc., which aim to take account of the process knowledge, data limitations, adequacy of models used and the presence of potential confounding factors (see Section 5 of Part 2 for short discussion of the IPCC's uncertainty framework). The conclusion that is reached will then have a form similar to the IPCC's headline attribution statement: It is extremely likely [>95% probability] that more than half of the observed increase in global average surface temperature from 1951 to 2010 was caused by the anthropogenic increase in greenhouse gas concentrations and other anthropogenic forcings together. (IPCC 2013, 17).
One method to reach such results is optimal fingerprinting. The method seeks to define a spatio-temporal pattern of change ( fingerprint) associated with each potential driver (such as the effect of greenhouse gases or of changes in solar radiation), normalised relative to the internal variability, and then perform a statistical regression of observed data with respect to linear combinations of these patterns. The residual variability after observations have been attributed to each factor should then be consistent with the internal variability; if not, this suggests that an important source of variability remains unaccounted for. Parker (2010) argues that climate change fingerprint studies are similar to Steel's (2008) streamlined comparative process tracing. That is, in fingerprint studies computer simulation models are used to quantify and characterise the expected effects of mechanisms ( for instance, the greenhouse gas effect or the cooling of aerosols when they scatter radiation). These effects serve then as fingerprints that are used to test claims about the causes of climate change.
As emphasised by Parker (2010), fingerprint studies rely on several assumptions. The first one is linearity, i.e., that the response of the climate system when several forcing factors are present is equal to a linear combination of the factors. Because the climate system is nonlinear, this is clearly a source of methodological difficulty, although for global-scale responses (in contrast to regionalscale responses) additivity has been shown to be a good approximation. Another assumption is that climate models simulate the causal processes that are at work in Earth's climate system accurately enough. As Parker argues, the very success of fingerprint studies (which is nontrivial) can help putting aside worries about this assumption. A further problem is, once more, the need to define internal variability characteristics (see also discussions in IPCC (2013, §10.2.3)).
Levels of confidence in these attribution statements are primarily dependent on physical understanding of the processes involved. Where there is a clear, simple, well-understood mechanism, there should be greater confidence in the statistical result; where the mechanisms are loose, multi-factored or multi-step, or where a complex model is used as an intermediary, confidence is correspondingly lower. The Guidance Paper cautions that, Where models are used in attribution, a model's ability to properly represent the relevant causal link should be assessed. This should include an assessment of model biases and the model's ability to capture the relevant processes and scales of interest. (Hegerl et al. 2010, 5) As Parker (2010) argues, there is also higher confidence in attribution results when the results are robust and there is a variety of evidence. For instance, the finding that late 20th century temperature increase was mainly caused by greenhouse gas forcing is found to be robust given a wide range of different models, different analysis techniques and different forcings, and there is a variety of evidence all supporting this claim. Thus, our confidence that greenhouse gases explain global warming is high. (For further useful extended discussion of detection and attribution methods in climate science, see pages 872-878 of IPCC (2013), and in the Good Practice Guidance paper by Hegerl et al. (2010).) In the interpretation of attribution results, in particular those framed as a question of whether human activity has inf luenced a particular climatic change or event, there is a tendency to focus on whether the confidence interval of the estimated anthropogenic effect crosses zero rather than looking at the best estimate. The absence of such a crossing indicates that change is unlikely to be due to non-human factors. This results in conservative attribution statements, but it ref lects the focus of the present debate where, in the eyes of the public and media, 'attribution' is often understood as confidence in ruling out non-human factors, rather than as giving a best estimate or relative contributions of different factors. This contrasts with interest in the scientific community, where researchers do attempt to quantify the anthropogenic contribution. In section D3 of the summary for policy makers of IPCC (2013), for instance, it is stated that it is extremely likely that more than half of the observed increase in global average surface temperature from 1951 to 2010 was caused by the anthropogenic increase in greenhouse gas concentrations and other anthropogenic forcings together.
So it is primarily the media and sceptics who want to focus on ruling out other causes rather than coming to a best estimate.
There is an interesting question concerning the status of attribution methodologies like fingerprinting. The greenhouse effect is well-documented and indeed easily observable in laboratory experiments. This, some argue, provides a good qualitative physical understanding of the climate system, which is enough to say with confidence that global warming is real and that anthropogenic CO 2 has been identified as a cause of the increase in global mean temperature (e.g. Betz 2013). This would render statistical debates about attribution methodologies second-order in terms of the key finding that anthropogenic CO 2 emissions cause global warming. However, this line of argument is not universally accepted, and many would insist that fingerprinting is crucial in attributing an increase in global mean temperature to anthropogenic CO 2 emissions. In IPCC (2013) fingerprinting methods provide important support for attribution of an increase in global mean temperature to anthropogenic CO2 emission (see the extended discussion in Chapter 10).

Conclusion
This paper reviewed issues and question relating to observation in climate science. In particular, the topics of different definitions of climate and climate change, the nature of climate data sets and detection and attribution of climate change were discussed. The discussion was from a philosophy of science perspective. Much more could be said from other viewpoints, e.g. from those of history, science studies, or sociology of science. For want of space, we have not been able to review contributions from these perspectives. For an interesting historical perspective, the reader might consult Edwards (2010). and serves on a number of editorial and advisory boards. Professor Frigg is a member of the European Society for the Philosophy of Science, the Philosophy of Science Association, and the British Society for the Philosophy of Science.
Erica Thompson was born in Scotland. She has a PhD in physics from Imperial College, London (2013), and an MMath from the University of Cambridge (2007). She is presently a Research Officer in the Centre for the Analysis of Time Series within the Department of Statistics at the London School of Economics and Political Science. She has previously worked for the Grantham Institute for Climate Change at Imperial College and the UK Energy Research Centre. Her current research interests include quantification of scientific uncertainty, the use and utility of climate information for real-world decision-making, and the interpretation and communication of climate model output. Dr Thompson is or has previously been a member of the Royal Meteorological Society, the American Geophysical Union, Scientists for Global Responsibility, and the British Beekeepers' Association.
Charlotte Werndl was born in Salzburg. She completed a PhD in Philosophy at the University of Cambridge in 2010 and master's degrees both in mathematics and in philosophy at the University of Salzburg in 2006. She is Professor of Logic and Philosophy of Science at the Department of Philosophy at the University of Salzburg, Austria, a Visiting Professor at the Centre for Philosophy of Natural and Social Science (CPNSS) at the London School of Economics, and an affiliate of the Grantham Research Institute on Climate Change and the Environment at the London School of Economics. Previously, she was an Associate Professor at the Department of Philosophy, Logic and Scientific Method at the London School of Economics and before that a research fellow at the University of Oxford. She is an editor of the Review of Symbolic Logic and an associate editor of the European Journal for the Philosophy of Science and serves on a number of editorial and advisory boards. She has published papers on climate change, statistical mechanics, mathematical knowledge, chaos theory, predictability, confirmation, evidence, determinism, indeterminism, observational equivalence and underdetermination. Her current work focuses on the philosophy of climate science, evidence and the philosophy of statistics and the foundation of statistical mechanics. Professor Werndl is a member of the Aristotelian Society, the British Society for the Philosophy of Science, the European Society for the Philosophy of Science, and the Philosophy of Science Association.

Notes
* Correspondence: Department of Philosophy, University of Salzburg, Austria, and Department of Logic and Philosophy Logic and Scientific Method, London School of Economics, UK. Email: charlotte.werndl@sbg.ac.at