Corresponding author: H. Q. Wen, State Key Laboratory of Numerical Modeling for Atmospheric Sciences and Geophysical Fluid Dynamics, Institute of Atmospheric Physics, Chinese Academy of Sciences, Beijing, 100029, China. (firstname.lastname@example.org)
 This study compares observed and model-simulated spatiotemporal patterns of changes in Chinese extreme temperatures during 1961–2007 using an optimal detection method. Four extreme indices, namely annual maximum daily maximum (TXx) and daily minimum (TNx) temperatures and annual minimum daily maximum (TXn) and daily minimum (TNn) temperatures, are studied. Model simulations are conducted with the CanESM2, which include six 5-member ensembles under different historical forcings, i.e., four individual external forcings (greenhouse gases, anthropogenic aerosol, land use change, and solar irradiance), combined effect of natural forcings (solar irradiance and volcanic activity), and combined effect of all external forcings (both natural and anthropogenic forcings). We find that anthropogenic influence is clearly detectable in extreme temperatures over China. Additionally, anthropogenic forcing can also be separated from natural forcing in two-signal analyses. The influence of natural forcings cannot be detected in any analysis. Moreover, there are indications that the effects of greenhouse gases and/or land use change may be separated from other anthropogenic forcings in warm extremes TXx and TNx in joint two-signal analyses. These results suggest that further investigations of roles of individual anthropogenic forcing are justified, particularly in studies of extremely warm temperatures over China.
 Human influence has been detected in extreme temperatures during the second half of the 20th century at the global, continental, and subcontinental scales [Christidis et al., 2005, 2011; Jones et al., 2008; Stott et al., 2011; Morak et al., 2011, 2012; Zwiers et al., 2011]. Christidis et al.  made the first attempt to attribute changes in temperature extremes to external forcings based on HadCM3 simulations. They detected human influence in the changes of warm nights as well as cold days and nights. Using a sophisticated measure of extremely warm days, Christidis et al.  detected human influence on warm days. They showed that at the global scale the influence of anthropogenic forcings on extremely warm temperatures may be separated from that of natural forcings. Using an approach that is based on extreme value theory, Zwiers et al.  found some early evidence of detectable human influence on both warm and cold temperature extremes during the period 1961–2000 in most subcontinental regions. However, separation of influence of anthropogenic forcing from that of natural forcing and separation of the effect of greenhouse gases from that of other anthropogenic forcings have not been attempted at continental or smaller scales.
 Regarding China, many studies have analyzed the changes in extreme temperatures in observational records that typically start in the 1950s/1960s [Ma and Fu, 2003; Gong et al., 2004; Yang et al., 2008]. They have reported an increased frequency of warm extreme events as well as a decrease of cold extremes. However, detecting and attributing influence from external forcing in extreme temperatures over China is limited [e.g., Zwiers et al., 2011; Christidis et al., 2012]. More work is needed to understand whether anthropogenic influence, especially human emissions of greenhouse gases, has been the main contributor to the observed changes in extreme temperatures over China.
 Here, we use a newly compiled and quality-controlled extensive Chinese daily temperature data set and ensembles of model simulations under different forcings, conducted with the second-generation Canadian Earth System Model (CanESM2) [Arora et al., 2011], to investigate possible causes of the observed changes in extreme temperatures. The remaining of the paper is structured as follows: We describe observed and simulated data in section 2. Methods and data processing are detailed in section 3, followed by the main results in section 4. We conclude the paper with discussions in section 5.
 China's National Climate Center has recently compiled and quality controlled an extensive daily temperature data set [Wu and Gao, 2012]. Records of daily maximum, daily minimum, and daily mean temperatures were collected from 2416 observation stations from 1961 to 2007. The station coverage is quite dense in eastern China, but it is relatively sparse in western China especially over the Tibetan Plateau (see Figure S1 in the Supporting Information). The station data have been gridded into a 0.5° × 0.5° latitude-longitude grid using a two-step procedure (for more details, see Supporting Information). Following Zwiers et al. , we extract the annual maxima of daily maximum (TXx) and daily minimum (TNx) temperatures and annual minima of daily maximum (TXn) and daily minimum (TNn) temperatures for every grid. TXx and TNx typically occur as the highest afternoon and nighttime temperatures in summer, respectively. They have significant impacts on human health and have been used to construct indices for heat wave intensity [Karl and Knight, 1997; Meehl et al., 2007; Fischer and Schär, 2010; Kuglitsch et al., 2010]. They are referred to as warm extremes. TNn and TXn typically occur as the lowest nighttime and daytime temperatures in winter, respectively, and will be referred to as cold extremes.
 Linear trends in annual series of TXx, TNx, TXn, and TNn during the period 1961–2007 have been estimated for each 0.5° × 0.5° grid point by using ordinary least squares regression (Figure 1). Increasing trends appear almost everywhere in China in both the warm and cold extremes, which are consistent with earlier findings [e.g., Yan et al., 2002; Meehl et al., 2007; Alexander and Arblaster, 2009]. In addition, increasing trends in cold extremes are much stronger and more uniform across the space. Cooling trends are observed in TXx in some areas of central China region. As a result, the averaged warming trend in TXx for central China becomes weak (see Figure S2).
2.2 Model Simulations
 We use simulations conducted with the earth system model of the Canadian Centre for Climate Modelling and Analysis (CanESM2) to estimate climate responses (or signals) to external forcings and to estimate internal variability of the climate system. The atmospheric component of the CanESM2 is a spectral model employing T63 triangular truncation with physical processes calculated on a 128 × 64 (~2.81°) horizontal linear grid. It has a climate sensitivity of 3.7 K, and its climate transient response is 2.4 K (N. Gillett, 2013, personal communication). The CanESM2 contains land carbon cycle components. Land use change is interactively modeled on the basis of changes in land cover [Arora and Boer, 2010]. As a part of Canadian contributions to the Coupled Model Intercomparison Project Phase 5 (CMIP5) [Taylor et al., 2012], this model has produced six ensembles for 1850–2012 forced with (1) historical forcing that includes both anthropogenic and natural external forcings (ALL), (2) natural forcings that represent changes in solar irradiance and volcanic activity (NAT), and individual forcing such as (3) greenhouse gases (GHG), (4) anthropogenic aerosol (AA), (5) land use change (LU), and (6) solar radiation (SL). Each ensemble has five member runs. Detailed information on forcing data can be found at the CMIP5 website (http://cmip-pcmdi.llnl.gov/cmip5/forcing.html). The model also produced 1096 year preindustrial control simulation. Values of annual extreme temperatures are extracted from the daily output of these simulations.
 Model-simulated responses (or signals) are represented by the ensemble mean of each forcing run. The signal for the combined effect of anthropogenic forcings (ANT) is estimated as the sum of relevant individual anthropogenic forcing run (see Supporting Information for details). Figure 2 shows the annual time series of extreme temperatures averaged across China from observations and model simulations, presented as anomalies relative to the 1961–1990 climatology. Visual inspection suggests a good match in the long-term changes of temperature extremes between observations and ALL, but not between observations and NAT. Estimated long-term trends for observations and model simulations are presented in Figure S8.
3 Methods and Data Processing
3.1 Detection Method
 A typical detection and attribution analysis uses an optimal fingerprint method based on generalized linear regression [e.g., Allen and Tett, 1999; Allen and Stott, 2003] (a brief introduction given in the Supporting Information). A difficulty arises when applying this optimal detection method to extreme values as this method assumes that the residual term follows a Gaussian distribution. One way to mitigate this problem is to fit a generalized extreme value (GEV) distribution combined with a regression model on the location parameter [Christidis et al., 2011]. Another method is to fit GEV distributions to extreme values at individual grid boxes with a signal as a covariate of the location parameter [Zwiers et al., 2011]. Here, we consider aggregated extreme temperature indices over seven large regions in China. Since finite-range dependence can be reasonably assumed for the spatiotemporal correlation structure of extreme temperatures, a central limit theorem holds [Lehmann, 1999, pp. 106–118]. As a result, the averages of extreme values over a large number of locations will asymptotically follow a Gaussian distribution (see Supporting Information for details). This makes it possible to apply the optimal detection methods on the regionally averaged series.
 We conduct one-signal analysis involving only one signal vector at a time. A detection of the signal would give an indication of the presence of response to a particular forcing (or combined forcings) in the observations. As the observations may be influenced by multiple forcing factors, regression models with multiple predictors would provide better fit. Therefore, we also conduct two-signal analysis in which X has two signal vectors. A two-signal analysis may allow the separation of responses to different forcings; thereby, clearer attribution to individual forcing can be obtained.
3.2 Data Processing
 The detection analysis is conducted based on temperature evolution over both space and time. For both observations and model simulations, we compute regional time series of temperature anomalies for each of the extreme temperature indices over seven regions in China using area weighting. These regions were defined according to administrative boundaries and societal and geographical conditions and were used in China's National Assessment Report on Climate Change . Details of the regional geographical boundaries, marked as red rectangles in Figure S1, are given in the Supporting Information. Each region has its own weight represented by the fractional area of the region in China.
 Detection and attribution analysis usually needs to be conducted in a reduced space due to lack of sample data for the estimation of noise covariance, or the estimated noise covariance matrix is not full rank in the first place. We use two approaches to arrive at dimension reduction. (1) We use multiyear nonoverlapping mean to reduce time dimension. (2) We further reduce the dimension by projecting the space-time series onto leading empirical orthogonal functions (EOFs) of the model-simulated natural variability. The use of shorter time averages may give better chance to detect climate response to short-time period forcing such as volcanic activities. However, the use of shorter time averages also means the need to estimate larger covariance matrix that would contain larger estimation error with the limited data sample. We conduct our analysis on 1, 2, 5, and 10 year nonoverlapping mean series. We found that the detection results are not sensitive to the use of time averaging, though the use of 10 year mean series does result in a more robust result in general. For this reason, we will report our results based on the analysis of 10 year mean series. The number of EOFs retained is determined based on residual consistency check [Allen and Tett, 1999].
 Two independent estimates of covariance of internal variability needed for the optimization and testing of the scaling factors are estimated using the 1096 year preindustrial simulation and intraensemble differences from the six 5-member ensembles. In total, 56 samples are used for optimization, with additional 56 for testing (see Supporting Information).
4.1 One-Signal Analyses
 Figure 3 displays the scaling factors and their 90% confidence intervals for ALL, ANT, and GHG for the four indices. These are based on 20 EOFs for warm extremes (TXx and TNx) and 15 EOFs for cold extremes (TXn and TNn) determined from residual consistency tests (see Figure S3 in the Supporting Information). Model-simulated variability is quite consistent with the residual for warm extremes. On the other hand, the model simulated smaller variability in cold extremes, which is comparable with the residual only when the first few EOFs are retained. However, when the model-simulated variability is doubled, the simulated variability would then be comparable with the residual even when larger number of EOFs is retained. Morak et al.  also found smaller variability in model-simulated cold extremes in Asia. They suspected that this might be related to more complex changes in regional circulation and/or forcings. Therefore, though cold extremes have more increases during the past 47 years, due to the larger internal variability in wintertime temperatures, the signal-to-noise ratio for cold extremes (details in the Supporting Information) is smaller than that for warm extremes.
 Overall, detection results are not sensitive to the number of EOFs retained (see Figure S4 in the Supporting Information). Both ALL and ANT are robustly detected in every type of the four extreme indices across a wide range of EOF truncations. The magnitude and the confidence intervals of the scaling factors for ALL and ANT are very comparable. NAT is not detected in the one-signal analysis, even when the time averaging is done on 2 or 5 years (not shown) for which volcanic signal is less smoothed out.
 It appears that the model may overestimate response in TXx as the scaling factors are much smaller than 1 and may underestimate changes in TNn as the scaling factors are larger than 1 in general. This is also the case in other studies [e.g., Zwiers et al., 2011; Morak et al., 2012]. GHG is detected in TNx with scaling factors similar to ALL and ANT and in TNn with a much larger scaling factor. The scaling factor of GHG for TNn is larger than those of ANT and ALL, suggesting that the model-simulated increase in TNn due to GHG may be smaller than that due to ANT or ALL. In order to understand this, we estimated linear trends in the TNn signal. It appears that CanESM2 simulated larger trends in ANT and ALL than in GHG over China during the years 1961–2007. However, TNn trends computed from a 47 year period within the 163 year historical simulation would in general be larger in GHG than in ANT or ALL. Therefore, the smaller trend in GHG simulation during the years 1961–2007 may be a reflection of uncertainty in GHG signal estimation in cold extremes. We expect much reduced uncertainty when multimodel ensembles are used in a future study.
 Changes in land cover can have substantial impacts on extreme temperatures [Avila et al., 2012] due to land-atmosphere interactions [Seneviratne et al., 2006]. Christidis et al.  showed that land use changes may have a cooling effect on temperature extremes at global scale, especially on extremely warm days. Here, we also found a detectable effect of LU in TXx (not shown), implying that impact of land use changes on extremely warm days might be detectable even at regional scale. However, the signal pattern of LU over China as simulated by CanESM2 is a weak warming (see Figure S2), which is in contrast to the results of Christidis et al. . Further analysis with more detailed regional land use change data and multimodel simulations would be required to improve our understanding of LU's influence on extremely warm days in China.
4.2 Two-Signal Analyses
 Figure 4 displays 90% confidence regions and marginal confidence intervals for the two-signal detection analysis using ANT and NAT for the 10 year nonoverlapping mean of warm and cold extremes. The origin (0, 0) is outside the 90% confidence regions, suggesting that ANT and NAT are jointly detected at the 90% confidence level in all four types of extreme temperatures. The marginal 90% confidence intervals for ANT are all above 0, but the marginal 90% confidence intervals for NAT all include 0. These suggest that ANT is clearly detected in all temperature extremes and that NAT is not detected in any of them. They also indicate that effects of ANT on extreme temperatures can be separated from those of NAT. The magnitude of scaling factors and their 90% marginal confidence intervals for ANT are also comparable to those from one-signal analyses. These two-signal analysis results are also very robust to different EOF truncations (Figure S5 in the Supporting Information). Note that two-signal joint analyses on 2 or 5 year nonoverlapping means do not detect NAT signal either. The facts that ALL and ANT are clearly detectable but NAT is clearly not detectable in one-signal analyses and that ANT and NAT can be jointly detected and the effects of ANT can be separated from those of NAT in two-signal analyses suggest that only anthropogenic forcing can explain observed changes in China's extreme temperatures from 1961 to 2007. Furthermore, if we quantify the ANT contribution to the observed 47 year trends in extreme temperatures using model-simulated trends scaled by the scaling factor obtained from ANT-NAT two-signal analysis, we can estimate contributions of ANT to TXx, TNx, TXn, and TNn by 0.92°C (88%), 1.70°C (91%), 2.83°C (99%), and 4.44°C (92%), respectively, subject to some uncertainty inheriting from the estimated scaling factors.
 Two-signal analyses conducted on combination of individual anthropogenic forcing suggest that the effect of GHG may be separated from those of other anthropogenic forcings, especially for warm extremes. Analysis for TXx conducted with GHG against ANT-GHG (Figure S6a) or LU against ANT-LU (Figure S6b) suggests that the effect of GHG may be separated from the combined effect of anthropogenic aerosol and land use change and that the effect of land use change may be separated from the combined effect of GHG and anthropogenic aerosol. Two-signal analysis for AA and GHG indicates that the effect of GHG may be separated from that of AA, though the separate detection of AA from GHG is not very robust. Two-signal analysis for GHG and LU failed to separate the GHG contribution from that of LU, suggesting that simulated responses to GHG and LU may be highly correlated. This can also be seen in simulated extreme temperature trends (Figure S2).
 The influence of AA on TXx deserves some discussion. We failed to detect AA. However, the influence of AA at the regional and local scales may not be negligible. Observation indicates a small cooling in TXx in parts of central China region (Figure 1). This is in accord to a rapid development over the region since the 1980s when increasing AA emission occurred. However, the model simulated positive TXx anomalies under AA forcing for the decades 1980s and 1990s (Figure S7). It is unclear if this positive TXx anomaly is a reflection of multidecadal variability of the simulations or if the global AA forcing field used in the simulation does not fully reflect China's AA emission with sufficient local/regional details. Two-signal analyses on TNx, TNn, and TXn all suggest that GHG can be separated from other forcings especially from AA (not shown). However, results for cold extremes TNn and TXn are less robust.
5 Conclusions and Discussion
 We used optimal detection method to compare the observed China annual extreme temperatures for 1961–2007 with those simulated by the CanESM2 under different external forcings. Our analyses include one-signal analysis using climate responses to ALL, NAT, ANT, and individual anthropogenic forcing, and two-signal analyses using various combinations of responses to different forcings. We found evidence of human influence on China's extreme temperatures including the warmest daytime and nighttime temperatures and the coldest daytime and nighttime temperatures. We also found that the influence of anthropogenic forcing can be separately detected from that of natural forcings. These clearly indicate that among known external forcings, only anthropogenic influence can explain observed changes in China's extreme temperatures. Furthermore, there are indications that the influence of greenhouse gases may be separated from that of other anthropogenic forcings and may be the dominant contributor to the observed increase in extreme temperatures in China. For warm extremes, GHG may have contributed approximately 89% to the observed warming in the warmest day temperature and 95% to the observed warming in the warmest night temperature, with the estimated 90% confidence intervals being (22%, 155%), and (72%, 119%), respectively. Land use change may have also contributed to the warmest day temperature increase. Future work that involves multimodel ensembles is needed to verify these first indications on singling out effects of individual anthropogenic forcing, especially on warm extremes. Our analyses have significantly advanced the regional analysis of Zwiers et al.  for China and greatly improved our confidence in understanding causes of observed changes in extreme temperature in this country.
 We thank Gabi Hegerl and Nikolaos Christidis for their insightful comments and an anonymous reviewer for his helpful comments. Q.H.W. is supported by the National Basic Research Program of China (2012CB955303). This work benefited from a collaborative project under the JWG-XII between the Meteorological Service Canada and the China Meteorological Administration.
 Noah S. Diffenbaugh thanks Nikolaos Christidis and an anonymous reviewer for their assistance in evaluating this paper.