A one-year (1998) experimental Arctic reanalysis was produced using an experimental Arctic reanalysis system (EARS), which was based on the MM5 model and 3DVAR data assimilation, implemented in combination with an intermittent nudging scheme. TOVS retrieval data and conventional surface observations and upper-air sounding data are assimilated by EARS, which is driven by the ERA-40 reanalysis. The domain covers a pan-Arctic region at a horizontal resolution of 30 km. The EARS reanalysis results, as well as ERA-40 and NCEP/NCAR reanalyses (NNRP), are verified against station observations. Comparisons show that the ERA-40 analysis is significantly better than NNRP for the metrics of root-mean-square error and bias. The EARS performed significantly better than both ERA-40 and NNRP at lower levels; it produced especially good results for surface wind and upper-air humidity. For the surface temperature, dew point, relative humidity, sea level pressure, as well as upper-air variables, the yearly average of the EARS results lie in between those of the ERA-40 and NNRP, closer to those of ERA-40.
 The Arctic region has been a focal point in global climate change studies, many of which have revealed evidence that it plays an important role in global climate change [e.g., Arctic Climate Impact Assessment, 2005; Serreze et al., 1997, 2000]. The nature of insufficient conventional observation data has also hindered the progress of understanding about the Arctic.
 Global reanalysis projects, such as the European Centre for Medium-Range Weather Forecasts (ECMWF) 40-year reanalysis project (ERA-40) [Uppala et al., 2005] and the National Centers for Environmental Prediction (NCEP)/National Center for Atmospheric Research (NCAR) reanalysis project (NNRP) [Kalnay et al., 1996], have provided systematic, spatially and temporally continuous datasets covering the Arctic which have enabled further progress in Arctic study. However, evaluations of these global reanalyses have revealed that they continue to erroneously depict some features found in observations. The error is more severe in the Arctic region where extreme conditions exist, e.g., low temperatures and less solar input, as well as the presence of fewer observations. The coarse resolution of the existing global reanalyses also causes coarse sampling of Arctic climate and degraded accuracy. A high-resolution reanalysis for Arctic climate study is needed and an initiative project has been started by the National Oceanic and Atmospheric Administration (NOAA) to address this. An experimental Arctic reanalysis system (EARS) has been set up and tested through case studies. A one-year experimental reanalysis has been produced for the year 1998 and compared with ERA-40 and NNRP data. We present the preliminary results of verification and comparison in this paper.
2. EARS System
 The EARS system is based on the Fifth-Generation Penn State University/NCAR Mesoscale Model (MM5) [Grell et al., 1994]. The three-dimensional variational (3DVAR) data assimilation package [Barker et al., 2004] developed at NCAR is implemented in combination with an intermittent nudging scheme [Stauffer and Seaman, 1990] to assimilate TOVS (TIROS Operational Vertical Sounder) retrieval data, including three-dimensional temperature and dew point, as well as total precipitable water [Francis and Schweiger, 1999]. Conventional surface observations and upper-air sounding data are also assimilated. The system is driven by the ERA-40 reanalysis, which was found to produce superior results to those generated by utilizing the NNRP reanalysis. A set of customized seasonal MM5 model background errors was produced from a one-year integration of the model over the same domain for use with the 3DVAR approach.
 The EARS domain covers the pan-Arctic region at a horizontal resolution of 30 km (Figure 1) with 41 vertical terrain-following sigma levels and a model top pressure of 100 hPa. Beginning every 6 hours, the EARS system performs a 12-hr assimilation period followed by a 12-hr free-forecast period. During the assimilation period, all the observations are grouped in one hour windows. The MM5 model is integrated to the observation time at the top of the hour and a 3DVAR analysis is performed using the model state as the background. The model is then restarted half an hour before this observation/assimilation time and nudged to the new 3DVAR analysis, after which it runs without nudging for an hour and the cycle repeated. At the 6-hourly points when the ERA-40 data is available, the 3DVAR analysis instead uses the ERA-40 reanalysis as the background. After 12 hours of assimilation, the model is then run for a 12-hr free forecast period without assimilation. This setup ensures that the model integration during the entire 24-hr period is continuous and is constrained by the large-scale, 1.125°-resolution fields from ERA-40. The model output at the end of assimilation period, i.e., the 0-hr free forecast, thus becomes the EARS reanalysis.
 The EARS reanalysis, i.e., the 0-hr free forecast, as well as the 6-hr and 12-hr free-forecast results, are verified against station observations. ERA-40 and NNRP data that are available as either reanalyses (e.g., upper-air variables) or 6-hr forecasts (e.g., precipitation and other ERA-40 surface variables) are also verified in the same manner for comparisons with the EARS reanalysis. As they use different models and background errors, the three reanalysis systems assimilate observations at different spatial scales with observational error taken into account. The analysis errors are dependent on both model and observational errors; verifying the analyses against the same surface and sounding observation dataset within the Arctic domain, based on the same set of statistical metrics, thus provides intercomparisons that are as fair as possible. Additionally, the verification of precipitation gives robust comparative results since it is not an assimilated variable.
 On average, at each observation time there are about 1050 surface observations of temperature (T), dew point (Td), relative humidity (RH), wind components (U and V), sea level pressure (SLP), and 6-hr accumulated precipitation, and about 100 soundings of T, Td, RH, U, V, and geopotential height (Z) across the entire domain. All verifications presented here are based on 6-hourly domain-wide averages, which are then averaged in time. Precipitation values are verified using the equitable threat score (ETS) and categorical bias (BIAS) based on a contingency table [Wilks, 1995]. Both the ETS and BIAS scores measure the model accuracy based on the frequency of occurrence at or above a given threshold, with higher ETS indicating greater skill and a perfect forecast having a BIAS of 1. Five thresholds (0.2, 1.0, 2.5, 5.0, and 8.0 mm) are examined for the 6-hr accumulated precipitation. For all other variables, the root-mean-square error (RMSE) and bias are examined.
 Verification of EARS analyses and forecasts of precipitation shows that the 6-hr forecasts are significantly better than the analyses, producing higher ETS and closer-to-one BIAS scores for all five thresholds (Figure 2). This implies that the EARS analysis is providing a better initial condition for the free forecast. EARS has very similar skill for 12-hr forecasts but overestimates the small- to mid-thresholds.
 Results from comparing 6-hr precipitation forecasts among EARS, ERA-40, and NNRP (Figure 2) indicate that the EARS has greater skill than NNRP for all five thresholds. EARS has slightly higher skill in producing large (5.0–8.0 mm/6 hr) precipitation amounts than ERA-40, but is not as skillful for small amounts; however, EARS does show improved BIAS scores, implying that it does a better job at not overestimating small precipitation amounts. The ERA-40 has greater skill than NNRP for all precipitation amounts, while the NNRP has better BIAS scores for larger precipitation totals. A paired and two-sided Student's t-test [Hamill, 1999], used to compare adjacent EARS forecast intervals, as well as 6-hr forecasts of EARS with either ERA-40 or NNRP indicates that the above differences, except for the difference in ETS at 5.0 mm between 6-hr forecasts of EARS and ERA-40, are significant.
 Verification of surface variables is shown in Figure 3, with all differences among reanalyses, as well as 6-hr forecasts of EARS, ERA-40, and NNRP having significance at a confidence level of >99.99%. It is shown that, for all available comparisons between ERA-40 and NNRP, i.e., analysis of SLP and 6-hr forecasts of other variables, ERA-40 is significantly better than NNRP for both RMSE and bias. EARS produces significantly better 6-hr forecasts of surface T, Td, U, V, and RH than NNRP, with only SLP (analysis only) having similar RMSE and a slightly larger absolute bias. EARS produced significantly better wind reanalyses than ERA-40. The yearly average of the EARS 6-hr forecast results for all variables lie in between those of the ERA-40 and NNRP.
 In addition to the yearly-averaged results, seasonal variation in the verification of surface variables has also been investigated. Based on the conclusions drawn above, both EARS and ERA-40 are superior to NNRP; due to the absence of NNRP reanalyses of the surface variables (except SLP), Figure 4 shows only the seasonal variation in RMSE for the EARS and ERA-40 reanalyses. It is shown that the EARS performed better in the warm season (May through September) than in the cold season (November through March) for all variables except RH. The strongest seasonal variation in RMSE is seen for T and Td; all other variables show similar seasonal variations in the two reanalyses, although the ERA-40 exhibits a lesser amount of variation. The large warm bias of the EARS reanalysis shown in Figure 3 is primarily due to its poor performance in the cold season, which is principally due to the fact that the current EARS system focuses solely on the atmosphere. Land surface, ocean, and sea ice need further consideration in the model and corresponding data need to be assimilated.
Figure 5 shows the yearly-averaged RMSE and bias of T, U, and RH from EARS, ERA-40, and NNRP reanalyses at 12 pressure levels. Results of Td are similar to T; those of V are similar to U (figures not shown). Both the EARS and ERA-40 produced a less erroneous temperature analysis than did NNRP. The EARS performed better at lower levels, though worse at high levels, than ERA-40. All three reanalyses show similar vertical variations in RMSE, demonstrating better performance at mid-levels (850–400 hPa) and worse near the surface and at upper levels (300–200 hPa). However, EARS and ERA-40 have a warm bias at higher levels, where the NNRP has a cold bias. The EARS exhibits smaller error for all the variables at lower levels. While it has a small easterly bias, EARS has a much smaller RMSE than NNRP for U at higher levels, although it is slightly worse than ERA-40. The EARS produces better RH than both ERA-40 and NNRP at all levels, indicated by smaller RMSEs and biases.
5. Summary and Discussion
 Based on the preliminary evaluation of EARS, ERA-40, and NNRP reanalyses for 1998, the following are concluded: (1) ERA-40 is consistently and significantly better than NNRP over the Arctic; (2) EARS performance mostly lies in between that of ERA-40 and NNRP and closer to that of ERA-40; (3) EARS produced the best relative humidity analysis at all upper levels, produced better precipitation forecasts at large thresholds than ERA-40, and produced better wind and temperature analyses at lower levels than ERA-40; (4) EARS surface winds are the best overall, and the upper-level winds are comparable to ERA-40 - both are better than NNRP winds; and (5) EARS reanalysis provides improved initial conditions for forecasting.
 The present study has implications for future data assimilation experiments using the Weather Research and Forecasting (WRF) model, which is becoming increasingly prominent in activities such as the Arctic System Reanalysis (see http://polarmet.mps.ohio-state.edu/PolarMet/ASR.html). Development of a polar version of WRF has been accelerated by previous work on a polar version of MM5 [Bromwich et al., 2001]. In the same manner, the results of data assimilation experiments with MM5 as described in the present paper will inform and guide future observing system experiments with the polar version of WRF. The across-model robustness of conclusions about impacts of various types of observations must be established if the results are to guide the design of future observing systems.
 This work was supported in part by the National Oceanic and Atmospheric Administration through grant NA17RJ1224. The TOVS retrieval data used for assimilation were provided by Axel Schweiger at the University of Washington. Computational support was provided by the Arctic Region Supercomputing Center.