We evaluate the effects of spatial resolution on the ability of a regional climate model to reproduce observed extreme precipitation for a region in the Southwestern United States. A total of 73 National Climate Data Center observational sites spread throughout Arizona and New Mexico are compared with regional climate simulations at the spatial resolutions of 50 km and 10 km for a 31 year period from 1980 to 2010. We analyze mean, 3-hourly and 24-hourly extreme precipitation events using WRF regional model simulations driven by NCEP-2 reanalysis. The mean climatological spatial structure of precipitation in the Southwest is well represented by the 10 km resolution but missing in the coarse (50 km resolution) simulation. However, the fine grid has a larger positive bias in mean summer precipitation than the coarse-resolution grid. The large overestimation in the simulation is in part due to scale-dependent deficiencies in the Kain-Fritsch convective parameterization scheme that generate excessive precipitation and induce a slow eastward propagation of the moist convective summer systems in the high-resolution simulation. Despite this overestimation in the mean, the 10 km simulation captures individual extreme summer precipitation events better than the 50 km simulation. In winter, however, the two simulations appear to perform equally in simulating extremes.
 Observational and modeling studies suggest that extreme precipitation intensity and frequency are expected to increase under global warming conditions as mean water vapor content in the atmosphere increases [Groisman et al., 2005; Dai, 2006; Huntington, 2006; Trenberth et al., 2007; Dominguez et al., 2012]. However, the change in extremes may not be uniform across all regions [Trenberth et al., 2007] or change in proportion with the increase in the mean global water vapor content [Pall et al., 2007]. Understanding the regional projected changes is crucial for future water management at the local and regional scales. With a changing climate, water managers may not be able to rely on past precipitation time series for future planning [Milly et al., 2008]. Global climate models (GCMs) have, in general, reasonably good skill in simulating large-scale time-averaged precipitation patterns, but they still lack the ability to accurately reproduce the spatiotemporal distribution of precipitation and small-scale regional features. One of the reasons for this shortcoming is the large spatial resolution of GCMs. Prior studies have shown better agreement of seasonal mean precipitation with observations when simulation was performed with a higher-resolution global model [Duffy et al., 2003; Iorio et al., 2004], but high-resolution long-term multimodel simulations with GCMs are currently not computationally feasible.
 The process of downscaling GCMs to a few tens of kilometers for a particular region using regional climate models (RCMs) has proven to be useful in simulating precipitation due to the models' improved representation of local topographic variability and small-scale atmospheric processes [Kanamaru and Kanamitsu, 2007; Castro et al., 2007a; Trenberth et al., 2007]. The impact of model resolution in the simulation of monsoon precipitation over East Asia is reported by Gao et al. [2006, Gao and Shi, 2012]. For the southwest USA summer monsoon, Castro et al. [2007a, 2007b] have shown that RCM at a 35 km horizontal resolution can better simulate the diurnal cycle of convection and hence the interannual variability of summer precipitation. However, Gutowski et al.  indicated that a grid spacing of 15 km or less is required to simulate subdaily scale precipitation intensity and phase over the central United States (US). Recently Yamada et al.  found a realistic representation of the diurnal cycle of precipitation in the downscaled NCEP/NCAR reanalysis-1 using Regional Spectral Model at a 10 km resolution during a 5 year simulation. The European ENSEMBLES project was designed to study regional weather and climate with horizontal resolution of 25 km and coarser. Using these models, Herrera et al.  found that models in general capture the main features of precipitation, but most of them overpredict annual precipitation. Heikkilä et al.  used WRF as an RCM at 30 km and 10 km resolution in Norway and concluded that the higher resolution added value and should be considered for the simulation of extreme precipitation. Fernández et al.  used MM5 at 45 km resolution over the Iberian Peninsula and emphasized the sensitivity of model results to the choice of physical parameterizations. Using nested grid WRF simulations at 27 km and 9 km horizontal resolution, Cardoso et al.  indicated a significant improvement in the representation of precipitation in the Iberian Peninsula at all time scales in the high-resolution simulation. Our focus in this work is the effect of model resolution on daily to subdaily scale precipitation events in the topographically complex region of the Southwestern US, particularly the states of Arizona (AZ) and New Mexico (NM).
 Summertime precipitation in the Southwestern US is associated with monsoonal convective activity, characterized by the intense precipitation events of short duration (usually less than 2 h events with total accumulated precipitation of 10–100 mm depending on location). Monsoonal convection is the result of a complex interaction between synoptic to mesoscale atmospheric circulation features and local topographic variations. While the distribution of water vapor and atmospheric stability may be controlled by large-scale atmospheric motions, small-scale local topography plays an important role in the development of convection, which is highly localized in space and time. East of the Rockies, once deep convection develops in the mountainous terrain, it moves eastward and distributes precipitation through the Rockies' slopes and Great Plains [Lee et al., 2007]. Winter precipitation, on the other hand, though influenced by local topography, is predominantly dependent on the synoptic-scale patterns. These patterns define how midlatitude storms from the Pacific Ocean affect the region. Extreme winter precipitation events are usually associated with a quasi-stationary trough patterns off the coast of California and individual storms that move into the Southwest US. [Sheppard et al., 2002]. As a result of the very different physical mechanisms responsible for summer and winter precipitation in the region, accurately representing precipitation in the Southwest US is a challenge for any numerical atmospheric model.
 We perform the nested dynamical downscaling of NCEP-DOE Reanalysis-2 for the period of 32 years (1979–2010) over a region that encompasses the entire North American Regional Climate Change Assessment Program (NARCCAP) [Mearns et al., 2012] domain using the weather research and forecasting (WRF) RCM. The coarse domain is simulated at a 50 km resolution, with four nests at the resolution of 10 km covering the western US for the continuous period 1979 to 2010. The NARCCAP domain is same as the domain requirement for Coordinated RCM Downscaling Experiment (CORDEX) North American domain. Our coarse-scale simulation with 50 km grid is also intended to be the part of CORDEX database. In this work, we will only present the results for the coarse-resolution (50 km) and the high-resolution (10 km) nest that covers AZ and NM. The downscaled precipitation is compared with hourly station observations throughout the two states to evaluate simulated seasonal mean, daily, and subdaily extreme precipitation accumulations. In the next sections, we provide a brief description of the observational record, model specifications, and the comparisons with observations.
2 Observational Data
 We used station observations as well as gridded observations and analysis products for the evaluation of our simulations. These data sets have their own limitations, particularly for the representation of precipitation. Over complex topography, measurements at high elevations are either absent or highly underrepresented and thus affect the analyzed product. Station observations suffer from limitations such as wind-driven under-catch [Adam and Lettenmaier, 2003]. Despite these limitations, these products provide rich data sets for model evaluations, and we are using them without detailed product evaluation which is beyond the scope of this work.
 Hourly precipitation records are obtained from the National Climate Data Center (NCDC) for the period 1979 to 2010. All the stations in AZ and NM are considered if the records between 1979 and 2010 are found in the data archive maintained by NCDC, and if there are no missing years. Figure 1 shows the station locations and the topography of the region. A total of 73 stations (20 in AZ and 53 in NM) are found to have consistent observational record for this period. Station observations are point measurements and need to be converted for areal representation if comparison is to be made with model output. We use the Areal Reduction Factor (ARF) approach [Asquith, 1999] to convert point precipitation into area averaged precipitation. The model's grid point precipitation values represent the average of the grid area. We use ARFs to derive area averaged spatial representative precipitation from point observations. Based on empirical as well as analytical approaches, there are several different methods of ARF calculations [see, e.g., Leclerc and Schaake, 1972; Omolayo, 1993; Allen and DeGaetano, 2005; Desbordes et al., 1984; Myers and Zehr, 1980; Zehr and Myers, 1984; Asquith and Famiglietti, 2000; Rodriguez-Iturbe and Mejía, 1974; Sivapalan and Blöschl, 1998]. A detailed review of the requirements of ARF and different approaches to calculate them is given by Svensson and Jones . As pointed out by Mishra et al.  despite variations in approaches to estimating ARF, the ARF values from various sources were similar as reported in Asquith . Therefore, we decided to use the method of Leclerc and Schaake .
where ZE is areal average effective precipitation and ZT is total point precipitation, t is temporal resolution of observational data in hours (here 1 for hourly data), and A is area in square kilometer (e.g., 100 for 10 km RCM resolution). More details of this method can be found in Mishra et al.  and references therein.
 To evaluate the simulated seasonal mean precipitation, we use gridded Parameter-elevation Regressions on Independent slopes Model (PRISM) data [Daly et al., 2005]. PRISM data are available at the monthly scale and thus cannot be used to validate hourly scales simulations. The PRISM model uses point measurements of precipitation to produce continuous grid estimates of monthly mean precipitation at the resolution of 4 km. The gridded PRISM data are provided by the PRISM Climate Group, Oregon State University (http://prism.oregonstate.edu). As noted above, the station observations are very limited in complex mountainous terrain; PRISM data take orographic effects into account. It uses a digital elevation model to partition the topography into different constant slope facets. Grid cell precipitation is calculated by linear regression of station precipitation and its elevation against the elevation of grid cells facets [see Daly et al., 2005; for more details]
 The North American Regional Reanalysis (NARR) data from the National Centers for Environmental Prediction (NCEP) Environmental Modeling Center is used to evaluate the gridded diurnal cycle of the model simulations. NARR provides 3-hourly gridded precipitation by assimilating NCDC daily cooperative stations and hourly precipitation data from observing stations into their atmospheric model.
3 Simulation Specifications
 The RCM WRF version 3.3 is used for downscaling NCEP-DOE reanalysis. The coarse domain horizontal grid is a 155 by 130 Lambert projection regular horizontal grid with spacing of 50 km (WRF50 here forth). The finer nest grid size is 116 by 101 with a 10 km horizontal resolution (WRF10) focused on the Southwestern states of AZ and NM. The 28 vertical levels range from the surface to 50 hPa. Vertical grid spacing follows terrain and varies with height. We used Yonsei University PBL scheme with explicit entrainment process at the top of the PBL [Honget al., 2006]. WRF Single Moment-5 class scheme [Hong et al., 2004] is used for microphysics and RRTMG scheme with the MCICA method of random cloud overlap [Mlawer et al., 1997] for the radiation scheme. The MCICA scheme is recommended by WRF for long simulations. The Noah Land Surface Model (LSM) scheme with four layers of soil moisture and soil temperature is implemented for surface physics [Chen and Dudhia, 2001]. NCEP-DOE reanalysis data have only two layers of soil moisture, so we used four-layer ERAI (Era-Interim) soil moisture and soil temperature climatology data as initial conditions for the LSM. Kain-Fritsch (KF) scheme is used for cumulus parameterization [Kain, 2004]. All the physical parameterizations are the same for the two resolutions, except the geographical data and nudging (spectral nudging is used only in the coarser grid).
 Nudging is the process where a model field is corrected (relaxed) towards that of the driving model (in this case, we use Reanalysis). Previous studies [Miguez-Macho et al., 2004; Meinke et al., 2006] have shown larger discrepancies between observations and models, particularly rainfall near coastlines, when grid nudging was implemented. In order to minimize this numerical effect and to preserve large-scale circulations and small -scale gradients and variability, spectral nudging is implemented in the coarser domain. This nudging was developed for RCMs [Von Storch et al., 2000] and has been extensively used in different studies [Heikkilä et al., 2011; Cha et al., 2011; Separovic et al., 2011; Wi et al., 2012]. Separovic et al.  showed that spectral nudging reduces significantly the internal variability and noise for RCM simulations. The nudging technique uses a correction term that depends on the difference between the model values and those of the driving model. This correction term controls the strength of the nudging. Very strong nudging may destroy the mesoscale features generated by the fine-scale models, and the result will be predominantly controlled by and tend towards the driving model. If there is no nudging, there is a tendency of growing phase and amplitude of the model's internal variability, particularly over large domains (like the 50 km one we use here) where the boundary forcing loses its impact in the interior of the domain. To compromise between these two possible extreme cases, the spectral nudging is performed only for the 50 km domain. There is no need of nudging for the 10 km domain because it is already inside the nudged domain. We use very weak nudging over the coarse domain to maintain mesoscale features generated by the model. Nudging strength in WRF is controlled by the nudging coefficient, and we reduced its value by an order of magnitude recommended by WRF itself (0.00003 against 0.0003).
 Precipitation in the model, particularly in summer, is controlled by the convective parameterization scheme. We use the KF parameterization scheme [Kain, 2004]. This scheme has three components, namely the trigger, updraft, and closure functions. The trigger function increases the temperature of the air parcel as a result of local perturbations and is a function of horizontal resolution. The value of temperature increment is higher for higher resolutions—which may lead to excess rainfall. The updraft property of clouds such as cloud radius controls the mixing. The closure scheme in KF assumes that the convection process consumes 90% of the convective available potential energy (CAPE).
 Our objective is to evaluate the WRF-simulated extreme precipitation by increasing the horizontal resolution from 50 km (WRF50) to 10 km (WRF10) in the complex terrain of AZ and NM. We focus on the summer monsoon season (June–September) and the winter season (November – February) and analyze 32 years (1979–2010) of a continuous nested simulation. The year 1979 is excluded to give the model sufficient spin-up time. The following results are based on the simulated data for 31 year period from 1980 to 2010.
4.1 Summer Precipitation
 More than 50% precipitation in the Southwestern US falls during the North American Monsoon (NAM) season in the months of June, July, August, and September [Report to the Nation, 2004]. During the NAM, most of the moisture at upper levels originates in the Gulf of Mexico and enters the region from the east, while lower-level moisture of oceanic sources originates predominantly from the tropical Pacific Ocean and the Gulf of California [Schmitz and Mullen, 1996; Adams and Comrie, 1997; Higgins et al., 2003]. The diurnal cycle of convective rainfall is a key component of monsoonal precipitation. The diurnal cycle largely controls the transitions of rainfall and the surface energy budget [Castro et al., 2007a]. For this reason, before we analyze extreme events in the region, we will evaluate the performance of the model in capturing the diurnal cycle.
4.1.1 Mean Summer Precipitation Diurnal Cycle
 One of the main problems in most General Circulation Models (GCMs) is the miss-representation of the diurnal cycle of summer rainfall with early and weak onset of convection development [Trenberth et al., 2003]. Summer diurnal cycle in the Southwestern US is very pronounced, with afternoon peaks associated with monsoon thunderstorms. Here we calculated mean hourly precipitation for each hour of the day, for 31 years of simulation at the selected stations and compare with similarly calculated observations. The WRF50 simulation has 3-hourly output, while WRF10 has hourly output. Every hour is considered whether it is wet or dry. This creates a 31 year climatological diurnal cycle for the WRF50 and WRF10 simulations. In a similar way, we generated mean diurnal precipitation for hourly observations. The diurnal cycle for four stations (two from AZ and two from NM) is shown in Figure 2. The “Hours” axis in Figure 2 represents local time (UTC + 7 h) to emphasize the local diurnal cycle of precipitation. The simulated diurnal cycles for the stations in AZ, Tucson and Flagstaff, are almost in agreement with the observed cycle, with peak intensity in the early afternoon. WRF10 is slightly closer to observations in intensity as well as in timing. In Tucson, WRF50 shows a slightly early onset of convection in comparison to WRF10 and observations. Simulated diurnal cycle in NM shows similar development of storms as observations, with a peak in the late afternoon and evening, a little later than in AZ. Simulated onset of convection seems to be slightly early and significantly more intense, particularly in WRF10.
 We compare the spatial distribution of mean precipitation with the gridded PRISM data product, available for monthly accumulations for each month from 1980 to 2010. The simulated climatological mean precipitation for WRF10 and WRF50 are interpolated onto the PRISM grid. Figure 3 shows the mean precipitation values from PRISM, WRF10, and WRF50. The fine spatial structure in mean JJAS precipitation, particularly in mountainous regions, observed in the PRISM data is well represented by WRF10, whereas these structures are missing in WRF50. The increased detail comes at the cost of a strong positive precipitation bias over NM. The excessive overestimation in WRF10 (unlike in WRF50) is likely due to several factors including, excessive precipitable water and CAPE, and deficiencies in the convective parameterization scheme. Figure 4 shows an example of a 4 day period between 5 July 2006 and 8 July 2006 in Albuquerque, NM when WRF10 triggered repeated convection, whereas observations only show three distinct events (Figure 4a). Excessive precipitable water enters from the south and east, and penetrates deep into the valley and the eastern plains in the 10 km simulation, while observations show less precipitable water over NM. The anomalous high precipitable water generates a 4 day period of high CAPE and very active convection that is not present in the observations. However, we do not find a positive bias in precipitable water in the 30 year climatology (not shown), so this points to additional factors accounting for the positive precipitation bias, including issues with the convective parameterization scheme in higher-resolution models.
 Deficiencies in the convective parameterization scheme can cause a slow propagation of mesoscale convective systems. Summer convection develops in the Rocky Mountains in the early afternoon, and then the systems move eastward toward the Great Plains [Carbone et al., 2002; Tian et al., 2005; Janowiak et al., 2007]. The speed of propagation of these systems determines the distribution of daily precipitation in the mountains, eastern slopes, and Great Plains. The WRF10 simulation generates excess precipitation over the mountainous region and eastern slopes and fails to propagate the systems further east. This stagnant nature of summer convective systems in our simulations is shown in Figure 5, where the climatological diurnal cycle of WRF10 precipitation is compared to that of NARR (which assimilates observations). Note that the purpose of the figure is to show the convective system core at different points in time for the simulation and observations, so WRF10 and NARR are plotted in different color scales. As shown in this figure, the core of the NARR convective system has almost left the domain by 2 am, whereas in WRF10 it is persistent until 5 am. Also, the development starts early in WRF10 at around 8 am, while in NARR it starts at around 11 am, and by this time WRF10 already developed a strong system. In addition to the problem with propagation speed, we clearly see strong overestimation of precipitation in the core of the system by the WRF10 simulation. Both the slow propagation of the system and the overestimation of precipitation are likely due to the convective parameterization in the simulations, as discussed later.
4.1.2 Extreme Summer Precipitation
 In this section, we analyze 3 and 24 h precipitation events over AZ and NM. We create an accumulation time series using sliding 3 h and 24 h windows from hourly observations. By doing this, we do not miss any extreme events that might get distributed between two nonoverlapping periods. This procedure is done for all 73 stations ARF corrected observations and corresponding simulated values interpolated at the station's latitude and longitude for all JJAS during the 31 year period.
 When comparing model simulations to observations, we find a clear variation of model bias according to precipitation percentile (Figure 6). Both WRF50 and WRF10 capture extreme events more realistically than lower percentile events, which tend to be overestimated. However, extreme events in AZ are underestimated—particularly 3 h events. There is a higher positive bias in the WRF10 than in the WRF50 simulations and larger overestimations over NM than over AZ (Figure 6). The regional differences of extreme precipitation events are more clearly seen when mapping the 99th percentile of observations and corresponding model biases with WRF10 and WRF50 (Figure 7). The bias level of −.08 to +.08 is considered as model agreement with observations and shown in green. Three-hour extreme events are more realistically captured by WRF10 than by WRF50, but 24 h events have a significant positive bias over NM, particularly in WRF10. This indicates that WRF10 has a tendency toward excessive precipitation during heavy storms, which is accumulated over the 24 h period. This artificial precipitation bias is related to the convective parameterization scheme and model resolution.
 To evaluate the ability of the simulations to capture individual extreme events at the observing stations, we identify observed extreme precipitation events in AZ and NM and corresponding temporal evolution in WRF10 and WRF50. We calculated the hit rates of extreme events (storms) in WRF50 and WRF10 by examining whether the observed extreme events are present in simulations with accurate intensity and timing. To do this, the 90th percentile precipitation (pr-90) is calculated from the observational time series, including the dates when the events occurred or were exceeded. To account for a larger number of events and improve the statistics, we relaxed the extreme criteria of 99th percentile used above and now use the 90th percentile ARF corrected observed precipitation level as a base level to look into the simulated data. In a similar way, we find the dates in WRF50 and WRF10 series when the accumulation exceeds the reference value (pr-90). These dates are compared with the dates in the observational time series. We consider a “hit” when dates in the model are the same as the dates in the observations with a time difference of plus or minus one day (this window makes a small difference in hit rates). The rest of the dates in the observations, not found in the simulations, are misses. The percentage hit rate is calculated against total number of events found in the observations. This process is repeated for all 73 observing stations. Note that there may be some extreme events (exceeding pr-90 on some dates) in the simulation time series that are not present in the observations, which we refer to as false alarms. A detailed analysis revealed that many of the false alarm dates of extreme precipitation indeed had precipitation in the observations (analysis not shown), but observed precipitation intensity was much lower and did not qualify as an extreme event. This is consistent with the positive bias in the 10 km simulation.
 Figure 8 shows the map of percentage hit rates for 3-hourly and 24-hourly accumulations, respectively. In general, we find that WRF10 has a greater ability to capture individual 3 h and 24 h storm events than WRF50. Simulation of mesoscale storms in complex mountainous terrain with coarse resolution is difficult because of inadequate representation of surface boundary and limitations in the convective parameterization [Dai et al., 1999; Zhang, 2003; Liang et al., 2004]. On the other hand, WRF10 is able to resolve storms individually but presents a time shift of 1 to 3 h. Missing events and some false alarms are also problems with WRF10 and WRF50 simulations. We also found a very strong regional difference in hit rate statistics with more hits in NM than in AZ (Figure 8). WRF10 consistently captured more events in both 3-hourly and 24-hourly simulations than WRF50. However, there is still a problem in resolving extreme events in AZ. The storm hit rates remain well below 50%, except at the Flagstaff airport where about 56% of 24-hourly events were captured by WRF10. The 3-hourly event hit rates in AZ is higher in WRF10 than in WRF50, but remain well below 50%. On the other hand, in NM, the hit rates from WRF10 for 24-hourly extremes were found above 50% for almost all stations with the maximum hit rate of 83% out of a total 167 events at Dilia. WRF10 even captures 70% of 3-hourly events at this station out of total 120 events. The number of events and hit rates for individual stations along with stations coordinates are shown in Tables 1 and 2 for stations in AZ and NM, respectively.
Table 1. 24-Hourly and 3-Hourly Extremes Hit Rates (%) Calculated From 50 km (WRF50) and 10 km (WRF10) Simulations for Stations in AZ for Summer (JJAS) and Winter (NDJF)a
Hit rates (%)
Hit rates (%)
Hit rates (%)
Hit rates (%)
aAlso included are the total number of events (Events) during 1980–2010 in all cases. Hit rates of more than 50% are indicated as bold.
 During the winter, strong westerly winds deliver moisture from the Pacific Ocean into the western US with strong storm-driven advection [Hirschboeck, 1991]. Winter precipitation in the Southwest originates from occasional cyclonic storms that attain a large size and/or shift southward in their flow patterns [Sheppard et al., 2002].
4.2.1 Mean Winter Precipitation
 Winter precipitation in the region is primarily focused on the high-elevation terrain of the Mogollon Rim in AZ and the Rockies in NM (Figure 9). Both WRF50 and WRF10 realistically capture this orographic precipitation, with greater detail in WRF10. There is still overestimation by WRF10 at the highest elevations, but it is not nearly as pronounced as the overestimation in the summer season (Figure 3). The orographically induced overestimation of precipitation in models was also reported in other studies [Giorgi and Marinucci, 1996; Gao and Giorgi, 2008]. In line with this study, Gao and Giorgi  also found that their 20 km resolution simulations has larger precipitation bias than 50 km resolution runs. They partially attributed this bias to the absence of any orographic and gauge under-catch correction in the analyzed product they used for the evaluation—this is also the case for our study area.
 Median to extreme winter precipitation is overestimated throughout the region, as simulated by both coarse- and high-resolution simulations (Figure 10). Much like the summer season simulations, extreme precipitation has a smaller bias than lower percentile events. While the biases are slightly larger over NM than over AZ, the regional difference is not as strong as during the summer. Extreme (99%) precipitation is realistically captured throughout the region, although 24 h events are consistently overestimated over NM (Figure 11).
 As expected, the model does a better job at capturing individual large-scale winter precipitation events than summer convective events. Hit rates are higher over NM than over AZ and increase significantly for 24 h events compared to 3 h events (Figure 12). We find 24 h hit rates range from about 40% to close to 100% in much of NM. In the winter season, both WRF10 and WRF50 perform in a very similar way, and we cannot find consistent improvements in the higher-resolution simulation. Actual values of hit rates and number of extreme events for winter are shown in the Tables 1 and 2 for the stations in AZ and NM, respectively.
5 Discussion and Conclusion
 The focus of this study was to evaluate the effect of increased RCM spatial resolution in the simulation of mean seasonal, daily, and subdaily scale precipitation in the Southwestern US. To do this, we perform a 32 year (1979–2010) continuous nested downscaling of NCEP-2 using the WRF RCM at 50 km and 10 km resolutions and compare with station observations and gridded data.
 As expected, the fine spatial structure precipitation is more clearly simulated by the 10 km resolution (WRF10) than 50 km (WRF50). However, the increase in the resolution introduces excess precipitation, with significant overestimation of mean precipitation over NM during the summer season, which is reflected in the amplitude of the mean diurnal cycle. We find that several factors contribute to the overestimation, including excessive precipitable water, excessive CAPE, and deficiencies in the convective parameterization scheme. We analyze one particular event over NM when excessive precipitable water in the WRF simulations penetrates deep into the valleys generating anomalous high CAPE and triggering repeated convection that was not present in the observations. However, we find no positive bias in the 30 year climatology of precipitable water in the simulations, so we conclude that other factors are also contributing to the positive precipitation bias.
 A similar overestimation of mean precipitation when increasing the spatial resolution of the numerical model has been found in previous studies [Lee et al., 2007; Yamada et al., 2012]. Lee et al.  find that higher-resolution GCM simulations tend to lock summer precipitation to the Rocky Mountain eastern slope region and consequently have a dry bias further downstream in the Great Plains and a strong wet bias over the southwestern US mountains. This bias becomes worse with increasing model resolution in high terrain. Though unclear, they suspected this strong locking of the convective rainfall in the mountain to the convective instability of second kind induced by strong convection. The compensating strong subsidence suppresses convection around the local maximum at the top of the mountains. Strong precipitation bias in presence of topography in higher-resolution simulations is also reported in Gao and Giorgi . In our simulations, the positive bias is also linked in part to the propagation of the mesoscale convective systems that develop in the mountainous terrain. These biases are linked to problems with the convective parameterization scheme in the high-resolution simulation.
 As stated before, we use the KF convective parameterization scheme. The mixing rate in the KF scheme is inversely proportional to the subgrid scale updraft cloud radius. In the old KF scheme [Kain and Fritsch, 1990], the radius of the cloud remained constant. In the new KF scheme [Kain, 2004] used in this study, this constraint is relaxed, but the cloud radius remains within a limit between 1000 and 2000 m depending on the vertical velocity. The assumption that the cloud radius remains between 1000 and 2000 m introduces a limit on the mixing and causes excessively developed convection, particularly in finer grids where in fact there might have been none had the mixing process were more relaxed. Note that mixing dilutes the atmosphere and plays an important role in convection development as stronger mixing creates weaker convection. For coarser resolutions, the subgrid-scale minimum cloud radius of 1000 m may be suitable, but for higher resolution this might inhibit mixing process in smaller clouds and therefore generate stronger convection. Narita  modified the KF scheme for higher resolution by modifying the cloud radius calculated by the KF. This modification reduces excessive orographic precipitation in their Japan Meteorological Agency mesoscale model (MSM) [Narita, 2010; Moriyasu and Narita, 2011]. In another effort to improve the KF parameterization, Troung et al.  introduce a new diagnostic equation to compute updraft velocity, closure assumption, and trigger function within the Regional Atmospheric Modeling System. The new scheme was shown to simulate precipitation much better than the original K-F scheme when tested on an extreme rainfall event in the mountainous provinces of central Vietnam, mainly because it helped provide a better representation of the stratiform rainfall associated with a mesoscale convective system. This modified scheme is currently being incorporated into WRF at the University of Arizona.
 Interestingly, WRF10 and WRF50 are better able to capture extreme precipitation than lower percentile events for both regions. This is perhaps due to the different physical mechanisms driving the variability of mean and extreme precipitation. Mean precipitation is constrained, not by the availability of moisture, but by the availability of energy (see Allen and Ingram  for a detailed analysis). On the other hand, the intensity of individual extreme precipitation events are more related to the available moisture in the atmosphere because in these heaviest events, most of the moisture is rained out of the column [Trenberth, 1999; Allen and Ingram, 2002].
 We find that increasing the horizontal resolution from 50 to 10 km improves the ability of the model to capture individual extreme (above 90th percentile) summer storm events of 24 h and 3 h duration. The regional discrepancy in hit rates is consistent in both simulations, with more success in capturing individual storms in NM than in AZ.
 The model's overall performance improves during the winter, particularly in terms of mean precipitation and the percentage hit rates in the region. However, there is still consistent overestimation of precipitation at both resolutions. Previous studies have found that 50 km RCM simulations consistently overestimate precipitation in the Western US [Wang et al., 2009; Dominguez et al., 2012; Mearns et al., 2012]. In this work, we find that increasing the horizontal resolution does not alleviate the problem. In fact, both WRF10 and WRF50 perform similarly in terms of 99th percentile precipitation bias and hit-rate statistics. This indicates that there is not much value added by increased spatial resolution during the winter months.
 We acknowledge the Department of Energy (DOE) for its support through funding the project (DE-SC0001172). Support for this study has also been provided in part by the National Science Foundation (NSF) Grant 1038938. Authors thank the National Climate Data Center (NCDC) for providing hourly precipitation data for monitoring sites used in this work. We also thank three anonymous reviewers for evaluating the manuscript and giving constructive feedback to improve the manuscript.