Using Large Ensembles to Examine Historical and Projected Changes in Record‐Breaking Summertime Temperatures Over the Contiguous United States

The frequency and intensity of heat extremes over the United States have increased since the mid‐20th century and are projected to increase with additional anthropogenic greenhouse gas forcing. We define heat extremes as summertime (June–August) daily maximum 2m temperatures that exceed historical records. We examine characteristics of historical and near‐future heat extremes using observations and past and future projections using 100 ensemble members from three coupled global climate models large ensemble simulations. We find that the large ensembles capture the trend and variability of heat extremes over the period 2006–2020 relative to the 1991–2005 climatology but overestimate the frequency at which the heat extremes occur. In future warming scenarios, heat extremes continue to increase over the next 30 years, with high amplitude records in the Northwest and Central US. After 2050, we find there is a spread in the frequency of heat extremes that is dependent on the emissions scenario, with a high emissions until mid‐century followed by a high mitigation scenario showing a decrease in heat extremes by the end of the century. Although the frequency of future heat extremes is likely overestimated in the large ensembles, they are still a powerful tool for researching extreme temperatures in the climate system.

Previous studies have used various definitions of heat extremes, including record breaking temperatures (Abatzoglou & Barbero, 2014;Fischer et al., 2021;Rahmstorf & Coumou, 2011), the ratio of record high temperatures to record low temperatures (Meehl et al., 2016), the number of days exceeding the NWS heat warning thresholds (Dahl et al., 2019), and the number of heat waves (Peterson et al., 2013).Despite the many definitions of heat extremes, there is consensus that heat extremes have increased, although not linearly over time with regional differences across the CONUS.The 1930s had the highest number of heat waves in history (Peterson et al., 2013) and half of the absolute highest temperature records (Abatzoglou & Barbero, 2014) due to the unprecedented summer heat exacerbated by land-surface feedbacks during the Dust Bowl.Since the mid-1960s, there has been an increase in heat waves and the warmest daily temperature of the year, and since the 1970s, an increase in the number of record high temperatures (Vose et al., 2017;Wuebbles et al., 2017).The annual average temperature is increasing most in the Western U.S. and increasing least in the Southeast U.S.This is also seen when analyzing the warmest daily temperature of the year.Compared to the 1901-1960 average, the 1986-2016 average has increased in the west, but decreased in almost all locations east of the Rockies (Vose et al., 2017;Wuebbles et al., 2017).The southeastern U.S. has experienced summertime cooling known as the "warming hole" (Partridge et al., 2018;Rogers, 2013), although there is recent evidence that the "warming hole" has been reduced (Hu et al., 2020;Meehl et al., 2015).
Of previous studies that have investigated record-breaking temperatures, Rahmstorf and Coumou (2011) find that the effect of a long-term warming trend on record-breaking is dependent on the ratio of the warming trend to the short-term standard deviation.They find that in data with large variability compared to the trend, climate-related increases in extremes will be relatively small, for example, daily data from a single location may not yet show a major change in temperature due to external forcing.However, they note that for extremes that exceed a predefined threshold, the dependence on record-breaks is highly non-linear and attribute most of the recent extremes in monthly and annual temperature data to increasing global warming.Fischer et al. (2021) examine record-shattering events using large ensembles and conclude that the probability of week-long record-shattering heat extremes are two to seven times more probable in 2021-2050 compared to the last 30 years in high emissions scenarios, with lower probabilities in lower emissions scenarios.They find the large ensembles simulate events that shatter previous records that are much larger than the ones of recent heat extremes, however they find the driving physical mechanisms behind the events are consistent with real-world examples.This study examines the historical and projected changes in record-breaking summertime (June-August) daily maximum 2m temperature over the CONUS using large ensembles of climate model simulations, including the recently developed Geophysical Fluid Dynamics Lab (GFDL) Seamless system for Prediction and Earth system Research Large Ensemble (SPEAR-LE).We use daily maximum record-breaking temperatures as a metric of heat extremes to investigate the historical and future properties of the warmest annual temperatures.Other metrics of heat extremes may be more suitable to study impacts, such as compound temperature and humidity metrics (Fischer & Schär, 2010), heat extremes that occur over a prolonged period of time (Meehl & Tebaldi, 2004), or extreme daily minimum temperature (Karl & Knight, 1997), but we believe the daily record-breaking metric will give insight as to how the most extreme future summer temperatures will evolve.We find that SPEAR-LE and other large ensembles used in this study overestimate the historical heat extremes over the CONUS.With future projected warming, summertime heat extremes increase over the next 30 years, and the frequency of heat extremes toward the end of the century is dependent on the emissions scenario and subsequent warming.

Large Ensemble Data Sets
In this study, we use output from the 30 member initial-condition large ensemble from the SPEAR model recently developed at GFDL (Delworth et al., 2020).SPEAR shares many components with GFDL CM4 (Held et al., 2019) but with configuration and physical parameterization choices geared toward climate prediction and projection on seasonal to decadal time scales for use in real-time seasonal (Kirtman et al., 2014) and decadal predictions (Yang et al., 2021).SPEAR has shown demonstrated skill in seasonal prediction of North American temperature including summertime heat extremes (Jia et al., 2022), wintertime cold extremes (Jia et al., 2023), and wintertime temperature swings (Yang et al., 2022), and has been used in climate change studies of various systems (Delworth et al., 2022;Murakami et al., 2020;Pascale et al., 2020).We use the 30-member large ensemble output from the medium resolution (SPEAR_MED) model with 50 km horizontal global atmosphere/land resolution (AM4-LM4, Zhao et al., 2018aZhao et al., , 2018b) ) and an approximate 1° horizonal resolution for ocean and ice components (OM4, Adcroft et al., 2019).The SPEAR large ensemble simulations extend from year 1921 to 2100 and include runs forced with natural forcings over the period 1921-2100, historical forcings over the period 1921-2014 and future projections from 2015 to 2100 with SSP5-8.5, SSP2-4.5, and SSP5-3.4OSradiative forcings (O'Neill et al., 2016;van Vuuren et al., 2014).Output from the SPEAR large ensemble is available at https:// www.gfdl.noaa.gov/spear_large_ensembles/.
We compare the results from SPEAR_MED with two other large ensembles, one from the GFDL Forecast-oriented Low Ocean Resolution (FLOR) model (Vecchi et al., 2014) and the other from the National Center for Atmospheric Research Community Earth System Model version 1 (CESM) (Kay et al., 2015).The FLOR large ensemble data set contains 30 ensemble members from year 1921-2100, with historical forcing from 1921 to 2005 and RCP8.5 forcing (Riahi et al., 2011) from 2006 to 2100.FLOR has similar horizontal atmosphere/land and ocean/ice resolution as SPEAR_MED but uses the atmosphere/land components from the GFDL CM2.5 model (Delworth et al., 2012) and the ocean/ice components from the GFDL CM2.1 model (Delworth et al., 2006).The CESM large ensemble data set contains 40 ensemble members from year 1920-2100, with historical forcings from 1920 to 2005 and RCP8.5 forcings from 2006 to 2100 (Deser et al., 2020).The horizonal resolution is 1° for all model components.The CESM large ensemble has been used in many heat extremes studies including investigating land-atmosphere interactions (Merrifield et al., 2017), aerosols impact on heat extremes (Xu et al., 2018), and the impacts of mitigation on heat extremes (Oleson et al., 2018;Tebaldi & Wehner, 2018).

Observational Data
The Global Historical Climatology Network (GHCN) gridded data is used in this study.The gridded data is available over the CONUS for the period 1951-present and has a 1/24° resolution.The GHCN data has been regridded to the same horizontal grid as SPEAR/FLOR before analysis.Data can be accessed and downloaded from https:// www.ncei.noaa.gov/pub/data/daily-grids/beta/by-month/.

Methods
We use the daily maximum 2m temperature from each data set described above to examine temperature extremes during the summer season, defined as June 1 to August 31 (JJA).Since we want to only identify records over land, we exclude any atmosphere grid cells that have associated land areas less than 80% and lake areas that are greater than 30%.After filtering out grid points with insufficient land area, we establish record high 2m maximum temperatures (t max ) for each day in JJA by computing the maximum daily t max for each grid point over the CONUS from 1991 to 2005.We then compute records for 2006-2020 as daily t max values that are greater than the 1991-2005 historical records at each grid point.Each time a new record is set, the previous record is replaced by the new value to simulate how record-setting occurs in practice.For the large ensemble data sets, each ensemble member is treated as a separate realization, so that each ensemble has a separate historical record and results for 2006-2020.This gives 100 realizations across three large ensemble data sets.
To examine how summertime heat extremes will change in the near future, we use model data from the high emissions SSP5-8.5 scenario for SPEAR and RCP8.5 scenario for FLOR and CESM for years 2021-2050.We first recalibrate the historical records by computing the maximum daily t max values for each gridpoint in JJA from 1991 to 2020.We then calculate new records for 2021-2050 as the daily t max values that exceed the 1991-2020 historical record values, without replacing the previous record.For this near future period, the historical records remain fixed so that we can examine how heat extremes will change relative to recent historical heat extremes.As previously done, each ensemble member was treated as a separate realization to give 100 realizations for the future period.We then extend the analysis to 2100 using multiple emission scenarios to examine the impact on heat extremes, including the impact of mitigation on extremes.

Historical Heat Extremes
We first evaluate the ability of the models to simulate historical summertime daily t max records over the CONUS by analyzing the ability of the large ensembles to capture the observed variability and trend of records.Figure 1 illustrates the percent land area where daily records are broken in JJA from 2006 to 2020.The 2006-2020 average large ensemble means are higher than the observed mean of 5.43%, ranging from 5.58% in FLOR to 7.47% in SPEAR.Most of the observed variability falls within the ensemble spread of all models, except for five days in June 2011 and June-July 2012 that lay above the daily ensemble spread of all 100 ensemble members.The observed values for these days range from 27% to 32%, which lay well within the maximum values simulated over the historical period for each model.There is a large daily ensemble spread for each model, where the maximum values for each model are above 50% and minimum values for each model are 0%.
We next investigate the ability of the models to capture the observed amplitude of record-breaking daily t max in JJA, where the amplitude of records is defined as the difference between the new and old daily t max records.We then calculate the amplitude of new records that are set from 2006 to 2020, separate them into 1 K bins, and calculate the frequency that they occur.The results in Figure 2 show that about 53% of new records have an amplitude of 0-1 K.The occurrence of higher amplitude records exponentially decreases in the observations and in the large ensembles.The 100-member ensemble spread does not capture the observed values for amplitude bins 5-6 K through 10-11 K, where the observed values are slightly below the ensemble spread.For example, in the 5-6 K amplitude bin, the observed value is 0.917% which is outside the ensemble spread of 0.938%-2.57%.The highest observed amplitude record is 11.02 K whereas the large ensembles simulate record amplitudes as high as 16-17 K.Most ensemble members simulate high amplitude records above the observed maximum record amplitude of 11.02 K, except for two ensembles in SPEAR, one in FLOR, and three in CESM.However, it is worth noting that these high amplitude records over 11K are still rare in the large ensembles.For example, in the SPEAR large ensemble, the average frequency of record amplitudes above 16 K accounts for 8.8 × 10 5 % of the total records set across all 30 members.
In general, the regions that have the highest amplitude records are consistent in the models and observations.Figure 3 illustrates the spatial pattern of the frequency of high amplitude records greater than 6 K, where Figures 3b-3d represent the ensemble mean frequency for each model.In the observations (Figure 3a), the regions with the highest amplitude records include the Northwest, West Coast, the central U.S., and parts of the Ohio River Valley.The regions with the lowest amplitude records are the Northeast and Southwest.The models show many more grid points where high amplitude records occur.The areas that have a high frequency of high amplitude records are generally consistent with the areas of high amplitude records in the observations.For example, in all three models the North-Central U.S. and Pacific Northwest are high frequency areas of high amplitude records, which is consistent with the locations in the observations.However, there are some differences, for example, in SPEAR (Figure 3b) there few high amplitude records in the Ohio River Valley area, which is inconsistent with the observations, and CESM (Figure 3c) and FLOR (Figure 3d) show more of a signal across the Southeast and East Coast compared to the observations.More information on the ensemble spread of the record amplitudes for each model are shown in Figure S1 in Supporting Information S1, where the largest ensemble spread coincides with the same general areas of high amplitude records.Previous studies have found that climate models have a warm and dry bias over cropland, including in the central U.S. and Midwest, where historically a summertime cooling is seen in areas of agricultural intensification due to increased evapotranspiration in these areas (Coffel et al., 2022;Mueller et al., 2016).This bias may explain some of the high frequency areas of high amplitude records in the central U.S. and Midwest in SPEAR, CESM, and to some extent FLOR, given the spatial pattern overlaps with areas of cropland and in the warm and dry bias areas from Coffel et al. (2022).
The observed frequency that new records are set from 2006 to 2020 is shown in Figure 4a, with a maximum of 7%-12% of days in JJA in the South and Southeast.The SPEAR ensemble mean (Figure 4b) and CESM ensemble mean (Figure 4d) show higher frequencies than the observations in most places with two maxima in  the Mountain West and in an area centered over West Virginia and Kentucky, with frequencies of 8%-12%.The ensemble spread does not contain the observed values in stippled areas in Figures 4b-4d, with much of the mountain west and northeast being overestimated in the SPEAR and CESM ensembles, as the observations lie below these ensemble spreads.Figure S2 in Supporting Information S1 shows a more detailed comparison between the rate of record setting in the observations and the three models, where the overestimation of records in SPEAR and CESM in the mountain west and northwest is seen in Figures S2a and S2c in Supporting Information S1.For these two models, the observations fall within the middle and upper tercile of the ensemble spreads in the south, where the highest observed frequencies are located.The FLOR ensemble mean (Figure 4c) has lower frequencies than SPEAR and CESM in all regions and is in general more comparable to the observed pattern, with much fewer grid points where the observations are outside the ensemble spread.The ensemble spread also better encompasses the observations (Figure S2b in Supporting Information S1) with a more balanced number of grid points in each tercile of the ensemble spread.However, the maximum frequency in FLOR is mainly concentrated to the southwest with values of 7%-8%, which is lower than the observed maximum and displaced to the west.A range of values for each model is shown in Figure S3 in Supporting Information S1, where the lower 10th percentile values range from 1.67% to 9.20% and the upper 90th percentile values range from 5.82% to 14.42%.
The overestimation of the frequency of record setting in SPEAR and CESM is expected due to the higher rate of overall warming seen in these models compared to FLOR and the observations.This conclusion is also consistent with Fischer et al. (2021), who found the warming rate is the main factor in determining the probability of record-shattering events when the events are calculated in the context of previous record events rather than a reference climatology as we are doing here.Figure 5 shows the observed and simulated t max trends in JJA from 1991 to 2020 in the observations, and SPEAR, FLOR, and CESM ensemble means.The observations (Figure 5a) show a warming trend in most areas, except for a slight cooling trend in parts of the Southeast and Midwest, especially in Louisiana and Arkansas.SPEAR (Figure 5b) shows the highest warming trends, with the largest trends in the Northwest.FLOR (Figure 5c) shows significantly less warming than SPEAR, but with the maximum warming in the Southwest, and CESM (Figure 5d) shows uniform warming at a lower rate than SPEAR.No model ensemble mean captures the cooling trend seen in the observations, however some individual ensemble members do simulate cooling trends in the south and southeast.The cooling trends in some locations can also be seen in the 10th percentile of t max trends across all ensemble members in Figure S4 in Supporting Information S1, where SPEAR has slight cooling trends in the southeast (Figure S4a in Supporting Information S1), FLOR has more widespread cooling (Figure S4c in Supporting Information S1), and the CESM cooling trends are concentrated in the South (Figure S4e in Supporting Information S1).The pattern correlation between the observations each ensemble mean is 0.489 for SPEAR and 0.380 for FLOR.
In conclusion, both the models and observations show an increasing occurrence of extreme heat events over the United States.While the SPEAR and CESM models appear to overestimate the frequency of such records at most grid points, the FLOR model is in better agreement with the observations.This is consistent with the lower climate sensitivity of the FLOR model relative to the SPEAR model.

Projected Changes in Near-Future Heat Extremes
In the second part of this paper, we examine how models project that summertime heat extremes will evolve over the next 30 years, in terms of frequency, spatial pattern, and amplitude.As described in Section 3, we define the projected heat extremes as the daily JJA t max values that exceed historical records without replacing the previous records to evaluate how extreme temperatures will change relative to the fixed historical records.
First, we examine the daily percent area of heat extremes in the three large ensembles shown in Figure 6.All three models show a projected increase in daily JJA heat extremes from 2021 to 2050 with CESM showing the largest increase of 0.42% per year and FLOR with the lowest at 0.21% per year.By the 2040s, the percent area of record-breaking heat ranges from 9.72% per day in the FLOR ensemble mean to 13.21% per day in the CESM ensemble mean.This can be compared to corresponding values for 2006-2020 of 5.33% in the observations, 7.47% in SPEAR, 7.01% in CESM, and 5.58% in FLOR.There is an increase in variability throughout the 2021-2050 period, where the maximum percent record area, which typically occurs in August, is increasing at a faster rate than the minimum record area, which typically occurs in June.This is illustrated in the ensemble means in Figure 6 especially in CESM and SPEAR after year 2045, with most seasons starting at a minimum in extremes in June and reaching a maximum in heat extremes in August.Since this is reflected in the ensemble mean of the three models, it is most likely not due to internal variability.This is also apparent in the t max monthly means in all three models, where the August temperatures are rising at a faster rate than June and July.We hypothesize that this could be the result of a soil moisture feedback, with increasingly dry soil conditions in future summers leading to an amplification of heat.This has been explored in observational and modeling studies and in a case  study of the 2011-2012 summertime heat extremes (Karl et al., 2012) and the 2003 European summer heat wave (Fischer et al., 2007).Both studies found that drought in the months leading up to the heat waves, lead to anomalously dry soils and enhanced sensible heat flux due to reduced evapotranspiration and latent cooling.This can further amplify heat, which can impact circulation through enhanced ridging.Fischer et al. (2007) found that in the case of the 2003 European heat wave, both the anomalous atmospheric circulation during the summer and the anomalously dry continental-scale soils played important roles.However, without the dry soil moisture anomaly the summer heat anomalies could have been reduced by around 40%.More work needs to be done to examine if this is the cause here.
We then calculate the probability that the daily t max values in JJA from 2041 to 2050 will exceed the daily historical records from 1991 to 2020.Results for years 2021-2030 and 2031-2040 can be found in Figures S6 and  S7 in Supporting Information S1.The results in Figure 7 show that each model's maximum probability is in the Southwest and Mountain West regions, closely resembling the results illustrated in Figure 4.This implies that the areas where the models are frequently setting records in the historical period are also frequently setting records in the 2041-2050 time period.SPEAR_SSP585 (Figure 7a) and CESM (Figure 7d) show higher probabilities than FLOR (Figure 7c), with a 24%-28% chance that any day in JJA will have maximum temperatures above the historical records in the Mountain West.In contrast, FLOR has lower probabilities, with a 16%-18% chance that daily temperatures will exceed historical records in the Southwest.SPEAR_SSP245 (Figure 7b) has lower probabilities than SPEAR SSP585 and CESM consistent with less warming but has the same general spatial pattern of SPEAR SSP585.Both CESM and SPEAR SSP585 and SSP245 also show a second maxima in the Ohio River Valley, where FLOR does not.In all three models, there are lower probabilities in the Southeast and to some extent, the Northern Plains, especially in SPEAR and FLOR.The range of outcomes from the ensembles is shown in Figure S5 in Supporting Information S1, with the probabilities ranging from about 20% in the lower 10th percentile to 38% in the upper 90th percentile in SPEAR SSP585 (Figures S5a and S5b in Supporting Information S1) and CESM (Figures S5g and S5h in Supporting Information S1) over the Mountain West.
The average amplitude of heat extremes compared to the 1991-2020 historical records is shown in Figure 8 for the 2041-2050 period.Results for years 2021-2030 and 2031-2040 can be found in Figures S6 and S7 in Supporting Information S1.All models show maximum amplitude extremes in the Northern and Central Plains, with average maximum amplitudes 2.25-3 K above historical records.These high amplitude extremes also extend east to the Ohio River Valley and west to the Northwest.Combining the results from Figures 7 and 8, this implies that in these high amplitude extreme areas, there are lower probabilities that any day in JJA in 2041-2050 will have maximum temperatures higher than historical records, however, there may be a large amplitude of extreme heat when it does occur.In areas like the Mountain West and Southwest where models are projecting a higher probability of extreme heat, there are lower amplitudes, meaning that there may be more days with extreme heat but with lower amplitudes of 1-1.5 K.Many areas of high amplitude heat extremes coincide with areas of larger variance in seasonal t max values, which are mainly confined to the higher latitudes.However, the amplitude pattern cannot be totally explained by the variance, for example, in the Mountain West where there are relatively low amplitude extremes, models show a relatively high variance.A range of values for the amplitude records is shown in Figure S8 in Supporting Information S1, where the values range from a uniform 0-0.5 K in the 10th percentile minimum across all three models to 90th percentile values of about 5.5-6.5 K above historical records.

Heat Extremes Toward the End of the Century
As shown above, the SPEAR SSP585 and SSP245 scenarios are similar in the daily percent area of heat extremes from 2021 to 2050, with a similar spatial pattern but lower probabilities of daily maximum temperatures exceeding the historical records in SSP245.Extending the analysis out to 2100, we see a clear difference in the daily percent area of heat extremes, with the SSP585 scenario increasing from 6% in 2021 to over 60% by the end of the century, where the SSP245 scenario more slowly increases from about 6% in 2021 to 20% by the end of the century (Figure 9).Two additional scenarios have been added to the analysis, the SPEAR Natural experiment which reflects a lower bound on heat extremes (blue line), and SPEAR SSP534OS which shows how mitigations impact future heat extremes (green line).The reason these experiments were not included earlier in the analysis is because we found that the SSP scenarios show similar results in the near-future period until about 2050.After 2050, we find that the results are highly dependent on the emissions scenario, and that the SSP534OS scenario gives interesting insight as to how heat extremes will evolve when there is a rapid decline in greenhouse gas emissions.The natural experiment reflects a stationary climate with atmospheric composition, land use, and natural radiative forcings held constant at year 1920 values.This simulation shows a near constant value in the ensemble mean of 3.15%.
The SSP534OS experiment uses the SSP5-3.4OSforcing, which is similar to SSP5-8.5 until 2040, with a rapid decline in greenhouse gas emissions after 2040, maximum atmospheric CO 2 concentrations in the late 2050s, and  negative emissions after 2070.The green line in Figure 9 shows the heat extremes in SSP534OS track with the SSP585 and SSP245 scenarios until around 2050, where the SSP585 scenario starts to increase more rapidly than SSP534OS and SSP245.The SSP534OS scenario then reaches a peak of 22.42% in 2054, after which the heat extremes begin to slowly decline until reaching a mean of 6.75% in 2100, which is comparable to but still higher than the 2021 seasonal mean of 5.89%.Toward the end of the century, the ensemble spread reaches a maximum of 95.93% in the SSP585 experiment, indicating that nearly all areas of the CONUS would reach temperatures above the daily historical records for that given day.The intraseasonal variability of extremes, or the difference between the maximum record setting area in August and minimum in June, is also dependent on the future warming scenario.There is a large increase in variability in the SSP585 experiment and a decrease in variability in the SSP534OS experiment toward the end of the century with less warming.

Conclusions
This study analyzed the ability of large ensembles to simulate observed characteristics of extreme summertime heat over the CONUS and the projection of extreme heat in the next 30 years.We find that the large ensembles capture the observed variability and trend of daily JJA t max records, but overestimate the frequency that records are set at most grid points, especially in the Mountain West and Northeast in the SPEAR and CESM large ensembles.This is consistent with previous work that found some models overestimate warming in heat indices compared to observations (Abatzoglou & Barbero, 2014;Hu et al., 2020;Meehl et al., 2016).In particular, Meehl et al. (2016) find that models show a greater number of record highs compared with the observations due to decreased evapotranspiration which leads to higher daytime maximum temperatures.The relationship between decreased evapotranspiration and extreme heat was also explored as a feedback mechanism during the record-breaking 2011-2012 summertime heat (Karl et al., 2012) and the 2003 European summer heat wave (Fischer et al., 2007).Hu et al. (2020) found an overestimation in warming in CMIP6 models and suggested this may be related to the models' high climate sensitivity, and there is recent research that suggests climate models have a warm and dry bias over areas of cropland (Coffel et al., 2022).Despite this overestimation of the frequency of records, we find that in general, the large ensembles capture the observed pattern of large amplitude records.
To analyze future projections of heat extremes over the next 30 years, we keep the historical records from 1991 to 2020 fixed and compare future heat extremes relative to these historical values.We find that the spatial extent of heat extremes increases each year in the three large ensembles with fixed daily JJA historical records from 1991 to 2020 as the threshold for record heat.Additionally, we find that more records are being set in August than in June and July after year 2045, indicating that temperatures are rising at a faster rate in August compared to the daily historical records.The regions with the highest probability that simulated daily t max values in JJA will be above the historical records are in the Desert Southwest, Mountain West, and Ohio River Basin.These areas are similar to the high frequency record areas from 2006 to 2020, indicating that models project that areas where records have been frequently set in the past 15 years will be similar to the high frequency areas over the next 30 years.We note that since there is an overestimation of the frequency of historical records compared to observations, there is likely an overestimation in the future period as well.
The region of high amplitude extremes remains concentrated to the Pacific Northwest, Central U.S., and the Ohio River Valley, with averages in these areas reaching up to 2.25-3 K above historical records.The frequency of extremes past year 2050 is highly dependent on how much future warming occurs based on the emissions scenario.We find that heat extremes increase the most in the SSP585 scenario, increasing from a 6% record area to over 60% in the ensemble mean by the end of the century, with maximum values up to 95% in individual ensemble members.The high emissions, followed by high mitigation scenario, SSP524OS shows a positive response to a decline in emissions and warming, with a decrease in heat extremes in the mid-2050s, ending with comparable values to the 2021 heat extremes.
A crucial challenge in projecting future heat extremes will be to better understand the reasons why the SPEAR and CESM models used here simulate excessive heat extremes over the last 30 years.Such understanding will help to improve models and reduce model biases, thereby helping to provide more robust projections of regional changes in heat extremes over the coming decades.Future analysis of the physical mechanisms of the historical and future projections of heat extremes will also increase our understanding of the mechanisms that may drive future heat extremes.

Figure 1 .
Figure 1.The percent land area where daily records are broken over the contiguous United States in JJA for observations (black), SPEAR (blue), FLOR (orange), and CESM (green).The shading represents the multi-model ensemble spread and bold lines represent the ensemble means for each model.

Figure 2 .
Figure 2. Amplitudes of new records set over the contiguous United States in JJA from 2006 to 2020 and the frequency that they occur.Record amplitudes are separated into 1 K bins for observations (black), and ensemble means for SPEAR (blue), CESM (green), and FLOR (orange).Gray error bars indicate the ensemble spread of the frequency of records set for each bin.

Figure 3 .
Figure 3. Grid points where the amplitude of new records exceeds 6 K in (a) observations, (b) SPEAR, (c) CESM, and (d) FLOR in JJA from 2006 to 2020.Shading denotes the number of times a record with an amplitude above 6 K is set, with the ensemble mean values for panels (b-d).

Figure 4 .
Figure 4.The percent of days that a new daily record is set at each grid point in JJA from 2006 to 2020 in (a) observations, (b) SPEAR ensemble mean, (c) FLOR ensemble mean, and (d) CESM ensemble mean.Stippling in panels (b-d) indicates grid points where the ensemble spread does not encompass the observational values.

Figure 5 .
Figure 5. Trends of t max in JJA from 1991 to 2020 in Kelvin per year for (a) observations, (b) SPEAR ensemble mean, (c) FLOR ensemble mean, and (d) CESM ensemble mean.Results are shown on the native atmospheric grid for each data set.

Figure 6 .
Figure 6.The percent area that daily t max values exceed the daily records relative to 1991-2020 over the contiguous United States in JJA for SPEAR SSP585 (red), SPEAR SSP245 (blue), FLOR (orange), and CESM (green) from 2021 to 2050.The shading represents the ensemble spread for each model and the bold lines are ensemble means.

Figure 7 .
Figure 7.The ensemble mean probability of daily temperature in JJA from 2041 to 2050 exceeding the daily historical records from 1991 to 2020 in (a) SPEAR SSP585, (b) SPEAR SSP245, (c) FLOR, and (d) CESM.

Figure 8 .
Figure 8.The average amplitude of heat extremes in JJA from 2041 to 2050 relative to 1991-2020 daily records in ensemble means of (a) SPEAR SSP585, (b) SPEAR SSP245, (c) FLOR, and (d) CESM.

Figure 9 .
Figure 9.The percent area that daily JJA t max values exceed the daily records relative to 1991-2020 over the contiguous United States in JJA for SPEAR experiments SSP585 (red), SSP245 (orange), SSP534OS (green), and Natural (blue) from 2021 to 2100.The shading represents the ensemble spread and the bold lines are ensemble means.