Km‐Scale Simulations of Mesoscale Convective Systems Over South America—A Feature Tracker Intercomparison

Mesoscale convective systems (MCSs) are clusters of thunderstorms that are important in Earth's water and energy cycle. Additionally, they are responsible for extreme events such as large hail, strong winds, and extreme precipitation. Automated object‐based analyses that track MCSs have become popular since they allow us to identify and follow MCSs over their entire life cycle in a Lagrangian framework. This rise in popularity was accompanied by an increasing number of MCS tracking algorithms, however, little is known about how sensitive analyses are concerning the MCS tracker formulation. Here, we assess differences between six MCS tracking algorithms on South American MCS characteristics and evaluate MCSs in kilometer‐scale simulations with observational‐based MCSs over 3 years. All trackers are run with a common set of MCS classification criteria to isolate tracker formulation differences. The tracker formulation substantially impacts MCS characteristics such as frequency, size, duration, and contribution to total precipitation. The evaluation of simulated MCS characteristics is less sensitive to the tracker formulation and all trackers agree that the model can capture MCS characteristics well across different South American climate zones. Dominant sources of uncertainty are the segmentation of cloud systems in space and time and the treatment of how MCSs are linked in time. Our results highlight that comparing MCS analyses that use different tracking algorithms is challenging. We provide general guidelines on how MCS characteristics compare between trackers to facilitate a more robust assessment of MCS statistics in future studies.


Introduction
Deep convective systems (DCSs) have lifetimes that span less than 1 hr to several days and spatial scales that span 10-1,000 km.They are an integral component of the global atmospheric circulation and water cycle (Cotton & Anthes, 1992).Mesoscale convective systems (MCSs) form from clusters of deep convective storms with horizontal scales over 100 km (Houze, 2014), and the "organization" of deep convective clusters is often categorized into types such as squall lines, bow echoes, line echo wave patterns, and mesoscale convective complexes (MCCs) (Markowski & Richardson, 2011) based on cloud and precipitation spatial patterns.MCSs play a crucial role in regulating rainfall patterns and moisture distribution throughout the tropics and midlatitude regions downstream of mountain ranges, contributing up to 90% of the annual precipitation in these regions (Feng et al., 2021;Nesbitt et al., 2006;Nesbitt & Zipser, 2003;Schumacher & Rasmussen, 2020), though the value of this contribution varies substantially based on how MCSs are defined.MCSs also produce a majority of extreme precipitation events in many regions of the world (Prein, Mooney, & Done, 2023;Rasmussen et al., 2016;Roca & Fiolleau, 2020;Stevenson & Schumacher, 2014).
In this study, we focus on MCSs in South America since it is home to various climate zones that promote MCS development with a wide range of characteristics.Focusing on this region additionally allows us to leverage existing kilometer-scale climate simulations performed within the South America Affinity Group (SAAG) (Dominguez et al., 2024).Mesoscale and synoptic processes both have a role in the formation of MCSs but differ regionally, which leads to differences in MCS properties depending on geographical location.Tropical MCSs across the Amazon are organized synoptically by the seasonally migrating intertropical convergence zone (Rehbein et al., 2018) and equatorial waves (Anselmo et al., 2021;Serra et al., 2020) with key mesoscale circulation controls from the sea breeze (Cohen et al., 1995), low-level jet (Alcântara et al., 2011;Anselmo et al., 2020) and complex terrain including mountains, rivers, and vegetation (Rehbein et al., 2018;Silva Dias et al., 2002).Many MCSs in subtropical South America are related to the presence of the South American lowlevel jet (SALLJ) (Salio et al., 2007), which forms after the deflection of the northeasterly trade winds crossing the Amazon as they encounter the Andes mountains (Zhou & Lau, 1998).The SALLJ advects low-level moisture from the Amazon basin to subtropical South America (Jones, 2019;Marengo et al., 2004;Vera et al., 2006;Zhou & Lau, 1998) and supports the growth of MCSs through overnight hours much like the Great Plains low-level jet over the U.S (Velasco & Fritsch, 1987).During the monsoon period in South America south of the equator (October to April), upper-level large-scale circulations such as the Bolivian High, resultant from the latent heat release from the Amazon deep convective activity (Dias et al., 1983), can affect the SALLJ moisture flux and strength of the South Atlantic Convergence Zone (Carvalho et al., 2004), which increases convection and precipitation in subtropical regions.
The Andes intersect the westerly upper-level flow, leading to the formation of surface low-pressure regions in the lee of the Andes such as the Northwestern Argentinean Low and Chaco Low (Seluchi et al., 2003).These lows redirect the SALLJ south and even southwestward (Salio et al., 2002).As troughs pass over the Andes, northward propagating cold fronts in the lee are also produced, where they interact with the SALLJ and mountainous terrain to initiate deep convection (Marquis et al., 2021;Rasmussen & Houze, 2016) that grows into MCSs (Feng et al., 2022;Mulholland et al., 2018;Zhang et al., 2021).In mid-latitudes, subsidence in the lee of the Andes creates steep free tropospheric lapse rates with temperature inversions capping the low-level moisture, which helps build up high convective available potential energy and convective inhibition (Rasmussen & Houze, 2016;Ribeiro & Bosart, 2018).The combination of these thermodynamic conditions and multi-scale circulations as modulated by the complex terrain of South America produces some of the deepest and most intense storms (Nesbitt et al., 2021;Varble et al., 2021;Zipser et al., 2006) with the most prolific lightning (Cecil et al., 2015) and hail (Cecil & Blankenship, 2012;Kumjian et al., 2020) in the world.Tornadoes, on the other side, are more common over North America due to differences in low-level wind shear (Schumacher et al., 2021).These conditions also support MCCs (Maddox, 1980), which are the largest form of MCSs, that are larger and longer lived with greater rainfall volume over South America as compared to North America (Durkee & Mote, 2010;Velasco & Fritsch, 1987).How much larger and longer-lived South American MCCs are remains debated due to sensitivities to how MCCs are defined.Thus, a combination of many different multiscale circulations coupled with thermodynamic conditions affects the life cycle of MCSs.
The contribution of MCSs to total precipitation is potentially increasing globally as temperatures rise (Tan et al., 2015), though large uncertainties exists since global models struggle to simulate MCSs and their changes.In the U.S., the frequency and intensity of MCS precipitation have increased over the past three decades during the warm season and are projected to accelerate further under future warming (Feng et al., 2016;Hu et al., 2020;Prein et al., 2017).Over the Amazon basin, in contrast, MCSs have decreased from October to March and increased from June to August, with increased precipitation in both seasons (Rehbein & Ambrizzi, 2023b).Thus, regional sensitivities of MCSs to a changing climate vary, but such assessments remain uncertain.This is at least partly the case because there is no agreed-upon definition of an MCS but also because their representation in weather and climate models is imperfect (Prein et al., 2021;Zhang et al., 2021).
Detection and tracking of convective systems including MCSs is vital to better understanding the mechanisms that control their properties and their role in energy and water transport such that predictive models can be improved for informing critical societal decisions regarding water resources and other environmental issues.The increasing availability of sub-hourly, kilometer-scale satellite cloud and precipitation retrievals and the development of similar-scale models has improved the ability to monitor and track MCS life cycles.Automatic tracking algorithms are an indispensable tool for such large data sets because they enable us to understand the full lifecycle of MCSs and allow for a process-oriented model evaluation by focusing on dynamic features rather than atmospheric mean states.
Over the past 40 years, tracking algorithms have been developed to automatically and objectively detect and track convective systems from infrared (IR) geostationary satellites and more recently from high-resolution model data.The most common algorithms are based on a convective cluster detection step from the IR imagery, and on a tracking step linking cloud clusters identified from one time step to the next.An MCS is then defined as the succession of convective clusters in a time sequence of IR images.The detection step is generally based on the application of a single brightness temperature threshold on the IR images, to identify anvil clouds associated with convective clusters (Machado et al., 1998;Vila et al., 2008;Williams & Houze, 1987).Over the years, a number of evolutions have been implemented to better describe features of the convective clusters.Thus, by applying different brightness temperature thresholds at several levels between 213 and 253 K, we can access a volumetric analysis of convective systems, with their convective cores detected with a cold temperature threshold embedded in cloud anvils detected with warmer brightness temperature threshold (Mathon & Laurent, 2001;Núñez Ocasio et al., 2020a).To go further and based on the principle that brightness temperature increases from the convective core to the edges of the anvil, detect and spread techniques have been implemented and applied to the infrared imagery to decompose the high cold cloud shield into cloud clusters (Boer & Ramanathan, 1997;Feng et al., 2023;Heikenfeld et al., 2019;Roca & Ramanathan, 2000;Wilcox et al., 2023).The tracking step, for its part, is often based on an area-overlapping technique to link one cluster detected at one time step to another one at the next time step (Feng et al., 2023;Machado et al., 1998;Williams & Houze, 1987).Some studies have added cloud movement projection techniques to increase the area-overlapping accuracy (Núñez Ocasio et al., 2020a).Other methods use a search radius method and predict the position of the cluster's center of mass to match cloud clusters between two time steps (Heikenfeld et al., 2019;Sokolowsky et al., 2023).Another branch of algorithms considers that convective systems can be tracked only if they are contiguous in their space-time domain.These algorithms then work in a volume of IR images in three dimensions (longitude, latitude, time), and identify and track MCSs in a single step, by applying single brightness temperature thresholds (Prein, Mooney, & Done, 2023), or by applying more complex techniques derived from the detect and spread method (Fiolleau & Roca, 2013).The six MCS trackers participating in this study represent the wide variety of methodologies introduced above.
Merging and splitting of convective cells is common (Bluestein et al., 1990) making it an important process in MCS dynamics.MCSs grow upscale from multiple individual cells and can decay into multiple individual cloud systems as they weaken (Ćurić et al., 2009;Rotunno et al., 1988).At a larger scale, the high cold cloud shields can be shared between several MCSs, whose anvil clouds can split and merge with each other over time, which is a challenging process to depict in tracking schemes.
There are several approaches to verify if tracking algorithms correctly detect the convective systems of interest.For example, Machado et al. (1998) validated MCS tracking results through subjective examination of forecasters determining whether or not the detected outcome looks reasonable.This is, however, very time-consuming and requires a lot of expertise.Tracking results can also be evaluated based on our physical understanding of MCSs, that is, by validating if the MCS lifecycle exhibits a realistic evolution of precipitation formation or by setting upper bounds for the expected spatial and temporal extent given physical constraints such as the Rossby radius of deformation (Cotton et al., 1989).In addition to lifecycle characteristics, this paper also assesses tracker formulation uncertainty by comparing the results of different trackers given a common set of MCS criteria.
The goals of this paper are to understand the impact of feature identification and tracker formulation on the analysis of simulated MCSs in South America and to evaluate and compare model and observed MCSs in terms of size, duration, and intensity.The paper is organized as follows: Section 2 describes the satellite and observational data, the MCS tracker methodological developments and evaluation metrics; feature tracker comparison results and discussions are in Section 3; Section 4 provides the summary of the key findings and conclusions.

Data and Methods
We focus on data from 3 years that were selected based on different phases of El Ni no-Southern Oscillation (ENSO): (a) June 2010 to May 2011, which was a strong La Ni na event; (b) June 2015 to May 2016, which was a very strong El Ni no event, and; (c) June 2018 to May 2019, which was a weak El Ni no event.Active ENSO phases generate Rossby waves that affect South American deep convection intensity, phase, and seasonality (Rehbein & Ambrizzi, 2023a).We decided to focus our analyses on multi-year average statistics and leave assessments of ENSO impacts on MCS statistics for future studies to limit the amount of presented information.

Data
The GPM Integrated Multi-satellitE Retrievals (IMERG) precipitation data is a combined multi-satellite precipitation retrieval data set from a network of low-orbit passive microwave sensors (Huffman et al., 2015).A quasi-Lagrangian interpolation technique is applied to the passive microwave precipitation retrievals to fill in the gaps between microwave overpasses using motion vectors derived from numerical model-derived precipitable water (Tan et al., 2019).The precipitation retrievals used in GPM-IMERG differ depending on the surface types due to changes in emissivity and differences in land/ocean precipitation characteristics (Huffman et al., 2015).Derin et al. (2021) found that GPM-IMERG has better skill in detecting precipitation and representing its intensity over oceans, while it has higher false alarm rates over land.
The IMERG data used in the study is the Final Precipitation L3 Half Hourly 0.1°× 0.1°V06B data, which is corrected with monthly surface rain gauge measurements (Huffman et al., 2019).We average the half-hourly GPM IMERG precipitation data to hourly to match the model simulation output frequency.Despite the relatively fine spatiotemporal spacing of IMERG, its actual resolution is significantly coarser than its grid spacing (Guilloteau & Foufoula-Georgiou, 2020).It is worth noting that gridded precipitation data sets may not fully capture the most intense precipitation events measured by rain gauges (Rozante et al., 2018).However, Feng et al. (2021) demonstrate that when tracking MCSs across the United States, using IMERG precipitation yields comparable results to using radar-based precipitation estimates from hourly stage-IV data (Lin & Mitchell, 2005).
The NASA Global Merged IR V1 infrared brightness temperature (T b ) data (Janowiak et al., 2017) is a merged data set combining all available operational geostationary meteorological satellite data.Viewing angle and parallax corrections have been applied to the data set.The Merged IR T b product covers 60°S to 60°N and has a spatial resolution of 0.04°and temporal resolution of 30 min.The Merged IR product was regridded conservatively to match the IMERG 0.1°grid using conservative regridding in the Earth System Modeling Framework (ESMF) software (Collins et al., 2005).One of the 30-min T b snapshots is used to represent convective clouds in an hour for tracking.Hourly data has been frequently used for MCS tracking (Feng et al., 2021;Kukulies et al., 2021;Núñez Ocasio et al., 2020a;Prein et al., 2020) and we follow this protocol mainly because of the availability of hourly modeled data.Future work will investigate the impact of using higher-frequency data on the presented conclusions.
We use the Weather Research and Forecasting (WRF) model version 4.1.5(Powers et al., 2017;Skamarock & Klemp, 2008) to downscale hourly data from the fifth generation of the European Centre for Medium-Range Weather Forecasts (ECMWF) Reanalysis (ERA5) (Hersbach et al., 2020) over the region shown in Figure 1.The simulation uses ∼4 km horizontal grid spacing with 1,471 × 2,028 grid cells in the horizontal and 61 stretched vertical levels.Each of the three simulations that focus on different ENSO states is initiated in May to allow for model spin-up.We use the Thompson microphysics scheme (Thompson et al., 2008), the Yonsei University Scheme (YSU) planetary boundary layer scheme (Hong et al., 2006), the RRTMG Shortwave and Longwave Schemes (Iacono et al., 2008), and the Noah-MP land surface model (Niu et al., 2011;Yang et al., 2011) including the Miguez-Macho and Fan groundwater scheme (Miguez-Macho & Fan, 2012).We use an empirical equation from Wu and Yan (2011) that estimates T b from the modeled outgoing longwave radiation (OLR) at the top of the atmosphere.The estimated T b as well as the hourly precipitation rates are conservatively regridded to the GPM IMERG grid by using the ESMF software (Collins et al., 2005).We analyze MCS characteristics in regions that are used in the IPCC sixth assessment report (Iturbide et al., 2020).Some of the selected regions are cropped to fit within the analysis domain (black polygon in Figure 1).The resulting analysis regions are displaced from the computational boundaries to allow for the spin-up of fine-scale structures from the coarser resolution ERA5 boundary conditions except for the Equatorial Atlantic Ocean (EAO) and Northwest South America (NWS) region.

Tracking Thresholds
Several definitions using various cloud system parameters and thresholds have been used in the literature to provide an objective classification of MCS, preventing actual quantitative comparisons (e.g., recent reviews by Kukulies et al. (2023)).Houze (2004) broadly defined MCSs as "…a cumulonimbus cloud system that produces a contiguous precipitation area ∼100 km or more in at least one direction."We translate this definition into objective rules, which can be used in tracking algorithms, and can be applied to satellite-based observations.The following four criteria are used in all tracking algorithms that participate in this study: 1.The continuous T b ≤ 241 K area must be at least 40,000 km 2 for at least four continuous hours.2. The maximum hourly precipitation underneath the ≤241 K T b area must be larger than 10 mm hr 1 for at least 4 continuous hours.3. The hourly precipitation volume must exceed 20,000 km 2 mm h 1 (e.g., 100 km × 100 km × 2 mm hr 1 ) at least once in the lifetime of the MCS 4. The minimum T b must be <225 K during the MCS lifetime to account for overshooting tops.(Iturbide et al., 2020) and include Northwest South America (NWS), Northern South America (NSA), Equatorial Atlantic Ocean (EAO), South American Monsoon region (SAM), Northeast South America (NES), Southeast South America (SES), and the South Atlantic Ocean (SAO).
The minimum anvil cloud area (criterion 1) and the overshoot thresholds (criterion 4) are included to select clusters of cumulonimbus clouds that live longer than single-cell deep convective storms.The maximum hourly rainfall threshold (criterion 2) helps to select storms that produce heavy convective rainfall while the precipitation volume threshold (criterion 3) excludes storms whose precipitation footprint is too small or too weak.We have used similar criteria in previous work (e.g., Feng et al., 2021;Prein, Mooney, & Done, 2023) and found good agreement between objectively selected MCSs and MCSs selected by trained meteorologists.Classical definitions of MCSs, such as by Houze (2004Houze ( , 2018) ) focus on the extent and duration of convective precipitation.Using precipitation as the primary classifier for MCSs is only feasible in regions with high-quality and high-resolution precipitation data sets that are typically based on radar observations.Such data sets are not available over South America since state-of-the-art satellite-based precipitation data sets such as GPM-IMERG have well-known biases (see Section 2.1) that prohibit their use for MCS classifications.We, therefore, translate Houze?s definition into rules that are objective and can be accounted for by using satellite-based observations.We decided to use T b observations from geostationary satellites and track MCSs according to their cloud shield characteristics, which is common practice in MCS tracking studies (Feng et al., 2021;Hartman, 2021;Núñez Ocasio et al., 2020).All participating tracking schemes identify MCS candidates according to the Tb field and use GPM-IMERG precipitation in a second step to assess if the Tb objects qualify as MCSs.

Python FLEXible object TRacKeR (PyFLEXTRKR)
PyFLEXTRKR (Feng et al., 2023) is an open-source Python package for tracking any 2D atmospheric features, with specific capabilities to track convective clouds from observations and model simulations.PyFLEXTRKR has a collection of multi-object identification algorithms, handles merging and splitting explicitly, and has been optimized for large data sets such as global kilometer-scale data.The package has a modular design that is easy to update and provides a suite of visualization, post-processing, and statistical analysis tools to facilitate scientific analysis of the tracking outputs.
The MCS tracking capability in PyFLEXTRKR jointly uses cloud top IR T b and surface precipitation to identify and track convective systems and subsequently identifies MCSs.In this study, the detect-and-spread approach was used to identify individual deep convective systems: (a) A 10-grid (˜100 km) box filter smoothing was applied to the T b field, and contiguous areas with T b < 225 K larger than four grids (˜400 km 2 ) were labeled as cold cores (individual convective cloud object).(b) Each cold core was then spread outward to surrounding grid points until T b reached 241 K.The grids with the closest distance to a nearby cold core were assigned the same label.Objects with area >800 km 2 were retained as candidate cloud systems.(c) Contiguous areas with smoothed precipitation (5-grid box filter) >3 mm hr 1 larger than 6 grids were defined as a precipitation feature (PF).Candidate cloud systems that share the same PF were combined to retain coherent PFs within a single convective system for tracking.
PyFLEXTRKR then tracks these convective systems based on their area overlap.Objects from two adjacent hours that have an overlap area fraction exceeding 0.5 were considered the same object.If more than one object exceeds the overlap fraction, the largest one was considered continuous and the smaller ones were labeled as merging or splitting.All convective systems exceeding 2-hr duration were tracked and saved.If a tracked system meets the MCS criteria (Section 2.2.1), the entire track is labeled as MCS, including convection initiation and upscale growth period and the decay period when the cloud shield area is below the minimum MCS area threshold.In addition, non-MCS cloud objects that merge with or split from an MCS are included as part of that MCS.The unique track numbers for each MCS were written to the pixel grid as masks, including the small merge/split cloud objects.Tracking was run continuously for each water year (from June to May) to obtain MCS tracks.

Tracking and Object-based Analysis of Clouds (tobac)
tobac (Heikenfeld et al., 2019;Sokolowsky et al., 2023) is a community-developed Python package for detecting, tracking, and analyzing clouds and other atmospheric phenomena.Due to its modular and flexible design, it can be used with user-defined tracking criteria on any atmospheric field (e.g., brightness temperatures or radar reflectivity) and on any gridded data set with two or three dimensions.In this study, we use tobac version 1.4.2 to track MCSs based on the above-defined criteria.
The three main modules of tobac are feature detection, segmentation, and linking.In the feature detection, tobac identifies objects above or below a user-defined threshold over a minimum area.In this study, we used brightness temperature fields for the feature detection and required that a cloud object needs to be <241 K over at least 40,000 km 2 .In addition, we require that at least one feature during the MCS lifetime contains a cold core of <225 K with no minimum area.For each of the identified feature, a center point is defined (in this study: the center of mass).In the segmentation procedure, these center points are used to identify all contiguous pixels around them below/above a specified threshold (here: 241 K).This done using watershedding, an image processing method that treats the input data as topographic maps and extends the area around a feature center point the same way water would flow until it meets a topographic barrier (the threshold).The segmented cloud features were colocated with the precipitation data to apply the additional precipitation-based criteria for MCS identification.It should be noted that the segmentation technique in tobac can result in time steps with detected cloud features that do not have an associated segmented area with their center location (see tobac documentation for details), which in turn influences the MCS lifetime when the latter is calculate based on the segmentation output.
The detected cloud features are linked over time using a search radius and their predicted propagation speed.In contrast to area-overlapping methods for the linking of features, tobac is based on particle tracking principles where the center points of features are assigned to a common track when they fall within the predicted radius of motion.The search radius for potential features can be adjusted by the user (we used a maximum propagation speed of 100 m s 1 ) and if multiple features fall within the search radius, the feature with the path that is closest to the preceding motion direction is selected.While tobac has a postprocessing tool that identifies merges and splits based on the output from the feature detection, it has no explicit treatment of merging and splitting during the linking procedure.

Forecasting and Tracking the evolution of Cloud Clusters (ForTraCC)
ForTraCC's development started in the 1990s (Machado et al., 1998) making it one of the longest-standing cloud object tracking algorithms developed and still actively used (Vila et al., 2008).Currently, it is being used operationally for nowcasting at the Brazilian Center for Forecast and Climate Studies of the National Institute of Spatial Research (CPTEC/INPE; http://pindara.cptec.inpe.br/fortracc/).ForTraCC can work with radar reflectivity, precipitation, or OLR.However, ForTraCC is not able to meet all of the here defined meet MCS criteria within the model code requiring the development of a post-processing tool (see https://github.com/salvatirehbein/percolator).
In the current study, ForTraCC was set to identify and track all the objects with one or more contiguous pixels with brightness temperature equal to or above 241 K.The algorithm takes into account the potential occurrence of missing data or input failures (Vila et al., 2008).ForTraCC relies on the overlap (in our Case 5%) between consecutive images.The initiation can be (a) spontaneous; (b) merge, or (c) splits.In cases of merging, the larger system or the first one identified if they have the same size, will be tracked.If an MCS splits, the larger resultant system will continue to be tracked, while the smaller systems will become new individual systems.
ForTraCC tracks and stores all cold clouds, which increases its runtime (see Table 1).A post-processing program is used to filter out storms that do not meet our MCS criteria.First, we ensure that each cloud cluster defined by ForTraCC has a minimum area of 40,000 km 2 for at least four continuous hours, along with at least one pixel during the system's lifecycle with a minimum brightness temperature of 225 K. Next, the mask files generated by ForTraCC are overlapped with the corresponding precipitation field.This process verifies if the area and volume under the mask (i.e., cloud shield) meet the desired criteria.

TAMS
The Tracking Algorithm for Mesoscale Convective Systems (TAMS) is an open-source MCS tracking and classifying algorithm and Python package.One novelty of TAMS is its grid independence.Grid-independent tracking allows for the identification and tracking of both observed (satellite data) and simulated (model data) systems regardless of the type of grid and data resolution.The package includes a set of visualization and postprocessing tools including functionality that allows matching a desired variable or atmospheric field to each MCS and calculating corresponding statistics.TAMS was initially developed to track and analyze tropical MCSs over Africa associated with African easterly waves (Núñez Ocasio et al., 2020aOcasio et al., , 2020b) ) and a description of this initial version can be found in Núñez Ocasio et al. (2020a) The identification step consists of identifying regions of cloud top IR T b within the 241 K region with overshoots (less than 225 K) larger than 4,000 km 2 .(In the default version of TAMS 235 and 219 K are used as thresholds.)TAMS uses this criterion in addition to the four MCS criteria described in Section 2.2.1.These regions that are potential candidates to be MCSs are called Cloud Elements (CEs).Although these identification criteria may cause a late detection of initiation, they assure the system is an MCS and not a convective cell as well as assuring the targeting of raining clouds.The tracking is done on stored convex hull polygon shapes based on the CEs shapes using the overlapping method.For this study, the overlap threshold was 50% and the optional cloud projection or background flow was turned off.In the current simplified linking scheme, each CE at the current time step is matched with the maximum overlap "parent" CE from the previous time step, if the overlap condition is satisfied.This creates a list of "parents" and "kids" that then become one single family/MCS.Based on default criteria considering shape, size, and duration, each MCS can be classified into one of four possible categories: MCCs, Convective Cloud Clusters (CCCs), Disorganized Long-Lived, and Disorganized Short-Lived.However, TAMS was configured in this study to follow the set of criteria defined here.Tracks were filtered to remove MCSs that did not meet the criteria for this study.Parquet files were converted to gridded mask NetCDF files.

MOAAP
The Multi-Object Analysis of Atmospheric Phenomenon (MOAAP) algorithm (Prein, Mooney, & Done, 2023) is a Python-based MCS tracker previously used in Prein et al. (2021) and Poujol et al. (2020).The algorithm is similar to the Method for Object-Based Diagnostic Evaluation (MODE) Time Domain (MTD) (Clark et al., 2014;Davis et al., 2009;Prein et al., 2020).MOAAP is based on the connectedness of objects, meaning that objects must be adjacent in space and time (no minimum overlap criterion is used).It is designed to track multiple atmospheric features, such as cyclones, jet streaks, and atmospheric rivers but can also track single features such as MCSs.MOAAP operates through the following five steps to track MCSs.
1.The three-dimensional T b field (time, latitude, longitude) is thresholded.This process produces a binary field where cells below the threshold are set to one (objects of interest), while all other cells are set to zero. 2. The binary field is passed to the Python label function of the multidimensional image processing tool (ndimage) from the SciPy package (Virtanen et al., 2020).This function identifies objects connected in space and time (grid cells that are co-located horizontally or diagonally in a 3-dimensional-latitude × longitude × time-matrix) and assigns them a unique label/index, resulting in a feature matrix.3. MOAAP uses a merging and splitting function on the feature matrix.This function merges or breaks up objects connected in time but not in space.For instance, if two objects merge, the smaller object ends at the previous timestep and is assimilated into the larger object.Conversely, when an object splits into two, the larger object continues while the smaller one is treated as a new feature.The merging and splitting function incorporates a temporal threshold (we use 4 hr here) to ensure that only longer-lived merged and split objects are relabeled.4. From the entire population of identified objects, we select a subset that satisfies specific criteria tailored to the atmospheric phenomena under consideration.This is the step where we account for the four MCS criteria defined in Section 2.2.1.5. Once all objects qualifying as a specific phenomenon are identified, their characteristics are calculated.

TOOCAN
The TOOCAN algorithm (Tracking Of Organized Convection Algorithm through a 3-D segmentation) (Fiolleau & Roca, 2013) relies on a conceptual model of a convective system consisting of a 3D (longitude, latitude, time) cloud cluster made up of a convective core associated to its stratiform anvil evolving in the space-time domain.To identify such spatio-temporal cloud clusters, the algorithm works within a volume of IR images and applies a 3-D region growing technique to decompose the cold cloud shield, initially delineated by a 235 K threshold in the spatio-temporal domain into component MCSs.This technique consists of an iterative process of detection and dilatation of convective seeds in the spatiotemporal domain.
Convective seeds are first detected with a 190 K threshold.Note that this is a TOOCAN-specific setting that is stricter than the common criteria for overshoots of <225 K.Then, an intermediate cold cloud shield mask is identified in 3D at a 5 K warmer threshold.Only convective seeds with a minimum lifetime of three images and exceeding 625 km 2 per image are kept.The selected seeds are then spread in the spatio-temporal domain until they reach the edges of the intermediate cold cloud shield.This step consists of adding edge pixels belonging to the intermediate cold cloud shield to all already detected seeds.This iterative process of detection and dilation is repeated every 5 K from 190 to 235 K and is stopped when all the pixels below 235 K are associated with an MCS.Note that this means that TOOCAN continues to track clouds after they lose their overshoot.To fit with the MCS criteria defined in this study, the cold cloud shield boundaries have been set at 241 K.The multi-stage, multithreshold technique allows an MCS identification independent of a single detection threshold.Also, the way the TOOCAN algorithm operates in 3D without the traditional detection and tracking steps allows the continuity of the tracking of the stratiform anvil associated with the MCS after its convective activity is ended.Isolated convective cells in the MCS initiation stages and scattered cirriform clouds in the MCS dissipation stages, disconnected on a single IR image, may be part of the same MCS allowing a coherent life cycle.With such methodology, the unphysical split and merge issues are resolved and all the MCS can be analyzed without filtering on merging and splitting.Finally, the method identifies the full spectrum of the convective organization, from small and short-lived systems to systems more organized lasting several days.For this study, we will focus on convective systems that meet the criteria defined previously.
It is important to highlight that although all trackers use the same four criteria to identify MCSs, each tracker has many additional assumptions incorporated into its source code that affect the classification.It is infeasible to homogenize all of these assumptions across the trackers and testing the effects of these assumptions on MCS classification is the prime motivation of this study.Table 1 summarizes high-level tracker characteristics and shows a comparison of memory needs and the speed of each tracker when identifying MCSs in a week-long period.All trackers, except for PyFLEXTRKR, used serial processing.

Evaluation Metrics
We ensure consistency in the evaluation of MCS characteristics by running the same analysis code on MCS mask files from each tracker (i.e., matrices with dimensions time, latitude, longitude that labels individual MCSs with a unique integer).All of the statistics are based on sampling over hourly MCS data except for the MCS duration, which integrates over the MCS lifetime.We introduce the evaluation metrics in the relevant locations in the results section to simplify the interpretation of results.

Tracking of Idealized MCS Cases
We start with comparing the MCS trackers by applying them to four highly idealized test cases to more easily identify commonalities and differences among them (Figure 2).The first case (Figures 2a-2j) features three individual eastward-moving cloud objects with overshooting tops (T b ≤ 221 K).These clouds are growing and merging 8 hr after initialization (t = 8 hr).At this time, the northern and southern cells are losing their overshoots.The cells continue to move eastward until t = 19 hr when the northern cell splits off from the two southern cells.MOAAP (Figure 2e), ForTraCC (Figure 2g), and TAMS (Figure 2h) identify a single MCS that starts at t = 8 hr (t = 0 hr in ForTraCC).These trackers exclude the northern cell from the MCS system once it splits off at t = 19 hr, which results in a sudden southward shift in the MCS track.ForTraCC has a discontinuity in its track also at t = 8 hr since it uses the northernmost cell as the initiation point after the systems have merged if there are multiple cells that merge at the same time.PyFLEXTRKR (Figure 2i) also identifies a single MCS but has an initiation at t = 0 hr and keeps the northern cell attached to the system after it splits off.Two MCSs are identified by tobac (Figure 2f).One system initiates at t = 6 hr and terminates at the time when the northern cell splits off.At this time, a new MCS is identified that consists of the central and southern cells.Finally, TOOCAN continues to track the three initial cells as separate systems and therefore identifies three MCSs that all start at t = 0 hr and terminate at t = 25 hr.It is important to mention that all of the found solutions satisfy our MCS criteria.
The second idealized case is identical to the first one but each of the three cells maintains its overshoot meaning the northern cell classifies as an MCS after it separates at t = 19 hr (Figures 2a-2d and 2k-2p).TAMS continues to identify one MCS with the northern cell remaining part of the system after it splits off.MOAAP, ForTraCC, and tobac identify the northern cell as a new MCS after it separates from the main system, but in contrast to most other trackers, tobac does not continue any of the previous track(s) after the split.This is most likely explained by the relatively large distance between the feature center points and the search radius that is used in tobac to connect the latter.PyFLEXTRKR identifies three MCSs instead of one due to the maintenance of the overshoots in this case, agreeing with TOOCAN whose classification is unchanged.Since PyFLEXTRKR uses the detect and spread method to segment cloud systems, the three separate overshoots that each satisfy the MCS criteria on their own result in three separately tracked MCSs (see Figure 8 and associated text in Feng et al. (2023)).
The third idealized case is similar to the first two except for only having 3 hr before the cells merge and 3 hr after the northern cell separates (Figures 2a-2d and 2q-2v).The 3-hr threshold is selected to test how the trackers deal with individual cells that are shorter-lived than the 4-hr minimum MCS lifetime.PyFLEXTRKR and TOOCAN find three systems similar to the second idealized case while MOAAP, tobac, ForTraCC, and TAMS each identify 1 MCS.However, there are differences between the start and end of the one identified MCS.MOOAP has a smooth track that initiates the MCS at t = 0 hr and follows it until the end of the simulation (t = 18 hr).tobac identifies the MCS at t = 1 hr and stops the system when the northern cell separates at t = 15 hr.This can be explained by a default smoothing procedure of the input data in tobac that can lead to some objects not being identified as an MCS cloud object if they have sizes close to the minimum area required.ForTraCC initiates the MCS at t = 0 hr but only follows the northern cell until the three cells merge at t = 3 hr, resulting in a discontinuity in the track.It also stops following the northern cell after it splits off at t = 15 hr, resulting in a second discontinuity in the track.TAMS starts identifying an MCS after the 3 cells merge at t = 3 hr and keeps all cells connected (similar to MOAAP) until the end of the simulation.
The fourth idealized case features an asynchronous development of the 3 cells with interactions (overlapping cloud shields) during their lifetimes (Figures 2w-2ah).MOAAP, tobac, and ForTraCC identify 1 MCS with identical tracks starting at t = 0 hr and ending at the end of the simulation at t = 23 hr.TAMS also identifies 1 system but with a shorter track (initiation happens at t = 5 hr and termination at t = 20 hr).TOOCAN keeps the 3 cells separated as individual MCSs during their entire lifetime with tracks predominantly moving eastward.PyFLEXTRKR also identifies three MCSs but features interactions between the individual cells when the northern cell terminates and the southern cell initiates, which results in small discontinuities in the MCS tracks.
Based on these idealized cases, we can expect that TOOCAN will identify more frequent and smaller MCSs followed by PyFLEXTRKR while TAMS and MOAAP might have the fewest and biggest systems with the other trackers being in between.TAMS likely produces larger MCSs due to its use of convex hulls when identifying Figure 2. Mesoscale convective system (MCS) tracker intercomparison for four idealized cases.The first three cases initiate with three individual cells that all contain an overshooting top (a) and move eastward, grow, and merge (b).Afterward, the northern cell splits off (c) by moving toward the northeast (d).Case 1 differs from Case 2 because the northern and southern cells lose their overshoot after merging with the center cell.Case 3 differs from Case 2 by only having 3 hr before the cells merge and 3 hr after the cells split instead of 8 hr.The fourth case (w-ab) explores how trackers deal with 3 splitting and merging cells that develop asynchronously.The northern cell initiates at t = 0 hr, the central cell at t = 6 hr, and the southern cell at t = 12 hr.Each cell moves eastward at the same speed, growing for the first 7 hr and shrinking during the following 7 hr.The resulting MCS tracks (connected circles) and MCS footprints (contours) from the different trackers are shown in panels (e-v) for cases 1 to 3 and in panels (ac-ah) for Case 4. Individual MCSs have different colors.The vertical lines indicate the time of initiation (solid), merging of the cells (dashed), splitting off of the northern cell (dashed), and the end of the case study (dashed-dotted line).
anvil clouds (see Figure 3 for an example).Additionally, ForTraCC, PyFLEXTRKR, and TOOCAN were able to follow the cells from t = 0 hr to the end of the simulations for all cases, while other trackers missed some of the early or late stages of MCS development for some cases.tobac tends to initiate new tracks instead of preserving one of the previous tracks when splitting occurs, which should result in higher initiation frequencies and shorter lifetimes.TAMS consistently detects MCSs hours after convection initiation.This late initiation detection is mainly due to its identification criteria of being a convective area with an embedded cold core area size threshold.

Tracking of MCSs During a Deep Convective Outbreak in Argentina
The idealized cases discussed in Section 3.1 capture some of the variability of MCS evolutions but certainly do not cover all possibilities that can occur in real cases.While it is impossible to analyze the thousands of MCSs that were identified during the 3-year analysis period, we want to highlight similarities and differences between the idealized cases and an observed deep convective outbreak that occurred during the Cloud, Aerosol, and Complex Terrain Interactions (Varble et al., 2021) and Remote sensing of Electrification, Lightning, And Mesoscale/ microscale Processes with Adaptive Ground Observations (RELAMPAGO) (Nesbitt et al., 2021) field campaigns in Argentina in December 2018 (Figure 3).3a-3h) on 8 December 2018 at 16:00 UTC and rapidly grew upscale moving toward the southeast during the subsequent hours.While the initial deep convection decayed after about 8 hr, new deep convection formed to the north and continued until 10 December 2018 16:00 UTC when it eventually decayed over the South Atlantic.

Deep convection was triggered around 38°S and 67°W (red circle in Figures
Similar to the idealized cases, MOAAP and TAMS have the fewest systems with one identified MCS, while tobac, PyFLEXTRKR, and TOOCAN identify the most systems with four MCSs each.MOAAP has the largest total MCS area extent (total area under the tracked cloud shield) while TAMS has the smallest total extent since it identifies only one fairly short-lived MCS.The four MCSs that were identified by tobac, PyFLEXTRKR, and TOOCAN have different extents and tracks, highlighting the complexity of cloud field decomposition including splitting and merging of systems within the MCS trackers.

Annual Cycle of Monthly MCS Frequencies
Moving on from investigating single cases, we now analyze the observed and simulated 3-year average monthly frequency of MCSs in different subregions.An MCS is assigned to a region if at least half of its track centroid (geometric center of the ≤241 K cloud shield) is within the region during the MCS's lifetime.The NWS region exhibits a double peak in MCS occurrence during September and March, which is captured by all trackers (Figure 4a).The main difference between the trackers is the average number of MCSs per year, which varies between 493 in MOAAP and 919 in tobac.These frequency differences between trackers are similar for simulated MCSs (Figure 4b).Comparing simulated to observed MCS frequencies shows that there are only 2 months where all trackers agree on the sign of the differences (i.e., the model has too many MCSs in January and too few in May (Figure 4c)).The SAM region has a pronounced dry period during winter and a long period with high MCS activities from September to February (Figures 4d and 4e).All trackers agree that the simulations have too few MCSs during the dry season and too many from September to February.However, large differences exist about the magnitude of the overestimation ranging from close to zero in PyFLEXTRKR to more than 100% in January when using TOOCAN.Similar results are found for the SES region, while larger differences exist for the SAO region where all trackers agree on a low bias in simulated MCS frequencies.Results for all subregions are shown in Figure S1 of Supporting Information S1.

MCS Characteristics
Next, we investigate how MCS characteristics depend on the tracker formulation and how this uncertainty affects model evaluation.We show results from the NSA region as a representative example here but show other regions in Figures S2-S9 of Supporting Information S1 and discuss them further below.
Peak MCS cloud shield sizes during the MCS lifetime per definition have to be larger than 40,000 km 2 , which is met by most trackers in the NSA region except for a few MCSs in PyFLEXTRKR, TOOCAN, and tobac (Figure 5a).In PyFLEXTRKR this might in part be related to the assumption of a constant grid cell area.The smallest MCSs are identified when using TOOCAN while the largest systems are found when using TAMS.ForTraCC, TAMS, and tobac suggest that the simulated MCSs are smaller than observed systems while the other trackers have similar observed and simulated size distributions.MCSs in NSA move slowest when using TOOCAN and are fastest when using tobac (Figure 5b).The smaller speed in TOOCAN is likely related to not having any mergers and splits in the tracking.All trackers agree that simulated systems move slightly slower than observed MCSs.Large model-observation differences occur for the MCS lifetime-maximum 95th-percentile (P95) hourly precipitation rate (Figure 5c) and mean precipitation rate (Figure 5d) with simulated rates being significantly higher than observed.This is at least partly related to deficiencies in accurately capturing precipitation frequencies and intensities in GPM-IMERG (Dominguez et al., 2024;Guilloteau & Foufoula-Georgiou, 2020;Rozante et al., 2018;Zhang et al., 2021), and simulated convective updrafts being too large and strong at 4-km grid spacing (Fan et al., 2017;Varble et al., 2020;Wang et al., 2020).Tracker-dependent modelobservations differences are smaller for the mean and P95 precipitation rate characteristics than for other MCS characteristics, though TOOCAN produces higher P95 values compared to the other trackers.MCS lifetime-average precipitation volumes are smallest in TOOCAN and similar across the other trackers (Figure 5e), in agreement with MCS maximum size statistics.Interestingly, differences between modeled and observed precipitation volumes are much smaller than the differences in MCS precipitation rates.This is caused by GPM-IMERG having larger precipitation areas than simulated, which offsets lower precipitation intensities compared to the simulations (Dominguez et al., 2024;Zhang et al., 2021).Finally, there are large tracker formulation differences concerning MCS duration.MOAAP has the longest-lived systems with median values of 17 hr while median MCSs in tobac only live for ∼8 hr.Also, the interquartile range of the duration distribution varies from ∼10 hr in MOAAP to 3 hr in TOOCAN.Most trackers produce little differences between observed and simulated MCS lifetimes, though ForTraCC, TOOCAN, and tobac suggest MCSs may be longer lived in the observations.The duration of MCSs in tobac can be influenced by the applied segmentation technique that can result in time steps with detected cloud features that do not have an associated segmented area with their center location (see Section 2.3).The results for other regions are shown in Figures S2-S9 of Supporting Information S1.
Figure 6 shows an overview of observed (x-axis) and modeled (y-axis) median MCS characteristics for different trackers (colors) and all subregions (symbols).For lifetime-maximum MCS size (Figure 6a), regional differences are similar between trackers with most trackers having the smallest MCSs in NAO and the largest MCSs in SAO, SES, and NES.MCS speeds are similar between observations and modeled systems with most data points lying close to the one-to-one line (Figure 6b).Most trackers simulate the fastest storms in SAO and the slowest in the EAO.There are large regional dependencies in the simulation of P95 precipitation rate with small differences in the SAO region and the largest differences in NWS, NES, and SES (Figure 6c).TOOCAN MCSs feature the heaviest P95 precipitation rates in most regions while systems identified by ForTraCC generally have the lowest rates (Figure 6c).Mean precipitation rates also agree better between simulated and observed storms in the SAO region (Figure 6d).In all other regions, simulated WRF MCSs have much higher mean precipitation rates than observed systems with small uncertainties due to tracker formulation.Simulated MCS average precipitation volumes are systematically smaller than observed volumes when tobac, ForTraCC, or TAMS are used, while the sign of differences is regionally dependent when the other trackers are used (Figure 6e).Finally, all trackers except MOAAP and PyFLEXTRKR feature slightly shorter-lived modeled MCSs in all regions relative to observed (Figure 6f).MOAAP systematically detects the longest-lived MCSs while tobac has the shortest-lived systems as exemplified using the idealized case studies of Section 3.1.

MCS Life Cycles
MCSs are known to go through different life cycle stages that are frequently differentiated into (a) a growth stage, where individual storms deepen with merging anvils to create a larger CCC, (b) a mature stage in which the MCS reaches maximum size, convective regions are the most spatially connected, and stratiform precipitation is maximized, and (c) a decay stage where the MCS precipitation decreases and becomes the system becomes downdraft dominated (Machado et al., 1998).
Here we investigate how different trackers depict the MCS life cycle and what impact the tracker formulation has on the model evaluation.We focus the analysis on short-lived (duration ≤12 hr) and long-lived (duration between >16 hr and ≤20 hr) MCSs.We only consider MCSs that initiate with a ≤241 K T b area of less than 40,000 km 2 to minimize the effect of MCSs that split off from an existing MCS.We chose the NSA region as a representative example (Figure 7) while results for other regions are shown in Figures S10-S16 of Supporting Information S1.All trackers show a rapid expansion of the anvil area after MCS initiation, followed by a stabilization of the area, and a decay of the anvil size (Figures 7a-7f).However, the shapes of these curves vary depending on the tracker, being close to bell-shaped in TOOCAN but skewed in tobac and TAMS (especially visible in short-lived MCSs).Note that the bend in the tail of some of the long-lived MCS distributions at hour 16 is artificial since we include MCSs that live between 16 and 20 hr in the statistics.The differences between the peak size in the short-lived and long-lived storms are also noteworthy.In MOAAP and ForTraCC, these two categories of storms reach similar peak sizes while TAMS has much smaller short-lived storms than long-lived ones.All trackers show similar MCS size evolution in the observations and the simulations.Larger differences between observed and simulated MCS life cycles exist for P95 precipitation with modeled systems producing much heavier rainfall (Figures 7g-7l; similar to what is shown in Figures 5c and 6c).TOOCAN features P95 precipitation that rapidly intensifies within hours after initiation and that has a long decay period afterward.MOAAP, PyFLEXTRKR, and ForTraCC feature similar behavior but with a much less pronounced increase and decay, while tobac and TAMS do not show the initial intensification of P95 precipitation.There are likely two reasons for these differences.First, the results include MCSs that initiate by splitting from other systems.Second, TAMS, tobac, and MOAAP might miss the earliest few hours of the MCS life cycle as is shown in Section 3.1.7m-7r) are similar to those of MCS anvil size since these two properties are closely connected.The differences between tracking schemes are also similar.Most trackers show good agreement between simulated and observed precipitation volumes although peak volumes of short-lived systems can differ depending on the tracker.

MCS life cycles of precipitation volume (Figures
For the long-lasting systems, three trackers (TOOCAN, PyFLEXTRKR and MOAAP) show similar bell-shaped life cycles of rain volume and further reveal a peak in the simulated precipitation volume that is ∼2 hr earlier than that in the observations, possibly indicative of a systematic bias simulated life cycles.However, the other trackers do not exhibit a noticeable time lag between the modeled and observed precipitation volume peaks.Lastly, MCS speed slightly increases over time in most trackers (particularly in TAMS), though PyFLEXTRKR and TOOCAN exhibit a decrease followed by an increase (Figures 7s-7x).This is likely due to how the splitting and merging are handled in these two trackers and due to the usage of MCS cloud shield geometric center displacements to calculate movement speed.All trackers show that observed and simulated MCS movement speeds are in good agreement.
We emphasize that the model-observation differences of the composite MCS life cycle characteristics are more consistent among the trackers than the evolution of the composite values themselves.Except for simulating approximately twice as high P95 precipitation, the model is able to simulate the evolution of the observed MCS cloud size, rainfall volume, and movement speed well, regardless of which tracker is used.This supports these metrics as being robust for evaluating the performance of the simulations.

MCS Initiation by Location
Regional hotspots of MCS initiation are identified by most trackers over the northeast Brazilian coast, a few hundred kilometers inland from this coastline, over the Guiana Highlands, the western slopes of the Colombian Andes, and the eastern slopes of the Peruvian, Bolivian, and Argentinian Andes (Figure 8).However, the frequency of MCS initiation at these hotspots can vary by an order of magnitude.TAMS has the lowest MCS initiation frequency while TOOCAN has the highest, which is true for observed and simulated MCSs.The general spatial pattern of MCS initiation is similar between the observations and simulations but there are differences in initiation frequency (Figure 8 bottom row).These model-observation differences strongly depend on the tracker formulation with generally lesser absolute differences when MOAAP and PyFLEXTRKR are used (relative differences are largest in TAMS; not shown) and mostly positive differences (more modeled initiations) when using tobac, TAMS, and ForTraCC.There are only a few regions where all trackers agree on the sign of the difference.Systematically higher model frequencies are found along the eastern slopes of the Bolivian Andes, in northwestern Colombia, and over the southern Amazon Basin.Consistent model underestimation of MCS initiation frequency among trackers is rarer and only occurs off the coast of northeastern Brazil.

MCS Frequency by Location
The frequency of MCSs is less sensitive to the tracker formulation than the frequency of MCS initiation (compare Figure 9 with Figure 8).Initiation frequencies only consider the grid cell with the geometric center of the MCS cloud shield during its first detection, which is highly dependent on the tracker formulation as shown in Figures 4  and 8.In contrast, MCS frequencies consider the ≤241 K T b footprint over the entire MCS life cycle where each MCS is only counted once in each grid cell eliminating the double counting of long-lived slow-moving systems.For instance, a tracker that produces many small and short-lived MCSs can result in the same MCS frequency as a tracker that produces few, large-scale, and long-lived MCSs.TOOCAN has the lowest MCS frequencies, likely because of the smaller systems that are identified, while TAMS has the highest frequencies, which is probably related to its use of convex hulls.The differences between modeled and observed frequencies are more similar between trackers than those for MCS initiation with all trackers agreeing on less simulated MCS frequencies over ocean regions, southern South America, and northeastern South America.Larger uncertainties exist in the Amazon basin.Only initiations that start with cloud shields smaller than 40,000 km 2 are incorporated to reduce the effect of MCS splits on the statistics.We consider an initiation to be the geometric center of the MCS cloud shield at the time of its first detection.

MCS Contribution to Total Precipitation by Location
Lastly, we analyze the contribution of MCSs to total annual rainfall (Figure 10).This analysis is affected by a large range of MCS characteristics including frequency, size, longevity, and precipitation rates.Applying different trackers results in a wide range of MCS contributions to total precipitation with PyFLEXTRKR producing the highest contributions while TOOCAN and ForTraCC produce the lowest.There is agreement on the continental maximum of MCS contribution over the La Plata basin which varies between ∼60% in ForTraCC and TOOCAN to more than 80% in PyFLEXTRKR.There is also agreement among trackers that the simulation produces a lesser fraction of precipitation from MCSs over large parts of the study region, particularly over Southern Argentina and Chile and over the equatorial and southern Atlantic.The differences over Patagonia are influenced by extratropical cyclones and atmospheric rivers that may produce erroneous MCS identifications in observations with the tracker definitions used since MCSs are not expected frequently in this region.

Conclusions
We compared the results of six MCS trackers to understand how sensitive MCS statistics are to the formulation of the tracking algorithm and what impact this has on the evaluation of km-scale regional climate model simulations over South America.We performed this analysis for three water years (June to May) over South America, each differing concerning their El Niño phase, but only focused on multi-year average statistics to limit the presented information.
Uncertainties in observed precipitation present difficulties in interpreting model-observation comparisons.There are documented high biases in the occurrence frequencies of light precipitation rates in GPM-IMERG over the Amazon basin (Dominguez et al., 2024;Rozante et al., 2018) and La Plata basin (Zhang et al., 2021), though heavy precipitation rates have also been shown to be biased high in a kilometer-scale season-long WRF simulation over the La Plata basin (Zhang et al., 2021).Particularly striking are the large land-ocean contrasts in simulated versus observed MCS statistics, which are partly related to differences in the GPM-IMERG precipitation retrieval over ocean and land.
The following points summarize our findings concerning the two leading questions about how MCS tracker formulation affects MCS statistics and how this uncertainty impacts the evaluation of km-scale climate models.
• MCS frequencies, as well as certain MCS characteristics, strongly depend on the tracker formulation, even when using the same MCS criteria.This means that statistics on the frequency, size, duration, or contribution to total precipitation of MCSs are susceptible to the tracker algorithm in use and should be interpreted accordingly.A main source of uncertainty is the treatment of cloud system segmentation including splitting and merging in different tracking algorithms, in agreement with previous findings (Müller et al., 2022).• The dependence of MCS characteristics on the tracker formulation is fairly systematic across geographical locations although some regional differences exist.Table 2 provides an overview of tracker MCS characteristics relative to average characteristics across the tracker ensemble.This should not be interpreted as a ranking of tracking schemes since no reference data set could be used to infer a quantitative assessment of derived MCS characteristics.• The tracker formulation can affect the evaluation of model performance in profound ways.Agreements amongst the tracking schemes on the sign of model-observational differences are typically the exception, which is in part caused by the good performance (i.e., small differences) of the 4 km simulation in capturing many observed MCS characteristics.Statistics that are highly sensitive to the tracker formulation are the MCS frequency including the initiation frequency, the ratio of MCS to total precipitation, and MCS size and duration.• Comparisons of observed and modeled MCS lifecycle characteristics (e.g., the development of cloud shield size, movement speed, and rain volume) are more robust and less dependent on the tracker used.Comparisons of MCS frequency differences by location and differences in MCS contributions to total precipitation are generally more robust, though disagreement exists for some locations such as the southern Amazon basin.
This study only focused on MCS tracker formulation uncertainty, neglecting uncertainties that stem from differences in how MCSs are defined.We use an arbitrary definition of MCSs that results in similar statistics as in published literature (Feng et al., 2021), but a modified definition could be warranted depending on the research question being asked and should be the focus of future assessments.Additionally, users of a particular tracker should configure it to fit the purpose of their work.
It is important to mention that while all trackers use a common definition of MCSs, there are settings in the tracking algorithm that have a substantial impact on the presented results.The segmentation treatment that treads the splitting and merging of MCSs is one of the most important procedures in MCS tracking and has a significant impact on MCS statistics such as frequency, size, and duration.Additionally, the method used to track MCSs in time (point-tracking: tobac; 2D overlap: ForTraCC, PyFLEXTRKR, TAMS, MOAAP; 3D segmentation: TOOCAN) can have also a substantial effect.Importantly, we find that the model versus observation differences are more consistent among the trackers than the MCS metrics meaning that model evaluation studies are less prone to MCS tracker formulation uncertainties.
Future work could also expand the present study to global scales to improve our understanding of tracker formulation uncertainties in regions that have different atmospheric conditions than those found in South America.Additionally, a better understanding of tracker formulation impacts on inter-annual variability and longterm trends in MCS statistics would be valuable.
Performing the analysis over a regional domain introduces uncertainties about the impact of the domain boundary on the MCS statistics.Most analysis regions are far away from the lateral boundaries and should not be affected except for EAO, NWS, and SAO.Particularly, the low difference of simulated MCS frequencies in the EAO region might be related to a lack of MCSs that enter this region through the eastern boundary.Additionally, boundary effects on our results should be small since most MCSs initiate over land regions and the vast majority of those MCSs do not live long enough to reach the boundary.
While we do not recommend that all MCS tracking analyses need to use multiple tracking schemes because of the complexity this would introduce, we stress that studies using a single scheme in isolation, or in comparison with another study with a different tracker, have to be interpreted with caution.Context from many complementary methods can improve understanding of any single method's strengths and weaknesses for a specific application, providing robust support for generalized scientific conclusions.

Figure 1 .
Figure1.Simulation domain (outline of filled contours), model topography (colored contour), analysis region (black polygon), and outlines of sub-regions (gray lines).The regions are the same as in the IPCC sixth assessment report(Iturbide et al., 2020) and include Northwest South America (NWS), Northern South America (NSA), Equatorial Atlantic Ocean (EAO), South American Monsoon region (SAM), Northeast South America (NES), Southeast South America (SES), and the South Atlantic Ocean (SAO).
. The new in-development version of TAMS used in this study Journal of Geophysical Research: Atmospheres 10.1029/2023JD040254 PREIN ET AL. follows the same main four steps as its predecessor: (a) Identify, (b) Track, (c) Classify, and (d) Assign variable(s).

Note.
For ForTraCC and TAMS two numbers are provided for the memory and runtime where the first number shows the demands of the tracking algorithm and the second number in brackets the demands of the post-processing program that creates an MCS mask file.The other trackers include the post-processing step in their algorithm.Temporal linking refers to the procedure of connecting MCSs in time while spatial segmentation refers to how a cloud field is segmented into individual MCSs.a Numbers correspond to tracking MCS to the completion of writing an MCS mask file.The common period is 1-7 November 2018 considering the entire domain (Figure1) on the 0.1°GPM-IMERG grid.

Figure 3 .
Figure 3. Similar to Figure 2 but for tracking an observed deep convective outbreak that occurred during December 2018 in southeastern South America.Panels (a-h) show the evolution of the cloud shield (gray shading), precipitation rates (colored shadings), the outline of the detected Mesoscale convective system (MCS) (blue contour), and the MCS track (red contour) based on results from MOAAP.The red circle shows the initiation point of the MCS.Panels (i-n) show the tracks and outline of the identified MCSs based on (i) MOAAP, (j) tobac, (k) ForTraCC, (l) TAMS, (m) PyFLEXTRKR, and (n) TOOCAN.Different colors indicate individual MCSs.

Figure 4 .
Figure 4. Monthly mean Mesoscale convective system (MCS) frequencies (averaged over the 3 years) for observed MCSs (first column), modeled MCSs (center column), and their relative differences (right column).Results are shown for the NWS, SAM, SES, and SAO regions (top-down).

Figure 5 .
Figure 5. Observed (gray) and simulated (red) Mesoscale convective system (MCS) characteristics in the NSA region.Shown are MCS (a) peak size, (b) median speed, (c) lifetime-maximum 95th percentile precipitation rate (P95), (d) mean precipitation rate, (e) mean precipitation volume, and (f) duration distributions.Precipitation statistics only consider grid-scale precipitation rates larger than 2 mm hr 1 .The box width shows the interquantile range with the median indicated as a horizontal line within the box.The whiskers extend to the maximum or minimum data point or to 1.5 times the interquantile range dependent on which one is smaller.

Figure 6 .
Figure 6.Observed (x-axis) and modeled (y-axis) median Mesoscale convective system (MCS) characteristics for (a) peak size, (b) median speed, (c) lifetime-maximum 95th percentile precipitation rate (P95), (d) mean precipitation rate, (e) mean precipitation volume, and (f) duration.Results from different trackers are shown in different colors (see legend in panel b).Results for different regions are shown with varying symbols (see legend in panel a).The dotted lines are convex hulls that incorporate all regional variations of MCS characteristics (polygons that incorporate all same colored points).The diagonal line represents a 1:1 relationship between observed and simulated MCS characteristics.

Figure 7 .
Figure 7. Evolution of short-lived (≥4 hr and ≤12 hr; dashed lines) and long-lived (≥16 hr and ≤20 hr) mean Mesoscale convective system (MCS) size (first row), 95th percentile precipitation rate (second row), precipitation volume (third row), and speed (bottom row) in the NSA region.Mean observed and simulated characteristics are shown with red and black lines, respectively.Results from different trackers are shown in columns.The number of MCSs in each analysis is shown in the legend.Only initiations that start with cloud shields smaller than 40,000 km 2 are incorporated to reduce the effect of MCS splits on the statistics.

Figure 8 .
Figure 8. Initiation frequency of Mesoscale convective systems (MCSs) in 2°× 2°regions based on observations (top row), the simulation (middle row), and their difference (model minus observed; bottom row).Only initiations that start with cloud shields smaller than 40,000 km 2 are incorporated to reduce the effect of MCS splits on the statistics.We consider an initiation to be the geometric center of the MCS cloud shield at the time of its first detection.

Figure 9 .
Figure 9. Frequency of Mesoscale convective systems (MCSs) in 0.1°× 0.1°regions based on observations (top row), the simulation (middle row), and their difference (model minus observed; bottom row).We use the extent of the MCS cloud shield in this calculation and each MCS is only counted once in each cell (i.e., if a slowmoving MCS occupies a grid cell for 10 hr, it is only counted once).

Figure 10 .
Figure 10.(a) Average observed and (b) simulated precipitation.Fraction of Mesoscale convective system to total precipitation based on observations (second row from top), simulations (third row from top) and their differences (model minus observed; bottom row).

Table 1
Key Characteristics and Computational Demands of Mesoscale Convective System (MCS) Tracking Algorithms