A Comprehensive Northern Hemisphere Particle Microphysics Data Set From the Precipitation Imaging Package

Microphysical observations of precipitating particles are critical data sources for numerical weather prediction models and remote sensing retrieval algorithms. However, obtaining coherent data sets of particle microphysics is challenging as they are often unindexed, distributed across disparate institutions, and have not undergone a uniform quality control process. This work introduces a unified, comprehensive Northern Hemisphere particle microphysical data set from the National Aeronautics and Space Administration precipitation imaging package (PIP), accessible in a standardized data format and stored in a centralized, public repository. Data is collected from 10 measurement sites spanning 34° latitude (37°N–71°N) over 10 years (2014–2023), which comprise a set of 1,070,000 precipitating minutes. The provided data set includes measurements of a suite of microphysical attributes for both rain and snow, including distributions of particle size, vertical velocity, and effective density, along with higher‐order products including an approximation of volume‐weighted equivalent particle densities, liquid equivalent snowfall, and rainfall rate estimates. The data underwent a rigorous standardization and quality assurance process to filter out erroneous observations to produce a self‐describing, scalable, and achievable data set. Case study analyses demonstrate the capabilities of the data set in identifying physical processes like precipitation phase‐changes at high temporal resolution. Bulk precipitation characteristics from a multi‐site intercomparison also highlight distinct microphysical properties unique to each location. This curated PIP data set is a robust database of high‐quality particle microphysical observations for constraining future precipitation retrieval algorithms, and offers new insights toward better understanding regional and seasonal differences in bulk precipitation characteristics.


Introduction
Accurate precipitation estimates are crucial for monitoring the global water-energy balance, influencing agricultural productivity, informing economic decisions, and fostering ecosystem growth and sustainability (Breugem et al., 2020;Calzadilla et al., 2013;Dagan & Stier, 2020;Gergel et al., 2017;Meyghani et al., 2023;Pörtner et al., 2019).As global temperatures continue to rise (Arias et al., 2021), Northern Hemisphere (NH) precipitation patterns are expected to respond in a nonlinear manner, driven by increased poleward atmospheric moisture transport and modulated by complex atmospheric dynamics (Bintanja & Andry, 2017).While future model projections agree that total precipitation will increase across high latitude NH regions (with marked enhancements in interannual variability of 40% by 2100), there exists substantial uncertainty in the distribution and frequency of rainfall and snowfall events, reinforcing the need for accurate observational techniques to monitor these processes (Bintanja, 2018;Bintanja et al., 2020).In situ precipitation measurements are high-quality observational references commonly used for these purposes, however manual measurement techniques are time-consuming (Cauteruccio et al., 2021), and the high installation and maintenance costs of automated precipitation gauges results in a sparse measurement network with large unobserved gaps between sites (Kochendorfer et al., 2022;Mekis et al., 2018).
Satellite-based precipitation measurement systems can be used to fill these gaps (e.g., the Tropical Rainfall Measuring Mission, Kummerow et al., 2000, CloudSat, Stephens et al., 2008, Global Precipitation Measurement [GPM], Hou et al., 2014, Earth Cloud, Aerosol and Radiation Explorer [EarthCARE], Illingworth et al., 2015).These systems are able to retrieve estimates of rainfall and/or snowfall over large swaths of the globe due to their orbit.However, current remote sensing-based precipitation retrievals strongly rely on assumptions of particle microphysical properties (e.g., particle size, shape, fall speed, and density) which do not necessarily generalize well across different regional climates (King et al., 2022;Pettersen, Bliven, et al., 2020;Wood et al., 2013).Biases in these physical assumptions result in large uncertainties in precipitation rates (Chase et al., 2020;Duffy et al., 2021;Gilmore et al., 2004;Morrison et al., 2020), with substantial hydrologic consequences to surface processes as errors propagate through model simulations (Biemans et al., 2009;Falck et al., 2015;King et al., 2020).
Bayesian retrievals, such as optimal estimation, employ a statistical approach to retrieve precipitation rates from satellite radar observations through the use of a priori databases of known particle microphysical properties (L'Ecuyer & Stephens, 2002;Maahn et al., 2020;Munchak & Kummerow, 2011;Rapp et al., 2009).However, the precision of these retrievals is greatly influenced by the quality and robustness of available a priori training data sets commonly developed from in situ microphysical observations during ground validation campaigns (Junkins & John, 2004).A comprehensive database of particle microphysics is therefore a powerful tool to facilitate future research toward developing more robust precipitation retrievals through an examination of snowfall and rainfall patterns across multiple years and throughout varying regional climates.Additionally, as demonstrated by Dolan et al. (2018), studying the spatiotemporal variability of precipitation in these data sets can objectively separate events by underlying physical and thermodynamic processes (e.g., convective or stratiform precipitation), and further characterize the dominant precipitating mechanisms within each group (e.g., particle riming, aggregation, vapor deposition, collision) to identify regional modes of variability.
In this paper, we present a comprehensive particle microphysics data set derived from a series of video disdrometers developed and built by National Aeronautics and Space Administration (NASA) called precipitation imaging packages (PIPs).The PIP instruments examined here were deployed at 10 locations across the NH with observations beginning in 2014, to provide high-quality estimates of particle microphysics at minute-timescales (Cooper et al., 2022;Houze et al., 2017;Lerber et al., 2017;Mariani et al., 2022;Munchak et al., 2022;Pettersen, Bliven, et al., 2020;Pettersen, Kulie, et al., 2020;Pettersen et al., 2021;Shates et al., 2021;Tiira et al., 2016).The resulting data set is: (a) packaged into a common, Climate and Forecast (CF)-compliant, accessible data format using the network Common Data Form (NetCDF-4) which is underlain by the Hierarchical Data Format version 5 (HDF5) for storing scientific data in a tabular form; (b) temporally standardized with minute-scale observations of particle size distributions (PSDs), vertical velocity distributions (VVDs), effective density distributions (rho), an equivalent density particle mass retrieval (eD) and derived snowfall and rainfall rates in daily files; and (c) quality controlled to remove erroneous data points, with improved alignment between PIP product levels.
The paper is organized as follows: 1. Introduce PIP study sites, along with the measurement capabilities of the PIP and its associated data products 2. Describe how PIP data was processed, quality controlled and standardized when converted into NetCDF-4 files 3. Analyze case studies using the PIP and ancillary data to highlight physical processes, and examine bulk precipitation characteristics to illustrate regional differences 4. Discuss how this data can be used in a handful of research and operational applications 5. Summarize the data curation methodology and highlight the strengths and limitations of the curated PIP data set

Study Sites
PIP measurements are collected from 10 different locations in six countries across the NH spanning 34°latitude from 37°N to 71°N (Figure 1a).Observations are retrieved from instruments installed at a combination of both long-term measurement sites, and temporary field campaigns.Each study site is briefly discussed in this section, including descriptions of their regional topography and climate.Temporal coverage across all sites spans 14 January 2014 to 31 August 2023, with all observational periods illustrated in the Figure 1b for intercomparison.Additionally, Table 1 provides a summary of site-specific details including their respective data coverage periods, elevation, latitude and longitude, and additional reference sources.et al., 2016).This campaign is dedicated to continuously gathering detailed data on regional atmospheric fluxes, storage, and concentrations within the land ecosystem-atmosphere interface (Hari et al., 2013).Positioned at roughly 150 m.a.s.l., the station is located in the middle of a forest clearing sheltered by the surrounding trees approximately 20 m from the PIP (Aaltonen et al., 2012;Lerber et al., 2017).Due to the influence of the treeline, the wind conditions at Hyytiälä are typically moderate or low (the median wind speed (WS) for snowfall events spanning 2014-2022 is 1.3 m/s).Adjacent forests predominantly feature boreal mixedconiferous trees, interspersed with small lakes and wetlands.The area's long-term average yearly temperature stands at +3.5°C, with February as the coldest month ( 7.7°C) and July being the hottest (+16°C).From 1981 to 2010, the annual precipitation averaged 71 cm, comprising rain during warm periods and snow in winter.The 30year mean winter maximum snow depth at this location is approximately 47 cm (Drebs et al., 2002).Since its installation in January 2014, the FIN PIP has been in continuous operation and observations are ongoing (Lerber et al., 2017;Tiira et al., 2016).
PIP data from the Marquette, Michigan (MQT) site was sourced from the National Weather Service (NWS) Marquette office, located in Michigan's Laurentian Great Lakes region (Kulie et al., 2021;Pettersen, Kulie, et al., 2020).This NWS office is positioned 13 km southwest from Lake Superior, set on a gently rising slope at 426 m.a.s.l.surrounded by a mixed northern hardwood-conifer forest (46.532°N, 87.548°E; (Shates et al., 2023)).
The PIP is situated in a flat, open field adjacent to the office, in an area specifically maintained by the NWS for monitoring snow accumulation.The Great Lakes region is known for its consistent cold-season snowfalls, typically resulting from broad, vertically deep synoptic-scale storms, or localized convective lake effect snow processes (Kulie et al., 2021).The site also frequently experiences precipitation driven by atmospheric rivers moving across the region, leading to enhanced precipitation rates and cold-season rain events (Mateling et al., 2021).Average winter lows are 6°C, summer highs average 19°C, and the site records a winter snow accumulation ranging from 250 to 500 cm (Pettersen, Kulie, et al., 2020).The PIP was installed at MQT in 2014 and has been operating continuously through present (Pettersen, Bliven, et al., 2020;Pettersen et al., 2021).While the nearby Gaylord, Michigan (APX) data is not collected from a long-term installation, it is sourced from a PIP at another more inland Michigan NWS site approximately 100 km to the southeast of Marquette in the lower peninsula (44.908°N, 84.719°E, 446 m.a.s.l.), in an area that experiences an average of 378 cm of accumulated snowfall each winter.The APX PIP is installed seasonally from November to April starting in 2021 through present.
Data from Iqaluit (YFB) were sourced from the Canadian Arctic Weather Science (CAWS) super-site (Joe et al., 2020), operated by Environment and Climate Change Canada in Iqaluit, Nunavut's capital (63.747°N, 68.542°E, 12 m.a.s.l.).The primary goal of CAWS is to enhance meteorological observations in the Canadian Arctic, aiding in forecasting and the evaluation of numerical weather prediction models.The measurement site is located in a valley overlooking Frobisher Bay, approximately 200 m from the city's airport runway on flat, permafrost terrain (Chou et al., 2022).YFB is influenced by various synoptic storms that originate across the Arctic, with the most common storm tracks emerging over the western Arctic or the Prairies (Mariani et al., 2022).Throughout the year, Iqaluit undergoes significant temperature variations, typically ranging from 35°C to +20°C, and experiences nearly 21 hr of sunlight or darkness during polar day or night periods.Being coastal, YFB is set within an Arctic-tundra setting, marked by icy terrains, rolling hills, and a dry, desert-like climate, receiving 20 cm of rainfall and 229 cm of snowfall, annually (Joe et al., 2020).The YFB PIP was installed in September 2014 and was updated to the same software version used by the other PIPs (i.e., v.1701) in May 2017, with which it operated under until August 2019.
The North Slope Alaska (NSA) site, situated in Utqiaġvik along Alaska's northern coast adjacent to the Arctic Ocean, is a high Arctic research facility under the Atmospheric Radiation Measurement program of the U.S. Department of Energy (DOE; Wendler et al., 2017).Positioned North of the Arctic Circle, Utqiaġvik is among the world's northernmost settlements and the farthest North in the U.S. NSA's mission is to offer detailed observations of high latitude cloud and radiative processes, making it a hub for Arctic atmospheric and ecological studies (Verlinde et al., 2016).As one of the cloudiest places on Earth, the site hosts a range of instruments focusing on cloud processes (Stamnes et al., 1999), and maintains a vast data archive of precipitation observations from the PIP.Utqiagvik's tundra climate is predominantly cold and dry with short, cool summers, and prolonged, freezing winters.Throughout the year, temperatures usually range between 28 and 9°C, seldom dropping below 38°C or exceeding 15°C.Despite its arid nature, with less than 15 cm of rainfall annually, Utqiaġvik's snowfall has been increasing, averaging 120 cm annually based on the 1991-2020 records.The NSA PIP was installed in October 2018 and has been operational until present.

Limited Field Campaigns
PIP data were also collected from the International Collaborative Experiments for Pyeongchang 2018 Olympic and Paralympic Winter Games (ICE-POP or ICP) campaign from January to April 2018 in South Korea (Helms et al., 2022).ICP was a field validation campaign aiming to generate comprehensive ground and airborne precipitation data sets to support the physical verification of precipitation retrieval algorithms used by NASA's GPM satellite constellation (Skofronick-Jackson et al., 2015).ICP data were sourced from two South Korean sites (denoted KO1 and KO2 in the PIP data set) with the objective of studying severe winter weather patterns across complex terrain and improving short-term weather predictions for these events (Petersen et al., 2016).The two sites, situated roughly 12 km apart, were (a) KO1: the BKC (Bokwang-ri Community center; 37.738°N, 128.756°E, 175 m.a.s.l.) positioned 15 km from the eastern coast, and (b) KO2: the MHS (Mayhills Supersite; 37.665°N, 128.7°E, 789 m.a.s.l.), situated in a mountainous region further inland (Kim et al., 2021).Given its coastal proximity and humid continental climate, the area experiences temperatures that range from 1°C in January to 25.8°C in August, while the lows vary from 4.6°C in January to 20.5°C in August.The region receives 131 cm of precipitation on average annually, with the majority falling during the winter as snow (Chandrasekar et al., 2019).
The Olympic Mountains Experiment (OLYMPEX or OLY) is another GPM GV campaign that provided PIP observations for this data set.Conducted in Washington State's Olympic Peninsula from November 2015 to February 2016, data from OLY were sourced from the Hurricane Ridge site (47.97°N,123.58°E, 1,603 m.a.s.l.), located roughly 18 km South of the Salish sea coastline in an alpine environment (Houze et al., 2017).Characterized by an active winter storm season, the area experiences moisture-laden systems progressing from the nearby Pacific Ocean, sweeping over the coast, and moving into the Olympic Mountains (Houze et al., 2017;Purnell & Kirshbaum, 2018;Zagrodnik et al., 2021).Annually, the region accumulates precipitation varying from 250 cm along the coast to 450 cm within its forested mountainous zones, with the bulk of this precipitation falling between November and April.While temperatures at lower elevations are generally cool to moderate, they can occasionally fall below freezing to produce solid precipitation.Higher terrains get blanketed with significant snow, with Hurricane Ridge receiving 30-35 feet of snow on years when strong storm systems are moving across the region (NPS, 2018).
The Haukeliseter (HAK) and Kiruna (KIS) sites played integral roles in the High-Latitude Measurement of Snowfall (HiLaMS) campaign (Cooper et al., 2022).This campaign aimed to harness snowflake microphysics observations to refine surface snow accumulation estimates during the winters of 2016/2017 and 2017/2018 in Scandinavia (Cooper et al., 2022;Schirle et al., 2019;Shates et al., 2021).Located in Norway's Telemark region at Haukeliseter on a mountain plateau, the HAK site (59.81°N,7.21°E, 991 m.a.s.l.) was managed by the Norwegian Meteorological Institute (Met Norway; Wolff et al., 2015).HAK's isolated alpine tundra region is characterized by low scrubs and mossy vegetation.Its winter season, spanning October to May, HAK predominantly experiences snow and sleet accompanied by wind speeds reaching 20 m/s and temperatures dropping to 30°C.Conversely, the second HiLaMS site, KIS (67.84°N, 20.41°E, 425 m.a.s.l.) is situated atop a single-story building in Kiruna, Sweden, amid a forested landscape and surrounded by proglacial lakes.Operated by the Luleå University of Technology, the research emphasis at KIS was on delineating snowfall attributes within a subarctic taiga forest (Schirle et al., 2019).This location was chosen for its frequent, intense snowfall from September to May, and its stark climatic contrast to Haukeliseter (Cooper et al., 2022).Notably, the influence of the warmer Atlantic Ocean on this inland site is mitigated by Sweden's tallest mountains, situated roughly 75 km southwest of Kiruna.).The area features a surrounding mixed deciduous forest, small lakes and streams, and slowly rolling terrain.In Storrs, summers are comfortably warm, while winters can be particularly cold and snow laden.Annually, temperatures typically fluctuate between 8 and 28°C, seldom falling below 16°C or exceeding 32°C.The site experiences a thorough mix of rain and snow throughout the year, averaging 125 cm of rainfall and 86 cm of snowfall, attributable to the pronounced seasonal temperature variations.

Precipitation Imager
The NASA PIP is a video disdrometer that was developed to succeed the Snowflake Video Imager (SVI) (Newman et al., 2009).As a disdrometer, the PIP measures PSDs and the velocity of falling hydrometeors, and is capable of observing both rain and snow with a high degree of accuracy (Pettersen, Bliven, et al., 2020;Pettersen et al., 2021).Additionally, compared to other similar disdrometers, the PIP is relatively inexpensive (approximately 7 thousand USD worth of equipment) and easy to deploy, facilitating its use in remote field campaigns.Images recorded by the instrument can be used to derive microphysical and bulk characteristics of rain and snow at minute-scale temporal resolution (Helms et al., 2022).
The PIP instrument (shown in Figure 2a) consists of a high-speed video camera (shooting at 380 frames per second at 640 × 480 resolution), aimed directly at a 150-W halogen lamp positioned 2 m in front of the camera.The camera has a 64 × 48 mm field of view (FOV) and a focal plane located 1.33 m from the lens.The image resolution of the device is 0.1 by 0.1 mm, with a minimum particle detection threshold of 0.3 mm equivalent area diameter.Each PIP is calibrated to the same specifications before being shipped to each study site, to ensure that all instrument settings are standardized and are comparable between one another.The PIPs used in this work were all running the same custom software version (v.1701) for processing the raw images from the device into higherlevel derived products.
One advantage the PIP has over other comparable disdrometers is the wide, 2-m observation path between the camera and bulb, which allows for hydrometeors to fall unimpeded from wind turbulence caused by the presence of the camera equipment in the scene.As hydrometeors fall between the camera and bulb, their shadows are observed by the camera falling in front of the bright halogen light, allowing for particle shape, size distributions and fall speeds to be observed when considering consecutive frames.A composite of hydrometers observed by the PIP at IMP are shown in Figures 2b and 2c including both solid precipitation and sleet.
The PIP software retrieves the mass of each falling particle by coupling particle microphysical observations with an empirically determined equivalent density relation.This equivalent density relation is determined using a parameterization that includes boundary conditions of raindrop terminal fall speed theory (Atlas & Ulbrich, 1977), and empirically derived snowfall properties (Pettersen, Bliven, et al., 2020).The PIP observations of PSDs and vertical fall speeds in conjunction with the parameterization are used to retrieve the volume equivalent density (additional details of this parameterization are provided in Pettersen, Bliven, et al., 2020, Section 2.2.1, andPettersen et al., 2021, Section 2.4).The mean density value (i.e., eD) is the volume-weighted average of the equivalent density distribution of all particles that fall over a one-minute period.This mean density can then be used to classify the hydrometeor phase (Pettersen et al., 2021), as well as obtain liquid water equivalent surface precipitation rates.

Surface Meteorology
Observations of 2-m air temperature (°C), air pressure (hPa), relative humidity (%), WS (m s 1 ) and wind direction (degrees) have also been collected and made available from each of these study sites alongside the PIP data.These ancillary meteorologic variables were collected from nearby weather stations operating at each site, and were converted from their original data formats into NetCDF files with the same metadata conventions and standards as those used in the PIP products.Packaged into similarly formatted daily files, these observations can then be analyzed in combination with the PIP data to provide additional context regarding local weather conditions (e.g., Section 4.1).Note that observations of pressure and wind direction were not recorded at Gaylord, and observations of relative humidity and pressure were not recorded at Haukeliseter.For additional data set details, including the temporal resolution and data coverage periods for each of these MET products, please see Table 2.

Data Conversion
To facilitate the efficient and accessible dissemination of PIP observations, we first parse the derived particle observations from the device and standardize them from a proprietary ASCII format into the more universally recognized NetCDF-4 format with associated metadata descriptions of each variable.Developed by the University Corporation for Atmospheric Research, NetCDF is an open standard set of software libraries which allows for improved sharing of array-oriented scientific data through enhanced documentation, compression, and distribution (Rew et al., 2006).This standardization process allows for broader compatibility and easier data sharing within the academic community.However, to perform this conversion, we must first understand the format of the raw PIP data and its derived, higher-order products.
PIP data is provided across four primary levels.The lowest level product (Level 1; L1) includes the raw video data recorded by the high-speed camera, where 8-bit gray-scale frames from the video are saved in compressed.pivvideo formats in 10 minute intervals.The Level 2 product (L2) ingests the compressed L1 videos to produce timestamped particle tables of 36 particle characteristics (containing attributes such as particle position, diameter, shape properties, and timestamp), for each falling hydrometeor that enters the camera's FOV.The Level 3 product (L3) ingests the L2 particle tables to track particle movement and, in turn, derive vertical velocity and PSD tables for each minute.Finally, the Level 4 product (L4) uses the information in the L3 tables to produce estimates of volume-weighted particle density, phase classification, and snowfall and rainfall rate estimates.Each of these products are highlighted in red on the left side of the Figure 3 data conversion pipeline.
Following a quality assurance (QA) procedure (elaborated further in Section 3.2), the data is transformed into daily NetCDF-4 files adhering to the standard CF conventions (version 1.10).Additionally, the files are compressed using a level 2 deflation flag to optimize for a smaller, chunked file.These converted files are 70% smaller, on average, when compared to their corresponding unprocessed L3 and L4 data files.For more details regarding the CF-1.10 conventions, please see Eaton and Gregory (2022).The conversion processed was applied to all files at all sites using a combination of bash and Python (version 3.11).
The internal structure of each converted NetCDF file is identical, with latitude, longitude and time variables containing the spatiotemporal information, a data variable containing one of the L3/L4 PIP products, and bin size information (i.e., bin_centers, bin_edges) representing different particle diameter bins.A list of all derived PIP variable names and their descriptions are shown in Table 3.Each daily file has exactly 1,440 time steps (1,440 min in a day), with 131 bins (up to 26 mm diameter particles) for two-dimensional (2D) variables.While the vast majority of observed particles at these locations are much smaller than 26 mm in diameter, we note that large icebased aggregates above 26 mm can sometimes occur and are saturated to the maximum bin size due to PIP camera visibility limitations.Missing data is marked as NaN.An illustration of the aforementioned 2D distribution variables for the MQT, FIN, and YFB sites, encompassing PSD, VVD, and rho, is shown in Figure 4.
The naming conventions for the converted daily files are delineated below for each site-year combination.Here, XXX symbolizes the PIP instrument number allocated to the equipment, while YYYYMMDD denotes the date.Each filename culminates with the designation: min, rho, psd, or vvd, corresponding to: the one-dimensional minute-scale derived precipitation products, effective density distributions, PSDs, and VVDs, respectively.

Quality Assurance
To produce a high-quality, error-free data set, an intermediate QA analysis is performed at each site before converting the ASCII data into NetCDF.This QA phase consists of three primary steps including (a) temporal alignment, (b) L4 equivalent density adjustment, and (c) outlier removal.
The first QA step, temporal alignment, ensures that each daily file is time consistent with 1,440 time steps, and with each day beginning at midnight and ending at 23:59.The raw ASCII files produced by the PIP software only display entries where there was hydrometeor activity detected at some period in each minute, while this new format ensures a consistent temporal time step of 1-min for all files, filled with NaN where data does not exist.Days with no detected precipitation from the PIP are not included in the final data set.
The second step, the equivalent density adjustment, was applied to the L4 edensity_lwe_rate product that contain derived estimates of volume-weighted particle density, rainfall rates and snowfall rates (liquid water equivalent or LWE) from the PIP.In the automated conversion process used by the PIP software, which converts information from the L3 particle tables to produce the L4 estimates, we identified a timing issue where gaps in detected hydrometeors in the L3 product tables resulted in an off-by-one-minute shift in the derived L4 products.Over time, for cases with multiple precipitation gaps, this timing issue leads to a drift of 10-20 min and ill-positioned volume-weighted density, snowfall and rainfall rates by the end of a given day.This timing offset was corrected-for using a greedy cross-correlation timing shift that was applied in 6-hourly chunks to each daily file to produce an adjusted_edensity_lwe_rate product.This technique is commonly used in signal analysis applications, with the goal of finding an optimal offset which maximizes the signal-to-noise ratio between two data sets (Yoo & Han, 2009).This adjustment process was shown to improve overall Pearson correlations between the L3 and L4 product density estimates by more than 0.1 on average (Figure 5), and produce more realistic peaks and troughs in snowfall and rainfall rates throughout the day when compared to independent observations at sites with a collocated Micro Rain Radar (MRR) system (Kneifel et al., 2011;Peters et al., 2002).
In the third step, outlier removal, we correct each file by masking erroneous observations (e.g., minutes with negative equivalent density values, equivalent density >1, or unphysical negative snowfall/rainfall rates), and check that each daily file has at least one non-NaN entry in it to ensure we aren't providing empty data files.A manual inspection of each site's summary statistics is also performed to visually identify and remove erroneous observations from the final data set.Additionally, as was needed in the case of the NSA data, we perform a check for measurement artifacts in the PIP observations.We found that on some days at NSA, due to assumed external interference with the device, the PIP would display unphysical large particle counts in the lowest diameter bin, with tens to hundreds of thousands of particles observed in a single minute.To address this issue, these particle bin counts were examined for each daily file, and cases where there were more than 2500 particles counted in a single minute (with this value calculated via a sensitivity test) were flagged as outliers.Isolating and masking these cases

Analysis
To demonstrate the physical consistency of the PIP data set with independent data sources, we have provided an analysis of select phase-transition case studies at MQT, showcasing various PIP-observed L3 and L4 products alongside collocated vertical radar measurements, surface meteorologic observations (MET), and reanalysis estimates from ERA-5 (Hersbach et al., 2020).Furthermore, we perform a comprehensive site intercomparison of PIP bulk precipitation features to discern the principal variations in precipitation characteristics across distinct regional climates.

Single Phase Transition Event (21 November 2019)
The first phase transition event is a rain-to-snow transition that took place 21-22 November 2019 (Figure 6).Starting at 09:40 UTC until 13:00 UTC, the vertical pointing K-band MRR detected a strong bright band in reflectivity and enhanced reflectivity values below 2 km, suggesting a melting layer, consistent with an increase in particle fall speeds observed in the Doppler velocity field at this altitude.Following this period, the vertical extent of enhanced reflectivity descends toward the surface until it completely disappears at around 16:00 UTC (first dashed black line).This period also corresponds to observations of rainfall in the PIP, as measured by the large particle VVDs and small PSDs with a narrow width, and is consistent with the warm surface/atmospheric temperatures reported by both the MET and ERA5.
Between 16:00 UTC and 19:30 UTC (second vertical black dashed line), a deep, high-intensity cell marks the beginning of the phase transition event.Here, we note a broader distribution of PIP PSDs with larger particles (snow and ice crystals), lower fall speeds and reduced effective density values.Accordingly, there exists a clear shift from rainfall to snowfall in the PIP L4 products during this time (i.e., a shift from non-zero rain-rate values to non-zero non-rain-rate values).This period also displays a decreasing surface temperature to 0°C, and a similar change in the ERA5 atmospheric temperature profile as relatively high wind speeds move a cool air mass over the measurement site.Following 19:30 UTC until around 09:30 UTC on November 22, we note a relative uniformity in the PIP PSD, VVD and rho estimates as surface temperatures continue to decrease until reaching 5°C, and the precipitation continues to fall as snow until the storm moves away from the site.

Multi-Phase Transition Event (17 November 2017)
The second phase transition event was a multi-phase, snow-to-rain-to-snow transition that took place 17-18 November 2017 (Figure 7).Beginning at around 13:00 UTC on the 17th, a storm system passes over the measurement site dominated by small reflectivity values and small fall speeds as observed by the MRR, with cold  Earth and Space Science 10.1029/2024EA003538 temperatures, broad PSD distributions (0.1-5 mm in diameter) and small VVDs (0-1 m/s) observed by the PIP.These conditions suggest the presence of falling snow, which is also identified in the L4 PIP product during this period.
At 16:30 UTC on the 17th (the first black dashed line), the MRR profile displays a bright band of reflectivity just below 2 km, with streaks of enhanced reflectivity values extending down toward the surface.The MRR also displays a pattern similar to the previous case, with increased fall speeds in this region and surface temperatures above 0°C.As noted in the ERA5 atmospheric temperature profiles, a pocket of warm air is advected across the region (0°isotherm between 750 hPa and 900 hPa) which triggers the atmospheric phase-transition.This transition is clearly captured by the PIP via the narrow PSDs with small particles and large VVDs (along with the nonzero rainfall rate noted in the L4 product).
After midnight on the 17th (the second dashed black line), the warm air mass moves away from the site and surface temperatures drop back down below 0°C.This temperature change triggers the second phase-transition event (rain-to-snow) as noted by the broader PSDs (0.1-10 mm in diameter), lower VVD values (0-2 m/s), and reduced effective density estimates (<0.4) after this period.In instances of complex phase transitions, the PIP data not only aligns well with independent, ancillary data sets (e.g., profiling radar surface measurements, and reanalysis products), but also offers a more comprehensive view of the fine-scale particle microphysical processes occurring during these events at very high temporal resolution.

Bulk Characteristics
By gathering data over several years and across different continents, we've created a data set that offers a clear advantage by presenting a comprehensive collection of observations revealing diverse precipitation regimes in various regional climates.To highlight the broad differences in particle microphysical properties across these sites, we compare L3 and L4 PIP-derived characteristics across all years.
First, we examine differences in the shape of each site's snowfall PSDs, modeled using the inverse exponential function from Equation 1, where N(D) is the particle concentration per unit particle size, N 0 is the intercept parameter, and λ is the slope.Similar to Pettersen, Bliven, et al. (2020), these values are calculated over contiguous 5-min intervals throughout each day (a similar temporal scale to the time it takes precipitation processes to change) to more easily find a well-defined solution to the curve.While the inverse exponential fitting method may not capture all possible snowfall PSDs at each site, as indicated by Duffy and Posselt (2022), who noted enhancements with a modified gamma function for snowfall aggregates, it remains as the most commonly used technique in current snowfall studies (Cooper et al., 2017(Cooper et al., , 2021;;Pettersen, Kulie, et al., 2020).Snowfall cases were selected by constraining the data set to periods with an average rho value below 0.4 (i.e., a low-density threshold consistent with snowfall observations, Pettersen et al., 2021) over the 5-min interval. (1) The resulting log-scaled N 0 and λ parameters are plotted for each site in the normalized two-dimensional histograms in Figure 8, where we note similar distributions at the geographically adjacent MQT and APX sites (note that the MQT distribution is smoother as it has a larger sample), with the highest density around Log 10 (λ) = 0.2 (mm 1 ), Log 10 (N 0 ) = 2.5 (m 3 mm 1 ), and a wide range in values ( 0.4 < Log 10 (λ) < 0.15) (mm 1 ), (1 < Log 10 (N 0 ) < 4) (m 3 mm 1 ).However, there exists large differences in the shape of these distributions.OLY for instance, displays a bimodal N 0 -λ relationship, IMP has a tighter λ distribution and higher slope, and NSA displays a concentration of small intercept terms.Interestingly, many sites display similar concentrated density wedges of values around Log 10 (λ) = 0.1 (mm 1 ) and Log 10 (N 0 ) = 1 (m 3 mm 1 ) (e.g., at MQT, APX, FIN, NSA, and KIS), and Log 10 (λ) = 0.2 (mm 1 ) and Log 10 (N 0 ) = 2.5 (m 3 mm 1 ) (e.g., at YFB, FIN, KIS, and MQT).
In examining the Kernel Density Estimation plots for these two parameters across all sites (Figure 9), we can conduct a more direct comparative analysis of their distributions (Chen, 2017).For instance, a site that commonly observes both large N 0 and λ values often experiences snowfall events with numerous fine-grained snowflakes, while a site with both small N 0 and λ would more commonly experience events with fewer, but larger particles.
In panel a, we note a general peak in λ between 0.7 and 0.8 (mm 1 ) for all sites.However, there are differences between locations.For instance, OLY displays a bimodal λ distribution with a peak at 0.8 (mm 1 ) and another around 1.3 (mm 1 ), likely stemming from the diverse forms of precipitation found in a high-altitude, mountainous setting adjacent to the ocean (which primarily experienced sleet and rain).HAK also displays a slightly lower peak and wider distribution shifted to the right, which may be the result of an increased frequency of highintensity, intermittent snowfall events enhanced by the local topography which consists of nearby fjords (Schirle et al., 2019).We note a similar pattern for the N 0 distributions in panel b), with a mostly log-normal distribution across all sites centered around a value of 2.5 (m 3 mm 1 ).There exists a slightly lower concentration of smaller particles at NSA, along with the widest distribution noted once again at OLY. ICP displays a minor right-shifted distribution with a higher number of smaller particles.
Upon analyzing the full set of L4 products for both rain and snow (comprising minute-scale equivalent density, rainfall rate, and LWE snowfall rate) depicted in Figure 10, it is evident that the majority of density values observed at each site are below 20%, suggesting primarily snowfall occurrences.MQT, FIN, and OLY display a bimodality in equivalent density below 20%, with one peak just above zero and another at 10%, while the other sites display a gamma distribution with peaks around 10%.Further, OLY, IMP, MQT, and ICP also exhibit an increase in frequency at around 60% and 100% from mixed-phase and liquid precipitation events that occurred at each site.In Figure 10b, the frequency distributions of rainfall rates indicate that sites such as ICP and IMP often experienced intense rainfall events, whereas sites like KIS and NSA rarely experienced rainfall events exceeding 1 mm per hour.In Figure 10c, we note that the snowfall rate frequencies group sites into three main categories based on (a) high intensity events (OLY and ICP); (b) medium intensity events (HAK, MQT, IMP, and YFB); and (c) low intensity (FIN, KIS, APX, and NSA).The increased variability in KIS and NSA above 2 mm for both snowfall and rainfall is a consequence of the infrequency of intense precipitation events of either phase at these locations.

Applications
We posit that this comprehensive NH PIP data set has great potential for advancing atmospheric precipitation research in subsequent studies.Incorporating detailed surface observations of macro-and microphysical properties for both rain and snow can notably improve weather prediction models (Morrison et al., 2020;Stoelinga et al., 2003;Wilson & Ballard, 1999).Additionally, the high temporal resolution in the observed PSDs, VVDs, and effective density distributions can inform model microphysical parameterizations, thereby improving the precision of short-term weather forecasts (Straka, 2009).
Further, this data set could be leveraged for the calibration and validation of remote sensing instruments, and the development of more robust remote sensing retrieval algorithms.For instance, the precursor systems to the PIP (i.e., the SVI and Precipitation Video Imager) have previously been effectively used in this context as part of the GPM Cold Season Precipitation Experiment (GCPEx; (Skofronick-Jackson et al., 2015)).Remote sensing instruments onboard satellites or located at ground-based stations rely on algorithms to make assumptions about precipitation phase and subsequent microphysical properties.The PIP data set described here is a robust observational repository covering diverse geographic and environmental conditions that will serve as a comprehensive a priori reference for fine-tuning these algorithms (Cooper et al., 2017;Noh et al., 2011;Wood & L'Ecuyer, 2021;Wood et al., 2014).Additionally, each PIP site has complementary, collocated instrumentation (including surface MET observations and vertical profiling radars), which can be leveraged with the PIP data set for additional environmental context.This comprehensive PIP data set will also offer new insights into the bulk characteristics of microphysical properties that govern the formation and evolution of different types of precipitation under varying environmental and thermodynamic conditions.Understanding these microphysical properties is critically important, as they impact global precipitation processes and drive the overarching hydrological cycle.For instance, we include an example in Figure 11 of how the PIP data set can be interrogated to identify modes of precipitation variability in a manner similar to Dolan et al. (2018), but for snowfall (as opposed to rain).
Passing a set of variables (i.e., N 0 , λ, effective density (rho), fall speed (Fs), snowfall rate (Sr), and total particle counts (Nt)) from the PIP data set through a simple primary component analysis, we can extract the primary components (PCs) which represent the lower-dimensional embeddings of relationships between the inputs.In this case, the first three PCs account for 95% of the variability in the entire data set (55%, 24%, 16%, respectively) with distinct density clusters forming in 2D histograms of each PC in Figures 11a-11c.Examining the Empirical Orthogonal Functions in Figure 11d, we can evaluate the contributions to explained variability between different inputs and cluster similar events together.These clusters allow us to characterize the dominant precipitating mechanisms at different locations by defining groupings that can then be tied back to physical processes using ancillary data.

Conclusion
In this work, we present a comprehensive particle microphysical data set spanning 10 study sites over 10 years.
The data set has been carefully curated and packaged into a widely accessible standardized format, with a common time-step and a consistent, CF-compliant naming pattern.The data set comprises a set of PIP L3 products including PSDs, VVDs, effective density distributions, as well as their corresponding derived PIP L4 products: minute-scale volume-weighted density, rainfall and LWE snowfall rate estimates.The QA procedure Earth and Space Science 10.1029/2024EA003538 masked a variety of outlier data points from PIP observation errors, and the temporal alignment step fixed a timing issue between the L3 and L4 products.The resulting data set displays more physically consistent distributions of microphysical properties with fewer outliers, and exhibits a consistent one-minute time step across all days.
The case studies presented here demonstrate the alignment of microphysical properties in this data set with independent, ancillary variables from collocated profiling radar, surface MET observations, and ERA-5 reanalysis data products.Preliminary analysis underscored that while overarching microphysical distributions are similar, notable variations exist across sites.Such variability is anticipated, given the distinct regional climates observed across different continents and the wide latitudinal range, leading to a comprehensive data set that encapsulates diverse snowfall and rainfall patterns.This curated PIP data set acts as a high-quality reference of over 1 million precipitating minutes (equivalent to two consecutive years of continuous precipitation) that can be used in future studies as training data for machine learning models, as an a priori reference data set for Bayesian retrievals, or as a diverse observational reference to compare modes of precipitation variability at various spatiotemporal scales.This study was primarily supported by a NASA New (Early Career) Investigator Program (NIP) Grant (80NSSC22K0789), NASA Precipitation Measurement Mission Science Team Grants (80NSSC19K0712 and 80NSSC22K0789), and the Global Precipitation Mission Ground Validation data initiative (80NSSC18K0701), with additional support provided by the Natural Sciences and Engineering Research Council of Canada (577912).The scientific results and conclusions, as well as any views or opinions expressed herein, are those of the authors and do not necessarily reflect those of NOAA or the Department of Commerce.

Figure 1 .
Figure 1.(a) Northern Hemisphere map showing the location of each study site; and (b) Gantt chart of the final available observational sample from each location.
The NASA Investigation of Microphysics and Precipitation for Atlantic Coast-Threatening Snowstorms (IM-PACTS or IMP) is the final field campaign used as a source of PIP observations in this study.The objective of IMPACTS is to analyze wintertime snowstorms and East Coast cyclones, with the specific goal of enhancing remote sensing capabilities and snowfall forecasts from the observations collected over the winter periods (December to March) of 2020-2023 (L. A. McMurdie et al., 2022).The IMP campaign incorporated diverse observations from sources such as aircraft, satellites, computer simulations, and direct in situ measurements (L.McMurdie, 2020).Specifically, the in situ PIP data were gathered in an open field near the University of Connecticut in Storrs, Connecticut (41.807°N, 72.294°E, 150 m.a.s.l.

Figure 2 .
Figure 2. (a) Photo of the precipitation imaging package (PIP) deployed at Marquette, Michigan; (b) A composite of solid precipitation observed by the PIP installed at IMP on 24 December 2021; and (c) A composite of sleet particles observed by the IMP PIP on 17 January 2022.

Figure 3 .
Figure 3. Precipitation imaging package (PIP) data conversion pipeline.PIP Level 1-4 data in red on the left, converted network Common Data Form files in blue on the right, and intermediate processing steps in gray (far right).* Encapsulates additional standardization steps (described in Section 3.2) for improving the consistency of the final converted data set.

Figure 4 .
Figure 4. Composite normalized 2D histograms of precipitation imaging package observations from Marquette, Michigan, Finland, and YFB, including particle size distributions, vertical velocity distributions, and effective density distributions, all plotted as a function of particle mean diameter.
days in 2019 at NSA) further enhanced the quality of the final data set and produced much more physically consistent PSDs, and corresponding L4 products.

Figure 5 .
Figure 5. Density scatterplots showing the impact of the particle-density timing correction when applied to the derived L4 volume-weighted equivalent density values (eD) compared to their respective L3 effective density distributions (rho).(a) Original eD v. rho; (b) adjusted eD v. rho; and (c) their difference (adjusted-original).

Figure 6 .
Figure 6.Multipanel showing a phase transitioning event observed by the precipitation imaging package (PIP) at Marquette, Michigan spanning 21-22 November 2019.This event is highlighted by the two dashed vertical black lines, which depict (a) Micro Rain Radar (MRR) reflectivity; (b) MRR Doppler velocity; (c) PIP PSDs; (d) PIP VVDs; (e) PIP rho distributions; (f) PIP-derived snowfall and rainfall rates; (g) ERA5 atmospheric temperature profiles (dashed contours showing the 0-degree isotherm); and (h) surface MET observations of 2 m temperature (T), dew point and wind speed.

Figure 7 .
Figure 7. Similar to Figure 6 for an example multi-phase-transition event at Marquette, Michigan spanning 17-18 November 2017.The dashed vertical black lines depict the locations of the two phase transition events.

Figure 8 .
Figure 8. Composite 2D histograms of log-scaled particle size distribution inverse exponential function parameters N 0 and λ for each site from Figure 1.

Figure 9 .
Figure 9. Normalized kernel density estimates of particle size distribution parameters (a) λ; and (b) Log 10 (N 0 ), for the entire spatiotemporal domain of observations.

Figure 10 .
Figure 10.Similar to Figure 9, except for L4 products including: (a) the equivalent density mass retrieval estimates; (b) rainfall rates; and (c) snowfall rates.

Figure 11 .
Figure 11.Example application of the curated precipitation imaging package data set, where a primary component analysis is applied to all snowing minutes from all sites, and the derived primary components and empirical orthogonal functions (EOFs) are plotted (in standard anomalies).(a) PC1 v. PC2; (b) PC2 v. PC3; (c) PC3 v. PC1; and (d) the EOFs for each normalized input feature (note that the sign of each anomaly is arbitrary).

Table 1
Summary Descriptions of the Precipitation Imaging Package Study Sites Incorporated Into This Data SetData from the Finland (FIN) site was sourced from the Hyytiälä Forest Research Station (61.845°N, 24.287°E) in southern Finland.Established between 1994 and 1996, the University of Helsinki's Department of Forest Sciences has overseen the research station as part of the Station for Measuring Ecosystem Atmosphere Relations (SMEARii) campaign (Petäjä

Table 2
Summary of Available Surface Meteorological Observations Across All Sites Including Measurements of Temperature (t;°C), Pressure (p; hPa), Relative Humidity (rh; %), Wind Speed (ws; m s 1 ) and Wind Direction (wd; Degrees)

Table 3
Summary Descriptions of the Derived Precipitation Imaging Package Variables