Statistical decision analysis for flight decision support: The SPartICus campaign



[1] Field campaigns in atmospheric science typically require making challenging decisions about how best to deploy limited resources, especially aircraft flight hours. Algorithmic decision tools have shown the potential to outperform traditional heuristic approaches to allocating limited flight hours in field campaigns. The present study examines the utility of algorithmic decision tools in an application to the Atmospheric Radiation Measurement (ARM) Small Particles in Cirrus (SPartICus) campaign, which sampled cirrus clouds over the ARM Southern Great Plains (SGP) site between January and June 2010. Probabilistic forecasts of suitable data collection conditions were generated using relative humidity forecasts from the Global Forecast System (GFS) and self-organizing maps. An optimization procedure based on dynamic programming was then used to generate day-ahead fly/no-fly decisions for research flights over the SGP site. The quality of flight decisions thus generated were compared with those made by the SPartICus science team. Results showed that the algorithmic decision tool would have delivered 11% more optimal data while shortening the length of the campaign season by 29 days and reducing the per-day expenditure of investigator time on activities of forecasting and decision-making.

1 Introduction

[2] A meteorological field campaign is an expensive undertaking. The challenge of optimally allocating and distributing resources impacts every step of a meteorological field campaign from the proposal stage to the post-experiment analysis. Before the campaign, the funding agency and infrastructure personnel must determine the amount of resources to allocate. During the campaign, decision makers are expected to distribute the allocated resources in a way that optimizes data collection. Finally, the allocation and distribution of resources impacts the amount of optimal data available for post-experiment analysis.

[3] These challenges are especially acute in field campaigns using aircraft to collect data in situ. Often these field campaigns require data to be collected under low-probability, imperfectly forecast-able atmospheric conditions. Limited budgets dictate a finite number of available flight hours and a finite number of days in the field. The use of aircraft prevents instantaneous decision-making: resource deployment decisions must be made with some lead time. The constraints of atmospheric conditions and limited resources combine to challenge researchers seeking to maximize the amount of optimal data collected.

[4] Specifically, researchers in these field campaigns are faced with two challenges when evaluating atmospheric conditions on a particular day. First, researchers must use weather forecasting techniques to assess the expected conditions of the atmosphere for each day. Second, researchers must decide, based on factors such as the number of available flight hours and number of days remaining in the campaign, whether conditions are favorable enough to justify using one of the scarce flights from the budget. Researchers working on these field campaigns are well trained in evaluating atmospheric conditions, but typically not in evaluating the opportunity costs associated with flying or not flying.

[5] The use of a decision algorithm has shown potential as a tool to optimize resource allocation and distribution. Small et al. [2011] integrated probabilistic forecasting methods with optimization techniques adapted from operations research to create an automated weather decision system that made decision recommendations of whether to fly or not on any given day in the RACORO campaign [Vogelmann et al., 2012]. The system was then tested retrospectively, comparing the decisions of the automated system to those taken by investigators. Results from that work showed that the algorithmic decision procedure showed a 66% improvement in skill over climatology, compared with traditional heuristic decision techniques, leading to a 21% improvement in data yield from a fixed resource budget.

[6] The SPartICus campaign [Mace et al., 2009] offers a second opportunity to test the algorithmic approach. The SPartICus campaign sought measurements of cirrus clouds concurrently using aircraft and surface-based instruments. To obtain the surface-based measurements, researchers preferred cirrus clouds aloft with a cloud-free lower troposphere. Like the RACORO campaign, the SPartICus campaign required specific conditions that, when combined, were characterized by a low probability of occurrence. A decision algorithm analogous to the one used by Small et al. was used to make fly/no-fly recommendations for the SPartICus campaign. Model forecasts, translated according to a historical conditional probability analysis, were used to produce probabilistic forecasts of favorable conditions at the Atmospheric Radiation Measurement (ARM) Southern Great Plains (SGP) site. The minimum probability of favorable conditions required to fly—the “hurdle probability”—was calculated as a function of the number of days left in the campaign, the number of flights remaining in the budget, the climatological probability of favorable conditions, and flight crew limitations. The algorithm recommends flying if and only if the forecast probability of favorable conditions exceeds this hurdle probability. During January–June 2010, concurrently with the SPartICus campaign, this algorithm was used to generate daily fly/no-fly recommendations (the recommendations were generated but not used). This paper assesses the performance of the decision algorithm and compares it with the performance of the actual decisions used in the SPartICus campaign.

2 Methodology

2.1 Problem Definition

[7] The Small Particles in Cirrus (SPartICus) project required data collection flights into cirrus clouds over the ARM Southern Great Plains (SGP) site near Lamont, Oklahoma. To best meet project objectives, SPartICus investigators sought cirrus clouds without underlying low-level clouds. Under these ideal conditions, clouds could be observed remotely by surface-based instruments in addition to the in situ airplane sampling.

[8] To calibrate and deploy the algorithmic decision rule, the type of conditions suitable for data collection must be defined rigorously. For the purposes of our application, ideal cloud conditions were defined using an interpretation of the parameters specified in the SPartICus Science and Operations Plan [Mace et al., 2009].

[9] The following conditions at the SGP site were requisite for an “optimal hour”: maximum cloud fraction between 6 km and 13.3 km greater than 20% and maximum cloud fraction below 6 km less than 20%, according to the ARM Climate Modeling Best Estimate (CMBE) cloud fraction data [Xie et al., 2010]. An “optimal day” is defined as a day with 4 optimal hours in the 9 h range between 16Z and 01Z. The range between 16Z and 01Z corresponds to the range between 10 A.M. and 7 P.M. CST or between 11 A.M. and 8 P.M. CDT. Figure 1 shows one example each of the time-height section of cloud fraction for an “optimal” day (Figure 1a) and a “suboptimal” day (Figure 1b).

Figure 1.

Cloud fraction data between 16Z and 00Z on 2 days during the SPartICus campaign. (a) 23 March 2010, an example of a day classified as “optimal,” based on the selection criteria. For the period from 17Z through 00Z, thick cirrus clouds were present over the SGP site with little to no low-level clouds. (b) 6 June 2010, an example of a day classified as “suboptimal.” There were 4 h with sufficient upper-level cirrus clouds: 17Z, 19Z, 20Z, and 22Z. However, of those 4 h, only 22Z had a clear enough lower troposphere to be classified as an optimal hour. At 17Z, 19Z, and 20Z, the low-level clouds were too thick (cloud fractions above 20%).

[10] Building on the success of the forecasting method used to predict boundary-layer clouds during the RACORO campaign [Stefik, 2010], relative humidity (RH) profiles were used as the sole indicator for cloud conditions. The ARM Merged Sounding Value Added Product RH data [Troyan, 2012], available from 1998 to 2007 over the SGP site, were employed as the source of historical data used to calibrate the probabilistic forecasting system.

2.2 Translating Weather Forecasts Into Probabilities of Success: The Self-Organizing Map (SOM) Method

[11] The self-organizing map (SOM) [Kohonen, 2001; Johnson et al., 2008] method was used to reduce the large, high-dimensional RH data set into a manageable, low-dimensional number of representative profiles. Using a neural network and iterative training, the SOM routine distilled the large set of 20Z RH data into 24 canonical RH profiles, oriented in a 4-by-6 grid with similar profiles in close proximity. The 4-by-6 SOM was chosen to emulate the decision algorithm implemented for the RACORO experiment, which used a 4-by-6 SOM. A sensitivity analysis [Stefik, 2010] suggested that algorithm performance is not sensitive to SOM size between 9 profiles and 64 profiles. For each day in the period from 1998 to 2007, the best-fit SOM profile was determined for the 20Z RH profile.

[12] To determine the probability of an “optimal” day conditional on the SOM pattern present at 20Z, the RH data were compared to actual cloud conditions for each day. The favorability of cloud conditions for each day was determined from ARM Climate Modeling Best Estimate (CMBE) cloud fraction data [Xie et al., 2010]. By classifying the 20Z RH profile for each day into one of the SOM categories and determining whether each day is an “optimal day,” the probability of favorable cloud conditions was determined for each of the 24 SOM profiles. Conditional probabilities for “optimal” conditions varied from 83% to 0% among the 24 SOM profiles, suggesting that the RH profiles, successfully discriminated by the system, are effective indicators of cloud conditions. A chi-squared test of a Monte Carlo simulation of a theoretical uniformly distributed set of SOM pattern conditional probabilities indicated that the SOM is non-uniform at extremely low p-values. The non-uniformity of the SOM distribution indicates that some profiles are high probability and some are low probability, allowing for the generation of sharp probabilistic forecasts.

[13] Finally, the conditional probability of “optimal” conditions given a certain GFS forecast RH profile was calculated. For each day from 2001 to 2007, the 33 h ahead RH forecast [Yang et al., 2006] from the 12Z GFS was classified as one of the 24 characteristic SOM patterns. The forecast RH profiles were then compared to the observed profiles to determine the joint distribution of 33 h ahead SOM patterns predicted and actual SOM patterns realized. In general, P(SOMr=j|SOMp=i) was calculated for i,j=1,⋅,24, where SOMr is the realized SOM pattern and SOMp is the predicted SOM pattern. From the joint distribution of SOM patterns predicted and SOM patterns realized, the conditional probability of favorable cloud conditions given a specific GFS prediction was calculated as a weighted average using the equation

display math(1)

This calibration of the GFS forecasts facilitates a characterization of uncertainty and accounts for the model's limited ability to accurately forecast relative humidity. Because of limitations in the GFS RH forecasting system, the probability of realization of a particular SOM pattern conditional on the simulation of that SOM pattern can be quite small. For example, given the forecast of SOM #1 by the 33 h ahead GFS forecast, SOM #1 is realized with only 30% probability. If the GFS were a crystal ball, this value would be 100%. If SOM #1 is rarely simulated by the GFS but occurs frequently, the weighted average will account for this. Perhaps, for example, SOM #1 occurs a significant amount of the time when the GFS simulates SOM #2 or SOM #5, which are similar to SOM #1. Equation (1) accounts for these inaccuracies in the GFS and the time-delay associated with using a 21Z simulation for 20Z conditions. Figure 2 shows the 24 canonical SOM patterns and their associated probability of optimal conditions.

Figure 2.

The 24 canonical SOM patterns. The first number above the pattern indicates the probability of a day being categorized as optimal, given a 20Z RH profile classified as that SOM pattern. The second number above the pattern indicates the probability of a day being classified as optimal, given a 21Z simulation of that pattern from the previous day's 12Z GFS. The decrease in the sharpness of the probability distribution after conditioning on the GFS forecast is indicative of the model's difficulty in accurately predicting the RH profile.

[14] Henceforth, the term “forecast” is used to refer to the postcalibration forecast used by the decision system, and the term “simulation” is used to refer to the RH forecast from the GFS.

2.3 Dynamic Programming: Introducing Decision Theory

[15] Dynamic programming is a technique that solves complex problems by breaking them into simpler, solvable sub-problems [Dasgupta et al., 2006]. Dynamic programming is a statistically robust method that can solve complex stochastic optimization problems. The technique has been used across diverse sets of fields, from the management of pension funds [Haberman and Sung, 1994] to determining whether or not to punt on fourth down in American football [Romer, 2006]. Previous applications of dynamic programming to weather uncertainty include problems in agricultural planning [Katz et al., 1987; Easterling and Mjelde, 1987; Wilks et al., 1993; Wilks and Wolfe, 1998], hurricane evacuation [Regnier and Harr, 2006], and reservoir planning [Raman and Chandramouli, 1996]. Using dynamic programming, the implications of any flight decision can be broken into two parts: the implications for today and the implications for the rest of the experiment.

[16] On any specific day in the experiment, the optimal decision can be described as a cost-benefit comparison: given the forecast, does the benefit of flying today and using one flight outweigh the expected cost as measured in terms of opportunity lost for future successes? As in the retrospective analysis of the RACORO campaign [Small et al. 2011], dynamic programming techniques were used to calculate the hurdle probability for each day, defined as the minimum probability of suitable conditions needed to justify the expenditure of one flight from the limited resource budget.

[17] Calculating the hurdle probability for each day requires calculating the “value” of each possible state of the experiment. Let V(d,f) be defined as the expected number of successful flights to be launched with a combination of d remaining days and f remaining flights, conditioned on following an optimal future path. Given an uncertain forecast, the climatological probability of occurrence of each SOM pattern and the climatological probability of “optimal” conditions given the occurrence of each SOM pattern, the expected number of successful flights can be determined for any combination of remaining days and flights using dynamic programming.

[18] In general, according to Bellman's principle of optimality [Bellman, 1957],

display math(2)

where sd is the forecast signal on day d, math formula is a binary variable that takes the value of 1 if a flight is made on day d and the value of 0 if a flight is not made, and xd is a binary variable that takes the value of 1 if conditions are optimal on day d and the value of 0 if conditions are not optimal.

[19] Using a flight on a given day is only worthwhile if the probability of success on that day exceeds the decrease in expected future success associated with having one fewer flight in the budget. Thus, the difference in expected future success as a result of having one fewer flight is the hurdle probability, HP; the HP must be exceeded for a flight to occur. The hurdle probability at any experiment state (d,f) is calculated as

display math(3)

and the probability of a flight being made on any day given the forecast signal sd is

display math(4)

where Pd is the probability of optimal conditions on day d, and HPd is the hurdle probability on day d.

[20] Two boundary conditions can be used to recursively calculate V for all combinations of d and f:

display math(5)

Boundary condition (5) holds because if there are no flights left, there can be no more successful flights.

display math(6)

Boundary condition (6) holds because if the number of flights equals the number of days, researchers will fly every day regardless of the forecast. In this case, the probability of success on each day is equal to the climatological probability of optimal conditions. Using these boundary conditions, backwards propagation can be used to calculate V for all possible combinations of d and f under the condition that optimal decisions were made [Stefik, 2010]. This backwards propagation amounts to solving the large decision tree in reverse time.

[21] The calculation of the value of each day is complicated by Department of Energy (DOE) flight crew regulations. A restriction was implemented to the decision algorithm, preventing flight crews from flying on more than 5 days in any 7 day period (this restriction is more strict than the prevailing DOE regulation, which required a continuous 24 h period off during any 7 day period). This restriction affects the expected value calculations: if there have been several flights in the last 7 days, the expected value going forward is slightly lower than it would be if there had been no flights in the last 7 days because some future options are no longer available. This restriction serves to increase the hurdle probability if several flights have been made recently. In the event that the restriction precludes a flight on a given day, the hurdle probability will be infinite.

2.4 The SPartICus Campaign

[22] During the SPartICus experiment, the research team made 43 research flights, not including flights devoted to tests of aircraft or instruments. The SPartICus Science and Operations Plan indicated that 60% of their available flight hours would be used on flights to the SGP site. Of the 43 flights, 26 were over the SGP site (60%), with the other 17 flights occurring over different parts of the West, often located to coincide with the passage of the CALIPSO satellite. When feasible, flights over the SGP site were timed to coincide with satellite overpasses. Such consideration may have affected the success rate of the forecast team. Because the collection of data over the SGP site was the primary objective of the campaign, the algorithm and its evaluation only considered the flight hours used over the SGP site.

[23] The SPartICus team's flight decisions did not include recommendations from the algorithm (Mace, personal communication, 2010). This allows us to independently and retrospectively make daily recommendations based on the number of flights available if the algorithm's recommendations were followed throughout the experiment.

[24] After a period of test flights, the SPartICus field campaign was set to run from 9 January through 28 April (110 days). Having not used all of their flight hours, the SPartICus team received permission to extend the experiment to include a second period from 2 June through 30 June after a period of aircraft maintenance from 29 April through 1 June.

3 Results and Discussion

3.1 Results: 110 Day Experiment

[25] Each day in the field campaign was categorized post-experiment as an “optimal day” or a “suboptimal day” using the same criteria used in the development of the algorithm. As of the original end date of 28 April, the recommendations from the algorithm would have consumed all 26 flights, yielding 10 flights on optimal days. The SPartICus forecast team had used 20 flights, yielding 8 optimal days.

[26] Figure 3a shows the decisions made and results achieved by the algorithm as a function of the SOM profile present in the atmosphere at 20Z on each day during the 110 day experiment. Figure 3b shows the same information but for the decisions made by the SPartICus team. During the 110 day experiment, 22 days had RH profiles characterized in the top quartile of rank-ordered SOM patterns. In expectation, 11.7 of these days would be optimal days; in realization, 12 of these days were optimal days. The algorithm used 16 of its 26 flights on these days, collecting 9 optimal days. By the experiment's original end date, the SPartICus team had only used 7 of their 20 flights on these days, collecting 6 optimal days. In this context, 9 out of 16 is better than 6 out of 7. Despite the high percentage of successes on these top-quartile days, the low number of flights on these days by the forecast team means the balance of the flights had to be used on days with a lower probability of optimal conditions.

Figure 3.

(a) The decisions made by the algorithm during the 110 day experiment. The 20Z RH profile from each day during the campaign is used to classify each day in one of the 24 SOM patterns. The 24 patterns are then rank-ordered by probability of optimal conditions before conditioning on the GFS simulation (the first number above each SOM pattern in Figure 2); this probability is shown by the black line. The bars show the number of times each SOM pattern occurred, with colors to denote hits (optimal day, decision to fly); misses (suboptimal day, decision to fly); no-fly, optimal days (optimal day, decision to not fly); and no-fly, suboptimal days (suboptimal day, decision to not fly). Relative to the SPartICus team, the decision algorithm shifts flights to days with a higher probability of success. (b) The same information for the decisions made by the SPartICus team during the 110 day experiment. (c) The same information for the decisions made by the SPartICus team during the 139 day experiment (including the 29 day extension).

[27] Each of the 13 worst of the 24 SOM patterns had probabilities of optimal conditions below 10%. There were 73 days in this category in the 110 day experiment. Of these days, 2.1 optimal days were expected; in realization, 3 of these days were optimal. Because model simulations are imperfect, the use of some flights on low-probability days was difficult to avoid. Minimizing the number of these low-probability flights is crucial, however, because any flight used on a low-probability day is a flight that cannot be used on a high-probability day. The algorithm used 5 flights on days in this category, collecting 0 optimal days. The SPartICus team, however, used 11 of its 20 flights on days in this category, collecting 1 optimal day. A consequence of not flying enough on medium-to-high-probability days is that those flights are instead used on low-probability days.

3.2 Results: 139 Day Experiment

[28] Figure 3c shows the same information as Figure 3b but includes the 29 day experiment extension for which the team had six flights remaining. During the 29 day experiment, 7 days had RH profiles characterized in the top quartile of rank-ordered SOM patterns. Of these days, 3.2 optimal days were expected; 2 optimal days were realized. The SPartICus team used none of their remaining six flights on these days.

[29] During the 29 day extension, 16 days had RH profiles characterized as one of the 13 worst SOM patterns with probabilities of optimal conditions below 10%. Of these days, 0.7 optimal days were expected. In realization, one of these days was optimal. The SPartICus team used three flights on days in this category during the extension, collecting 0 optimal days. In total, the SPartICus team used 14 flights (more than half its budget) on days in this category. In particular, the SPartICus team flew on many low-probability days near the end of the experiment, when a surplus of flight hours remained in the budget. Of the SPartICus team's last 11 flights, seven were used on days where the actual RH profile implied a probability of optimal conditions of less than 7%. With these seven flights, their expected number of optimal flights was 0.3; in realization, they got 0 optimal days from these seven flights.

3.3 Discussion of Results

[30] The decisions made by the SPartICus team suggested that they were reluctant to use their flights early in the experiment. Figure 4 shows the hurdle probability faced by the algorithm and the SPartICus team throughout the experiment. The hurdle probability for the forecast team is a function of the flight decisions actually made in the field, while the hurdle probability for the algorithm is a function of the flight decisions the algorithm would have made, had it been used throughout the experiment. The hurdle probability faced by the SPartICus team decreases throughout the experiment, indicating that they were not using enough of their flights. Such a pattern is common in field experiments, with decision makers placing the bar for a “go” decision too high early in the experiment, only to lower the bar drastically towards the end of the experiment. This reluctance to fly early in the experiment has consequences, most notably in the extension of the field campaign and an apparent relaxing of criteria for an “optimal day.”

Figure 4.

The hurdle probability (equation (3)) faced by the algorithm and the SPartICus team during the field campaign. For the SPartICus team, the hurdle probability shown assumes the experiment will end on 28 April and is calculated according to the flight decisions made by the SPartICus team. The two breaks in the green line representing the algorithm's hurdle probability correspond to days where the algorithm would not have been allowed to use a flight without breaking Federal Aviation Administration flight crew restrictions. The green line ends before the end of the experiment because the algorithm would have used its last flight on 19 April.

[31] The SPartICus team's apparent reluctance to fly early in the experiment left the team with six flights at the experiment's original end date, leading the team to extend the experiment. According to an ARM press release, the SPartICus team “decided to extend the campaign for at least two more months based on the amount of remaining flight hours. This reserve was due to suboptimal conditions throughout February, when low clouds and winter storm systems persisted over the Southern Great Plains (SGP) area, preventing the desired cirrus measurements.”

[32] Under an efficient decision system, periods of anomalously good or bad weather are implicit in the future hurdle probability. During a period of bad weather with no flights expended, the hurdle probability will continually fall, progressively directing the science team to use flight hours more liberally. Likewise, during an extended period of exceptionally good conditions, many flights should be used, resulting in an increase to the hurdle probability to entice the science team to be more conservative with their flight hours. The optimizing decision algorithm automatically adjusts the hurdle probability to try to prevent a situation in which many flight hours are left at the end of an experiment.

[33] The construction of the SPartICus experiment was such that the principal investigators and flight crew were based at their home institutions. For these scientists, an extra month “in the field” was not physically in the field. However, during the extension period, SPartICus scientists had to devote time to forecasting and decision-making. Even for an experiment like SPartICus where decisions are made remotely, the human-effort costs of extending the campaign are significant.

3.4 Sensitivity Analyses

3.4.1 “Acceptable” Days

[34] A consequence of the SPartICus team's conservative use of flight hours early in the experiment is that with many flight hours and few days remaining, the decision team seemed to relax the criteria defining an “optimal day.” In particular, the SPartICus team seemed to be more likely towards the end of the experiment to fly on any day when cirrus clouds were present, regardless of whether the lower troposphere was clear. We defined an “acceptable” day as a day with sufficient cirrus clouds but with low-level clouds that prevented a clear line of sight from some of the surface-based instruments to the cirrus clouds. We assumed that a successful research flight on an “acceptable” day partially addresses experiment science objectives.

[35] SOM #4 (Figure 2) offers an example of an RH profile characteristic of an acceptable day: moisture throughout the column. When SOM #4 is simulated for 21Z by the previous day's 12Z GFS, optimal days are expected only 6% of the time, but acceptable days are expected 77% of the time (not shown). During the experiment, including the June extension, SOM #4 was simulated 18 times. From the beginning of the experiment through 15 April, the SPartICus team flew once on the 12 days SOM #4 was simulated. From 16 April through the end of the experiment, the SPartICus team flew on 5 of the 6 days SOM #4 was simulated. The shift in decision-making on these acceptable days suggests a shift in criteria by the SPartICus team. The SPartICus team, having many flights left in their budget and few days, began flying on days only partially meeting the optimal conditions specified in the science and operations plan.

[36] This relaxing of criteria changes the statistics of the experiment. The decision-making pattern of the SPartICus team suggested that flights on acceptable days had some value, but an objective definition of this value was not specified a priori in the science and operations plan.

[37] With an objective definition of the value of days with cirrus but without a clear lower troposphere, the decision algorithm can be modified to account for these days. Retrospectively, we let β be a parameter equal to the value of an acceptable day, represented as a percentage of the value of an optimal day. β was then varied from 0 to 1 in increments of 0.1. β= 0 represents the original case where acceptable days are regarded as worthless, while β = 1 represents the case that completely ignores the low-level cloud criterion. The algorithm was run for each of 11 values of β∈(0.0,0.1,…,1.0). In each case, the algorithm was set to maximize total value, where total value is defined as (1)(# of flights on optimal days) +(β)(# of flights on acceptable days). Regardless of the value of β set before the experiment, the value obtained by the automated algorithm is greater than or equal to the value obtained by the SPartICus team (Figure 5).

Figure 5.

The total value achieved by the algorithm (green line) and the forecast team (red line) as a function of β, where β is a parameter that measures the value of a flight on an acceptable day relative to the value of a flight on an optimal day. The forecast team launched eight flights on optimal days and eight flights on acceptable days, while the recommendations of the algorithm vary depending on the value of β. For every value of β, the value achieved by the algorithm is greater than or equal to the value achieved by the forecast team.

3.4.2 SOM Size

[38] In order to test the sensitivity of the algorithm to the size of the SOM, the original algorithm was run with a 12-member SOM rather than the original 24-member SOM. Decreasing the number of SOM members has the benefit of increasing the statistical robustness by increasing the number of profiles per cluster, but also increases mean quantization error: fewer SOM members means the best-fit SOM pattern for a particular RH profile is, on average, less characteristic of that profile. The optimal SOM size balances these two factors. Stefik [2010] showed that the algorithm used in the RACORO campaign was minimally sensitive to the number of SOM members. An adapted algorithm, run with a 12-member SOM, would have recommended flights on the same number of optimal days (10) as the original algorithm run with a 24-member SOM. Seven of the 26 flights would have been on different days, resulting in losing one optimal day and gaining one optimal day. The minimal difference in performance suggests that the days for which the recommendations were different between the algorithms were borderline days.

3.4.3 Effects of Interseasonal Variability

[39] One possible vulnerability of the model is an inability to account for differences in seasonal probability related to ENSO or other long-term atmospheric patterns. Interseasonal variability is not expected to affect the probability of optimal conditions given a particular SOM profile. Interseasonal variability, however, could alter the hurdle probability by altering future expectations: if optimal days are expected to be sparse, the hurdle probability should be lower, pushing experimenters to be less picky when expending resources.

[40] The data sets used did not offer enough data to properly evaluate the effect of ENSO on cloud climatology. The ENSO state in spring 2010 was moderate El Nino. In the CMBE period of record from 1998 through 2009, only 2003 and 2005 qualified as moderate El Nino springs according to MEI (Multivariate ENSO Index) data [Wolter and Timlin, 1993]. More observations are needed to properly account for the effect of ENSO in this algorithm; however, the algorithm was able to adjust to the adverse conditions during the early parts of the experiment (1 optimal day in the 31 day period from 27 January through 26 February) to match the expectation by the end of the experiment. This recovery from an adverse period is possible in such a long experiment but may not be possible in a short duration experiment, which may be more significantly impacted by an extended period of poor conditions.

3.4.4 Seasonal Effects

[41] Another sensitivity test was performed by allowing the hurdle probabilities to account for variations in seasonal climatology. Figure 6 shows the monthly variation in the climatological probability of optimal conditions: optimal days are approximately twice as common during the winter months as during the summer months. An algorithm that accounts for these monthly differences in climatology will encourage the use of more flights at the beginning of the field campaign, expecting that fewer optimal days will be available at the end of the field campaign. Upon retrospectively testing this adapted algorithm, we found that it performed slightly worse than the original algorithm, recommending flights on 9 optimal days as compared to the 10 optimal days captured by the original algorithm. While climatological norms suggested conditions would be better in January and February, the actual conditions present during Spring 2010 contradict this expectation. During the SPartICus campaign, the algorithm that did not adjust for seasonal differences performed better than the algorithm that adjusted for seasonal differences because the realized conditions during the campaign followed the overall climatology (the black line in Figure 6) closer than the month-by-month climatology (the blue line in Figure 6).

Figure 6.

The climatological probability of “optimal” conditions at the SGP site for each month between January and June (blue line), the realized percentage of days with “optimal” conditions during each month (red line), and the overall climatological probability of “optimal” conditions (black line).

3.5 Other Considerations

[42] The SPartICus campaign was originally slated to start during fall 2009. The start date was pushed to January 2010 to complete aircraft preparations. This shortening of the experiment affected the initial hurdle probability and the expected final number of successes but did not affect the mechanics of the algorithm.

[43] The GFS was chosen in part to demonstrate that this forecasting-and-decision method is successful even with a low-resolution model. The alternative use of a higher-resolution model could yield a SOM with better differentiation between the best and worst profiles, but much of the importance of model quality is diminished by the conditional probability calibration procedure.

[44] One possible source of error for the algorithm arises from the need to quantitatively define conditions suitable for data collection. Discrepancies between the definition used by the algorithm and the desires of investigators could produce sub-optimal decision recommendations. This problem could be addressed via a quantitative definition of optimal flight conditions during the proposal stage before the field campaign.

[45] Field campaign experience has indicated that in some cases, instrument maintenance needs can preclude flights on days where conditions appear to be favorable. For example, investigators may be unable or unwilling to fly 3 days in a row without a nonflight day designated for maintenance. If specified prior to the field campaign, restrictions like this could in principle be incorporated into a decision algorithm.

[46] Boundary condition (6) states that if the number of flights and the number of days are equal, investigators will fly every day: the hurdle probability will be zero. However, if the probability of flight success is judged to be very low, investigators might be hesitant to fly, so as not to jeopardize credibility with flight crew and sponsors. In practice, therefore, investigators might rationally elect to not use all of their flights at the end of an experiment. The use of this algorithm, however, would encourage the use of flight hours earlier in the experiment, avoiding a scenario with many flights and few days.

[47] This algorithm assumes that the 33 h forecast is the only forecast available. Under this assumption, no forecast information is available at longer lead times, and any forecast information available at shorter lead times is too late to impact decision-making. This assumption could prove problematic in a late-experiment situation where long-range forecast guidance suggests that future chances for suitable conditions are likely to be well above or below the climatological probability. Theoretically, the algorithm could be modified to include longer-range forecasts.

4 Conclusions

[48] The decision model used for the SPartICus campaign produced results consistent with those produced using an analogous model for the RACORO campaign. Both experiments were long-duration experiments seeking low-probability atmospheric conditions. The automated algorithm would have used the 26 flights allocated for measurements over the SGP site by the original April end date, finding conditions optimal for data collection on 10 days. Even after extending the experiment for an extra month to employ unused flight hours, the SPartICus forecast team was only able to collect 9 optimal days with their 26 flights. Alternate algorithms that assign value to “acceptable” days also outperform the forecast team.

[49] Traditional methods of decision-making for field campaigns are often inefficient: scientists are trained to forecast atmospheric conditions but not to evaluate the opportunity costs associated with deciding whether or not to expend scarce resources. In a contest of pure forecasting skill, the SPartICus forecast team might very well outperform the GFS forecasting model, given the availability of additional models and human expertise. Yet despite employing a relatively simple forecast system (with its calibration), the automated algorithm outperformed the SPartICus team in data collection, emphasizing the importance of optimal decision-making. The widespread implementation of decision algorithms like the one used for the SPartICus campaign may be expected to increase the amount of data collected using available flight resources while reducing the human effort spent on forecasting and resource deployment.

[50] In addition to being a valuable operational tool during field campaigns, the algorithm can be used during the planning phase of an experiment to guide decision makers with questions of resource allocation. Given 26 flights, 110 days, and the climatology of cloud conditions over central Oklahoma, the algorithm generated an expected value of 9.86 optimal flights. The algorithm captured 10 optimal flights. With their budget of flight hours and number of days, the SPartICus team, using an optimizing decision system, could have reasonably expected to get approximately 10 days' worth of optimal data. To expect more than 10 optimal days, the team would have to extend the experiment, relax the criteria, or improve the forecasting system.

[51] Researchers and funding agencies could be tasked with determining the amount of “optimal data” needed to answer an experiment's science questions. An algorithm like this can output combinations of days and flights that will be needed to obtain the desired amount of optimal data in expectation. Rather than subjectively estimating the amount of flights and days needed to achieve scientific objectives, experiment planners could define the amount of data needed and calculate the amount of resources needed to obtain those data at a given confidence interval. By quantitatively allocating and using resources, a decision algorithm like the one used for the SPartICus campaign can increase scientific value while decreasing both monetary cost and human-effort cost.


[52] Data were obtained from the Atmospheric Radiation Measurement (ARM) Program sponsored by the U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research, Climate and Environmental Sciences Division. We would also like to thank Fanglin Yang from NOAA who generated the GFS profiles, Shaocheng Xie and David Trojan who expedited the generation of the CMBE and Merged Sounding data, and the SPartICus investigators (Jay Mace, P.I.) for their openness to discuss their project. Stefik and Verlinde acknowledge their support through the Office of Science (BER), U.S. Department of Energy Grant DE-FG02-05ER64058. Small gratefully acknowledges support from the Human and Social Dynamics and Decision Making under Uncertainty Programs of the U.S. National Science Foundation under Grant NSF SES-0729413 and the Atmospheric and Geospace Science Grants NSF AGS-1063692 and 1063733. The authors are grateful for helpful input from three anonymous reviewers.