The Lagrangian separation distance between the endpoints of simulated and observed drifter trajectories is often used to assess the performance of numerical particle trajectory models. However, the separation distance fails to indicate relative model performance in weak and strong current regions, such as a continental shelf and its adjacent deep ocean. A new skill score is proposed based on the cumulative Lagrangian separation distances normalized by the associated cumulative trajectory lengths. This skill score is used to evaluate surface trajectories implied by Global HYCOM hindcast surface currents as gauged against actual satellite-tracked drifter trajectories in the eastern Gulf of Mexico during the 2010 Deepwater Horizon oil spill. It is found that the new skill score correctly indicates the relative performance of the Global HYCOM in modeling the strong currents of the Gulf of Mexico Loop Current and the Gulf Stream and the weaker currents of the West Florida Shelf. In contrast, the Lagrangian separation distance alone gives a misleading result. The proposed dimensionless skill score is particularly useful when the number of drifter trajectories is limited and neither a conventional Eulerian-based velocity nor a Lagrangian-based probability density function may be estimated.
If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.
 In rapid response to the Deepwater Horizon oil spill in the northeastern Gulf of Mexico, a system for tracking the oil [Liu et al., 2011a, 2011b] was immediately implemented by marshaling numerical modeling, in situ observing and satellite resources available from existing University of South Florida (USF) coastal ocean observing system activities [e.g., Weisberg et al., 2009]. A limited number of surface drifters were also deployed in summer 2010 for tracking the Gulf of Mexico Loop Current, its eddies and the currents on the West Florida Shelf.
 Such Lagrangian velocity statistics, computed over a large ensemble of particles, also require a large number of drifter trajectories. Recently, Ohlmann and Mitarai  proposed a purely Lagrangian validation of coastal dispersal simulations based on Lagrangian probability distribution functions (PDFs) [Pope, 1994; Mitarai et al., 2009]. The agreement between the Lagrangian PDFs for actual and simulated drifters is measured using the Kolmogorov-Simirnov (K-S) test [Massey, 1951; Bracco et al., 2000; LaCasce, 2005], which employs a maximum difference in the cumulative distribution functions. By not requiring the binning of drifter data, this statistical approach may be used with smaller sample sizes [Ohlmann and Mitarai, 2010]. Nevertheless, K-S test statistical inference still requires more than 10 independent drifter observations [Peacock, 1983]. Hence there are no shortcuts to sufficient drifter coverage in either space or time.
 Model simulated drifter trajectories may be directly compared with corresponding independent drifter observations [e.g., Vastano and Barron, 1994; Thompson et al., 2003; Barron et al., 2007]. Virtual drifters are seeded at the locations where satellite-tracked drifters are observed, and the separation distances between the endpoints of these simulated and observed drifters are then computed as a function of time. The separation distance is a direct measure of trajectory model skill: the smaller the separation distance, the better the model skill, and conversely. Such model assessments made over relatively short time scales, e.g., tidal to synoptic weather, are useful for assessing applications to oil spill trajectories [e.g., Price et al., 2006; Abascal et al., 2009], search and rescue [e.g., Smith et al., 1998; Jordi et al., 2006], and river plume spreading [e.g., McCabe et al., 2009]. By not requiring a large number of drifter observations (as needed for statistical inference), such applications remain useful even when Lagrangian observations are limited.
 Lagrangian trajectory evolution is a subject of many investigations [e.g., Özgökmen et al., 2000, 2001; Chu et al., 2004], one finding being that the prediction error tends to grow with time at a rate proportional to the square root of the velocity variance. Piterbarg , in a study on short-term Lagrangian trajectory prediction, shows that the prediction error is most sensitive to the ratio of the velocity correlation radius and the initial cluster radius. Özgökmen et al.  also argue that model performance evaluation should consider dynamically different flow regimes separately, such as interior gyres, western boundary currents and regions of mid latitude zonal jets. Thus, a priori knowledge of the ocean circulation is required. Moreover, the actual ocean circulation at any given time may be quite different from that inferred from climatological mean patterns.
 Here we propose a new method (based on Lagrangian separation distance) to evaluate surface trajectory models with a limited number of drifter observations that are spread over both deep ocean and continental shelf regions where the currents may be faster and slower, respectively. Prior knowledge of the ocean circulation is not required. Such an assessment is necessary in a rapid response mode (as was the case for the Deepwater Horizon oil spill in spring/summer 2010), when there may not be much data available within a short period of time and the analyst may lack familiarity with the region.
 Thus our paper introduces a new skill score for evaluating trajectory model performance. The drifter observations and model simulations used over the course of the Deepwater Horizon oil spill are described in Section 2. Section 3 presents the performance evaluation technique, and application is made in Section 4. The newly proposed model skill score for trajectory assessment is presented in Section 5, followed by a summary and discussion in Section 6.
 Beginning in May 2010, and in response to the Deepwater Horizon oil spill, the Ocean Circulation Group (OCG) within the USF College of Marine Science seeded drifters in the Loop Current, its shed eddy and on the West Florida Shelf to help monitor the evolution of the regional flow fields. Such information further served in assessing the trajectories as estimated by the models that we employed to track the spilled oil (e.g., http://ocgweb.marine.usf.edu). Six drifters were initially deployed during a 19–24 May 2010 R/V Bellows cruise joint between the USF OCG, the USF Optical Oceanography Laboratory, the Florida Department of Environmental Protection (FDEP), the U.S. Coast Guard (USCG), and Florida Wildlife Research Institute (FWRI). Three drifters were subsequently deployed during a 2–14 June 2010 R/V Weatherbird II cruise by the USF OCG assisted by the Florida Institute of Technology (FIT). Nine more drifters were then added during a 22–25 June 2010 R/V Weatherbird II cruise, in a joint effort by the USF OCG, the Woods Hole Oceanographic Institution (WHOI), and the Northeast Fisheries Science Center (NEFSC). The drifters, drogued at 1 m depth, transmitting data via satellite in real time. The locations of the drifter trajectories were binned at hourly time steps and archived. Figure 1 shows the trajectories for May–August 2010.
 The Global HYbrid Coordinate Ocean Model (HYCOM) [e.g., Bleck, 2002; Chassignet et al., 2003] is configured to simulate global ocean circulation on a Mercator grid with 1/12° equatorial resolution [e.g., Chassignet et al., 2007, 2009]. The horizontal resolution in the Gulf of Mexico is about 9 km. Surface forcing is from Navy Operational Global Atmospheric Prediction System (NOGAPS) [Hogan and Rosmond, 1991; Rosmond, 1992] and includes wind stress, wind speed, heat flux (using bulk formula), and precipitation. Data assimilation is via the Navy Coupled Ocean Data Assimilation (NCODA) system [Cummings, 2005], which uses the Modular Ocean Data Assimilation System (MODAS) synthetic data product [Fox et al., 2002]. The Global HYCOM and NCODA hindcast experiment output are available as daily snapshots via the HYCOM Consortium website [http://www.hycom.org/]. This study uses the surface velocity field.
3. Model Performance
3.1. Trajectory Model
 Lagrangian particles are often used in numerical models to track fish larvae [e.g., Werner et al., 1999; Epifanio and Garvine, 2001] and oil spills [e.g., Spaulding, 1988; Reed et al., 1999; Aamo et al., 1997; Daniel et al., 2004]. In rapid response to the Deepwater Horizon oil spill, the USF OCG implemented an oil trajectory nowcast/forecast system using the surface velocity fields output from six numerical circulation models, including the Global HYCOM. Surface oil location, inferred from satellite images, were used to seed virtual drifters in these surface trajectory models [Liu et al., 2011a, 2011b]. The satellite-tracked drifters deployed in the eastern Gulf of Mexico during May–August 2010 provide an opportunity for assessing the veracity of the modeled trajectories.
 For particle tracking (as in the work of Price et al. ), the daily surface velocity fields from the Global HYCOM are interpolated into 3-hourly time series. A fourth-order Runge-Kutta scheme is used for integration, similar to many Lagrangian-tracking models [e.g., Edwards et al., 2006; Alvera-Azcárate et al., 2009b]. For each satellite-tracked drifter, the trajectory model is initialized daily from the observed drifter locations at 0 h UTC (Figure 2), and the virtual particle is tracked for the next 5 days, a procedure similar to that of Sotillo et al. . Being that the main purpose of our paper is not drifter trajectory simulation, we do not take into account the errors in drifter observations [e.g., O'Donnell et al., 1997] or technical issues in drifter modeling [e.g., Lee et al., 2005; Edwards et al., 2006; Furnans et al., 2008; Kako et al., 2010]. Instead emphasis is on the separation distance, d, between simulated and observed drifter locations at a particular time after initiation (Figure 3) as a measure of model performance. Smaller d indicates better model performance, with d = 0 being a perfect trajectory model, i.e., the virtual drifter is at the same location as the actual drifter. In previous studies [e.g., Barron et al., 2007; Price et al., 2006], d (or its mean value) was used to evaluate trajectory models along with other performance evaluation methods.
3.2. The Evaluation Problem
 Drifter # 87798 was deployed in the eastern Gulf of Mexico on 05/24/2010 (Figure 2). After circulating around the Loop Current eddy during the next 10 days, it was transported to the southern portion of the outer West Florida Shelf on 06/09/2010, where it stayed for about 2 weeks. It was then entrained into the Florida Current and transported through the Florida Straits to the South Atlantic in mid July 2010. Thus, during its 2-month journey, this drifter flowed within regions characterized by either fast currents (the Loop Current, Florida Current, and Gulf Stream) or slow currents (the outer West Florida Shelf and the Florida Keys).
Figure 2 compares the simulated drifter trajectories (initialized daily from the actual drifter # 87798 locations) with the observed trajectories. It can be seen that the Global HYCOM successfully simulated both the slower currents on the outer continental shelf and the faster deep water currents. While the virtual drifter trajectories generally align with the observed drifter path within the first three days of simulation, deviations occur, with the separation distance (d) tending to increase with simulation time. The average values of d after 1, 3 and 5 days of Lagrangian tracking are 29, 64 and 104 km, respectively (Figure 4a). Large d values generally occur during the first 10 days (05/24/2010 – 06/15/2010) of transit within the Loop Current eddy and the Gulf of Mexico and then in the later stage (07/08/2010 – 07/19/2010) within the Gulf Stream (Figures 1 and 2). These intervals correspond to times of fast ocean currents, also evident from the longer lengths (lo) of the observed Lagrangian trajectories (Figure 4b). Small d values occur from 06/16/2010 – 07/05/2010 when the drifter was on the outer West Florida Shelf and in close proximity to the Florida Keys (Figure 2). These d values alone might indicate that the Global HYCOM performs worse in the fast current (deep ocean) than in the slow current (continental shelf) regions. We suggest, however, that such interpretation is incorrect.
 Configured to simulate the large scale currents of the global ocean circulation [e.g., Chassignet et al., 2007], HYCOM is traditionally a deep ocean application model [e.g., Chassignet et al., 2003; Shaji et al., 2005]. Its data assimilation system (NCODA) [Cummings, 2005], relies on along-track satellite altimeter data as one of the main data sets assimilated into the model. Being that satellite altimetry is less reliable on the continental shelf for a number of reasons [e.g., Vignudelli et al., 2011], HYCOM does not assimilate these data there. Additionally, He et al.  show that a major limitation to coastal ocean circulation modeling is the wind field used to force the model because the shelf currents are largely locally forced. Thus, it is generally accepted that the Global HYCOM results are more reliable in the deep ocean than on the continental shelf. This is a reason why the Global HYCOM is used to supply open boundary values for smaller domain, coastal ocean models such as those in use on the West Florida Shelf [e.g., Barth et al., 2008; Weisberg et al., 2009] or as applied to the Cariaco Basin [e.g., Alvera-Azcárate et al., 2009a].
3.3. The Normalized Cumulative Separation Distance
 When evaluating trajectory models, errors inherent in predicting Lagrangian trajectories are compounded by errors in the model velocity field [Barron et al., 2007]. Such errors may be reduced by using more drifter trajectories (higher particle densities) [e.g., Özgökmen et al., 2000]. It was also suggested that a priori knowledge of the climatological mean circulation might serve as a useful estimate of such errors. Evaluations of ocean model performance using Lagrangian drifter records were thus performed regionally for dynamically different flow regimes [e.g., Özgökmen et al., 2000; Barron et al., 2007]. These previous studies relied on dense particle seedings and/or other auxiliary oceanographic data [e.g., Paldor et al., 2004].
 In an attempt to overcome the above mentioned evaluation difficulties, a purely Lagrangian trajectory-based non-dimensional index is defined as
where di is the separation distance between the modeled and observed endpoints of the Lagrangian trajectories at time step i after the initialization (virtual particle release), loi is length of the observed trajectory, and N is the total number of time steps. These definitions are illustrated in Figure 3. The smaller the s value, the better the performance, with s = 0 implying a perfect fit between observation and simulation. Note that both the separation (d) and the along-path (l) distances in equation (1) accumulate in time. The skill score is calculated from these cumulative summations of the d and l values for all time steps i between the re-initialization and the end points, versus a division of the d by l at the very end point [Toner et al., 2001]. Such weighted average tends to reduce the skill score errors due to Lagrangian uncertainties. A typical example is shown in Figure 3, d2 > d3, whereas if only d3 and l3 were used, the skill score would be underestimated.
 The basic idea of this index is to normalize the Lagrangian separation distance between the modeled and observed trajectories with the length of the trajectory, both in a cumulative manner. It is purely based on drifter trajectories, and does not need any other information.
 As an example, time series of the normalized Lagrangian separation distances are shown in Figure 4d for drifter # 87798 after 1, 3 and 5 days of tracking, respectively. The model performs well (smaller s) during the first 10 days when the drifter was circulating around the Loop Current eddy and then during the later stage (07/08/2010 – 07/19/2010) when it flowed within the Gulf Stream, both being times/locations of fast, deep ocean currents. The model performs relatively worse (larger s) on the shelf and close to the Florida Keys where the currents are slower. Thus, in contrast to the original Lagrangian separation distance, this non-dimensional skill score provides an evaluation of the trajectory model that is more consistent with expectation from a deep water designed data assimilative model such as the Global HYCOM.
 The trajectory model has about the same performance for the 1, 3 and 5 day simulation (Figure 4d), because it is the hindcast data that is used for particle tracking. We would expect a decrease of model performance for the forecast model output. For instance, on 06/30/2010 and 07/04/2010, spikes are seen in the s values of the 1-day simulations due to the short trajectories observed, but these are not seen in the cumulatively averaged skill score (Figure 4d). So, a cumulative separation distance normalized by the associated accumulative length of drifter trajectory as defined in equation (1) makes more sense than just using single day results.
4. Application to the Eastern Gulf of Mexico
 The same procedure is applied to all the drifter trajectories obtained in the eastern Gulf of Mexico during May–August 2010. All of the 18 drifters are roughly classified into two categories: deep ocean and West Florida Shelf (Figure 5). This deep versus shelf separation is an approximate delineation for regions of fast versus slow currents. Some drifter trajectories (e.g., # 87798) are divided into two or more stages, treated as trajectories of two or more drifters, and classified into the two categories (deep and shelf) according to their geographical locations. For each category of drifter trajectories, virtual particles are released at the observed drifter locations daily at 0 h UTC in the trajectory model and tracked for 5 days. That is to say, each day, the same satellite-tracked drifter is treated as a new particle released into the trajectory model. The simulated particle trajectories are then compared with the observed ones. The performance of the trajectory model is quantified using both the original and the normalized Lagrangian separation distances, d and s, respectively.
 Statistical results of the original Lagrangian separation distance, d, are shown in histograms (Figure 6). After a 1 day simulation, over 80% of the virtual drifters have d values less than 20 km on the shelf, while that percentage drops to about 35% for in the deep ocean. Mean d values are 34 and 13 km for the ocean and shelf drifters, respectively. The population of large d values increases with simulation time. After a 5 day simulation, about 15% and 5% of the virtual drifters have d values less than 20 km on the shelf and in the deep ocean, respectively (with mean d values of 177 and 58 km, respectively). Note that smaller d values would indicate better model performance. These results indicate an unacceptable conclusion: the Global HYCOM based trajectory model performs better on the shelf than in the deep ocean on 1–5 day simulations.
 In contrast with the original separation distance results of the last paragraph, when d is normalized by its associated length of Lagrangian trajectory, lo, the resulting normalized separation distance, d/lo, provides the opposite result (Figure 7). After a 1 day simulation, about 51% and 26% of d1/lo1 values are smaller than 0.6 in the ocean and over the shelf, respectively. After a 5 day simulation, these percentages increase to 64% and 54% for the ocean and shelf drifters, respectively. For all of the cases, larger populations of small d/lo values (d/lo < 0.6) are seen in deep water than on the shelf. This indicates better performance for the Global HYCOM based trajectory model in the deep ocean than on the shelf.
 There are more uncertainties in d/lo for shorter time simulations (e.g., 1 day simulation) because of smaller values of lo, especially for weak current regions (shelf). The cumulative skill score, s, helps to mitigate this effect. After a 3 day simulation, the mean s values are 0.74 and 0.89 for the deep and shelf regions, respectively (Figure 8). Again, the smaller s value for the deep ocean area indicates better model performance there. The mean s values after a 5 day simulation are close to those of the 3 day simulation, considering their large standard deviations.
5. Skill Score
 As a measure of trajectory model performance, the normalized cumulative separation distance, s, is counterintuitive to the conventional model skill scores [e.g., Willmott, 1981; Liu et al., 2009]. Note that the smaller the s values, the better the performance of a trajectory model, while in the conventional model skill scores the higher value means better model performance. Thus, we propose a similar skill score for trajectory models based on s
where n is a non-dimensional, positive number that defines threshold of no skill (ss = 0). Larger n values correspond to lower requirements to the model. For example, n = 2, the model results with the cumulative separation larger than two times of the cumulative distance (s > 2) are flagged to be no skill (ss = 0). Those results with the cumulative separation smaller than two times of the cumulative distance are considered to be acceptable and used in the skill score calculation. On the other hand, smaller n values indicate stricter requirement to the model. For example, n = 0.5, the model results with the cumulative separation larger than half the cumulative distance (s > 0.5) are flagged to be no skill (ss = 0). So, n is a tolerance threshold. For n = 1, the skill score is reduced to
In this case, model simulations with s > 1 are flagged to be no skill (ss = 0). This corresponds to a criterion that the cumulative separation distance should not be larger than the associated cumulative length of the drifter trajectory, i.e., di < loi, otherwise the model is considered to have no skill. In this way, the skill score, ss, is in the range of 0 (no skill) to 1 (perfect simulation), as commonly used.
 Based on the entire drifter data set (both deep and shelf drifters), about 72% of the cumulative separation distances after 3 days are smaller than the cumulative lengths of drifter trajectories, i.e., s3 < 1. It would be reasonable to consider the remaining 27% of model simulation as unacceptable, since those modeled drifters are too far away from the observed positions. Thus, n = 1 may be a good choice, and the skill score in equation (3) is used to quantify the model performance. The skill scores are calculated daily for 3 day simulations based on all the drifters obtained in the eastern Gulf of Mexico during May–August 2010 (Figure 9). Larger skill scores (ss = 0.5 ∼0.9) are generally seen in the deep ocean areas of the Gulf of Mexico corresponding to the Loop Current eddy, the Loop Current, and the Florida Current to Gulf Stream region of the Florida Straits. Smaller skill scores (ss < 0.2) are mostly found on the West Florida Shelf. Some larger skill scores (ss > 0.8) are also found on the outer West Florida Shelf, and these are related with the Loop Current eddy influences onto the West Florida Shelf (Figure 1). The mean ss value is 0.33 based on the entire drifter data set evaluation of the 3 day simulations. The mean ss value is 0.41 and 0.30 for the deep and shelf regions, respectively. This again shows that, as expected, the Global HYCOM based surface trajectory model generally performs better in the deep ocean than on the shelf.
 Throughout this paper we considered drifter release intervals of one day regardless of the potential Lagrangian decorrelation time scale because with application to spatially inhomogeneous velocity fields, the decorrelation scale itself is ill-defined. As a sensitivity test we now consider the use of different drifter release intervals with the results summarized in Table 1. Changing the release interval to every two days does not change the skill scores. When changed to every 3 days, the mean skill scores are 0.37 and 0.28 for the ocean and the shelf regions, respectively. Even with a drifter release interval of 5 days, the average skill scores remain about the same. Regardless of interval (1–5 days) their standard deviations are also about the same, i.e., 0.31 and 0.28 for the ocean sand shelf regions, respectively.
Table 1. Sensitivity of the Skill Scores, ss, to the Virtual Drifter Release Interval
Drifter Release Interval (days)
Skill Score, ss
 Given cumulative distance as the denominator in equation (1), are there instances when this skill score may fail due to very weak currents and hence small cumulative distance? As an extreme example, if the observed drifter moves very little during a time period (e.g., 3 days), the cumulative distance will be close to zero, and the normalized cumulative distance s could be a very large value. However, by using a proper tolerance threshold n, this large s case is flagged as having no skill (ss = 0) according to equation (2). So, while arbitrary, the choice of n is important. As shown in our analysis, n = 1 provides a good choice to begin with.
6. Summary and Discussion
 A new skill score, based on the cumulative Lagrangian separation distance normalized by the associated cumulative trajectory length, was proposed to evaluate the performance of trajectory modeling in different dynamic regions. Application was made to the evaluation of surface trajectories implied by Global HYCOM hindcast surface currents as gauged against actual satellite-tracked drifter trajectories in the eastern Gulf of Mexico in summer 2010. The skill score matched expectations for the relative performance of the Global HYCOM in modeling the fast currents of the Gulf of Mexico Loop Current, its eddies and the Gulf Stream, versus the slower currents of the West Florida Shelf, whereas a non-normalized Lagrangian separation method failed at this expectation.
 The proposed non-dimensional skill score is particularly useful when the number of drifter trajectories is limited and conventional Eulerian-based velocity estimation or the Lagrangian-based probability density functions are not possible. The normalized Lagrangian skill assessment proposed is solely based on the drifter trajectories, and thus prior knowledge of the ocean circulation in the interested region or additional climatological data of the mean circulation patterns are not required. These features make the normalized cumulative Lagrangian separation a practical index for a trajectory model evaluation in situations of rapid response to maritime incidents, such as oil spills [e.g., Ji et al., 2003; Sotillo et al., 2008] and search and rescue operations.
 Although the proposed skill score is useful in quantifying trajectory model performance, it is but one measure of performance, and it does not gauge all aspects of model performance. The limited data set used in this study did not allow for more extensive Lagrangian statistical analyses [e.g., Garraffo et al., 2001a]. Combining multiple skill score metrics for more complete model performance evaluations [e.g., Liu et al., 2009] may be useful in the future.
 Support was by the Office of Naval Research, grants N00014-05-1-0483, N00014-10-0785, and N000014-10-1-0794; National Oceanic and Atmospheric Administration (NOAA) EcoHAB grant NA06NOS4780246; South Carolina Sea grant as pass through from the NOAA IOOS Program Office; NOAA grant NA07NOS4730409; National Science Foundation grant OCE-0741705; and British Petroleum (BP) through the Florida Institute of Oceanography (FIO), FIO grant 4710-1101-05. The success of drifter deployments from USF OCG is mainly attributed to J. Law, with assistance from USF Optical Oceanography Laboratory (C. Hu), FDEP (C. Kovach), FWC/FWRI, FIT, WHOI, and NEFSC (J. Manning). USF OCG staff J. Donovan, P. Smith, and D. A. Mayer assisted with drifter data processing. Additional drifter data were provided by USCG. The Global HYCOM + NCODA analysis is provided by the HYCOM Consortium. This is CPR contribution 16.