Selectivity metrics for fisheries management and advice

Fisheries management typically aims at controlling exploitation rate (e.g., Fbar) to ensure sustainable levels of stock size in accordance with established reference points (e.g., FMSY, BMSY). Population selectivity (“selectivity” hereafter), that is the distribution of fishing mortality over the different demographic components of an exploited fish stock, is also important because it affects both Maximum Sustainable Yield (MSY) and FMSY, as well as stock resilience to overfishing. The development of an appropriate metric could make selectivity operational as an additional lever for fisheries managers to achieve desirable outcomes. Additionally, such a selectivity metric could inform managers on the uptake by fleets and effects on stocks of various technical measures. Here, we introduce three criteria for selectivity metrics: (a) sensitivity to selectivity changes, (b) robustness to recruitment variability and (c) robustness to changes in Fbar. Subsequently, we test a range of different selectivity metrics against these three criteria to identify the optimal metric. First, we simulate changes in selectivity, recruitment and Fbar on a virtual fish stock to study the metrics under controlled conditions. We then apply two shortlisted selectivity metrics to six European fish stocks with a known history of technical measures to explore the metrics’ response in real-world situations. This process identified the ratio of F of the first recruited age–class to Fbar (Frec/Fbar) as an informative selectivity metric for fisheries management and advice.

contact selectivity, which is the differential retention probability of fish that encounter the gear; available selectivity, which is the differential availability of different fish to the gear; and population selectivity, which is the combination of the previous two forms (Millar & Fryer, 1999;Sampson, 2014). Population selectivity is the focus of this study.
Population selectivity ("selectivity" hereafter) describes the differential vulnerability to fishing of the demographic components of an exploited fish population, as a result of the gear used (e.g., choice of mesh size) and availability (e.g., choice of fishing timing and location) (Millar & Fryer, 1999;Quinn & Deriso, 1999;Sampson, 2014;Sampson & Scott, 2012;Scott & Sampson, 2011). Selectivity can be inferred from the age/size composition of the catch (Froese, 2004;Froese, Stern-Pirlot, Winker, & Gascuel, 2008), but in age-structured stock assessment models, it is usually expressed as the standardized vector of age-specific fishing mortalities (F-at-age) divided by the maximum F observed for any age-class (Sampson & Scott, 2012;Scott & Sampson, 2011).
Selectivity is of paramount importance for fisheries management. Beverton and Holt (1957) first suggested that for any fixed rate of F, there is an optimal size/age-at-first-capture at which cohort biomass (i.e., the weight of fish captured from this cohort over its lifetime) is maximized. In the next decades, numerous studies illustrated the benefits in terms of increased yields and lower collapse probability from improving selectivity by adjusting (usually increasing) the size/ageat-first-capture. Such studies have been based mostly on simulations (e.g., Froese et al., 2008;Froese, Winker, Gascuel, Sumaila, & Pauly, 2016;Myers & Mertz, 1998;Prince & Hordyk, 2019;Scott & Sampson, 2011;Vasilakopoulos, O'Neill, & Marshall, 2016), but also on meta-analyses of empirical datasets (Vasilakopoulos, O'Neill, & Marshall, 2011. The mechanism through which improved selectivity promotes sustainability is by preventing growth overfishing (Beverton & Holt, 1957;Froese et al., 2008), as well as recruitment overfishing (Myers & Barrowman, 1996;Myers & Mertz, 1998). Selectivity studies illustrate that there is a trade-off between F and selectivity; catching larger fish allows stocks to sustain a higher F without collapsing, while catching too many small fish can lead to stock depletion even at moderate levels of F (Prince & Hordyk, 2019;Scott & Sampson, 2011;. Additionally, both Maximum Sustainable Yield (MSY) and F MSY are selectivity-dependent; for most exploited stocks, catching larger fish would lead to a higher MSY (Froese et al., 2008;Scott & Sampson, 2011;Vasilakopoulos, Maravelias, & Tserpes, 2014;. The potential gains for a specific fish stock from protecting juveniles can be quantified by checking the interplay between selectivity and F, and the resulting long-term yield (e.g., Froese et al., 2008;Prince & Hordyk, 2019;Scott & Sampson, 2011;. For the F levels occurring in the majority of commercial fish stocks (especially demersal ones), an "improved" selectivity would come from a lesser exploitation of juveniles (e.g., Colloca et al., 2013;Froese et al., 2008;Vasilakopoulos et al., 2014;. Accordingly, technical measures typically aim to modify selectivity to protect juveniles (Armstrong, Ferro, MacLennan, & Reeves, 1990;European Commission, 2019;Suuronen & Sarda, 2007). Hence, in this study, we refer to "improved selectivity" as the one resulting in a lesser exploitation of juveniles.
Despite the importance of selectivity, fisheries management and the provision of fisheries advice have traditionally focused on the total rate of deaths of fish due to fishing, that is the overall F (FAO, 2019). This is usually approximated by the average F of the fully exploited age-classes (i.e., Fbar). Regulating Fbar to achieve the maximum sustainable yield (F MSY ) is a common management objective, while fisheries advice is tailored to the observed levels of Fbar and stock size (usually spawning stock biomass-SSB) in relation to agreed reference points (Wakeford, Agnew, & Mees, 2009;Worm et al., 2009;ICES, 2018a). The incorporation in fisheries management of a metric summarizing selectivity, analogous to Fbar summarizing fishing mortality, would capture an additional dimension of the exploitation regime, hence allowing an advanced understanding of the exploitation dynamics and potential for higher yields. Such a selectivity metric could be used as an additional lever for fisheries managers to achieve desirable outcomes for both the fishery and the stock through gear regulations and spatio-temporal restrictions of fishing, whose effects would be quantified. In this context, improved selectivity could even allow an increase in Fbar (Froese et al., 2008;Scott & Sampson, 2011;Vasilakopoulos et al., 2014. A selectivity metric could be also used to track the impact on the stock of the introduction of various regulations, such as technical measures and discard bans, Previous simulation-based studies of selectivity which have incorporated selectivity metrics (e.g., Froese et al., 2008;Myers & Mertz, 1998;Scott & Sampson, 2011;, have typically focused on the effect of selectivity on equilibrium yield and equilibrium stock size under fixed levels of recruitment and Fbar. However, the response of such selectivity metrics in the face of real-time changes in recruitment and Fbar remains unknown. The investigation of this response is crucial, given that any selectivity metric that is too sensitive to factors unrelated to fishing practices would be of limited use for fisheries managers. Consequently, to assess the utility of any selectivity metric for fisheries management and advice, the following three criteria should be considered: 1. Ability to track selectivity changes in the fishery. The metric can clearly capture changes happening in the distribution of F across the age-classes due to changes in fishing practices (e.g., gear characteristics, spatio-temporal allocation of fishing).
2. Robustness to recruitment variability. The metric should not be influenced by changes in the distribution of the population numbers across the different age-classes caused by natural fluctuations, such as recruitment pulses. The utility of a selectivity metric for management relies on its ability to capture changes in selectivity happening due to changes in the fishery, rather than through natural processes.
3. Robustness to changes in Fbar. A metric that is highly correlated to Fbar would not fully capture changes in selectivity; hence, it would have limited value.
It should be noted that criteria 2 (robustness to recruitment variability) and 3 (robustness to changes in Fbar) do not imply that a good selectivity metric must remain always unchanged in the face of recruitment variability and/or changing fishing mortality. Rather, these criteria refer to the correlation between selectivity metrics and recruitment or Fbar while the fishing practices remain unchanged, which may occur due to the numerical configuration of the metrics. However, changing recruitment or Fbar may result in the fishers changing their operation in a way that affects selectivity. For example, recruitment pulses in quota fisheries may affect selectivity due to fishers changing their effort allocation to retain a desirable catch profile while adhering to the quotas. Therefore, a selectivity metric may still respond indirectly to recruitment variability and/or changes in Fbar if these result in fishers changing their differential allocation of F over the age-classes.
Here, we examine candidate selectivity metrics that can be calculated using standard age-based stock assessments, and test them against the three aforementioned criteria. Seven key metrics are presented in the core text of this study, while seven more are included in the Supporting Information. First, we use simulations to identify those metrics that fulfil our three criteria under controlled conditions. Second, we apply two shortlisted metrics to six fish stocks from the NE Atlantic and the Mediterranean Sea with a known history of technical measures to explore the metrics' response in real-world situations. Combined, this process allowed the identification of the most informative selectivity metrics for fisheries management and advice.

| Selectivity metrics
Seven age-based selectivity metrics have been analysed in the core text of this study (S1-S7; Table 1). These metrics differ in complexity and data requirements but can all be calculated using common stock assessment inputs and outputs with annual resolution, that is age-structured catch, F, maturity and stock numbers (abundance).
Most metrics quantified the relative exploitation of juvenile/undersized fish, given that this is the most common target of selectivity changes. Age-based metrics were preferred here, because most of the stock assessments currently used for management and advice are also age-based. Still, these metrics could also be used with length-based datasets, either by adjusting them to length or by calculating the age from length using appropriate growth equations (as is done here for S2). For all the metrics, lower values indicated an improvement in selectivity, that is less pressure on juveniles, with the exception of S2 and S5 where the opposite was true.
Seven additional selectivity metrics have been examined within the Supporting Information (S8-S14; Table S1). These metrics were variations of the seven metrics included in the core text and had similar performance; hence, it was decided to present them separately so as to reduce the complexity of the core results.

| Simulation analysis
The simulation analysis aimed to test all selectivity metrics against the three criteria described in the Introduction. Accordingly, the selectivity metrics were tested in a controlled environment for their ability to track selectivity changes, as well as their robustness to recruitment pulses and Fbar changes. The simulation analysis was performed within the Fisheries Library in R (FLR) framework (FLR http://flr-proje ct.org; Kell et al., 2007) using R v3.5.2 (R Core Team, 2018). The theoretical stocks were generated using packages FLlife and FLasher.
A stock was simulated using the approach developed by Gislason and implemented by FLife (Gislason et al., 2008). The simulated stock made use of a von Bertalanffy growth model (Linf = 90 cm, K = 0. 77 year -1 , t0 = −0.019 year). The stock was set up to contain 10 age-classes, with the first class being age 1 and the tenth class being a plus group. 50% of the fish were mature at age 2 and all fish were mature from age 4 onwards. Natural mortality was age-dependent using M(age) = 0.2 + 1.64 * exp(-age), which has an asymptote at M = 0.2 with a much higher juvenile mortality. Recruitment at age 1 was simulated by a Beverton-Holt model; in scenarios with recruitment pulses, the timing and the amplitude of the pulses were fixed across scenarios. The basic recruitment in the absence of recruitment pulses or changes in stock size was set to 154,000 individuals.
The fishery was characterized by a base selectivity, where fish were fully selected at age 3 ( Figure 1). In this context, selectivity was the outcome of the fishing activity of all individual fisheries exploiting the stock with their different gears and fishing efforts (F-atage). Fbar was set to a level just above the Fmsy reference point Builder (ADMB) as the optimization engine (Jardim et al., 2014). This process generated both structural uncertainty (through the models chosen for fishing mortality, recruitment and survey catchability) and estimation uncertainty (through the MCMC fit). As a result, the response of the selectivity metrics followed the stock assessment outputs rather than the true unobserved process, as would have happened in a real-world situation. One survey index was used by Base selectivity New selectivity; Scenarios 2−4 New selectivity; Scenario 5 the stock assessment in each time step, and this survey index was generated by applying the survey catchability (same for all ageclasses) on the population numbers at age. The survey index's observation error distribution was log-normal with a standard deviation of 0.2. To avoid introducing stock assessment model artefacts associated with the edges of the time series, the first and last fifty time steps were cut off, leaving a time series of 100 years.
In total, five increasingly complex scenarios were tested: Scenario 1: No changes in selectivity, Fbar or recruitment. This scenario was used to check whether selectivity metrics fluctuate in the absence of changes in selectivity, recruitment or Fbar.
Scenario 2: Selectivity, taken as the F-at-age scaled by the maximum F-at-age, was reduced at ages 1 (from 0.2 to 0.05) and 2 (from 0.6 to 0.4) ( Figure 1) in year 51, while Fbar and recruitment were unchanged. This change in selectivity is a common result of changes in the fishery aiming to protect juveniles (e.g., mesh size increase, protection of nurseries). This scenario aimed to investigate the ability of the metrics to detect a change in selectivity when recruitment and Fbar were kept stable. for 15 years before decreasing again. This scenario was used to test the selectivity metrics against the three criteria (i.e., ability to track selectivity changes, robustness to recruitment variability, robustness to changes in Fbar) in more realistic conditions, when there is both recruitment variability and changes in Fbar.
The improvement in selectivity (i.e., reduced F at ages 1 and 2) occurred while the Fbar was decreasing, to mimic an overfished stock where both overall fishing pressure and selectivity improve in response to management actions.

Scenario 4:
The same selectivity change as in the previous two scenarios was simulated in year 45. It coincided with a recruitment pulse and occurred while Fbar was high but stable, unlike scenario 3 where the selectivity change occurred while Fbar was also changing. The amplitude of the recruitment pulses was modelled similarly to Scenario 3. This scenario, besides testing the selectivity metrics in the face of recruitment variability and changing Fbar, also aimed to investigate a situation where a recruitment pulse could potentially mask an improvement in selectivity, given that this pulse would result in more juveniles in the catch.
Scenario 5: A more subtle selectivity change compared to scenarios 2-4 at ages 1 (from 0.2 to 0.1) and 2 (from 0.6 to 0.5) (Figure 1) was simulated in year 45. As in scenario 4, the selectivity change coincided with a recruitment pulse and occurred while Fbar was stable. The amplitude of the recruitment pulses was modelled similarly to scenario 3. This scenario aimed to test the differential sensitivity of the selectivity metrics when more subtle selectivity changes occur, while recruitment and Fbar also changed.
In all scenarios, the temporal development of all metrics, recruitment and Fbar were plotted in order to visualize the response of the metrics to change. Furthermore, a linear model was fitted to each selectivity metric (S, being S1-S7), and the selectivity state (Sel; being either base selectivity or new selectivity-see Figure 1), occurrence of selectivity pulse (Rec), Fbar (Fbar) and an interaction term Sel:Fbar as explanatory variables (1): Sel and Rec were modelled as binary variables, while Fbar was modelled as a continuous variable. Modelling Rec as a binary variable, we aimed to test whether the metrics changed when there was a modelled pulse in recruitment, since our interest was to assess whether the metrics were sensitive to strong recruitment classes.
The interaction between Sel and Fbar was included to ensure that

| Analysis of empirical stocks
The results of the simulation analysis refer to a simplified version of reality designed specifically to test the metrics against the three criteria set in the Introduction. Therefore, the simulations should not be considered in isolation, but in combination with an analysis of empirical stocks that entail a much higher level of complexity.
One F-based (S4) and one catch-based (S1) selectivity metric, representing cases with different explanatory power and data requirements, were shortlisted after the simulation analysis. S4 tracked the selectivity change and was robust to changes in recruitment and Fbar, while S1 did not perform very well, but had a simpler calculation and interpretation. The temporal development of these metrics was examined together with that of recruitment and Fbar in six European commercial stocks for which age-structured stock assessments were available. These stocks The use of more selective gears was made mandatory for UK vessels in certain fisheries with haddock bycatch.
• 2015-The stock comes under the LO, meaning all haddock must be landed and counted against quota. This introduced an incentive to target larger fish.
West of Scotland whiting (Merlangius merlangus, Gadidae): Data for this stock were taken from its latest stock assessment (ICES, 2018c) and ranged from 1981 to 2017. Reported age-classes were 1 to 7+, and Fbar was calculated over age-classes 2-4. This stock was selected because it has been through a recent depletion and is currently rebuilding after going through some emergency technical measures that greatly reduced Fbar. The timeline of the technical measures and other changes that could have potentially affected the selectivity of this stock is as follows: • 2002-Requirement to increase mesh size (from 80 mm to 100 mm) and use of square mesh panels (90 mm).
• 2004-Introduction of the first cod effort management plan, which linked the use of selective gears to increased effort allocations. Limited uptake.
• 2006-SSB fell to very low levels (only began to increase after 2011).
• 2009-Introduction of the second cod effort management plan which incentivized the use of selective gears in return for increased effort allocations or removal from the effort regime altogether.
• 2009-Introduction of emergency technical measures. Mesh size increased to 120 mm with 120 mm square mesh panels in bottom trawl (TR)1 fisheries (i.e., towed gears with a codend mesh size of greater or equal to 100 mm) and 80 mm plus 120 mm square mesh panels or sorting grid in TR2 fisheries (i.e., towed gears with a codend mesh size in the range of 79-99 mm).
• 2010 onwards-Fbar has decreased significantly. Majority of catches now in TR2 fisheries but mostly discarded (undersized).
Catches in TR1 fisheries were significantly reduced following the latest mesh size increases.
West Baltic cod (Gadus morhua, Gadidae): Data for this stock were taken from its latest stock assessment (ICES, 2018d) and ranged from 1994 to 2017. Reported age-classes were originally 0 to 7+ and Fbar was calculated over age-classes 3-5. F-at-age 0 was zero in all years, so prior to quantifying the selectivity metrics, age 0 was trimmed, and age 1 was considered the first recruited age. This stock was selected because of a well-documented history of technical measures, some of which were expected to improve selectivity (e.g., increasing mesh size), while others were expected to deteriorate it (e.g., a reduction in Minimum Conservation Reference Size-MCRS). The timeline of the technical measures and other changes that could have potentially affected the selectivity of this stock is as follows: • 1986-Minimum Landing Size (MLS) for cod set at 30 cm and codend mesh size at 95 mm.
• 1998-Mesh size increased to 105 mm with an unspecified escape window or 120 mm codend mesh size.

| Simulation analysis
For scenarios 2-5, the temporal development of seven selectivity metrics (S1-S7) is presented here (Figures 2-5), while the temporal development of all 14 metrics in all scenarios is presented in the Supporting Information (Figures S1-S5).
Scenario 1, featuring no changes in selectivity, Fbar and recruitment, confirmed that when Fbar and recruitment were stable, the selectivity metrics did not pick up any change in selectivity ( Figure   S1).
Scenario 2, where selectivity changed while Fbar and recruitment were stable, suggested that all metrics were able to track changes in selectivity when Fbar and recruitment were kept stable (Figure 2; Figure S2). However, there were certain metrics with higher contrast before and after the selectivity change than others (e.g., S4 and S5; Figure 2). The seemingly earlier onset of the selectivity change captured by the tested metrics came from the smoothing introduced by the stock assessment fitting process.
In scenario 3, where selectivity changed while Fbar and recruitment were fluctuating, the differences in performance between Year metrics became more apparent. Visual inspection of the metrics' trends suggested that while most metrics responded to the selectivity change, the F-based metrics S4 and S5 performed the best with regards to their robustness to changing Fbar and recruitment ( Figure 3). F-based metrics similar to S4 (S11-S13) exhibited similarly good performance ( Figure S3). Metrics incorporating catch and/or abundance data (S1, S2, S3 and S7), exhibited trends with abrupt changes of various magnitudes in response to the modelled recruitment pulses (Figure 3). Notably S3, which was both catch-and N-based, responded to recruitment pulses with a dip on the year of the pulse followed by an increase the year after ( Figure 3). This was because the high abundance of juveniles in the Year sea generated by a recruitment pulse gave a signal of "good" selectivity in the same year followed by a signal of "bad" selectivity the year after, when many fish from the previous year's pulse were caught. Regarding the effect of the changing Fbar, S6 mistook the increase of Fbar as worsening selectivity (i.e., higher targeting of juveniles) (Figure 3).
In scenario 4, Fbar and recruitment were fluctuating and the selectivity changes coincided with a recruitment pulse. As in scenario 3, F-based metrics such as S4 exhibited a better performance than the others (Figure 4). Notably, the synchronization of the selectivity change with a recruitment pulse led to the perception of a worsening selectivity in the same year by S1, a year later by S3 and S7, and four Year metrics (S1-S2), as well as S6, were no longer able to pick up the selectivity change (Table 2) (see Figures S13-S19 for the diagnostic plots). In other words, these metrics responded with a false negative signal to the selectivity change. This was because the effects of recruitment pulses and changes in Fbar were more pronounced than the effect of the selectivity change ( Figure 5). By contrast, S4 and S5 were still able to pick up the effect of the selectivity change, without being affected by changes in Fbar and Rec ( Table 2).
The simulation analysis indicated that metrics based on catch and/ or abundance data (S1, S2, S3 and S7), can be representative of selectivity, but they are also sensitive to fluctuations in Fbar and/or recruitment (Table 2). Moreover, the simple catch-based metrics S1 (i.e., proportion of juveniles in the catch) and S2 (i.e., proportional of fish at optimal length in the catch) were able to track the pronounced selectivity change in scenario 4, but not the more subtle one in scenario 5 (Table 2). This means that S1, S2, S3 and S7 fail at least one of the three criteria set in the Introduction; hence, they are sub-optimal for use in empirical fish stocks. S4 met all three criteria as it was able to track selectivity changes, and it was also robust to changes in recruitment and Fbar (Table 2). F-based metrics similar to S4, had also similarly good performance (S11-S13; Figures S4-S5). Therefore, these types of metrics are the most suitable to be further examined in empirical stocks. S5 (i.e., difference between A50 of selectivity and A50 of maturity) also performed well, while S6 (i.e., Fbar of juveniles) was sensitive to fluctuations in Fbar and could track only the pronounced selectivity change (Table 2).
Based on the results of the simulation analysis, two metrics were shortlisted to be tested on empirical stocks. These were S4 and S1. S4 (i.e., Frec/Fbar) was chosen as representative of the group of best performing F-based metrics (S4, S11-S13) which all tracked the selectivity change and were robust to changes in recruitment and Fbar. S4 was considered to have the most straightforward calculation and interpretation among these metrics expressing different ratios of F of recruits (Frec) to some measure of F of non-recruits, given that Fbar is the most common measure of overall fishing pressure. S5 (i.e., difference between A50 of selectivity and A50 of maturity) was not shortlisted because it is known to be unsuitable for saddle-shaped or multi-peak selectivity curves that make the calculation of A50 problematic . S1 (i.e., proportion of juveniles in the catch) was shortlisted mainly to provide a contrast to S4 in the sense of exhibiting how a sub-optimal selectivity metric performs. In addition, S1 was chosen because it may be an option for data-limited fisheries (i.e., those without an analytical stock assessment), given that its calculation only requires data on catch and maturity (Table 1). S1 has been suggested as a useful indicator in data-rich stocks as well (Froese, 2004).

| North Sea haddock: A case of recruitment pulses
The shape of the selectivity curve in this stock ranged from asymptotic to domed, with the highest selection (i.e., highest F) occurring typically at age 3, 4 or 5 ( Figure S20). North Sea haddock had a history of strong recruitment pulses and a big Fbar reduction after 2000. This resulted in the catch-based selectivity metric (S1) to exhibit a rather unstable trend ( Figure 6). This was to be expected from the results of the simulation analysis that indicated a sensitivity of this metric to both recruitment pulses and Fbar. S1 ranged from 10% to 94%, and recruitment pulses were associated with high values of S1 within 0-2 years. Meanwhile, S4 had a relatively smooth decreasing trend ( Figure 6). This trend of  TA B L E 2 Summary of the multiple linear regression models for the seven selectivity metrics (S1-S7) against Sel, Fbar and Rec (zero to four lags) in scenario 4 (big selectivity change) and scenario 5 (small selectivity change) technical measures and improvements in selectivity, possibly due to the gradual implementation of the technical measures by the fishing fleets.

| West of Scotland whiting: A case of stock depletion
The shape of the selectivity curve in this stock was asymptotic, with the highest selection (i.e., highest F) occurring at age 4-6, with the exception of the last five years of the time series when selection peaked at age 1 ( Figure S21). West of Scotland whiting went through a period of very low SSB and recruitment in the early 2000s. As in the case of North Sea haddock, S4 exhibited a much smoother trend than S1 (Figure 7). Up until the early 2000s, peaks in recruitment were associated with peaks in S1, but not in S4 (Figure 7), in agreement with the findings of the simulation analysis. S1 continued to exhibit high variability after the early 2000s when recruitment was more stable, owing to high variability in catches at age 1. From 1995 onwards, a slow deteriorating trend in selectivity (i.e., higher targeting of juveniles) was captured by S4, which was further accelerated from the early 2000s onwards, when the SSB was depleted. This was because technical measures taken from 2002 onwards aimed to reduce the catch of whiting by the TR1 fisheries (gadoid fisheries), which targets adult whiting. As a result, Fbar gradually decreased.
However, there was little change in the bycatch of juvenile whiting by the TR2 fishery (small mesh fishery for Nephrops). This led to a perception of deteriorating selectivity picked up by S4, and less so by S1.

| West Baltic cod: A case of technical measures reversal
The shape of the selectivity curve in this stock was asymptotic, with the highest selection (i.e., highest F) occurring at age 5 ( Figure   S22). West Baltic cod went through a series of technical measures from 1998 onwards aiming to reduce the catch of small fish; however, in 2014 MCRS was reduced, allowing the capture of smaller fish.
S4 proved to be sensitive to these technical measures, capturing a gradual improvement in selectivity during 1998-2013, followed by a Year deterioration of selectivity from 2014 onwards (Figure 8). The signals coming from S1 were more obscure, as S1 tended to follow recruitment fluctuations within 0-1 year, in agreement with the findings of the simulation analysis. Notably, the large recruitment pulse in the last year of the time series had a clear impact on S1, but not S4 (Figure 8).

| North Sea sole: A case of introducing a new gear
The selectivity curve in this stock was domed, with the highest selection (i.e., highest F) occurring typically at age 3, 4 or 5 ( Figure S23). Notably, S1 did not pick up this signal of deteriorating selectivity (i.e., higher targeting of juveniles) at the end of the time series and exhibited high variability (Figure 9). S1 tended to follow recruitment pulses, in agreement with the findings of the simulation analysis, with one year lag ( Figure 9). The increasing use of pulse trawl did not translate into a higher Fbar in 2011-2017, despite the higher efficiency of that gear, because fishing effort decreased during the same period (ICES, 2018b).

| North Sea herring: A case of collapse and rebuilt
North Sea herring was the stock with the longest time series among the ones examined . During this time it went through a stock collapse and a rebuilt. The shape of the selectivity curve in this stock was highly variable from year to year, with the highest selection (i.e., highest F) occurring anywhere between age 1 and age 7 ( Figure   S24). Selectivity, as captured by S4, started deteriorating (i.e., increasing targeting of juveniles) in the early 1970s (Figure 10), coinciding with Year the development of the sprat fishery which caught juvenile herring. In the late 1970s the herring stock collapsed and the directed fishery for herring stopped, resulting in a decrease in Fbar and a perceived deterioration of selectivity because juvenile herring was being caught as bycatch. In 1981-1982 there was a peak in both S4 and S1 (Figure 10), corresponding to the highest proportional representation of age 0 herring in the catch over the entire time series (ICES, 2018f). This was due to the opening of the fishery in these years only in the southern North Sea areas, which host more juveniles than the central and northern ones. Both S4 and S1 also captured a selectivity improvement in the late 1990s (Figure 10), coinciding with the bycatch regulation enforcement that reduced the bycatch of juvenile herring by the sprat fishery.

| Gulf of Lion hake: A case of a Mediterranean demersal fishery
Gulf of Lion hake has been chronically overexploited, as is the case for most Mediterranean hake stocks (Vasilakopoulos et al., 2014). The shape of the selectivity curve in this stock ranged from asymptotic to domed, with the highest selection (i.e., highest F) occurring typically at age 2 or 3 ( Figure S25). Fbar levels have been high and they even exhibited an increasing trend during the studied time period (Figure 11). S1 did not exhibit any particular trend, but S4 indicated an improvement in selectivity over time coinciding with the reduction in the number of

| D ISCUSS I ON
This study offers a comprehensive analysis of selectivity metrics, with a focus on their utility for fisheries management and advice. The simulation analysis suggested that F-based metrics provide more adequate quantifications of selectivity than metrics incorporating catch and/or abundance data. The latter may track selectivity changes but they are also sensitive to changes in recruitment and Fbar, meaning that they are not suitable to use as representative measures of F I G U R E 1 0 The temporal development of selectivity metrics S4 and S1, Fbar and Recruitment (thousands) in North Sea herring. S1 values in 1978 and 1979 could not be calculated due to the lack of catch-at-age data Year fisheries selectivity in real-world fisheries. Selectivity metrics using only F data, such as the ratio of the F of recruits to Fbar (Frec/Fbar-S4) respond clearly to selectivity changes, while they are also robust to both recruitment variability and changes in fishing pressure. For example, Fbar in the case of Gulf of Lion hake, which was originally calculated over age-classes 0-2, had to be recalculated over age-classes 1-3. Other adjustments may also be useful depending on stock-specific needs: Frec could be assigned to any juvenile ageclass, if that age-class is the focus of a specific management measure, or it could even be taken as the average of multiple age-classes in the case of stocks with many juvenile age-classes (e.g., NE Arctic cod).
This study focused on data-rich stocks with age-structured analytical assessments. In this context, only two metrics were examined that can be calculated without the requirement of a stock assessment, namely S1 (i.e., proportion of juveniles in the catch) and S2 (i.e., proportion of fish at optimal length-Lopt-in the catch). F-based selectivity metrics were found to work much better than these simple catch-based metrics. The latter tend to be more "noisy" and sensitive to changes in population structure, because when there are more juveniles in the sea, there tend to be more juveniles in the catch. Also, simple catch-based metrics, respond strongly to variability in the catch of juveniles compared to F-based ones. Nevertheless, in some of the empirical stocks (North Sea haddock, West Baltic cod, West of Scotland whiting, North Sea herring), S1 (i.e., proportion of juveniles in the catch) was able to track some of the selectivity changes similarly to Frec/Fbar, albeit with much higher inter-annual variability. In other cases (North Sea sole, Gulf of Lion hake), S1 failed to do so. Consequently, simple catch-based selectivity metrics should be avoided if F-at-age is available. In data-limited situations where F-at-age is absent, simple catch-based selectivity metrics such as S1 and S2 could potentially be used as an indication of selectivity if Fbar and recruitment are assumed to be stable. Still, as indicated by the simulation analysis of strong versus subtle changes in selectivity, these sub-optimal catch-based metrics would be useful only if strong changes in selectivity have occurred. Nevertheless, selectivity-based management, whereby minimum fish size limits and size selectivity of fishing gears are set above the mean size at maturity, is considered a sound strategy in data-limited fisheries (e.g., Prince & Hordyk, 2019;Vasilakopoulos & Maravelias, 2016). In other words, even in the absence of information on F, minimizing the catches of juveniles while increasing the catches of fish at Lopt is a good rule of thumb (Froese, 2004;Froese et al., 2008Froese et al., , 2016 is also the case for the perception of Fbar and SSB. Stock assessment specifications may also lead to selectivity changes not being exactly synchronized with changes in the selectivity metric, as is the case in the simulation analysis here. Some delay between the legislation of technical measures and the response of a selectivity metric may also be due to the gradual implementation of the technical measures by the fishing fleets. While our investigation of selectivity metrics has been quite exhaustive here, the existence of additional good selectivity metrics cannot be ruled out. Protecting juveniles is a major priority for European fisheries management, but other selectivity concerns may exist in different fisheries around the world, for example related to the protection of other age-classes, or related to changing the shape of the selectivity curve. In such cases, other selectivity metrics may need to be developed. Some of these metrics could be constructed by modifying metrics tested here. For example, in any particular case where managers want to track changes in the relative exploitation of an adult population component (e.g., big old fish), the F of the relevant age-class(es) could be used as numerator in the Frec/Fbar metric. In any case, any alternative selectivity metrics should satisfy the three criteria proposed in this study: ability to track selectivity changes, robustness to recruitment variability, and robustness to changing Fbar.
Improving selectivity, albeit important, it is not a panacea.
Within the ranges of selectivity and Fbar commonly observed in commercial fish stocks, the latter is the most important driver of stock status (Vasilakopoulos, O'Neill, & Marshall, 2011. Also, to fully understand the effects of selectivity on stock size and fisheries yield, selectivity and Fbar need to be considered together (Froese et al., 2008;Prince & Hordyk, 2019;. Therefore, a deteriorating selectivity (i.e., higher targeting of juveniles) does not necessarily need to be alarming. For example, in the cases of West of Scotland whiting and North Sea sole, selectivity has been deteriorating in recent years but Fbar has been decreasing and SSB has been increasing (ICES, 2018b; 2018c).
Conversely, selectivity has been slightly improving in the Gulf of Lion hake due to the trawling effort reduction, but the stock is still severely depleted due to a very high Fbar (GFCM, 2017).
A robust selectivity metric, such as Frec/Fbar, with a proven ability to track selectivity changes in response to technical measures could be a powerful tool for managers to promote fisheries sustainability. For example, by breaking down F into partial F-at-ages from different fleet segments and using them to infer partial selectivity, managers could identify the fleet segments that are fishing less sustainably than others and focus on improving them. Additionally, by calculating partial selectivities, managers would be able to identify the fleet segments that are affected or should have been affected by specific management measures, such as the Landing Obligation.
A lower-than-expected selectivity improvement of a fleet segment could either mean a limited uptake of the management measure, or changes in fleet behaviour to circumvent the effects of that management measure. In both cases, managers could use the selectivity metric as an objective measure of fleet performance.
Quantifying selectivity not only allows to track the effect of management measures, but also enables the systematic exploration of the stock-specific potential for higher sustainable yields which have been predicted by simulation studies (e.g., Froese et al., 2008;Froese et al., 2016;Prince & Hordyk, 2019;Scott & Sampson, 2011;Vasilakopoulos et al., 2014;. The latter is an area where future research needs to focus; the expected yields from different combinations of F and selectivity needs to be studied on a stock-by-stock basis. Subsequently, to integrate fully selectivity into fisheries management and advice, stock-specific selectivity reference points analogous to F MSY and B MSY need to be investigated. This would operationalize selectivity as a second lever, complementary to Fbar, for managers to achieve desirable states and outcomes for exploited fisheries resources.

ACK N OWLED G EM ENTS
We thank Kenneth Patterson and Norman Graham for their useful comments on an earlier version of this paper. We thank two anonymous reviewers and David Sampson for their constructive comments for the improvement of this manuscript. We also thank the scientists involved in the STECF Working Group on Technical Measures for the stimulating discussions on fisheries selectivity.

CO N FLI C T O F I NTE R E S T S
The authors confirm that there is no conflict of interest to declare in this paper.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study were derived from the following resources available in the public domain: the