Competitive performance as a discriminator of doping status in elite athletes

As the aim of any doping regime is to improve sporting performance, it has been suggested that analysis of athlete competitive results might be informative in identifying those at greater risk of doping. This research study aimed to investigate the utility of a statistical performance model to discriminate between athletes who have a previous anti-doping rule violation (ADRV) and those who do not. We analysed performances of male and female 100 and 800 m runners obtained from the World Athletics database using a Bayesian spline model. Measures of unusual improvement in performance were quantified by comparing the yearly change in athlete's performance (delta excess performance) to quantiles of performance in their age-matched peers from the database population. The discriminative ability of these measures was investigated using the area under the ROC curve (AUC) with the 55%, 75% and 90% quantiles of the population performance. The highest AUC values across age were identified for the model with a 75% quantile (AUC = 0.78 – 0.80). The results of this study demonstrate that delta excess performance was able to discriminate between athletes with and without ADRVs and therefore could be used to assist in the risk stratification of athletes for anti-doping purposes.

testing them at the right time.As a consequence, there is a need to gather additional information on athletes to provide a forensic style intelligence-led approach to anti-doping. 5Such an approach would allow ADOs to make more informed decisions about assigning athletes to registered testing pools, better targeting of individual athlete tests and ultimately more efficient distribution of their anti-doping resources.Indeed, anti-doping authorities such as the World Anti-Doping Agency and the Athletics Integrity Unit highlight the importance of an intelligence-led approach to anti-doping involving risk stratification of athletes based upon their athlete biological passport profile and performance. 6,7ny factors can affect performance such as maturation, 8 improved training 9 and technological advances. 10However, as the primary reason for an athlete to dope is to artificially enhance their performance, it is intuitive to consider the analysis of their sporting performance as important information for ADOs to inform their antidoping activities.To this effect, the most recent version of the International Standard for Testing and Investigations 7 highlights the use of sport performance history, including sudden major improvements and/or sustained periods of high performance as relevant factors indicating possible doping/increased risk of doping.Indeed, athletic performance has been shown to be sensitive to new anti-doping practices, such as the introduction of the ABP and out of competition doping tests in a range of sporting disciplines, [11][12][13] suggesting that longitudinal monitoring of athlete performance is a viable method to inform anti-doping practice.

The main objective of what we have previously termed
'the athlete performance passport' (APP) 14 is to distinguish between expected changes in sporting performance and disproportionate improvements which may be indicative of doping.We have previously developed a Bayesian hierarchical model to investigate both population-and individual-level longitudinal performance trajectories over time adjusted for age-related changes. 15Our work illustrated how individual performance progression could be modelled while allowing for confounders, such as atmospheric conditions, and could be fitted using Markov chain Monte Carlo.We calculate a term called excess performance by subtracting the population performance trajectory from the individual performance trajectory to show whether an athlete is performing better or worse than their age-matched counterparts.Therefore, as suggested above, sudden or unexpected changes in an athlete's level of excess performance might therefore be indicative of doping.Indeed, using this logic, we have previously demonstrated the potential for distinguishing between the career performance trajectories of clean and doped athletes. 16However, for use in targeted anti-doping efforts, it is necessary to identify athletes using a probability risk stratification approach.The objective of this study was therefore to validate the use of performance data to discriminate between athletes with and without previous anti-doping rule violations (ADRV).First, competitive performance results over 11 years were used to construct longitudinal profiles for individual athletes with and without ADRVs during this period; then, the performance of our Bayesian model was tested using these profiles.

| Data
We extracted 100 and 800 m results for both male and female athletes from publicly available results databases of World Athletics including athlete ID number, date of birth, sex, country of birth, country of representation, event details, performance result (time [s]) and finishing position.The 100 m data contained results from both male and female sprinters who had at least five competition results between 8 January 2011 and 28 August 2021.The database contained 2834 male athletes who have a personal best below 10.5 s and 1297 female athletes who have a personal best below 11.6 s.The male data set had 95,376 observed performances, with the female data set having 48,999 observations.The ages for males athletes ranged from 12 to 47 years, whereas females ranged from 12 and 42 years.The 800 m data set contained results from both male and female middle distance runners who had at least five competition results between 1 January 2011 and 10 April 2022.The database contained 4382 male athletes (104,594 performance results) and 3760 female athletes (92,606 performance results).We also accessed publicly available sanction data to identify athletes with a previous ADRV.These data are composed of the date and reason for the sanction.
Only sanctions imposed for substance use that have been shown to have a performance enhancing effect relevant to the discipline (i.e., 100 or 800 m) were included within the subsequent analysis.

| Modelling performance
Our methodology for modelling performance has been developed over several years (see previous studies [14][15][16][17] ).We use the specification of a Bayesian spline model documented in Griffin et al 15 to construct performance trajectories for individual athletes.In brief, our model assumes individual performances can be represented as the sum of an individual performance trajectory, the effects of sport/ discipline specific confounders and an observation error.The model is summarised by the equation below for M athletes, with y i,j indicating the jth performance for athlete i at age t i,j (measured in years) and x i,j representing any observed confounders (e.g., atmospheric conditions) for that performance.We use n i to denote the number of performances for individual i.The model is where h i is the individual performance trajectory for the ith individual, ζ is population-level regression coefficients for the effects of confounders and ϵ i,j is observation errors that are assumed to follow a standard skew-t distribution. 18This error distribution, rather than the usual normal distribution, allows for the skewness and heavy tails observed in sporting performance data (i.e., poor performances lie much further from the median performance than exceptionally good performances).We express the individual performance trajectory h i ðtÞ as the sum of two parts: the population performance trajectory gðtÞ and the excess performance trajectory of the ith athlete so that h i ðtÞ ¼ gðtÞ þ f i ðtÞ.The excess performance trajectory represents individual performances adjusted for the average performance of athletes within the population at the same age and any confounders and forms the basis of our risk stratification measure.The population performance trajectory gðtÞ is modelled as a fourth-degree polynomial, which Griffin et al 15 find is sufficiently flexible for sporting performance, and f i ðtÞ is flexibly modelled by separate Bayesian linear spline model for each athlete.The model is identified by assuming that the prior mean of f i ðaÞ is 0, where a is the smallest integer age in the database.

| Athlete risk stratification
We develop an athlete risk stratification measure using excess performance, which adjusts individual performance for the expected effects of age and confounders and therefore does not depend on absolute level of performance (which will be heavily influenced by physiological factors).To understand risk, we consider yearly changes (which we term delta excess performance) and assume that an athlete who increases their level of competitive performance more rapidly than seen in the comparator population is likely to be at greater risk of doping and therefore warrant closer scrutiny by ADOs.Specifically, at each age, we consider the 55th, 75th and 90th percentiles of delta excess performance derived from the wider population in our analysis and denote the corresponding risk scores as M1, M2 and M3, respectively.Further details of this calculation using output from a Markov chain Monte Carlo algorithm are given in Appendix A.1.Under these risk scores, athletes with larger values will have a greater risk of doping.
ROC analysis was used to evaluate the ability of the risk scores to discriminate performance profiles as either leading to an ADRV or not ADRV in the next x years.We treat this as a binary classification problem for each age and use the standard area under the ROC curve (AUC) as our metric of classification ability.This metric takes values between 0 and 1 with larger values associated with better discrimination.A value of 1 implies perfect discrimination, and 0.5 is the same as guessing at random.The use of ADRVs rather than the (unobserved) true doping statuses of athletes has some important implications.We can only consider whether an athlete receives an ADRV over a specific number of years, and so we will also define the doping status of an athlete over the same period.We define the 'doping' group to contain athletes who are, at some time during the period, involved with a doping regime that is designed to increase their performance over time, rather than those involved 'one-off' instances of doping.We will refer to all athletes not in this doping group as 'clean'.The period doping prevalence levels discussed in Section 1 imply that many doping athletes will never receive an ADRV and so the group without an ADRV will contain both doping and clean athletes.As a consequence, if our risk stratification measure was successful at discriminating between doping and clean athletes, we could still achieve a low AUC measure because many doping athletes do not receive an ADRV in the corresponding period.For example, if the risk measure could perfectly discriminate between doping and clean athletes, then athletes who are doping but have not received an ADRV will be recorded as misclassified.This will lead to an AUC value below 1 (potentially far below 1).
We quantify how the level of the mislabelling of doping athletes as without an ADRV affects the AUC metric in Section 2.4 with further details provided in Appendix A. 1.1.The difference between the group of athletes with ADRVs and the group of doping athletes (without ADRVs) also leads to the following trade-off in the choice of doping observation period.First, the doping group contains athletes who are not doping at a given age but subsequently start doping.For these athletes, our risk stratification measures will be small because the performances before the athlete starts doping will be unaffected by doping.As observation period increases, the number of such athletes will tend to increase and so increasingly affect our estimate of the AUC implying a shorter observation period is preferable.Second, because the number of athletes with ADRVs will be small relative to the total number of athletes, the accuracy of the ROC (and the AUC measure) deteriorates as the observation period become smaller implying that a greater observation period will be preferable to avoid a very small doped group.In our analysis we subsequently consider 3-, 5-and 8-year observation periods to investigate this trade-off.In order to maximise the number of ADRVs recorded for a given value of period, we combined data across combinations of discipline and sex (i.e., 100 m males and females and 800 m males and females).Table 1 shows the number of 'doped' athletes under this definition for different observation periods at a range of ages.

| The effects on the AUC of the ROC curve of doping athletes without an ADRV
As we discussed in Section 2.3, some doping athletes will not receive an ADRV which will effect estimation of the AUC of the ROC curve.
To investigate this effect further, we will distinguish between the true status of an athlete (which we will call either truly clean or truly doping) and the observed status of an athlete determined by ADRVs (which we will call either observed clean or observed doping).The true doping status could correspond to the one described in the previous section, but the analysis can be used with any definition of doping over a period.The approach makes several assumptions • There are no false positives, and so a truly clean athlete will never have an ADRV.
• The probability that a truly doped athlete has an ADRV (the prevalence of ADRV's in the truly doping group), that is, the doping detection rate, is the same for all doped athletes.Under these assumptions, the prevalence of ADRVs is estimated from the prevalence of doping divided by the detection rate.To understand these two values, consider the following example.Suppose that the prevalence over a period of 1 year is 21.2%. 19If all doping athletes only take part in a doping regime for 4 weeks randomly distributed throughout the year, every athlete was tested once at random throughout the year, and the test was perfectly accurate (i.e., the test result was positive if the athlete was doping), then the doping detection rate would be 4=52 ¼ 1=13 and the probability of an athlete receiving an ADRV would be 1=13 Â 21:2% ¼ 1:6%.This is just an example, and in practice, there are several potential confounders, such as the presence of false negatives at the testing stage, variation in doping regimes, time between doping and anti-doping test and variations in testing times.As a consequence, it is difficult to identify the size of the athlete population sub-group who are doping but do not have an ADRV.Therefore, within our model, we assume both the prevalence and proportion of doping athletes within the sub-group to be relatively stable over time, and therefore, the probability of detection to increase over the observation time period (i.e., 3, 5 or 8 years) as more athletes will test positive.This approach therefore allows us to accommodate for the aforementioned uncertainties in identification of truly doping athletes.Therefore, the probability of an athlete receiving an ADRV (if doping) would simply be calculated by dividing the number of athletes with ADRVs by the number of athletes who are defined as doping but without ADRVs, that is, having established the size of the ADRV group, the choice of prevalence can be used to establish the rate of doping detection (i.e., the probability of an athlete receiving an ADRV if doping).
We distinguish between the AUC calculated using the true labels (either truly clean and truly doped) which we will call AUC true and the AUC calculated using the observed labels (either observed clean and observed doped) which we will call AUC observed .
To illustrate the effect of doping athletes without an ADRV on the value of the AUC metric, we used estimates of doping prevalence from the work of Petr oczi et al. 19 These researchers used a randomised response technique to estimate a doping probability in the previous 12-month period of 21.2% from athletes participating at the World Athletics Championship in Daegu, South Korea.In prevalence from abnormal blood profiles (15% to 18% 3 ), and period prevalence from anonymous athlete self-reports (21.2% 19 ).As such, assuming a period prevalence of 21.2% and a low detection rate (q ¼ 0:1), an AUC observed of 0.75 equates to an AUC without mislabelling, AUC true , of 0.81.Although this difference seems quite small, the AUC metric will usually only take values between 0.5 and 1, and so if interpreted in this context, the observed change from 0.75 to 0.81 is relatively large, with values close to 0.80 suggesting very good performance.

| RESULTS AND DISCUSSION
We considered the ability of the risk measures described in Section 2.3 to correctly classify performance profiles as receiving or not receiving an ADRV over x years.Figure 1 shows how the discriminatory performance of the risk measures (as measured by the AUC metric) changes depending upon whether we consider athletes receiving an ADRV in the following 3, 5 or 8 years.
For example, when considering the model performance over a 3-year period, the AUC value for age 19 quantifies the ability of the risk measures to classify an athlete who is 19 years of age as either having or not having an ADRV in the subsequent 3 years (i.e., between ages 20 and 22).If we consider a 5-year period, we consider between the ages of 20 and 24, and an 8-year period, between the ages of 20 and 27 years.As can be seen from Figure 1, the AUC values are fairly stable for the different measures and whether the 3-, 5-or 8-year observation period is used.The risk measure M2 (which uses a threshold of 75%) and the 5-and 8-year periods give slightly higher AUC values on average than other choices.Therefore, we recommend the use of this risk measure.All risk measures perform better for the ages 19 to 23 than 24 to 29.For ages 19 to 23, the AUC metric is between 0.65 and 0.70, which suggests that the risk measures can discriminate between athletes with and without an ADRV.Particularly since, as discussed in Section 2.4, this is an underestimate of the AUC if we had access to the true doping status of athletes.For ages 24 to 29, the AUC metric is stable between 0.55 and 0.65 which suggests that the risk metrics are not able to consistently discriminate F I G U R E 1 AUC values for model performance at different ages over periods of (a) 3, (b) 5 and (c) 8 years at thresholds of 55% (M1 = x), 75% (M2 = 4) and 90% (M3 = ▽ ) using delta excess performance in 100 and 800 m athletes.Only age points that have more than five ADRV athletes are considered within the AUC analysis.
between ADRV and non-ADRV athletes for these ages.However, as shown in Table 1, it is important to acknowledge that there are a much greater numbers of ADRVs for ages 19 to 23 compared with ages 24 to 29.This may also reflect that the detection probability is lower between ages 24 to 29 and so the underestimation of the AUC is larger for these ages.
ROC analysis allows us to consider the overall ability of a risk measure to discriminate between doped and clean athletes.It is also interesting to consider how we can choose a threshold for a given risk measure above which an athlete is considered particularly high risk of doping based upon their delta excess performance.To provide an example, we will concentrate on risk measure M2 for the 100 m.We want to choose a threshold for the posterior probability that the delta excess performance falls outside (greater than) the 75% quantile risk measure across all athletes and are therefore at greater risk of doping.
We used false positive and true positive rates to identify the posterior probability level which minimised false positives and maximised identification of athletes with ADRVs.In order to assess the specificity of would result in approximately 10% of 100 m sprinters being flagged per year for delta excess performance.This level of prevalence is based upon our observation of athletes who receive an ADRV over a fixed number of years, which will be an underestimate of the true doping prevalence, and is lower than has been reported by previous self-report and randomised response studies, 1,2 due to the assumed high rate of false negatives.

| Application to the individual athlete
The output from our model is in the form of individual performance trajectories (adjusted for covariates such as seasonality and wind effects) and is presented across four different sets of analysis.As can be seen from Figure 2, the athlete with ADRV (top row) demonstrates a negative excess performance (panel b), suggesting that their performance is better than anticipated given the performance level of age-matched peers.Similarly, the delta excess Illustrative performance model plots from a male sprinter with an ADRV (top row) and a male sprinter without an ADRV (bottom row).(a) Athlete raw performance with median (solid line) and confidence intervals (dashed lines), (b) athlete excess performance with median (solid line) and 95% credible interval (dashed lines), (c) yearly delta excess performance and (d) probability of yearly delta excess performance to exceed 75th percentile of the population.Dashed vertical line illustrates the timing of athlete A's ADRV.
performance in this example has a high probability of exceeding the 75% quantile (panel d), suggesting that their performance is evolving at a faster rate than anticipated at the time of their ADRV (shown by vertical dashed line), and appears to be unabated, even after returning to competition following their doping ban.The athlete's level of excess performance (panel b) continues to increase as they age, reaching 0.6 s by the age of 34 years, that is, their performance decline with age is much slower compared with their age-matched peers.
Linked with this, there is a high probability that the athletes have exceeded the 75% quantile for delta excess performance at the time of their ADRV, acknowledging the uncertainty within the model estimates.Specifically, setting the probability of delta excess at 0.9 would flag Athlete A's performances at the ages of 21-23 and 26-27 years.
By comparison, the athlete without the ADRV (Figure 2 bottom row) who has a similar absolute performance level still demonstrates excess performance suggesting that their performance is consistently about 0.3 s better than their age-matched counterparts (panel b), but their delta excess performance is 0 s (panel c), which indicates that their career evolves at the anticipate rate for their age.As a consequence, there is a very low probability that the athlete would exceed the 75% quantile delta excess performance (panel d).Therefore, we would conclude that the athlete is a high-level sprinter that is performing better than their age-matched counterparts but at a low risk of doping.
Our retrospective analysis of competitive performance data in athletes with and without ADRVs provides an indication that longitudinal monitoring of competition results has a valuable role to play in the fight against doping in sports.Specifically, by combining this type of performance monitoring with other sources of data (e.g., biological, whereabouts and social networks), there is the potential to improve the effectiveness and efficiency of anti-doping programs and bring greater certainty to the process of athlete risk stratification.In turn, athletes with a higher probability of doping risk would therefore be subject to closer scrutiny by ADOs.Moreover, given the longitudinal nature of our modelling approach and comparison to the age-matched population performance trajectory, even though an athlete may have been 'clean' for many years, it is possible to 'detect' an abnormal change in excess performance when doping occurs at the latter part of a career to sustain a given performance level.Even though our method could be applied to any sport in which the performance is determined by a measurable outcome (e.g., in seconds, grams or centimetres), it is important to acknowledge that the model currently only considers athletic competition results in isolated disciplines.As a consequence, there is potential to miss important performance-related information where an athlete competes over multiple events (e.g., 100 and 200 m or 800 and 1500 m).

| CONCLUSIONS
This study demonstrates the utility of performance monitoring to discriminate between athletes with historical ADRVs and those without.Specifically, we demonstrate how our model could be utilised to identify athletes who are at greater risk of doping.However, it is important to recognise that high levels of delta excess performance are not sufficient to prove an athlete is doping and that information obtained from this type of analysis should be integrated with other data as part of a wider intelligence gathering approach to anti-doping.À Á restricted to athletes with performances in the period from j to j þ 1.Let i 1 , …, i j be the indices of the athletes with a performance in that period and calculate the given percentile (50% for M 1 i,j , 75% for M 2 i,j and 90% for M 3 i,j ) of med Δ i1,j À Á , …, med Δ i j ,j À Á which is written q j 4. For i ¼ 1, …, M and j ¼ a, …, b, calculate the risk measure for ith athlete in period j as the posterior probability that Δ 1 i,j is greater than q j which can be calculated by • There are no false positives, and so a truly clean athlete will never have an ADRY implying that PrðO ¼ 0jY ¼ 0Þ ¼ 1.
• The probability that a doped athlete has an ADRVs is q and is the same for all doped athletes.This implies that PrðO ¼ 1jY ¼ 1Þ ¼ q and so PrðO ¼ 0jY ¼ 1Þ ¼ 1 À q.
• The prevalence of doping is w which implies that PrðY ¼ 1Þ ¼ w or Gneiting and Vogel 20 show how the theoretical ROC curve can be written in terms of the probability distributions of the risk measure for the clean and doped groups.If we consider the truly clean and doped groups, the distribution of the risk measure for the truly clean and truly doped groups are denoted F true and G true .The ROC curve for these true groupings can be written as Similarly, we can define a theoretical ROC curve under the observed groupings.This involves the distribution of the risk measure for the observed clean and observed doped groups which are denoted F observed and G observed .We can link these distributions to F true and This allows to calculate F observed and G observed as where r ¼ 1Àw 1Àwq .This could be used to express the theoretical ROC curve for the observed groups, which is in terms of F observed and G observed (although this does not lead to a simple expression).
We now consider how AUC observed is related to AUC true .First, we can show that AUC observed can be expressed as or due to the properties of the inverse of a function composition, This final equation implies that F observed G À1 observed ðpÞ dp: the model using the 75% quantile at different age points, we assessed the false positive rate across different probability levels for delta excess performance.The true positive rate ranges between 0.20 and 0.67 across the ages due to the changes in the number of observed true positives (i.e., ADRVs) recorded at each age and athletes within the database.As an example, at the age of 21 years using a period of 3 years and a false positive rate of 0.1, a posterior probability threshold of 0.8 results in a true positive rate of 0.57.Incorporating all athlete's performance profiles in our sample (across years 2011 to 2022)

Figure 2
Figure 2 illustrates the performance trajectories for two 100 m athletes, one with and one without an ADRV.The data points in the first column represent the raw performance times of each individual adjusted for covariates.The second column represents the data adjusted by the posterior mean population performance trajectory, month and wind effects.The third column shows the delta excess performance, and the fourth column is the probability that the delta excess performance exceeds the 75% quantile of the population distribution.
Future research is needed to consider how performance-related information can be shared across different events to construct a complete performance profile for individual athletes.Moreover, future research should consider the efficacy of using longitudinal performance profiling for anti-doping purposes in team sport.The availability of large databases capturing all events generated during team sport at both match and individual player level provides an opportunity to quantify excellent performance and what separates a top player from others.The challenge in team sport is to account for the influence of tactical confounders (e.g., team formation, style of play and player role) on the physiological performance capacity of the individual (e.g., total distance covered, total distance covered within specific running speed zones, number of high speed runs, number of sprints, top speed and work:recovery times measured via time-motion analysis).Unique combinations of the above physical and tactical parameters can be used to develop appropriate agerelated physiological performance trajectories for players and thereby inform talent ID, development of training programmes and/or for anti-doping purposes.
> q j , where IðxÞ ¼ 1 if x is true and 0 otherwise.A.1.1 | The effects on the AUC of the ROC curve of doping athletes without an ADRVIn this appendix, we provide more details on understanding the effect of doping athletes without ADRVs on the ROC curve and the AUC metric including mathematical details.For a randomly chosen athlete, we define the random variables O to represent the observed status of that athlete (clean/doped) and Y to represent the true status of that athlete (clean/doped).We define O ¼ 1 if the athlete is observed doped and O ¼ 0 if the athlete is observed clean (and similarly for Y).The assumption in Section 2.4 can be expressed as follows:

Table 2 ,
19 demonstrate the impact of changes in doping detection on the ability of our performance model to discriminate between doped and nondoped athletes considering Petr oczi et al.'s probability of doping.Given the WADA 2021 Testing Figures Report,4the total percentage of adverse findings (0.65%) suggests detection is low assuming the prevalence is as high as documented in research analysing both pointT A B L E 1The number of identified ADRV cases across age intervals.The effect of mislabelling of doping status on AUC values for a doping prevalence (p) of 21.2% Petr oczi et al19with high to low doping detection rates (d).
15n the Markov chain Monte Carlo algorithm in Griffin et al,15we will use θ ðsÞ to represent the sth posterior sample of a parameter θ and assume that there are S samples.We define a and b to be smallest and largest integer ages in database respectively.We can calculate the risk measures in the following way:1.For i ¼ 1,…, M and j ¼ a, …, b, calculate a posterior sample for Δ i,j by 1Þ for s ¼ 1, …, S. 2. For i ¼ 1, …, M and j ¼ a, …, b, calculate the posterior median of Δ i,j , We calculate the percentile of the med Δ i,j Making the change of variable ð1 À pÞ !p leads toAUC observed ¼ 1 À ðAssuming that F observed and G observed are continuous implies that the composition of the functions G observed and F À1