Dissecting the cognitive phenotype of post-stroke fatigue using computerized assessment and computational modeling of sustained attention

Post-stroke fatigue (PSF) is prevalent among stroke patients, but its mechanisms are poorly understood. Many patients with PSF experience cognitive difficulties, but studies aiming to identify cognitive correlates of PSF have been largely inconclusive. With the aim of characterizing the relationship between subjective fatigue and attentional function, we collected behavioral data using the attention network test (ANT) and self-reported fatigue scores using the fatigue severity scale (FSS) from 53 stroke patients. In order to evaluate the utility and added value of computational modeling for delineating specific underpinnings of response time (RT) distributions, we fitted a hierarchical drift diffusion model (hDDM) to the ANT data. Results revealed a relationship between fatigue and RT distributions. Specifically, there was a positive interaction between FSS score and elapsed time on RT. Group analyses suggested that patients without PSF increased speed during the course of the session, while patients with PSF did not. In line with the conventional analyses based on observed RT, the best fitting hDD model identified an interaction between elapsed time and fatigue on non-decision time, suggesting an increase in time needed for stimulus encoding and response execution rather than cognitive information processing and evidence accumulation. These novel results demonstrate the significance of considering the sustained nature of effort when defining the cognitive phenotype of PSF, intuitively indicating that the cognitive phenotype of fatigue entails an increased vulnerability to sustained effort, and suggest that the use of computational approaches offers a further characterization of specific processes underlying behavioral differences.

With the assumption that a critical characteristic of cognitive fatigue is the failure to maintain or sustain cognitive effort over time, monitoring performance over time should increase sensitivity to cognitive manifestations of fatigue (Holtzer et al., 2010) and would also be closer in line with the conceptual definition of cognitive fatigue as "decreased performance during acute but sustained mental effort" (Deluca, 2005). Accordingly, the attentional network task (ANT; (Fan, McCandliss, Sommer, Raz, & Posner, 2002)) appears to be appropriate for examining the relationship between self-reported fatigue and attentional function over time in stroke patients. ANT combines a flanker test (Eriksen & Eriksen, 1974), and a cued reaction time task (Posner, 1980) in a computerized behavioral paradigm requiring sustained attention over time. The full version lasts for about 20 min, where accuracy and response times (RT) are tracked over time in 288 trials with varying cognitive demands. The ANT allows for estimation of individual-level attention network scores such as the alerting, orienting and executive components, defined as relative differences in average RTs between different flanker and cue conditions (Fan et al., 2002).
ANT has been applied in studies of fatigue and attention in other neurological patient groups, such as Parkinson's disease, where fatigue was associated with reduced efficiency in the executive attentional network (Pauletti et al., 2017) and chronic fatigue syndrome, associated with higher RT in the most cognitively demanding condition (Togo, Lange, Natelson, & Quigley, 2015).
Although representing a widely applied and valuable contribution to theories on attentional function, analytical approaches based on mean RTs are vulnerable to tradeoffs between speed and accuracy which are not accounted for in the model (Miller & Ulrich, 2013), and they do not provide information about which underlying mechanisms give rise to observed RT differences. In contrast, computational approaches such as the drift diffusion models (DDM; (Ratcliff, 1978)) simultaneously model the full distribution of RTs and accuracies to estimate parameters reflecting specific theoretical cognitive constituents of the decision process. DDMs are frequently applied to simple and speeded decision-making tasks (Ratcliff & McKoon, 2008;Ratcliff, Smith, Brown, & McKoon, 2016), offering both a theoretical framework to understand basic cognitive processes, and a psychometric tool to translate behavioral data into subcomponents of cognitive processing (Ratcliff & McKoon, 2008). DDMs conceptualize decision-making as a noisy process where information is accumulated over time, continuing until a decision threshold is reached and a response is initiated (Ratcliff & McKoon, 2008). Four parameters are postulated in the original model (Ratcliff, 1978): drift rate (v), describing the rate or the speed of information accumulation, reflecting processing efficiency; non-decision time (t) representing time needed for stimulus encoding and response execution; decision boundary separation (a) indicating how much evidence is needed before a decision is made; and the starting point (z), reflecting any bias toward one of the two responses (Ratcliff & McKoon, 2008). The parameters have been validated in various experimental paradigms (Lerche & Voss, 2017;Voss, Rothermund, & Voss, 2004).
Applying computational models such as the DDM in clinical research may allow for a dissection of specific cognitive processes underlying observed group and individual differences in RT patterns. For example, assessing young and older subjects with a signal detection task, Ratcliff, Thapar, and McKoon (2001) found that the prolonged RTs often observed in older individuals were not explained by slower drift rates but rather longer non-decision times and higher decision thresholds, which provided a relevant adjustment to the long-held notion of a general slowing in cognitive aging (Brinley, 1965;Salthouse, 1985). In the context of stroke patients and PSF, such computational approaches may provide a valuable, supplementary tool to expand our understanding of cognitive function beyond conventional methods of neuropsychological assessment and statistical analysis.
In sum, a large number of stroke patients suffer from PSF, and many experience cognitive difficulties and cognitive fatigue. Attentional deficits may be particularly involved. The ANT paradigm allows us to determine whether and how subjective fatigue manifests cognitively during prolonged effort, and assess associations between subjective fatigue and efficiency of the attentional networks. With the aim of characterizing the relationship between subjective fatigue and attentional function, we collected behavioral data using the ANT and self-reported symptoms of fatigue using the fatigue severity scale (FSS; (Krupp, LaRocca, Muir-Nash, & Steinberg, 1989) from 53 chronic stroke patients (>6 months since hospital admission). We hypothesized that self-reported symptoms of fatigue as measured by the fatigue severity scale (FSS; (Krupp et al., 1989) would interact with time on task, manifesting in an increase in RT for patients with high fatigue levels relative to patients with low levels of fatigue. Further, we expected to find a negative association between fatigue and executive network functioning, in line with previously mentioned literature. Main analyses were conducted with FSS score as a continuous predictor, and follow-up sensitivity analyses were conducted with PSF group (high/low PSF) as a factor predictor, or separately for patients with high/low PSF to assess manifestation of group differences. Lastly, evaluating whether DDM modeling can elaborate our understanding further by characterizing the specific cognitive processes underlying observed differences in RT patterns, we performed an exploratory analysis where we fitted a hDDM to the ANT behavioral data and tested for associations between the model parameters (drift rate (v), non-decision time (t) and boundary separation (a)) and fatigue (FSS) score. To account for the temporal aspects of task performance, we specifically tested for interactions between FSS, trial number and performance. In line with our first hypothesis, we hypothesized that any associations between subjective fatigue and model parameters will interact with time, with increasing associations between fatigue and model parameters with more sustained performance.

| Sample
Stroke patients who had been previously admitted with acute stroke to the Stroke Unit, Oslo University Hospital, or the Geriatric Department, Diakonhjemmet Hospital, between 2013 and 2016, were invited by letter. Patients had to be in a chronic phase, defined as minimum 6 months post-stroke, with no other severe neurological, psychiatric or neurodevelopmental conditions. Among the approximately 900 invitation letters, 250 patients responded to decline or obtain more information. Seventy-seven were interested and eligible for inclusion and provided informed consent. Nineteen of the 77 patients withdrew during the course of the study and before the data for the current paper were collected. Four additional patients were excluded because of medical conditions. One patient was excluded due to behavioral criteria for the ANT (see below), resulting in a final sample of n = 53 stroke patients. Table 1 summarizes relevant demographic and clinical information of the patient group, and Figure 1 shows the age distribution. This work was part of an intervention study on cognitive rehabilitation after stroke with a double baseline, randomized controlled design (see (Kolskaar et al., 2019) for more details, including a description of overall study design).
All data for the current study were collected from the baseline assessments prior to the intervention, starting 6-45 months after the acute stroke.

ULRICHSEN Et aL.
Mini-Mental Status Examination scores < 24 may indicate cognitive impairment and warrant further examination (Strobel & Engedal, 2008). One patient scored below 24, but further neuropsychological assessments done by a clinical psychologist indicated that cognitive function was sufficient for participation and that the inclusion criteria were not violated. The study was approved by the Regional Committee for Medical and Health Research Ethics, south-east Norway. All participants provided their written informed consent prior to inclusion.

| FSS
Fatigue was measured by the FSS (Krupp et al., 1989), which is a one-dimensional, 9-item self-report scale, and one of the most frequently used measures to assess fatigue after stroke and other neurological conditions (Cumming et al., 2016;Lerdal et al., 2009;Whitehead, 2009). The nine items are statements about impact of fatigue on different areas of daily life, and responses are given on a nine-point Likert scale reflecting degree of agreement (minimum mean score 1, maximum mean score 7). A review of 22 fatigue measures concluded that FSS was among the three scales that demonstrated good psychometric properties, as well as sensitivity to change in fatigue over time (Whitehead, 2009). Figure 1 shows the distribution of mean FSS scores by sex. Average FSS score was 3.53 (SD = 1.46), and 35% of the patients reported mean FSS > 4, which is a commonly adapted threshold for clinical fatigue in stroke studies (Krupp et al., 1989;Schepers et al., 2006;Tang et al., 2010). Table S1 shows the mean scores per item for patients with-and without PSF according to this cutoff value, offering a more detailed characterization of fatigue complaints in the sample. The PSF group scored significantly higher on all items.

| PHQ-9
Depressive symptoms were measured by the self-report scale Patient Health Questionnaire (PHQ-9; Spitzer, Kroenke, Williams, & Patient Health Questionnaire Primary Care Study, 1999). PHQ-9 consists of nine items based on the DSM-IV criteria for depression. These are scored 0-3, providing severity scores ranging from 0 to 27. Briefly, sum scores of 5, 10, 15 and 20 represent mild, moderate, moderately severe and severe symptom levels. Average PHQ score in the patient sample was 4.79.

| Attention network test
A conventional version of the ANT was applied, as previously described (Fan et al., 2002). In the ANT, accuracy and response times (RT) are tracked over time in trials with varying cognitive demands in a computerized paradigm. By combining a flanker test (Eriksen & Eriksen, 1974), and a cued reaction time task (Posner, 1980), the ANT estimates network scores as relative differences in mean RTs between different flanker and cue conditions (Fan et al., 2002). Figure 2 depicts the details of the task. Briefly, participants were instructed to direct their gaze at fixation cross that was presented with a duration of 400, 800, 1,200 or 1,600 milliseconds. Immediately following the fixation cross, one out of four cue conditions would appear for 100 milliseconds; no cue, a center cue (temporal cue only), a double cue (temporal cue only), or a spatial cue (temporal and spatial cue), alerting the attention toward the stimulus about to appear. Then, five small arrows or lines were presented for 1,700 milliseconds, and the task was to, as quickly and correctly as possible, decide whether the middle arrow (target arrow) was pointed left or right. Participants responded by pressing the left or the right mouse button. The four flanker arrows/lines surrounding the middle, target arrow could point in either the same direction (congruent flankers) or the opposite (incongruent flankers) direction as the middle, target arrow, or they could simply be lines without direction, constituting neutral flankers. The flanker arrows/lines represent the different stimulus conditions associated with different cognitive demands, where incongruent flankers typically result in the highest error rates and RTs (Westlye, Grydeland, Walhovd, & Fjell, 2010).
Starting with a practice run of 24 trials, the full test consisted of 288 trials, divided into three rounds (96 trials per round), lasting about 20 min. Participants were instructed to take a short break between rounds. For setting up the experiment and collecting responses, E-prime software (Psychology Software Tools, Pittsburg, PA) was applied.

| Outlier exclusion and data cleaning
Trials with RT < 200 ms, thought to reflect fast guesses, were removed from the analysis, in line with previous ANT reports (Chang, Pesce, Chiang, Kuo, & Fong, 2015;Westlye et al., 2010). 2% of the responses were removed due to this criterion. Participants having more than 50% incorrect responses within any of the flanker conditions were discarded. One participant was removed due to this criterion.

| Associations between FSS, time and RT
In order to characterize the relationship between subjective fatigue (FSS_z), time (trial 1-288) and RT, we applied linear mixed-effects models using the lme function from the nlme package in R (Pinheiro, Bates, DebRoy, & Sarkar, 2013). Following the recommendations from Barr, Levy, Scheepers, and Tily (2013), we started with a maximal model, including by-subject random slopes for FSS_z * time at the subject level, in addition to random intercepts, and all fixed effects or covariates of potential interest. These were z-normalized FSS scores × time, age, sex, flanker condition, stroke topography (left or right hemisphere, brainstem/cerebellum), lesion volume (defined by number of voxels affected), TOAST classification for stroke etiology (large artery artherosclerosis, small vessel occlusion, cardioembolism or "other known or unknown factors"), NIHSS scores and z-normalized PHQ scores. Non-converging models were dealt with by sequentially simplifying the fixed effect structure until reaching convergence. The full model did not converge, and we dropped NIHSS, on the basis that the variability in NIHSS scores was small (mean = 11.4, median = 1, SD = 1.23), reflecting the fairly highly functioning patient sample. Next, we removed TOAST classification of stroke etiology, due to a large number of cases in the "not specified/unknown" category, and then excluded PHQ scores because of high correlations with FSS.
The most complex converging model was specified as follows: lme (RT ~ FSS_z * time + age + sex + flanker + lesion volume + lesion location, random = 1 + FSS_z * time|id, data = data, method = "REML"). As a formal test of whether random slope effects were warranted, we used the ANOVA function in R to compare model fit between this model and a similar model without a random slope term, and results indicated that random slopes should be included. To further refine the model, we tested whether removing independent variables that did not provide predictive value improved model fit. Model fit improved marginally by removing lesion volume and lesion location. As an indication of FSS effect size, we compared the final model with a model that did not include FSS score. Model formulae and notes on model selection are provided in Table S2.
Assessing whether PSF status (PSF defined by mean FSS score > 4, in line with common practice (Krupp et al., 1989;Schepers et al., 2006)) interacted with the effect of time/trial number, we reran the above-specified regression model with PSF status included in the model instead of FSS score as a continuous measure. Additionally, to test whether effects varied between flanker conditions, we estimated the full regression model separately for each flanker condition. In these follow-up models, random slopes were not estimated in order to secure convergence. To explore whether the relationship between time, fatigue and performance manifested differently according to PSF status, we repeated the above-described within-flanker linear mixed-effects models within patients with PSF and patients without PSF.
Importantly, to test whether potential effects were specific for fatigue or could be explained by depressive symptoms, the full linear mixed-effects model was repeated with PHQ instead of FSS, keeping all other model specifications constant.
In all analyses, the time variable refers to trial number (1-288).

| Associations between conventional ANT network scores and FSS
Based on a previous definition (Westlye et al., 2010), we computed the conventional ANT network scores orienting, alerting and executive control network based on median RTs: To assess the association between estimated attentional networks and subjective fatigue, we ran a linear model for each attentional network and tested for main effects of FSS, covarying for age and sex. We then estimated change in network efficiency over time (network slope) for each network and fitted this to a linear model adding FSS, age and sex as predictors to test for interactions between attentional networks, time and FSS. Network slopes were created in two steps: First, we ran linear models for each patient within each flanker and cue condition separately, predicting RT by trial number. Then, change in network efficiency/network slope was calculated for each patient by subtracting the betas from the first models in the same way as outlined above, that is: Following the same procedure as in the RT models above, we reran the network analyses replacing FSS with PSF status as independent variable, to investigate whether attentional networks were differently affected by time dependent on PSF status.
As an additional test of potential associations between subjective fatigue and stroke-related variables, we estimated the correlations between FSS score, NIHSS score, lesion volume and months since stroke, respectively.

| Hierarchical drift diffusion modeling
Cleaned RT and accuracy data were submitted to hierarchical drift diffusion modeling by use of the python toolbox HDDM (Wiecki et al., 2013). HDDM uses hierarchical Bayesian parameter estimation, which provides enhanced statistical power and allows for estimation of both individual and group parameters simultaneously (Wiecki et al., 2013). We applied mildly informative priors and starting points as predefined in the toolbox (Wiecki et al., 2013). We did not estimate any bias in starting point. The data were accuracy-coded (accurate responses = 1, erroneous responses = 0). In addition to the data cleaning described above, an outlier mixture model included in the HDDM was applied, which assumes that a fixed proportion (5%) of trials are outliers that come from a uniform distribution not generated by a diffusion process (Wiecki et al., 2013). A mixed-effects model allowing for some outliers has been shown to provide a better fit in likelihood models than models not allowing for any outliers at all (Wiecki et al., 2013).

| Model selection/ defining parameters
When parametrizing the hDDM, we tested different cognitively plausible models to identify the model that best explained data, guided by the theoretical assumption that drift rate (v) should be allowed to vary as a function of stimulus difficulty condition (Ratcliff, Smith, & McKoon, 2015). Further, decision threshold (a) was assumed to be constant across stimulus conditions, following the logic that if a varies with stimulus conditions, the participant would have to first identify the condition, before adjusting threshold and then start accumulating information from the stimulus, a sequence of events that does not seem plausible (Thapar, Ratcliff, & McKoon, 2003). Non-decision time (t, stimulus encoding and motor responses) was not expected to be affected by flanker condition, given that the visual stimuli were highly similar across flanker conditions and motor responses were simple and identical across conditions (simple button press).
Building on the above-mentioned assumptions, we estimated different models and tested which combination of parameter fixations provided the best model fit. See Table 2 for an overview of models tested.
Variability estimates were included in the preliminary models, but were discarded as they failed to converge adequately and slightly worsened model fit. Variability parameters are often estimated poorly, and less complex models may improve estimates of the parameters of interest (Lerche & Voss, 2016 regression model where all three parameters (a, t and v) were allowed to vary by the interaction term. To further explore which parameter fixations provided the best model fit, we ran nine simple models with (a) the main effect of time on either a, t or v; (b) the main effect of FSS on either a, t or v; and (c) the FSS*time interaction on a, t or v separately. Finally, we estimated the best model with individual regressors and group only regressors. Model fit was assessed by comparing the deviance information criteria (relative DIC values) between models. In Bayesian analyses, the DIC provides an estimation of fit of the model to the data, where lower DIC values indicate that the model has better support (François & Laval, 2011). In models where individual regressors were estimated, we simulated data from the respective models and performed posterior predictive checks (PPC) to evaluate whether the model was able to reproduce central patterns in the observed data (Wiecki, 2016). 500 data sets were simulated by drawing 500 samples for each parameter from the estimated posterior distribution. The simulations thus capture the uncertainty in the estimated model and allow for comparisons with the observed data.
Final choice of model was based on a combination of model fit and convergence (see below).

| Estimating the posterior distributions and assessing convergence (model diagnostics)
We used a Bayesian framework and Markov chain Monte Carlo sampling (MCMC) to estimate the posterior distributions (Kruschke, 2014). In the preliminary models, when testing and comparing parameter fixations, models were estimated on 1,500 or 6,000 samples. The final model was run on 12,000 samples. To improve convergence, the 4,000 first samples were discarded, and thinning was set to 2 (keeping only every second sample).
A valid model should demonstrate convergence of the MCMC chains (Wiecki, 2016). Convergence was assessed by plotting and visually inspecting traces and autocorrelation plots for each estimated parameter. As a more formal test of convergence, the Gelman-Rubin statistics (R^; (Gelman & Rubin, 1992) were calculated. These values should be close to 1 and not exceed 1.1 if the chains have converged successfully, that is, if the samples of the different chains are similar (Wiecki et al., 2013).

| Hypothesis testing within the hDDM
Effects of task and cue conditions, as well as the effects of time and fatigue status, were determined by Bayesian hypothesis testing, by assessing the degree of overlap between posterior distributions. If less than 5 percent of the posterior distributions of two parameters overlap, the difference is said to be credible, or an effect is credibly different than null when at least 95 percent of the posterior distribution does not contain zero. Table 3 shows mean RT and error rates for each flanker condition. Two-tailed, one-sample t tests revealed significant differences in RT between incongruent and congruent condition, M = 111, CI = 101-122, t(52) = 20, p < .001, between incongruent and neutral condition, M = 124, CI = 112-136, t(52) = 20, p <.001 and between congruent and neutral condition, M = 12, CI = 3.7, t(52) = 3.6, p < .001.

| ANT behavioral results
There was no significant association between FSS and mean RT across (r = .09, p = .48) or within conditions (incongruent flanker: r = .05, p = .67, congruent flanker: r = .11, p = .47, neutral flanker: r = .12, p = .37). There was no association between FSS and error rate (r = −.09, p = .48). Table 4 shows the summary statistics from a linear mixedeffects model testing for associations between RT and FSS, time, sex, age and flanker condition for all conditions simultaneously. The model including FSS score performed significantly better than the model not including FSS score as indicated by ANOVA model comparison, supporting the predictive value of FSS (L.ratio(1) = 19.09, p < .001, see also Table S2).

| Associations between FSS, time and RT
The model presented in Table 4 was also run with lesion volume and lesion location as independent variables to control for effects related to lesion characteristics. As both volume and location displayed low predictive value and did not improve model fit, they were not included in the final analyses. Results from the linear mixed model including lesion volume and lesion location are presented in Table S3.   Table 5 presents summary statistics for models estimated for each flanker condition separately, estimating the effect of FSS score, time (trial number) and the interaction effect between time and FSS. Figure 3 shows the estimated RT (output from Table 5) plotted by group (PSF vs. non-PSF patients, based on mean FSS score ≥ 4). Briefly, after Bonferroni correction for multiple comparisons (corrected alpha 0.5/8 = .006), the interaction between time and FSS was significant in the neutral and incongruent condition, as was the association between age and RT, indicating that age was associated with increased RT across conditions. There was no significant main effect of FSS on RT in any condition.
Corresponding linear mixed model with group (PSF status) instead of FSS score revealed similar associations with the various independent factors as presented in Table 4, except for identifying a negative main effect of time (β = −0.05, SE = .01, t = −3.31, p < .001). The interaction effect between PSF status and time was comparable to that of FSS score and time, albeit smaller (β =0.06, SE = .01, t = 2.27, p = .022), and only nominally significant. All results from the model with PSF status as predictor are presented in Table S4. Mixed-effects models with PHQ score included instead of FSS did not indicate significant interaction effects between depressive symptoms and time on RT in any flanker condition. Table 6 shows summary statistics from linear mixed models estimating the main effect of sustained performance (time) on RT in the various flanker conditions, conducted separately for patients with and without PSF. In this model, that included only trial number (time) and not FSS score as predictor, the results suggested that patients without PSF demonstrated more speeded RTs in the incongruent condition during the course of the experiment, while patients with PSF did not show any significant changes in RT in any condition.

| Associations between FSS and other clinical measures
There was no correlation between FSS score and months since stroke (r = .00, p = .97), between FSS score and lesion volume, indicated by number of voxels affected (r = −.14, p = .30) or FSS score and stroke severity, indicated by NIHSS score (r = .10, p = .46). FSS score was positively correlated with PHQ score (r = .47, p < .001).

| Associations between ANT network scores and FSS
One-sample t tests revealed significant group-level network score effects for executive control network (M = 0.18, CI = 0.17-0.20, t = 21.57, p < .001), orienting network (M = 0.06, CI = 0.05-0.08, t = 11.27, p < .001) and alerting network (M = 0.04, CI = 0.03-0.06, t = 7.47, p < .001). Table 7 shows summary statistics from linear models estimating the associations between ANT network scores and FSS. Whereas the analyses revealed a nominally significant negative association between FSS and the executive network score (t = −2.23, p = .03) and a negative effect of age on the alerting network (t = −2.17, p = .03), no associations remained significant after correction for multiple comparisons. Table 8 shows linear models testing associations between ANT network efficiency change over time (network slope) and FSS score, age and sex. Results suggested a (nominally) significant association between executive slope (network efficiency change over time, where positive score indicate efficiency) and FSS (t = 2.24, p = .029). No associations remained significant after correction for multiple comparisons. See Figure 4 for network slopes plotted against FSS scores. Follow-up linear models with PSF status as predictor instead of FSS did not support a significant main effect of PSF status (t = −0.73, p = .466) on executive network slope.

| hDDM regression models
The best fitting model that showed adequate convergence allowed drift rate (v) to vary across flanker conditions, non-decision time (t) to vary across warning cue conditions and time while boundary separation (a) was kept constant ("v ~ flanker," "t ~ warningcue +time"). In this grouplevel model, no Gelman-Rubin statistics (R-hat values) were > 1.1, and chains and autocorrelations confirmed adequate convergence for all parameters.
A less restricted model where all parameters were allowed to vary by the FSS*time interaction term generated a slightly better model fit (DIC value −16,608 vs. −16,602), but worse convergence in terms of (R-har values > 1.1), chains and autocorrelations. This model was therefore discarded as not sufficiently valid. Estimations of the best fitting model ("v ~ flanker," "t ~ warningcue + time") on the individual level produced the best fit in terms of DIC values, but posterior predictive checks indicated that the models did not sufficiently reproduce observed patterns in the data and the standard deviations for t:FSS and t_time:FSS showed suboptimal convergence.

| Effect of time and FSS on t,
non-decision time Figure 5 shows the posterior distributions for non-decision time, t. hDDM provided support for a negative main effect of time (P(t_time < 0) = 0.98) on non-decision time, indicating that time needed for stimulus encoding and response execution decreased during the course of the test. hDDM did not identify a main effect of FSS on non-decision time (P(t_ FSS > 0) = 0.77). In contrast, the model provided evidence for a positive interaction effect between time on task and FSS on non-decision time (P(t_time:FSS > 0) = 1.00), suggesting that the association between FSS and non-decision time increased during the course of the experiment, so that patients with high levels of fatigue were more negatively affected by time on task (resulting in higher non-decision times), than patients low on fatigue. The interaction effect is small, but robust (posterior distribution not overlapping the null, model displaying good convergence), and it is in the opposite direction of the main effect of time when FSS is not accounted for.

| Effect of warning cue on t, nondecision time
Figure 6 (left) shows the posterior probability plot for nondecision time (t) as a function of warning cue (intercept: center cue). Non-decision time was lowest for cue conditions "up" and "down". "No cue" resulted in the highest non-decision time out of all cue conditions. Thus, model evidence suggests that the presence of cues facilitated the process of stimulus encoding and response execution, and most efficiently so when the cues provided both temporal and spatial information ("up" and "down").

| Effect of flanker conditions on drift rate
Figure 6(right) shows the posterior probability plot for the drift rate (v) estimated by flanker condition (intercept: congruent condition). The model provided strong evidence supporting that drift rate was lower in the incongruent condition compared to both congruent and neutral condition (P(v_Incongruent < v_Congruent) = 1.0, and P(v_Incongruent < v_Neutral) = 1.0), suggesting lower rates of evidence accumulation in the cognitively most demanding condition (incongruent flanker with cognitive conflict). Drift rate was highest in the neutral condition (P(v_Neutral > v_ Congruent) = 1.0, P(v_Neutral > v_Incongruent) = 1.0).

| DISCUSSION
Post-stroke fatigue is a common and debilitating symptom in stroke patients, yet its mechanisms are poorly understood. Many patients suffering from PSF report increased fatigue and cognitive difficulties when engaging in cognitive tasks, but previous studies have largely failed to establish robust associations between subjective fatigue and cognitive performance. The scarcity of evidence may be due to the use of instruments lacking cognitive sensitivity and specificity, and tests that do not account for the effect of time on task.
In the current study, we aimed to characterize the relationship between subjective fatigue and attentional function, taking duration of effort into account. To this end, we collected behavioral data using ANT and self-reported fatigue using FSS from 53 chronic stroke patients. First, we tested the assumption that FSS scores would interact negatively with time on task, manifesting in a performance decline for patients with high fatigue relative to patients with low fatigue. Results from linear mixed models provided support for this hypothesis, identifying significant interactions between FSS score and time on RT in the neutral and incongruent flanker conditions. In these whole sample models, no significant main effects of time or FSS were identified. Interestingly, when examining the main effect of time separately for patients with and without PSF, results revealed that non-PSF patients significantly improved RTs over time in the most cognitively demanding condition, while the PSF group did not demonstrate significant improvement.
These findings underscore the relevance of taking time on task into account and measure sustained performance when addressing fatigue. Because the study design does not allow causal inference, the observed interaction between subjective fatigue and RT may be either a manifestation of fatigue, a cause of fatigue (i.e., that attentional difficulties give rise to fatigue), or both. Providing a speculative theoretical context, the coping hypothesis (Van Zomeren, Brouwer, & Deelman, 1984;Van Zomeren & Van den Burg, 1985) offers one explanatory framework for the observed interaction. Originally articulated in relation to traumatic brain injury patients, this view suggests that the chronic effort needed to compensate for subtle, cognitive deficiencies gives rise to secondary symptoms, hereunder fatigue. Hence, subtle cognitive deficits associated with stroke may be temporarily disguised by a compensating and temporary increase in cognitive effort. However, this compensation comes with the cost of increased feeling of fatigue, in particular during sustained effort. In line with this, the current interaction between time on task and fatigue may be understood as a result of increased cognitive effort, producing increased tiredness over time, resulting in suboptimal performance. The concept of "cognitive compensation" also ties well with evidence from the split sample analysis indicating that non-PSF patients' performance benefitted from practice (sustained performance) in the most cognitively demanding condition, while the PSF group did not improve with practice. This may reflect a weakening of learning effects due to cognitive compensation costs as described above, or, alternatively, a failure to benefit from practice due to increasing fatigue.
The interaction between FSS and time can also be mediated by motivation, with high levels of fatigue leading to reduced motivation and suboptimal performance. Accordingly, the role of motivation is implied by the high scores on the FSS item reflecting reduced motivation when feeling fatigued.
Regardless of the specific theoretical account, the results can be understood as lending support to Holtzers definition of cognitive fatigue as "an executive failure to monitor and optimize performance over acute but sustained cognitive effort resulting in performance that is lower and more variable than the individual´s optimal ability" (Holtzer et al., 2010, p. 123).
It should be noted that the interaction between time on task, self-reported fatigue and RT did not change when depressive symptoms were added to the model. Moreover, when testing the model with PHQ score on the interaction term instead of FSS, we did not find any interaction effects between depressive symptoms and time. These results suggest that although fatigue and depression are overlapping and correlated clinical phenomena, the specific characteristics of fatigue may be more strongly associated with sustained attentional performance during the course of a demanding cognitive task.
Results did not reveal any significant association between stroke location/laterality or lesion volume and outcome variables (RT or FSS), suggesting that, in this sample, lesion location and volume are not strong predictors of subjective fatigue or attentional function as measured by ANT. Whereas the lack of a robust relationship between lesion location/lesion volume and FSS score is in line with previous reports (Choi-Kwon, Han, Kwon, & Kim, 2005;Mead et al., 2011), the literature is not conclusive, and right hemispheric lesions are frequently associated with attentional dysfunction and neglect (Robertson, Ridgeway, Greenfield, & Parr, 1997;Spaccavento et al., 2019;Vallar & Perani, 1986). The current lack of predictive value of stroke location highlights the complex etiology of attentional function in chronic stroke patients. However, we cannot rule out that different operationalizations of attentional dysfunction or alternative categorizations of lesion location could reveal stronger associations.
Our hypothesis that FSS scores would be associated with overall reduced executive network efficiency was not supported, and no associations between fatigue and attentional networks remained after correcting for multiple comparisons. This finding does not support previous studies on fatigue in neurological conditions, linking fatigue to reduced efficiency of the ANT executive network (Holtzer et al., 2010;Togo et al., 2015). There was, however, a nominally significant association between change in executive network efficiency over time (network slope) and fatigue, indicating that patients with higher levels of fatigue exhibited a larger decline in executive network efficiency with sustained effort than patients reporting lower levels of fatigue. Although these findings did not remain after corrections for multiple comparisons, they may suggest that subjective fatigue is less associated with reduced executive attention per se, and more with an increased susceptibility to distractors when the attentive system is put under sustained pressure.
Suggestive FSS network effects were only observed in the executive network. It is unclear whether the lack of alerting and orienting network effects reflects that subjective fatigue is related to executive attention exclusively, or rather reflects psychometric properties of the ANT networks. A psychometric evaluation of the ANT networks based on 15 previous studies (MacLeod et al., 2010) reported that the power to identify significant effects varied across networks, while network reliability was consistently highest for executive network effects, and low to medium for alerting and orienting network effects.
Because traditional analyses based on observed data alone do not allow for any inference regarding the specific cognitive processes that may underpin differences in RT, we performed an exploratory analysis where we fitted a hierarchical drift diffusion model (hDDM) to the ANT behavioral data.

ULRICHSEN Et aL.
This computational dissection of the ANT data indicated that the interaction between fatigue and time on RT was best explained by non-decision time, and not the speed of evidence accumulation (drift rate) or response style (boundary separation). hDDM revealed no main effect of FSS on any of the model parameters, but provided evidence of an interaction between time and FSS on non-decision time, indicating increasing effects of FSS during the course of the experiment. In this respect, the results concurred with the linear mixed-effects models on RT data, suggesting stronger associations between fatigue and hDDM parameters with more sustained performance, and indicate that hDDM is sensitive to fatigue in a cognitive context when explicitly modeling the interactions with time.
Non-decision time (t) comprises both sensory encoding and motor response output (Ratcliff & Smith, 2010). The fact that model evidence was stronger for the models where the interaction between FSS and time was estimated on non-decision time, rather than on drift rate or boundary separation, indicates that fatigue may be specifically associated with non-decision aspects of the response process, such as stimulus encoding or response execution rather than with the speed or efficiency of the evidence accumulation or with the decision threshold (i.e., how much information is required before making a decision). Previous studies have reported higher non-decision times in older compared to younger individuals (Ratcliff et al., 2001), and in this respect, patients reporting high fatigue are responding more like elderly individuals, but only after sustained exertion.
It is also interesting to note that the negative main effect of time (in non-decision time) suggested by the current model is in line with previous drift diffusion research on practice effects, identifying a reduction in the non-decision component across trials (Dutilh, Vandekerckhove, Tuerlinckx, & Wagenmakers, 2009). In this context, the current positive interaction between time and fatigue on non-decision time may be understood as fatigue counteracting the otherwise beneficial effects of practice.
This ties well with results from the linear mixed-effects models, suggesting that patients with PSF did not improve performance over time, in contrast to patients without PSF who got faster with during the course of the session. However, this effect was found in the incongruent condition, defined by flankers, while in the reported hDDM results, non-decision time primarily accounts for variance introduced by cues. One explanation could be that responses in the incongruent condition require an inhibition of the dominant motor response after the decision is made and that the identified interaction between fatigue and time on non-decision time is driven by a stronger slowing of these inhibitory responses in patients with higher levels of fatigue.
Research aiming to delineate the nervous system pathophysiology of PSF may further inform hypotheses about this apparent link between non-decision time and fatigue. Applying transcranial magnetic stimulation (TMS), a previous study (Kuppuswamy, Clark, Turner, Rothwell, & Ward, 2014) reported higher motor thresholds in stroke patients with high fatigue and suggested that patients with PSF experience diminished excitability of motor pathways, regarding both corticospinal outputs and facilitatory inputs. In this respect, the current observation of an interaction between time and fatigue on non-decision time might reflect altered neuronal excitability. However, how such neurophysiological mechanisms would translate into the subjective perception of fatigue remains unclear. Here, the perception of effort might be central, in the sense that subjective fatigue may manifest when volitional motor cortex input does not longer produce the expected output due to reduced excitability (Kuppuswamy et al., 2014).
These explorative results based on computational modeling provide a novel account of the specific cognitive underpinnings of PSF. When the task context is appropriate, DDM parameters can be interpreted directly (Froehlich et al., 2016) and thus provide insight into the modular and temporal evolution of the decision process. Decision boundary separation (a) adjusts the trade-off between speed and accuracy (Pedersen, Frank, & Biele, 2017). Large estimates of (a) are typically interpreted as indicative of a conservative decision style, associated with higher RTs but more accurate responses (Pedersen et al., 2017). Larger estimates of drift rate (v) are typically interpreted as more efficient information processing and are expected to vary by "the quality of the information extracted from the stimulus" (Ratcliff & McKoon, 2008, p. 3), implying that experimental conditions varying in difficulty should produce different drift rates (Ratcliff & McKoon, 2008). In line with this, and in agreement with previous studies estimating the effect of stimulus difficulty on drift rate (Voss et al., 2004), hDDM identified a credible effect of flanker type on drift rate, with the more cognitively demanding incongruent condition resulting in the lowest drift rate, while neutral flankers yielded the highest drift rate.
To sum, results from linear mixed-effects models suggested that subjective fatigue interacts with time on task, possibly counteracting practice effects, in particular in the most cognitively demanding incongruent flanker condition. Group analyses revealed that patients without PSF improved performance over time in the incongruent condition, while the PSF group did not. Additionally, higher FSS scores were associated with declining efficiency in the executive network over time. However, the effect was small and was not associated with PSF status. Lastly, hDDM modeling identified an interaction between fatigue scores and time on non-decision time.
Some limitations should be considered when interpreting the results of the current study. In line with most clinical studies, the study design does not allow for causal inference. Still, the findings may pave the way for future clinical or experimental studies examining possible causal mechanisms and subsequent interventions. Moreover, as subjective fatigue can manifest as both a normal and a pathological phenomenon, and no universally accepted definition or criteria of PSF exists, we adopted an explorative approach, aiming to characterize the relationship between subjective fatigue and sustained attentional performance by a continuous measure symptom scale. While our main objective was not to identify case-control differences between stroke patients and healthy controls, but rather to characterize the cognitive correlates of post-stroke fatigue using computational modeling of response patters, future studies adding a healthy control group may provide stronger interpretations regarding the clinical sensitivity of the computational behavioral parameters.
The distribution of NIHSS scores indicates that the current patients were sampled from a relatively healthy part of the full population of stroke patients. It is possible that a higher fatigue symptom burden on the group level could reveal associations that were not expressed in this relatively well-functioning patient sample. Reported fatigue levels are, however, comparable with what has been reported in other studies (Wang, Wang, Wang, & Chen, 2014) and higher than what is reported in healthy control samples (Valko, Bassetti, Bloch, Held, & Baumann, 2008). Further studies are needed to test the generalizability of the findings to different and more severely affected patient samples.
Although the classical version of ANT (Fan et al., 2002) appears to be a suitable paradigm to target cognitive aspects of PSF, as performance requires sustained attentional and executive resources (Holtzer et al., 2010), other versions of the test, like the ANT-I Vigilance task (Roca, Castro, López-Ramón, & Lupiánez, 2011), could offer a more comprehensive account of relevant, associated processes like vigilance. It should also be noted that the error rate in the sample was low. This might have implications for the validity of the results from hDDM model, because the model estimates parameters based on distributions of both RT and accuracy and assumes different RT distributions for correct versus erroneous responses. Moreover, ANT is not frequently applied in hDDM modeling and may not be ideal for such due to existence of flankers and cues. However, our model displayed adequate convergence, and a recent hDDM study reported encouraging results for ANT data with no error responses (O'Callaghan et al., 2017).
In conclusion, the current study represents a novel approach to assess the cognitive phenotype of fatigue in stroke patients. The results indicate a relationship between the subjective experience of fatigue and response time distributions from a sustained attention task and demonstrate the significance of considering the sustained nature of the task when targeting fatigue in a neuropsychological context, intuitively indicating that the cognitive phenotype of fatigue entails an increased vulnerability to sustained effort. It is encouraging that the evidence suggests a link between self-reported fatigue and performance in a computerized, standardized paradigm, as it may contribute to bridging the gap between subjective experience and behavioral performance in this complex and prevalent stroke sequela. The explorative application of an advanced computational model on the temporal evolution of response times enabled the possibility to parse the observed response time patterns into specific cognitive processes. In general, the use of computational approaches in the neuropsychological workup may offer a dissection of the specific cognitive processes underlying observed behavioral differences, with clinical relevance.