Five‐year outcomes of ADHD diagnosed in adulthood

There is a dearth of long‐term follow‐up studies of adults diagnosed with ADHD. Here, the aim was to evaluate long‐term outcomes in a group of ADHD patients diagnosed in adulthood and receiving routine psychiatric health care. Adults diagnosed with any type of ADHD (n = 52) and healthy controls (n = 73) were assessed at baseline and at a 5‐year follow‐up, using Global Assessment of Functioning (GAF), Clinical Global Impression (CGI), Brown ADD Scale (BADDS) and Adult ADHD Self‐Report Scale (ASRS). A multivariate regression method was used to identify factors predicting 5‐year outcomes, including baseline ratings, medication intensity, comorbidity, intelligence quotient (IQ), age, and sex. After 5 years, ADHD patients reported fewer and/or less severe symptoms compared to baseline, but remained at clinically significant symptom levels and with functional deficits. Baseline self‐reports of ADHD symptoms predicted their own 5‐year outcome and low baseline functioning level predicted improved global functioning at follow‐up. Factors previously reported to predict short‐term outcomes (i.e., medication, comorbidity, IQ, age, and sex) did not anticipate long‐term outcomes in present study.

Adult ADHD is associated with poor functioning in everyday life (Asherson et al., 2016). For example, a Swedish study of several thousand adult ADHD patients showed that only around one third were employed, that the mean income was substantially reduced compared to the general population, and that a whopping 64% of the men had been convicted of crime (Chang, Lichtenstein, D'Onofrio, Sj€ olander & Larsson, 2014). Similarly bleak functional outcomes in adult ADHD have been reported in other studies (reviewed by Hervey, Epstein & Curry, 2004). Adult ADHD is also commonly associated with other psychiatric disorders (Giacobini, Medin, Ahnemark, Russo & Carlqvist, 2018), such as anxiety, depression, or substance abuse (Chen, Hartman, Haavik et al., 2018).
In children and adolescents, studies have shown clear short-term beneficial effects of psychostimulant medication but its long-term consequences are less clear. For example, in the MTA-study, treatments including psychostimulant medication were superior to non-pharmacological treatment options at a 14-month follow-up but these beneficial effects were not discernible one year later (reviewed by Hinshaw, Arnold & MTA Cooperative Group, 2015).
For adult ADHD, psychostimulant drugs are equally regarded as first-line treatments, at least in the short run, and as part of a multimodal approach including viz. psychoeducation (Kooij, Bijlenga Salerno et al., 2019). However, they appear less efficacious and less well tolerated in adults than in children/ adolescents (Cortese, Adamo, Del Giovane et al., 2018). Medication in adults has been shown to confer several beneficial effects of functioning, beyond alleviating symptoms, including reducing the rate of serious traffic accident and criminal behavior (Chang et al., 2014;Lichtenstein, Halldner, Zetterqvist et al., 2012). However, discontinuation or stop/start patterns of ADHD medication are common. For example, Bejerot, Ryd en and Arlinde (2010) found that only 50% of adult ADHD patients remained on medication 2 years after commencement. On the other hand, adult ADHD patients can be made to adhere more closely to stimulant treatment during more than 3 years by close

Aims
In the present study, we followed 52 persons diagnosed with ADHD in adulthood over 5 years. Our aim was to evaluate long-term outcomes in a group of carefully diagnosed ADHD patients, since ADHD is a life-long impairment associated with poor functioning in daily life. We compared self-report symptom ratings and clinicians' ratings of symptom severity at baseline and at the 5-year follow-up. Employing multivariate regression methods, we attempted to identify outcome (symptom severity and real-life functioning) predictors using rating scores at baseline along with measures of medication intensity, psychiatric comorbidity, cognitive ability, age, and sex.

Patients
The present study sample is part of a project within the Northern Stockholm Mental Health Service, the St G€ oran project, which assesses patients with ADHD (and bipolar disorder, see P alsson, Sellgren, Ryd en et al., 2017) over several years. Patients with ADHD were enrolled from a tertiary outpatient clinic specialized in assessment and treatment of ADHD. Experienced board-certified psychiatrists (ER or OF) conducted structured anamnestic interviews with the patients. The interview structure relies upon the clinical assessment instrument Affective Disorder Evaluation (ADE) with its origin in a bipolar disorder study (Sachs, Thase, Otto et al., 2003). For the purpose of diagnosing ADHD in the project, the protocol was complemented with a section covering the DSM-IV diagnostic criteria for ADHD (American Psychiatric Association, 2000). The ADE also includes a social anamnesis, medical history, and family history. In addition, the MINI International Neuropsychiatric Interview (MINI; Sheehan, Lecrubier, Sheehan et al., 1998) was used to screen for psychiatric diagnoses other than ADHD and bipolar disorder, which are covered in the ADE. The Wender Utah Rating Scale (WURS; Ward, Wender & Reimherr, 1993) was used to assess childhood ADHD symptoms. The Adult ADHD Self-report Scale (ASRS; Kessler et al., 2005) and the Brown Attention-Deficit Disorder Scales (BADDS; Brown ADD Scales; Rucklidge & Tannock, 2002) were used to assess current ADHD symptoms. Clinicians used the Global Assessment of Functioning (GAF) and Clinical Global Impression-Severity (CGI-S) to rate the patients functioning and symptoms, respectively. All available sources of information, encompassing patient interview, case records and, if available, interview with next of kin were utilized in the diagnostic assessment.
Present data were extracted from the St. G€ oran research database, a Structured Query Language (SQL) based database hosted by the University of Gothenburg, in January 2016. By that time it contained 91 patients with ADHD and 116 controls, all recruited in Stockholm. In this study, we only included patients with complete or near-complete data from two time-points (psychiatric interview and self-reports at baseline and 5year follow-up). This left 52 patients with ADHD and 73 controls for participation in present study. For background characteristics see Table 1 in the Results section.
The control group consisted of population-based controls that were randomly selected through Statistics Sweden (SCB) and contacted by mail. A research nurse contacted individuals who volunteered to participate in the study. Controls were scheduled for a one-day comprehensive assessment comprising a psychiatric interview by experienced clinicians using selected parts of the ADE and the MINI to exclude psychiatric disorders. Control persons were screened for substance abuse in several ways: during the telephone interview, during the psychiatric in-person interview, through the self-report Alcohol Use Disorders Identification Test (AUDIT) and Drug Use Disorders Identification Test (DUDIT), and also by determining serum concentrations of carbohydrate-deficient transferrin (CDT). Overconsumption of alcohol or other drug abuse led to exclusion. Other exclusion criteria were neurological conditions (apart from mild migraines), untreated endocrinological disorders, pregnancy, dementia, recurrent depressive disorder, personality disorders (based on the psychiatric interview and assessment with the Structured Clinical Interview for DSM-IV Axis II Personality Disorders, screen questionnaire (SCID-II-SQ), and a family history of schizophrenia or bipolar disorder in first-degree relatives. The attrition was 39 patients, whereof nine lacked self-report data at baseline. For the remaining 30 patients lost to follow-up, baseline scores were compared with those who were re-evaluated at follow-up (the study sample). The baseline averages did not differ between the study sample and those lost to follow-up: BADDS M = 65.9; SD = 23.3 (attrition) vs. M = 63.1; SD = 20.2 (study sample); ASRS M = 42.9; SD = 9.7 (attrition) vs. M = 41.0; SD = 10.4 (study sample); GAF M = 64.5; SD = 8.6 (attrition) vs. M = 65.8; SD = 9.7 (study sample); CGI-S M = 4.0; SD = 0.6 (attrition) vs. M = 3.8; SD = 0.7 (study sample). None of these differences were statistically significant by the t-test (t's = 0.57-1.32 and all p-values> 0.05).

Psychometric instruments
Brown ADD scale (BADDS) is a 40-item self-report scale that assesses executive functioning. Individual items are rated on a scale from 0 to 3 (never to almost daily). The items are clustered into five subscales. BADDS is primarily designed to measure the inattentive part of the ADHD symptomatology. Total score can range from 0 to 120. The clinical cutoff score 50 indicates 'probable ADHD' (Brown et al., 2011).
The WHO Adult ADHD Self-Report Scale (ASRS) has 18 items, which correspond to the 18 diagnostic criteria of ADHD symptoms in the diagnostic manual DSM 5 (American Psychiatric Association, 2013), and includes questions about both the inattentive and the hyperactive/impulsive symptoms. The responses are given in a five-point Likert-scale from 1 (never) to 5 (always). The ASRS has shown good reliability and validity for evaluation of ADHD in adults (Adler, Spencer Faraone et al., 2006). The clinical cutoff score of 24 (for either inattention or hyperactivity/ impulsivity) indicating 'highly likely ADHD'; 17-23 point indicating 'likely ADHD' and 0-16 indicating 'unlikely ADHD' for this full version (ASRS-18) as proposed by Yeh, Gau, Kessler, and Wu (2008) was adopted in present study.
Global Assessment of Functioning (GAF; GAF functioning and GAF Symptom) ranges from 100 (extremely well-functioning/ no symptoms) to 1 (severely impaired/ severe psychiatric symptoms) and is used to rate overall psychological functioning plus social and occupational functioning (how well the patient is handling various everyday problems) and psychiatric symptoms. The GAF has some reliability and validity issues but is widely used in routine clinical settings (Monrad-Aas, 2010; Piersma & Boes, 1997;S€ oderberg, Tungstr€ om & Armelius, 2005).
Clinical Global Impression Scalesymptom severity (CGI-S) is a 3items scale measuring symptom severity, global improvement and therapeutic response. In the present study, the symptom severity item was included, which summarizes the clinician's global impression of symptom severity. The CGI-S is rated on a 7-point scale, from 1 (not ill at all) to 7 (extremely ill).
Wender Utah ADHD Rating Scale (WURS) is a 61-item retrospective self-report scale, based on DSM-criteria, used to estimate childhood ADHD symptoms in adults (Ward et al., 1993). Twenty-five questions are directly related to ADHD and add up to a summary ADHD score, which was used in present study. The participant recalls symptoms from his/her childhood and responds on a five-point Likert scale. The Swedish version of WURS self-report has good psychometric properties (Kouros, H€ orberg, Ekselius & Ramklint, 2018).

Statistics
Descriptive statistics are presented as means/medians and 95% confidence intervals/interquartile ranges, unless noted otherwise. To evaluate temporal patterns and treatment effects a series of paired t-tests (IBM SPSS Statistics for Mac, Version 22.0. Armonk, NY) were conducted; statistical significance for BADDS total and its five subscales was adjusted according the sequential Bonferroni-Holm method to avoid Type I errors (see Holm, 1979). Effect sizes are expressed as partial eta-squared (g 2 ); the computation of g 2 following pairwise t-tests employed an online calculator (http://www.psychometrica.de/effect_size.html). The definitions of g 2 magnitude are 0.01 (small), 0.06 (medium), and 0.14 (large) according to Cohen (1988).
We used a multivariate regression technique, Orthogonal Partial Least Squares (OPLS; Eriksson, Byrne, Johansson, Trygg & Vikstr€ om, 2013), in order to identify factors at baseline predicting outcomes 5 years later. The OPLS regression procedure (SIMCA-P 13.0, Sartorius Stedim Biotech, G€ ottingham, Germany) forms a latent component composed of that portion of the systematic variation in the predictor set (i.e., baseline data in the present case) that is specifically related to the variation in the outcome variable (i.e., follow-up data in the present case). It does this by leaving out the systematic variation among the predictors that is uncorrelated (i.e., orthogonal) to it. In this way, the OPLS regression procedure filters away irrelevant information in the predictor data set and maximizes the explained covariance between predictors and outcome (Eriksson et al., 2013).
By default the SIMCA software transforms the data by unit variance scaling and mean centering. Similarly, the solidity of the predictive component is determined according to the software's default crossvalidation significance test, in which all data are left out once in a seven leave-out series. In this way, a number of parallel models are developed; if they are sufficiently similar the model is deemed significant (Eriksson et al., 2013).
The relationship between the dependent variable and the predictive component is described by a number of parameters, such as each predictor's scaled and centered regression coefficient. The Variable Influence on Projection (VIP) summarizes the importance of the various independent variables for the predictive component. Variables with VIPs ≥ 1 are considered very significant and important for the model (Eriksson et al., 2013), and accordingly this VIP-criterion was used for interpreting the present OPLS results. In addition, there are two important measures describing the quality of a particular OPLS model: R 2 X is the fraction of the variation of the predictors modeled by the component; R 2 is the fraction of the variation in the dependent variable modeled by the predictors.
In comparison to regular multiple linear regression, OPLS deals well with collinearity and missing data. Importantly, it (and related techniques) was developed to handle data sets with many variables relative to the number of observations/participants ('short-and-wide' matrices) and it is also robust to noise in both the predictor-and the dependent datasets (Eriksson et al., 2013). Accordingly, OPLS was considered well suited for the present type of clinical data with a large number of inter-correlated variables with relatively few participants. OPLS models get more robust when predictors overlap (Eriksson et al., 2013), which is why we do not discard any of the intersecting symptom rating scales employed in the present research.

Ethics approval and consent to participate
The Regional Ethics Committee in Stockholm approved this study (2005/ 554-31/3), which was conducted in accordance with the latest Helsinki Protocol. All patients and controls consented both orally and in writing to participation in this study. Table 1 presents background information on the participants. Fifty-two adults diagnosed with ADHD were included (21 females; 40.4% and 31 males; 59.6%). At baseline, 19 (37%) patients had comorbid depression or anxiety disorder, three (6%) had comorbid developmental disorder (i.e., autism spectrum disorder), and two (3%) patients a comorbid personality disorder. No patients reported symptoms of present substance abuse. Thirty-one patients (60%) scored 46 or higher in WURS rating scale, indicating recalled impairing symptoms of childhood ADHD.

Background characteristics
As our study was a routine clinical practice observational study, we had no control over medication type, discontinuation, doses, or visits over the course of 5 years. However, according to the medical records, 44 (84.6%) of the patients had medicated with central stimulants during at least one prescription period. Thirty-four (65.4%) were on medication at both baseline and follow-up. Although we recognize that this does not necessarily imply continuous medication, we nevertheless assumed that this group were on ADHD medication more regularly than the rest of the patients. They therefore formed the 'medicated' group in the statistical analyses. However, for 22 in this group of 34, the exact number of months being on medication was available: the median was 48 months but with considerable variability (interquartile range: 26 months). For the remaining patients, forming the 'non-medicated' group, eight (15.4%) did not use medication at any of the two time points, eight (15.4%) medicated at baseline but not at follow-up, while two (3.8%) medicated only at follow-up.
There were no apparent relationships between the length of medical treatment and outcomes according to the GAF-and BADDS within the group of 22 patients for which the exact number of being on medication was known (r's À0.19 and 0.15, respectively). A power estimation indicate that a group size of 40 to 60 would have been required to ascertain statistical significance of an association of this size at a of 0.05 and b of 0.80. There were significant relationships between self-rated ADHDsymptoms (BADDS and ASRS) on the one hand and clinicianrated functioning level (GAF functioning) on the other, at both baseline and follow-up. The Spearman rs were: baseline BADDS-GAF-F: r = À0.316, p = 0.024; 5 year follow-up BADDS-GAF-F: r = À0.434, p = 0.001; 5 year follow-up ASRS-GAF-F: r = À0.325, p = 0.019); the exception was the non-significant ASRS-GAF-F correlation at baseline: r = 0.011, p = 0.936.
For comparative purposes a group of 73 healthy controls were included (see Table 1). Figure 1 shows each individual patient's baseline and follow-up scores over the course of five years. Figure 2 summarizes these data along with data from the healthy controls. The difference between the scores within the ADHD group was assessed using paired t-tests (see Table 2 for statistical details). As to the BADDS, the patients' total score was significantly lower at follow-up (p = .001), but the effect size was small (g 2 = 0.05) and the average patient scored above the clinical cut-off for BADDS also at follow-up. Among the constituent BADDS subscales, Activation (p = 0.006, g 2 = .04), Attention (p = 0.000, g 2 = 0.06), and Effort (p = 0.001, g 2 = 0.05) were significantly improved at follow-up compared with baseline (Table 2). Scores on the remaining subscales remained unchanged ( Table 2).

Change in ADHD symptoms over time
The patients also reported clinical ADHD symptom levels on the ASRS at both time points, but the ASRS scores were lower (i.e., improved) at follow-up compared with baseline (p = 0.006, g 2 = 0.04). As to the CGI-S scale, running from 1 (healthy) to 7 (extremely ill), the patients significantly improved over time (p = 0.000), and the effect size was higher (medium) than for the other scales (g 2 = 0.16). Concerning the GAF functioning/symptom scales, the patients' baseline-and follow-up scores did not differ statistically (Table 2).
Controls showed a significant improvement in the ASRS selfreport [t (61) As seen in Fig. 1, quite a few patients had CGI-S scores ≤ 3, indicating that they were judged to be only mildly affected by the disorder. We analyzed treatment effect separately within the mildly and severely affected group, but we did not detect any differences (data not shown).

Baseline scores in relation to outcome
To test if characteristics at baseline predict follow-up scores, the scale scores at baseline, plus sex, age, comorbidity (0, 1), and full scale IQ (WAIS-III) at baseline were used as predictors in a series of OPLS models. ADHD medication was coded as 1 for patients receiving medication at both baseline and follow-up, and as 0 for the rest. In all, 17 predictors were used to model BADDS total score, ASRS total score, CGI-S score, and GAF function score at the 5-year follow-up. Table 3 shows a correlation matrix of the variables included in the modeling. The resulting model for the self-report BADDS at follow-up, significant by cross-validation, used approximately 22% of the variation in the predictor set to form a component explaining 51% of the variation in the BADDS total scores at follow-up. Table 4 shows that high BADDS scores were associated with worse outcome in BADDS at follow-up. No other predictor had VIPs on or above threshold. Similarly, baseline BADDS also predicted ASRS self-report scores along with ASRS baseline scores (data not shown).
As to the clinicians' CGI-S ratings at follow-up, approximately 19% of the variation in the predictor set related to 35% of the outcome. In this case, the two GAF scales (functioning and symptoms) constituted the strongest predictors (Table 4). Clinicians' ratings on the GAF functioning scale at follow-up could not be related to the predictor set (data not shown).
In a complementary approach, we investigated predictors of improvement over the course of 5 years. Alas, as to the BADDS self-report scales and the CGI-S, the attempts were unsuccessful, in the former case because of non-significance of the model, and in the latter because of limited variation in the baseline/follow-up scores. However, with regard to the GAF functioning scale, OPLS used 14% of the predictive variance to explain 44% of the GAF functioning improvement scores (Table 4). The GAF functioning baseline score was the strongest predictor for improvement in GAF functioning score, followed by GAF symptom and the number of sick leave days. Figure 3 illustrates the relationship between baseline GAF function scores and improvement/ impairment over 5 years: lower (worse) GAF functioning scores were associated with a larger improvement whilst the reverse was true for patients with higher (better) GAF scores (Spearman r = À0.47).

DISCUSSION
We followed 52 individuals diagnosed with ADHD in adulthood over 5 years using clinical interviews, self-reports (BADDS, ASRS, WURS), and clinical ratings (CGI-S, GAF). The main findings were that the ADHD symptom burden decreased over the course of 5 years according to both self-reports and clinicians' judgements. ADHD symptom rating scales (ASRS and BADDS) predicted their own 5-year outcome, such that high scores at baseline predicted worse outcome. Lower (worse) clinician-rated functioning scores (GAF) were associated with a larger improvement. Retrospective WURS childhood ADHD ratings, comorbidity and medical treatment had no bearing on the 5-year outcomes.
With respect to patients' self-reports, the decrease over the course of 5 years was significant in the statistical sense, but the averages remained at clinical levels and the effect sizes were weak. Clinicians' ratings (CGI-S) indicated a more robust improvement with a higher effect size. The discrepancy between clinicians' and patients' reports provides yet another example of partial patient-informant disconcordance (De Los Reyes, Augenstein, Wang et al., 2015). Previous studies have noted that patients with ADHD tend to underestimate their symptoms in comparison with other informants (Swanson, Arnold, Molina et al., 2017). The modest but significant decrease in ADHD symptoms documented here might reflect a true symptom reduction and/or an increased ability to cope with the difficulties, making the symptoms and functional deficits less obvious and impairing; indeed, symptom severity and functional deficits were inversely related to one another in the present study. Alternatively, the decrease might be due to a replacement of more overt symptoms as hyperactivity and impulsivity by more subtle symptoms like mental restlessness and excessive mind wandering as described by Kooij et al. (2019). Such inner symptoms of adult ADHD may be insufficiently covered in the instruments used in the present study. Finally, milder ADHD symptoms at follow-up compared with baseline could also simply be due to regression-tothe-mean phenomena. In line with our findings, age-dependent declines in ADHD symptoms have been demonstrated in earlier studies (Biederman, Mick & Faraone, 2000;Faraone et al., 2006;Srebnicki, Kolakowski & Wolanczyk, 2013). Importantly, however, symptoms at follow-up remained at clinical levels despite improvement over 5 years, confirming the well-known persistence of ADHD symptoms (Roy et al., 2016;Sibley et al., 2016).
Interestingly, self-rated ADHD symptoms (ASRS scores) and clinician-rated functional impairment (GAF scores) improved also in the healthy controls during the study period, despite ASRS being subclinical at baseline and the controls' GAF functioning level being high (in the span 71-80). Speculatively, and among several possibilities, these changes might be reflections of the positive personality development documented in adult healthy people, involving higher levels of conscientiousness and lower levels of neuroticism on average (reviewed by Roberts & Mroczek, 2008). Thus, as adults mature they tend to get better at  impulse control/delayed gratification (increased conscientiousness) and to become more emotionally stable (less neurotic), changes that would be expected to be beneficial when dealing with everyday hassles and work demands. In this context it is interesting to note that the personality of adults with ADHD is characterized by low levels of conscientiousness (hard-working/ control impulses/delay gratification while working towards goals) and high levels of neuroticism (Nigg, John, Blaskey et al., 2002;Parker, Majeski & Collin, 2004).
The regression models showed that the baseline, self-rated ADHD symptom scale (BADDS scores) had broader predictive value than the other self-rated ADHD symptom scale (ASRS ;  Table 4). Thus, baseline BADDS scores not only predicted its own score 5 years later, but were also relevant for the understanding of the follow-up clinician-rated symptomatic (CGI-S) and functional impairment (GAF-F) ratings. A similar pattern emerges when one studies the correlation matrix presented in Table 3: BADDS correlate with multiple variables, including ASRS at baseline, whereas ASRS only correlates with itself 5 years later and with BADDS at baseline. This differential in importance might be due to the fact that BADDS captures a wider range of symptoms than ASRS, which only includes the 18 diagnostic criteria for ADHD. Another difference is that BADDS includes the inattentive symptoms of ADHD only, which are more common in adults. ASRS, by contrast, concerns both inattention and hyperactivity.
The present study concerned patients diagnosed with ADHD in adulthood only. Today, diagnosing adult ADHD is based on the assumption of a disorder emerging in childhood. Indeed, in order to meet the diagnostic criteria for ADHD, symptoms need to be present before the age of 12 (DSM 5; American Psychiatric Association, 2013). According to Moffitt, Houts, Asherson et al. (2015), in research, there is an ongoing discussion about the possibility of a late-onset ADHD with its onset in adulthood, besides the typical childhood onset ADHD (Kooij et al., 2019). However, this assumption remains untested because there are as yet no longitudinal studies of the childhoods of individuals diagnosed with ADHD in adulthood. Thus, this issue remains unsettled and should be the focus of further research.
As such, is it important to examine progression of symptoms in adulthood. According to a meta-analysis by Faraone et al. (2006) adult ADHD is more common than usually believed, especially with regard to patients with subclinical levels of symptoms (e.g., patients not fulfilling all criteria for the ADHD diagnosis according to the diagnostic manuals). The majority of adults with ADHD continue to struggle with substantial functional deficits related to their ADHD symptoms, especially when the ADHDdiagnosis is combined with executive dysfunctions (Mattfield et al., 2014), and even in the subsyndromatic cases (Uchida, Spencer, Faraone & Biederman, 2018).
Concerning global functioning, we found at the group level that clinician-rated functional impairment (GAF) scores remained unchanged over the course of 5 years. Being ≥ 60 on average, the results also indicate that these patients had relatively mild symptoms and experienced relatively minor impairments in daily living. Yet, more fine-grained analyses showed that patients with the lowest functioning scores at baseline had the largest improvement at follow-up. As noted by Brod, Pohlman, Lasser,
BADDS, Brown ADD Rating Scale; ASRS, The WHO Adult ADHD Self-report Scale; CGI-S, Clinical Global Impression-Symptoms; GAF, Global Assessment of Functioning; WAIS, Wechsler Adult Intelligence Scale; FSIQ, Full Scale Intelligence Quotient. and Hodgkins (2012), a lifetime of ADHD accumulates a number of functional problems that may be hard to correct even if the ADHD symptoms get milder or change in character with age. Most ADHD treatments are oriented towards targeting core symptoms; our findings suggest that treatments also need to be focused on increasing daily functioning for ADHD patients. A clinically important question is whether there are predictors of long-term outcome in adult ADHD patients. According to our OPLS models, high self-reported levels of symptoms/functioning on a given scale at baseline predicted high levels on the same scale at follow-up; no other factor was of importance when predicting outcome using self-reported ADHD symptoms (BADDS and ASRS). However, the clinician-rated functional impairment scores (GAF) at baseline, were the strongest predictors of the 5-year score on the clinician-rated symptomatic impairment score (CGI-S). GAF baseline scores along with the sick leave factor also predicted the GAF improvement score. Because sick leave status is one of the indicators of global functioning, this result was not unexpected. Our results are in line with previous studies on this topic (Biederman et al., 2011;Karam et al., 2015;Lara et al., 2009;Lensing et al., 2013), and improvement in self-rated ADHD was expected.
Five factors may account for the fact that the self-rated ADHD symptoms improved (measured with ASRS and BADDS) whereas the overall level of clinician-rated functioning (GAF) did not. First, many ADHD patients continue to be symptomatic as adults, but fewer continue to meet full diagnostic criteria (For example Karam et al., 2015). Second, we do not know to which extent patients adhered to treatment in present sample; it is known that compliant patients fare better than those who are not (Bejerot et al., 2010;Edvinsson & Ekselius, 2017;Lensing et al., 2013), even though the improvement might not reach the extent of normalization or reach levels of healthy controls . Third, the GAFs in present sample was at baseline rated as 'quite well functioning' with only mild functional difficulties on average. Thus, the present sample was already at an adequate functioning level at baseline. Fourth, there are often clinically relevant symptoms accompanying ADHD that are not included as diagnostic criteria, such as sleep problems, executive dysfunction,  or mood-swings (Asherson et al., 2016). These problems may impact on overall functioning and life quality. Fifth, the fact that the patient rated ADHD symptoms and the clinician rated GAFs might be important: it is difficult for someone with lifelong ADHD to compare his or her own situation to that of someone without ADHD (Kooij, 2010). Three potential predictors were conspicuous by their lack of importance for 5-year outcomes. First, the retrospective WURS ratings of childhood ADHD symptoms turned out non-significant. An earlier Swedish study reported clear links between high WURS scores and current ADHD symptoms in elderly people (Guldberg-Kj€ ar, Sehlin & Johansson, 2013). However, in that study comparisons were made between extremes: from a sample of almost 1,600 WURS ratings, the 30 lowest-scorers were compared to the 30 highest-scorers. In the present study, we attempted to relate WURS scores to current symptoms and functioning within a group of well-defined ADHD patients. This proved unsuccessful, perhaps due to the difficulty in recalling symptoms from long ago, or due to the fact that this sample was diagnosed in adulthood and might not have had explicit ADHDdifficulties as children. For example, Agnew-Blais, Polanczyk, Danese, Wertz, Moffitt and Arsenault (2016) found an adult onset ADHD prevalence of 5.5% in their UK cohort, and that 67.5% of their sample of adults diagnosed with ADHD would not have met diagnostic criteria for ADHD as children. Their group of 112 patients showed lower levels of externalizing problems and higher IQ in childhood compared to the group of persistent childhood ADHD, results comparable to present findings.
Second, presence of comorbid psychiatric problems was not associated with outcome. This is surprising given that many other studies show that psychiatric comorbidity worsens the prognosis of ADHD (Roy et al., 2016). Comorbidity is clinically important and a factor contributing to both persistence in adulthood (Faraone et al., 2015;Kooij et al., 2019;Roy et al., 2016), and to finding the most effective treatment (Instanes, Haavik & Halmoy, 2016;Kooij et al., 2019). We used only registered comorbid diagnoses, which excluded potential contribution from subthreshold comorbid psychiatric symptoms, a possible explanatory factor. Another possible reason for lack of impact from comorbid psychiatric diagnoses is the small sample and thereby less power to detect differences, since comorbid ADHD is more common than clean-cut ADHD in clinical samples (77.1% prevalence of psychiatric lifetime comorbidity; Sobanski et al., 2007).
Third, ADHD medical treatment did not appear important for the 5-year symptomatic or functional outcomes. According to the medical records, 35 (65.4%) patients were medicated at both baseline and at follow-up. We have no data on how compliant these patients were, but for 22 of them the median number of months on medication was 48 (i.e., approximately 80% of the examined time span). The fact that this was a naturalistic study lowers the quality of the medical data, and Shaw et al. (2012) have described the poor systems for follow-ups and difficulties in maintaining long-term medical administration in the healthcare systems as examples of contributors to the sparse number of existing long-term, naturalistic follow-ups in ADHD . Even so, the absence of effect on 5-year outcome is notable since medication is considered first-line treatment in adult ADHD (Kooij et al., 2019), and multiple studies confirm the value of ADHD-medication in terms of alleviating core symptoms (Kooij et al., 2019). However, as remarked by Cortese et al. (2018) these positive outcome studies are mostly short-term and seldom longer than 12 weeks. Likewise, naturalistic studies on children and adolescents reveal beneficial effects of ADHD medications in the short-term but not in the long-term (Asherson, Chen, Craddock & Taylor, 2007;Jensen, Arnold, Swanson et al., 2007;Molina, Hinshaw, Swanson et al., 2009;Nylander, 2018;Storebø, Ramstad, Krogh et al., 2015;The MTA Cooperative Group, 1999van Lieshout, Luman, Twisk et al., 2016). Thus, the present 5-year study provides yet another example of a possible failure to detect favourable effects of ADHD-medication in the long run. This failure might reflect a true dissipation of the symptom-reducing effects of the drug, but complementary possibilities includes issues related to drug discontinuation (Zetterqvist, Asherson, Halldner, L angstr€ om & Larsson, 2013), and to adherence-to-medication (Bejerot et al., 2010). As emphasized by Cortese et al. (2018), there is an urgent need for assessing the long-term effectiveness of ADHD medications to refine treatment choices and clinical management for adult ADHD patients (Arnett & Stein, 2018;Asherson et al., 2016).

Limitations
First and most important, the lack of detailed information about type of central stimulants, dosage, individual stop/start-patterns or co-medication might weaken the conclusions reached in the present study. This is a common difficulty in naturalistic, uncontrolled studies of real-life patients in real-life settings where the information rests upon the patient's compliance and selfreporting skills. Second, the relatively small sample size might have impeded our ability to detect somewhat less powerful predictors. Third, in the present study the number of patients reporting comorbid substance abuse was surprisingly low, as were the overall rate of comorbidity, which might indicate that the present sample was not entirely representative in all aspects (Capusan, Bendtsen, Marteinsdottir & Larsson, 2019). The primary advantage of the present study is its time length and its high ecological validity. Another advantage is the use of statistics designed to handle multicollinear datasets with few observations relative to the number of variables.

CONCLUSIONS
Patients diagnosed with ADHD in adulthood showed a decrease in ADHD symptom burden over the course of 5 years according to both self-reports and clinicians' judgements. However, at case closure the ADHD patients, as a group, remained impaired compared to controls. Medication, comorbidity, IQ, age and sex are all factors known to predict short-term outcomes, but did not anticipate the long-term outcomes in the present study.
This research was supported by grants from the Swedish Research Council (2018-02653) and the Swedish Federal Government under the LUA/ALF agreement (ALF 20170019 and ALFGBG-716801).
We are deeply grateful for the participation of all patients contributing to this research, and we are indebted to the collection team at the Northern Stockholm Psychiatry Clinic that worked to recruit them. We especially wish to thank study nurse Lena Lundberg and data manager Mathias Kardell. ML declares that, over the past 36 months, he has received lecture honoraria from Lundbeck pharmaceutical. No other equity ownership, profitsharing agreements, royalties, or patent. The other authors have no competing interests to declare. The datasets generated and/or analyzed for the current study are not publicly available due to the Swedish law for register data. But data will be available from the corresponding author on reasonable request.

DATA AVAILABILITY STATEMENT
The datasets generated and/or analysed for the current study are not publicly available due to the Swedish law for register data. But data will be available from the corresponding author on reasonable request.