Clinical validation of the parent‐report Toronto Obsessive–Compulsive Scale (TOCS): A pediatric open‐source rating scale

Abstract Background There is a need to develop a multipurpose obsessive–compulsive disorder (OCD) measure that is useful for cross disorder research and as a reliable clinical rating scale. The current study examined the psychometric properties and established clinical cutoffs for the parent‐report version of the Toronto Obsessive–Compulsive Scale (TOCS), a 21‐item rating scale of obsessive–compulsive traits. Method Participants ranged in age from 6 to 21 years old and had a primary diagnosis of OCD (n = 350, 50% female), attention‐deficit/hyperactivity disorder (ADHD) (n = 820, 25% female), autism spectrum disorder (ASD) (n = 794, 22% female), or were typically developing controls (n = 391, 51% female). Confirmatory factor analyses, internal consistency reliability, and convergent and divergent validity of the TOCS were examined in the OCD group. Using various scoring approaches, receiver operating characteristic (ROC) analyses were used to establish a clinical cut‐off by splitting the OCD group into a discovery sample (166 OCD cases, 164 controls) and a validation sample (184 OCD cases, 227 controls). Classification accuracy and TOCS scores were compared across OCD, ADHD, and ASD groups. Results The psychometric properties of the TOCS were confirmed. ROC analyses across TOCS scoring approaches in the discovery sample indicated excellent diagnostic discrimination (AUC ≥0.95, sensitivity 77%–92%, specificity 92%–98%). Established cutoffs, when applied in the independent validation sample of OCD cases and controls, showed an overall classification accuracy of 85%–90%. The TOCS total score and symptom count showed good discrimination of OCD from ADHD (AUC ≥0.86) and ASD (AUC ≥0.81). The OCD group scored significantly higher on all TOCS dimensions (except Hoarding) than the ADHD and ASD groups. Conclusion The TOCS is a reliable and valid rating scale with strong sensitivity and specificity in discriminating OCD cases from controls, as well as from ASD and ADHD. It is a quantitative OCD measure with important clinical and research applications, with particular relevance for cross disorder phenotyping and population‐based studies.


INTRODUCTION
Obsessive-compulsive disorder (OCD) is a psychiatric disorder characterized by recurrent intrusive thoughts or impulses that cause marked distress (obsessions), and repetitive behaviors or mental acts (compulsions). Approximately 50% of individuals with OCD initially developed their symptoms during childhood (Geller, 2006), with pediatric OCD having a particularly strong genetic component (45%-65%;van Grootheest et al., 2005). Like other mental illnesses, OCD may reflect of the extremes of quantitative traits that are normally distributed within the population (Abramowitz et al., 2014;Plomin et al., 2009). This trait-based conceptualization allows researchers to harness the power of large populations to study the genetic underpinnings of OCD by focusing on quantitative obsessivecompulsive (OC) traits.
The Toronto Obsessive-Compulsive Scale (TOCS) (Burton et al., 2018;Park et al., 2016) is a multidimensional measure that has been designed to measure OC traits in the general population, in addition to being an easily administered rating scale for clinicians.
Rating scales are commonly used in clinical settings as a way to identify individuals at the highest need to triage them toward the most appropriate services (e.g., those in need of comprehensive assessment and treatment). Existing rating scales for pediatric OCD (e.g., Leyton Obsessional Inventory-Child Version (LOI-CV; Berg et al., 1986), Child Behavior Checklist-Obsessive-Compulsive Scale (CBCL-OCS; Nelson et al., 2001), Obsessive-Compulsive Inventory-Child Version (OCI-CV; Foa et al., 2010), Children's Obsessional Compulsive Inventory-Revised-Parent (ChOCI; Uher et al., 2008), Children's Florida Obsessive-Compulsive Inventory (C-FOCI; Storch et al., 2009) are typically scored based on the presence or absence of symptoms.
Specifically, the LOI-CV and C-FOCI have binary "yes"/"no" responses, while the CBC-OCS, OCI-CV, and ChOCI are scored on a three-point Likert scale (perceived impairment is scored separately on the LOI-CV, C-FOCI, and ChOCI). While useful clinically, these scales either have limited response options that limits variance and/or creates severely skewed distributions, particularly in general populationbased or cross disorder samples. For example, the OCI-CV subscales have skewed distributions in community samples, with only the obsessing subscale adequately discriminating between clinical and nonclinical youth with 62% classficiation accuracy (Rodríguez-Jiménez et al., 2016). Similarly, the CBC-OCS has a skewed distribution in nonclinical youth and psychiatric controls (Nelson et al., 2001). The TOCS overcomes the limitation of a truncated distribution as it is specifically designed to capture a wide distribution of OC trait scores using a strengths and weaknesses design on a seven-point scale. The TOCS asks raters to score from strengths (−3; far less often than average) to weaknesses (+3; far more often than average). The TOCS has a six-factor structure with strong psychometric properties in a community sample of children and adolescents, with scores on each factor being approximately normally distributed (Burton et al., 2018;Park et al., 2016). Thus, the TOCS captures more variance in OC traits than existing measures, with scores approximating a normal distribution, particularly in population samples.
In turn, the design of the TOCS makes it uniquely suited for genetic research and ensures strong psychometric properties.
Quantitative OC trait research using the TOCS indicates that TOCS dimensions and total score are strongly heritable (30%-77%; Burton et al., 2018). Recently in a large community sample, the TOCS was used to identify the first genome-wide significant variant for OC traits that was also associated with diagnosed OCD and to show shared polygenic risk between OC traits and OCD . Furthermore, in this same study we demonstrated that genome-wide significance was lost when using OCD measures with the usual skewed distribution (e.g., CBCL-OCS) .
We have also shown that in a community sample, a cut-off score of 0 on the TOCS showed adequate discrimination of self-and parentreported OCD cases from non-cases (Park et al., 2016). These findings highlight the power of using rating scales that capture the full distribution of OC traits in community samples. What is currently  (Stewart et al., 2004), with a lifetime diagnosis not necessarily reflecting the presence of current symptomatology (i.e., patients may be asymptomatic). Second, the diagnostic accuracy of the TOCS for identifying OCD rather than co-occurring disorders (e.g., autism spectrum disorder [ASD] and attention-deficit/hyperactivity disorder [ADHD]; Burton et al., 2016;Kushki et al., 2019;van der Plas et al., 2016) needs to be established.
For example, community-reported ASD cases reported higher scores on the hoarding and symmetry/order dimensions of the TOCS than those with community-reported OCD (Park et al., 2016). Ideally in a clinical sample, the TOCS will discriminate OCD cases from typically developing controls and will be able to discriminate these traits from symptoms of ADHD and ASD. A measure that has both strong diagnostic accuracy and the potential to inform quantitative research (including genetics) is beneficial for both clinicians and researchers by minimizing redundancy and burden for patients and their families.
The goal of this study is to confirm the psychometric properties of the TOCS in a clinical OCD sample and examine its sensitivity and specificity in discriminating OCD cases from controls, as well as from other neurodevelopmental disorders (ASD and ADHD). Specifically, we evaluated the factor structure, internal-consistency reliability, convergent and divergent validity, and age and gender associations of the TOCS in a clinical sample of pediatric OCD. We examined the clinical ability of the TOCS to discriminate OCD cases from controls by splitting the OCD group into two subgroups (discovery and validation), in which a clinical cutoff was extracted in the discovery sample and confirmed in the validation sample. Lastly, we examined the clinical ability of the TOCS to discriminate OCD cases from clinical ASD and ADHD cases.

Participants and procedure
Participants in the discovery sample of OCD and healthy controls were recruited from two tertiary care children's mental health clinics (SickKids, Toronto, Canada, and University of Michigan Medical Center, Ann Arbor, Michigan, USA). Participants in the validation sample with OCD, ADHD, and ASD diagnoses or typically developing controls were recruited from tertiary care clinics as part of the Province of Ontario Neurodevelopmental Disorders (POND) Network (Ontario, Canada). TOCS total scores did not significantly vary based on recruitment site for the OCD group: F(3, 346) = 2.46, p = .062. Informed consent (and verbal assent when applicable) was obtained before research participation. Ethics approval was obtained from the relevant institutions listed above.
Participants ranged in age from 6 to 21 years old and had a primary diagnosis of OCD (n = 350), ADHD (n = 820), or ASD (n = 794) or were typically developing control participants (n = 391). Diagnoses for the clinical groups were based on diagnostic criteria from the DSM-IV or DSM-5 following a rigorous, semi-structured clinical assessment with a psychologist or psychiatrist. The following gold-standard assessment tools were used to confirm the primary diagnoses for each clinical group: the Kiddie Schedule for Affective Disorders and Schizophrenia (K-SADS (Kaufman et al., 1997)), and/or the Children's Yale-Brown Obsessive-Compulsive Scale (CY-BOCS; Scahill et al., 1997), and/or the Scheduled for Obsessive-Compulsive and Other Behavioral Syndromes (SOCOBS; Rough et al., 2020) for OCD; the Parent Interview for Child Symptoms (PICS; Ickowicz et al., 2006) for ADHD; and the Autism Diagnostic Observation Schedule-2 (ADOS-2; Lord et al., 2000), and/or the Autism Diagnostic Interview-revised (ADI-R; Rutter et al., 2003) for ASD. Clinical group membership was determined based on the primary diagnosis assigned during the assessment (i.e., participants were not excluded based on comorbid diagnoses 1 ).
Additionally, participants were included in the OCD group only if they currently met criteria for OCD at the time of assessment (i.e., excluded participants with lifetime symptoms only). Participant demographics are presented in Table 1.

Toronto Obsessive-Compulsive Scale
The TOCS is a 21-item measure of OC traits over the past 6 months that is specifically designed to capture a wide distribution of responses (Park et al., 2016). Parents report the extent to which their child engages in each of the 21 OC thoughts or behaviors using a seven-point Likert scale (−3 = far less often than average, 0 = average, +3 = far more often than average). The TOCS assesses a range of common OC symptoms. It includes the following 6 subscales derived from a factor analysis conducted by our group in a large pediatric community sample: Counting/checking, cleaning/contamination, hoarding, symmetry/ order, rumination, and superstition (Burton et al., 2018;Park et al., 2016). The scale has strong reliability and validity in communitybased samples of children and adolescents (Burton et al., 2018;Park et al., 2016). The TOCS is freely accessible online: https://lab.research.

T A B L E 1 Participant demographics by diagnosis
Abbreviations: ADHD, attention-deficit/hyperactivity disorder; ASD, autism spectrum disorder; OCD, obsessive-compuslsive disorder. Child Behavior Checklist-Obsessive-Compulsive Scale The CBCL-OCS (Nelson et al., 2001) was used to examine convergent validity of the TOCS. The CBCL-OCS is an eight-item parent-report screening tool used to assess symptoms of OCD. Parents report the extent to which their child experiences each symptom on a 0-2 scale, with higher scores indicating greater symptoms. The CBCL-OCS is reliable and valid as a screening tool for pediatric OCD (Hudziak et al., 2006;Nelson et al., 2001).

Strengths and weaknesses of ADHD Symptoms and Normal Behavior Rating Scale
The strengths and weaknesses of ADHD Symptoms and Normal Behavior Rating Scale (SWAN) (Swanson et al., 2012) measures ADHD symptoms and was used to assess the divergent validity of the TOCS. The SWAN is a 18-item scale in which parents rate the extent to which their child engages in each behavior on a −3 to +3 Likert scale. The SWAN is reliable and valid in pediatric samples, with high SWAN scores associated with polygenic risk for ADHD (Burton et al., 2019). Scores are reversed with higher scores reflecting increased traits of ADHD.

Social Communication Questionnaire
The Social Communication Questionnaire (SCQ) (Rutter et al., 2003) measures social communication and behaviors associated with ASD and was also used to assess the divergent validity of the TOCS. The SCQ is a 40-item scale in which parents rate the extent to which their child engages in each behavior, with higher scores indicating greater symptom severity. The current symptoms (not lifetime) version of the SCQ was used in the current study.

Data analysis
All data analyses were conducted using R v4.0.2. Confirmatory factor analysis (CFA) (Rosseel, 2012) was used to examine the factor structure of the TOCS in the OCD group. Two indices of reliability (Jorgensen et al., 2020) were examined: Cronbach's alpha and Omega coefficient. Measurement invariance (Rosseel, 2012) was tested to determine if the same factor structure held across groups. Measurement invariance was accepted if the change in model fit was <0.01 (Putnick & Bornstein, 2016).
Given the unique distribution of the TOCS (i.e., it has negative scores and a much wider range of scores than many other screening tools), we examined the performance of the TOCS total score (summed total), TOCS symptom count (number of items with scores ≥2), and the TOCS max average (highest raw average score within a subscale). Average scores were computed for all six TOCS subscales.
Pearson correlations were used to examine convergent and divergent validity using these TOCS indices within the OCD sample.
ROC analyses (Khan & Brandenburger, 2020) were used to identify the optimal cut-points for discriminating those with a clinical diagnosis of OCD from controls. The area under the ROC curve (AUC) indicates the overall accuracy of discrimination, with higher values indicating better discrimination of cases from controls (AUC ≥0.90 indicates excellent discrimination, ≥0.80 good discrimination (Zhu et al., 2010)). The Youden index was used to determine the optimal cutpoint from the ROC curve. ROC analyses were conducted for the TOCS total score, symptom count, and maximum average score. Established cut-offs from the discovery sample were applied in the independent validation sample. Across both samples, we report the sensitivity, specificity, and accuracy (overall probably of correct classification, of the TOCS at these cutoffs. We also examined whether the ROC results differed by age and gender, as well as whether they differed when the Hoarding factor was not included. Additionally, we examined how well the TOCS was able to discriminate between clinical groups (OCD, ADHD, ASD) using ROC.
We report the AUC, which reflects the overall ability of the TOCS to discriminate OCD from ADHD, as well as OCD from ASD. We also report the classification accuracy (overall probability of correct classification as OCD and non-OCD) at the previously identified cutoff scores. Lastly, we tested whether TOCS scores significantly varied by clinical group. Tukey HSD tests were used for post-hoc comparisons and effects sizes are reported using Cohen's d.

Confirmatory factor analysis in OCD sample
The previously identified six-factor model (Park et al., 2016) was Factor loadings are presented in Table S1. The model fit significantly improves when the strictness of the model is relaxed using an exploratory structural equation modeling (ESEM) approach, which allows cross-loadings (see Appendix S1 and Table S2).
Latent factor correlations are presented in Table S3. Overall, all factors were positively correlated with one another. The strongest correlations were observed between the counting/checking, symmetry/order, superstition, and rumination factors. Cleaning/contamination and hoarding tended to have weaker correlations with the other factors.

Reliability in OCD sample
All TOCS subscales demonstrated acceptable to strong levels of internal consistency (Table S1). The TOCS total score demonstrated strong internal consistency, α = 0.91, ω = 0.95.

Convergent validity
TOCS total scores and subscales demonstrated small to moderate positive correlations with the CBCL-OCS, with the exception of hoarding which was not significantly related to the CBCL-OCS (

Age and gender differences in OCD sample
There were small negative correlations between age and TOCS scores but no significant gender differences in TOCS scores (Table 2).
We also explored whether age and gender interacted to predict TOCS scores. A significant age by gender interaction was found only for the TOCS max average score, β = −0.08, SE = 0.04, p = .03. For girls only, there was a negative association between age and TOCS max average scores, β = −0.09, SE = 0.03, p = .001. The association between age and TOCS max average scores among boys was nonsignificant, p = .61. A similar pattern was observed using the CBCL-OCS: for girls only, there was a negative association between age and CBCL-OCS scores, β = −0.27, SE = 0.09, p = .003.

ROC analyses in OCD sample
For the ROC analyses, the OCD and control samples were divided into a discovery sample (i.e., used to determine a cut-point) and a   (Table 3). Age and gender did not affect the cutoffs, nor did the exclusion of the hoarding dimension (see Appendix S1 and Table S4). The TOCS maintained an excellent level of diagnostic discrimination in the validation sample, with overall classification accuracy ≥85% across scoring approaches (Table 3).

Group comparisons across TOCS scores
Measurement invariance of the TOCS was examined before making group comparisons. The clinical groups and control sample were combined for invariance testing for a total sample size of N = 2362.
The TOCS demonstrated measurement invariance at the configural, metric, and scalar levels, indicating equivalent item loadings, and intercepts across groups (Table S5).
We examined whether TOCS total scores differed by clinical group using ANOVAs (Bonferroni corrected p = .016). There was a significant effect of group on TOCS total raw score, F(2, subscales (except hoarding) the OCD group reported the highest subscale scores, followed by the ASD group, and then by the ADHD group, respectively (Table 4). As shown in Figure 1 All TOCS composite scores (i.e., TOCS total score, symptom count, maximum subscale average) demonstrated excellent ability to discriminate OCD cases from controls, highlighting the robustness of this measure. Indeed, the sensitivity and specificity of the TOCS is just as strong-if not somewhat stronger-than existing screening tools for OCD such as CBCL-OCS (Nelson et al., 2001). Moreover, the TOCS captures a wider variety of OC traits compared with the CBCL-OCS, which was derived from a broader symptom measure and contains items that were not designed to assess OC symptoms specifically. Interestingly, the TOCS symptom count measure performed as well as the TOCS total score in terms of discriminating OCD cases.
It is important to note that this does not negate the strengths/ weaknesses design of the TOCS, as a wide distribution of scores remains important for genetic and cross-disorder research . Rather, it highlights that the same measure can easily be used by both researchers and clinicians. Clinicians may wish the TOCS symptom count score cutoff (two items ≥2) to screen for pediatric OCD, as this indicator is quickly scored and is a highly sensitive and specific indicator of clinical levels of OC traits.
Moreover, the TOCS symptom count did not significantly differ by age or gender, thereby facilitating interpretation as the same cut-off score of two symptoms can be used for all patients. Note: Standard deviations are presented in brackets. All group means are significantly different at p < .001 with the exception of Hoarding scores, which did not significantly differ between OCD, ADHD, and ASD groups.
Abbreviations: ADHD, attention-deficit/hyperactivity disorder; ASD, autism spectrum disorder; TOCS, Toronto Obsessive-Compulsive Scale. OCD, ADHD, and ASD groups. Elevations on the hoarding factor alone may indicate that further assessment of OCD, ADHD, and ASD are warranted. Together, these results add further support differentiating hoarding from OCD, and support previous research highlighting the comorbidity between hoarding, ADHD, and ASD Morris et al., 2016). Future research should consider the clinical utility of the TOCS hoarding dimension for discriminating clinical cases of hoarding disorder.
The current research is the first to evaluate the TOCS using a large pediatric clinic sample and should be considered with the following limitations. The TOCS has both parent-and self-report formats; however, we only evaluated the parent-report version.
Additional research is necessary to establish the psychometric properties and clinical cut-offs of the self-report TOCS, which may be especially useful for older adolescent samples. While we report several indices of diagnostic accuracy (e.g., sensitivity, specificity, AUC), the Youden index was used to select cut-points for the TOCS, which is not sensitive to differences in sensitivity and specificity (Šimundić, 2009). Additionally, our research examined discriminant validity of the TOCS between OCD, ADHD, and ASD clinic samples. It is unknown the degree to which the TOCS and its dimensions can discriminate between OCD and other comorbid psychiatric disorders, such as anxiety disorders or eating disorders.
Indeed, the clinical groups in the current study were based only on primary diagnosis, which is the diagnosis that the assessing clinician felt best captured the patient's presenting concerns. In other words, we did not exclude participants from the ADHD or ASD groups who may have had comorbid OCD. While we believe this adds to the ecological validity of our findings, it is possible that the discriminant validity of the TOCS would in fact be stronger with "pure" clinical groups. Additionally, we did not include very young children in our sample (i.e., <6 years old). Given that many ASD diagnoses are assigned during the preschool years, it is possible that discriminant classification of ASD from OCD cases may be different in such age groups. Lastly, the current research is cross sectional and specifically examined OCD cases with current symptomology (i.e., cases with a past history of OCD but no current symptoms were excluded). Future research may wish to examine how TOCS scores vary over time.
In conclusion, our results build upon previous research with community samples (Burton et al., 2018;Park et al., 2016) by demonstrating the strong psychometric properties of the TOCS in a clinical OCD sample. The parent-report TOCS is a reliable and valid measure that has excellent diagnostic accuracy for identifying clinical levels of OC traits in pediatric clinic samples. By establishing the TOCS both as a meaningful tool for genetic research and as a clinically valid rating scale, researchers and clinicians may be able to simultaneously gather data to understand the etiology of OCD, as well as data to inform clinical assessments.