The Autism Symptom Dimensions Questionnaire: Development and psychometric evaluation of a new, open‐source measure of autism symptomatology

To describe the development and initial psychometric evaluation of a new, freely available measure, the Autism Symptom Dimensions Questionnaire (ASDQ).

Autism spectrum disorder (ASD) is a heterogeneous set of conditions characterized by qualitative impairment in social interaction and communication and the presence of restricted and repetitive behaviors and interests. 1 Comprehensive and reliable measurement of autism symptoms is crucial for early identification, entry into clinical services, and ongoing longitudinal tracking, including evaluation of response to intervention. Informant report measures are also a key component of longitudinal tracking in many clinical and research settings. [2][3][4] However, the available informant report instruments measuring core autism symptoms are either commercial measures (e.g. Social Responsiveness Scale, Second Edition; Autism Impact Measure), 2,5 intended for screening within a narrow age band (e.g. Modified Checklist for Autism in Toddlers), 6 or focus only on one symptom domain (e.g. Repetitive Behavior Scale-Revised). 7 Commercial measures can add significant expense in clinical and research applications and are therefore ill-suited for frequent longitudinal tracking or the development of large sample cohorts. This problem specifically disadvantages research and clinical practice in low-and middle-income countries. 8 Measures such as the Autism Spectrum Quotient, 3 which is only applicable to cognitively able individuals, are difficult to implement in practice settings in which pre-appointment completion of informant measures is desired and estimates of the cognitive level of the child are not available. Finally, many existing informant-completed measures often have unclear construct validity, 9,10 require responses to a large numbers of items, which can be burdensome for parent caregivers (e.g. the Pervasive Developmental Disorder Behavior Inventory has 124 items in the standard set), use dichotomous responses (e.g. Social Communication Questionnaire [SCQ]), or were not expressly designed to assess Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-5) symptom criteria, which results in limited coverage of sensory symptoms.
This article describes the development and psychometric evaluation of a new open-source, freely available informant report measure, the Autism Symptom Dimensions Questionnaire (ASDQ). The ASDQ was created to address the limitations of existing measures, covering core autism symptoms in children and adolescents based on the DSM-5 symptom criteria. Initial work focused on the development and factor structure of a 33-item version. Using results from this initial work, the ASDQ was expanded to include 39 items, with the goal of identifying specific social communication/interaction and restricted/repetitive behavior factors corresponding to DSM-5 criteria and recent factor analyses of other measures. [10][11][12] Using the revised 39item ASDQ version, the present psychometric evaluation aimed to (1) examine factor structure and (2) evaluate the psychometric properties of the revised scales, including measurement invariance, overall and conditional reliability, and convergent and discriminant validity. These analyses also explored the screening efficiency of the revised ASDQ compared with parent-reported ASD diagnoses, to determine the potential value of the measure in screening and diagnostic contexts.

Initial development
Initial development of the ASDQ focused on a 33-item version (see Appendix S1 for an extensive description of this work). The development process found that the 33-item measure had good scale reliability and strong convergent validity with an established autism symptom measure, but that additional work was needed to update the factor structure and further examine key psychometric characteristics. Based on the results of this work, the 33-item ASDQ was revised by adding six items to identify separate factors corresponding to nine autism symptom dimensions identified in previous analyses of other screening measures, 10,11,13,14 including those shown in factor analyses to yield separate factors within the DSM-5 criteria. Two of the newly added items assessed relationships (items 15 and 17), one evaluated sensory sensitivity (item 27), two focused on sensory interests (items 29 and 31), and one examined restricted interests (item 34) (Appendix S2). Psychometric evaluation focused on this revised instrument. Based on our initial factor analyses and specific item content additions, a nine symptom factor plus a general bifactor model was hypothesized a priori to provide optimal fit.

Participants
Informants were recruited using the Prolific online data collection service (https://proli fic.co/). Inclusion criteria for the Prolific panel included: residence in the USA; having a dependent child aged 2 to 17 years; and informant proficiency in English. Adults were not included in this study because the intention was to develop and validate a measure for children and adolescents and a narrower age range was desired for evaluating measurement invariance across age. Data were collected from 23rd June 2021 to 7th July 2021. Additional methodological details are included in Appendix S3.

Procedure
After completing informed consent electronically, caregiver informants completed demographic information and all questionnaires for the identified child. Compensation was provided after study completion based on the average expected completion time (US$10).

Statistical analyses
Factor structure To identify the factor structure of the revised 39-item ASDQ, the sample was randomly divided into approximately equal exploratory (n = 734) and confirmatory (n = 733) subsamples. A series of exploratory structural equation models (ESEMs) were estimated. 18 The first model was a singlefactor model. The second model included four specific factors and a general ASD bifactor, consistent with the initial analyses of the 33-item ASDQ. The next three models were 8-, 9-, and 10-factor ESEMs corresponding to variations in the a priori hypothesized model based on item content. The nine-factor model (model 9a) included nine specific factors, one for each item content area and a general ASD bifactor and was hypothesized a priori based on the content of items added to the revised instrument. Once the best-fitting ESEM was identified in the exploratory subsample and replicated in the confirmatory subsample, an equivalent confirmatory factor analysis (CFA) model for ordinal data 19 (model 9b) was estimated in both subsamples and the total sample. For this CFA model, item-factor correspondence was determined by assigning each item to the factor with its largest loading. Items 36 to 39 were only assigned to the general ASD factor because these items did not have a clear loading pattern in our preliminary analyses of the 33-item version; these items emphasize symptoms that are most relevant for older children with average or above cognitive ability and they have the highest rates of nonresponse. Models were estimated and fitted using standard structural model fit statistics and consideration of model interpretability and parsimony (Appendix S4).

Measurement invariance
The CFA model derived from the best-fitting ESEM was used as the basis for the evaluation of measurement invariance across age groups (ages 2-4 years, 5-7 years, 8-12 years, and 13-17 years), sex (male, female), race (White, other), and ethnicity (Hispanic, non-Hispanic). Standard procedures were followed to evaluate fitting across successively restrictive models (see Appendix S4 for additional information). 20

Reliability
Using the optimal factor solution, classical test theory reliability (internal consistency), 21 factor reliability, 22 and item response, theory analyses 23 were conducted for each scale (Appendix S4). Reliability estimates falling in the ranges 0.70 to 0.79, 0.80 to 0.89, and greater than 0.90 were considered fair, good, and excellent. 24 Test-retest reliability and sensitivity to change data were not available.

Convergent and discriminant validity
Convergent and discriminant validity was computed using bivariate correlations. Specific predictions for strong and weak convergent and discriminant validity are presented in Appendix S4.

Preliminary screening efficiency
To explore possible screening efficiency, the sample was split into training (approximately 60% of the sample, n = 880) and testing (approximately 40% of the sample, n = 587) subsamples; 25 two criteria were evaluated: ASD versus neurotypical plus other developmental disability (total sample) and ASD versus other developmental disability (at-risk-only sample). The first criterion estimates potential screening accuracy in primary care or population settings and the second criterion estimates potential screening accuracy in tertiary care settings where only at-risk cases are evaluated. These criteria were derived from informant-reported diagnoses, which showed very good correspondence (80.0% overall accuracy) with classifications based on an SCQ cut score of 12, which has previously been shown to have good sensitivity and specificity. 26 To estimate potential screening value, nonparametric estimates of the area under the curve from receiver operating characteristic analyses quantified screening efficiency using the ASDQ total item average score as the predictor. The predictive validity of the ASDQ total item average scores was also evaluated by identifying the optimal cut score in the exploratory subsample using Youden's J. The cut scores were applied to the testing subsample to determine whether validity was maintained in a holdout sample. The total sample cut score was also examined in the relevant subgroups (ages 2-4 years, female, other race, and Hispanic ethnicity) to examine whether predictive accuracy was maintained.
A rough guideline for evaluating area under the curve values is: less than 0.60, poor; 0.60 to 0.69, fair; 0.70 to 0.79, good; 0.80 to 0.89, excellent if the comparison group is clinically meaningful; and 0.90 to 1.00, exceptional only if the design and comparison are appropriate. 27

Statistical power
The exploratory and confirmatory sample sizes (n = 734 and 733) were expected to be adequate for factor analyses with up to 39 indicators (>18 cases per indicator) 28 and for item response theory analyses of the total score and subscales. 29 For convergent and discriminant validity analyses, the total sample was expected to have excellent power (1−β > 0.96) for detecting at least small Pearson bivariate correlations (r ≥ 0.10, α = 0.05, two-tailed) and statistical power was expected to be excellent (1−β > 0.94) for detecting differences in dependent correlations as small as Δr = 0.09. Analyses estimating screening efficiency had excellent power (1−β ≥ 0.95) for detecting areas under the curve ≥ 0.660.

Participant characteristics
The sample included 1467 children (aged 2-17 years; Table 1). Biological mothers predominated as informants. Informants were older and slightly less educated in the groups with developmental disability and ASD. Household income was lower in the group with developmental disability and lowest in the group with ASD relative to neurotypical children. Children were older in the groups with developmental disability and ASD and, consistent with ASD prevalence, the sex ratio was approximately 3:1 in the group with ASD but more balanced in the neurotypical and developmental disability groups. The sample was predominantly White (85%), but Black (12.0%) and Hispanic (11.5%) groups were well represented. The most frequently reported non-ASD diagnoses in the groups with ASD and developmental disability were speech/ language disorder, attention-deficit/hyperactivity disorder (ADHD), and anxiety. Depression and specific learning disability were also common in the group with developmental disability. Details on the age and language level distribution and missing data handling are provided in Appendix S5.

Factor structure
In the exploratory and confirmatory samples, optimal fit was identified for the a priori hypothesized nine specific symptom factor plus a general bifactor ESEM solution (Table S1). Specific factors were readily interpreted as basic social communication (seven items), affiliation (three items), perspective taking (four items), peer relationships (three items), repetitive motor behavior (four items), sensory interests (three items), insistence on sameness (four items), sensory sensitivity (three items), and restricted interests (four items) ( Table 2 and Figure S1). For this model, the nine specific symptom factors had primary loadings that were consistent with the model hypothesized a priori (Table S2). All items had significant loadings on the general ASD factor. Factor correlations indicated small-to-medium residual relationships, with the strongest relationships being between repetitive motor behavior and sensory interests (r = 0.59) and between insistence on sameness and sensory sensitivity (r = 0.48). Correlations based on scale scores followed the same pattern but were larger in magnitude because these reflect relationships without partitioning variance due to the general ASD factor (Table S3).
The CFA model (model 9b) fitted adequately in spite of not estimating minor item cross-loadings (Table S1). This model was used to evaluate measurement invariance, model reliability, and variance accounted for by the specific and general ASD factors. Using this model, the general ASD factor explained 42% of the common variance, whereas the specific factors accounted for smaller amounts of common variance (basic social communication = 6%, affiliation = 6%, perspective talking = 2%, peer relationships = 5%, repetitive motor behavior = 9%, sensory interests = 7%, insistence on sameness = 7%, sensory sensitivity = 6%, restricted interests = 10%).

Measurement invariance
The nine specific symptom factors hypothesized a priori with a general bifactor CFA model 33 showed evidence of measurement invariance of factor loadings, thresholds, and residual variance (strict invariance) across sex, age, race, and ethnicity (Table S4).

Reliability
Model reliability was high for the general ASD factor (ω = 0.97) and specific factors (ω ≥ 0.85). Internal consistency reliability was excellent for the total scale (α = 0.95) and at least adequate for all subscale scores (α ≥ 0.75; Table S5). Conditional reliability T A B L E 2 ASDQ factor-item mapping and relationship to the DSM-5 criteria The ASDQ factor basic social communication includes both verbal and nonverbal elements of communication but is consistent with previous factor analyses suggesting that this DSM-5 symptom criterion (A2) may be too narrow.
b Affiliation item content is consistent with previous factor analyses identifying a social motivation factor. c The perspective taking factor includes some overlap in content with the DSM-5 A1 reciprocity criterion, such as the back and forth of interaction/offering comfort, but also focuses more on the underlying cognitive problem that likely drives problems with reciprocity. d Peer relationships are more narrow than the DSM-5 A3 relationships criterion by focusing mostly on peer relationships and interactions, although item 16 also includes content regarding seeking any playful interactions. e ASDQ restricted/repetitive behavior factors tend to show strong correspondence with the DSM-5 restricted/repetitive behavior symptom criteria with the notable exception that the DSM-5 B4 is split into the two types of sensory symptoms. estimates indicated excellent reliability (≥ 0.90) for the total ASD scale from very low (θ [trait level] = −2.1) to extremely high (θ = +4.0) scores. Adequate or better reliability (≥ 0.70) was present for subscale scores in the range from low average (θ = −1.0) to very high scores (θ = +2.6) (Figure 1). Theta scores showed strong correlations with ASDQ total scores (r = 0.97) and subscales (r = 0.92-0.98). The revised 39-item ASDQ showed a relatively normal score distribution ( Figure S2). Coupled with strong reliability through very low scores, this suggests that the ASDQ total score adequately reflects continuous variation in autism symptom levels through much of the general population.

Convergent and discriminant validity
ASDQ total scores showed strong correlations with SCQ and Strengths and Difficulties Questionnaire total scores and with reported ASD diagnosis (Table 3; see Table S6 for the SCQ sample characteristics). Moderate correlations (r = 0.39-0.50) were observed between the ASDQ and Strengths and Difficulties Questionnaire internalizing and externalizing, executive functioning, and adaptive functioning measures. The expected patterns for convergent and discriminant validity were observed.

Potential screening accuracy
Potential screening accuracy was very good (89.6%-94.9%; Table S8) with very-good-to-excellent area under the curve = 0.927 in all cases and 0.876 in at-risk cases (Figure 2). The optimal cutoff point (Youden's J) was 2.7 for screening in a primary care or population context (sensitivity = 0.88, specificity = 0.91) and 2.9 for screening in a tertiary care setting (sensitivity = 0.83, specificity = 0.89) accuracy. Scores greater than 3.2 had high specificity, occurring in nearly half of cases with ASD (sensitivity = 55.8%) but in only 5.4% of participants with developmental disability and 0.3% of neurotypical participants (specificity = 98.4%; likelihood ratio = 34.9). Accuracy levels remained high across ages 2 to 4 years, female, other race, and Hispanic subgroups, with some variation across subgroups (87.3%-92.9%). Potential screening accuracy in the at-risk sample was also good (80.2%).

DISCUS SION
The ASDQ is a brief (39-item), informant report measure of core autism symptoms informed by the DSM-5 symptom criteria and recent factor analyses of autism symptom measures. Initial work developed a 33-item version with good reliability and construct validity. However, coverage of specific autism symptom constructs was limited. Based on this, six items were added to improve construct coverage and a comprehensive psychometric evaluation of the revised instrument was conducted. This evaluation focused on factor structure, measurement invariance, classical test theory F I G U R E 1 Item response theory-derived conditional reliability across the latent trait for the total ASD scale and subscales. Abbreviation: ASD, autism spectrum disorder; BSC, basic social communication; AF, affiliation; PT, perspective taking; PR, peer relationships; RM, repetitive motor behavior; SI, sensory interests; SS, sensory sensitivities; RI, restricted interests. scale reliability and item response theory conditional reliability, convergent and discriminant validity, and exploration of potential screening validity.
Results indicated that the 39-item ASDQ is a psychometrically sound instrument, with evidence of several positive psychometric properties, suggesting it is a highly promising measure for use in both research and clinical practice. Specifically, the ASDQ had a clear and replicable factor structure, which is consistent with previous multiinstrument analyses; 10,11 good measurement equivalence across age, sex, race, and ethnicity; convergent validity with established measures of autism symptoms and traits; and discriminant validity evidence among ASDQ subscales and with measures of related but distinct constructs and other developmental diagnoses. The ASDQ also showed preliminary evidence of utility in informing the screening process. If replicated in future work, the ASDQ would represent a major advance for the measurement of autism symptoms because currently open-source informant report measures of comparable coverage in terms of ages and cognitive levels are not freely available. However, it is crucial that future replication studies overrecruit at younger ages and include a sufficient sample of individuals with cognitive and speech/ language impairment to ensure that factor structure and screening efficiency are maintained in these subgroups.
The revised 39-item ASDQ had a highly differentiated factor structure that replicated well across exploratory and confirmatory subsamples. The final model included four social communication/interaction symptom factors, five restricted and repetitive behavior symptom factors, and a general ASD factor. These factors represent specific types of social communication/interaction and restricted and repetitive behavior symptoms and these distinctions have been found in previous research. 10,11,34,35 To gain greater understanding of the well-recognized variety and heterogeneity of autism presentations, it is crucial to have a measure that can capture these subdomains. For example, using the ASDQ, future etiological studies will be better equipped to identify which etiological factors relate to specific core ASD symptoms. Having nine specific symptom factors that correspond to the DSM-5 criteria (with affiliation and peer relationships assessing A3 and sensory interests and sensory sensitivities evaluating B4) is also useful for helping clinicians understand whether a child might be exhibiting symptoms that meet the diagnostic threshold. Finegrained measurement of specific symptom areas may also facilitate identification of symptom subgroups, enable more detailed longitudinal tracking, and provide useful secondary outcomes for clinical trials (see Appendix S6 for additional discussion of the ASDQ findings), although it will be important to examine factor structure in more discrete ages and functional levels to avoid inflation of inter-item correlations that may occur in very broad samples.
Convergent and discriminant validity evidence indicates that the ASDQ is measuring core autism symptoms, showing lower correlations with measures of other psychopathologies. This finding is important because it suggests that ASDQ scores are only showing the anticipated influences from other forms of psychopathology, an expected finding given the level of comorbidity seen in ASD. 36 Future research is needed with other autism measures and with measures of other types of psychopathology and clinical factors (e.g. IQ, language) to better understand ASDQ measurement specificity, determine whether adjustment for other factors will produce a more distinct measure of autism symptoms, and further explore convergent and discriminant validity.
Although an exploratory aspect of this study, the present results suggest that the ASDQ has good potential to inform screening. Accuracy was maintained across relevant subsamples, including young children (aged 2-4 years), females, other (non-White) races, and Hispanic ethnicity. Intriguingly, the ASDQ item average also performed comparably to statistical learning methods, producing roughly equivalent accuracy in most analyses. Although additional work is needed in larger and more diverse samples of young children, including children who are referred but not yet diagnosed, these findings suggest that the ASDQ total item average may be a sufficient approach to inform screening and diagnostic differentiation across demographic groups.

Limitations and future directions
The primary limitations of this study were having only one validated measure of psychopathology for evaluating discriminant validity, a limited number of items per ASD subscale, reliance on informant-reported clinical diagnoses of ASD and other developmental conditions from previously diagnosed individuals, and small subsamples across age and other demographic and clinical factors with a very limited sample size below age 5 years, thus precluding more detailed exploration of the potential screening utility of the instrument. Future work should include a larger sample of undiagnosed children aged 2 to 4 years, including stratification according to speech and language level and intellectual disability, to better assess the potential of the ASDQ as a screening tool. This will provide a more stringent test by focusing on undiagnosed individuals and will assist in determining whether modifications are needed to accurately differentiate cases with and without ASD in younger and more impaired samples. However, it is important to note that decreased reliability of informant-reported ASD diagnoses is more likely to attenuate validity estimates rather than inflate them. If sufficiently validated in future investigations, the ASDQ could be used in an evidence-based assessment capacity by a developmental pediatrician or child neurologist to screen for possible risk for ASD. For example, a clinician seeing a 4-year-old male child with no family history of ASD, but with concerns of social deficits, could use the ASDQ to inform the risk of ASD. A total item average score greater than 3.2 would warrant priority referral for tertiary care evaluation with the probability of ASD moving from 3.7% (population estimate) to approximately 57% (elevated concern). Additionally, assuming good test-retest reliability and sensitivity to change are demonstrated in future research, the ASDQ could also be used after diagnosis to monitor autism symptom improvement during intervention. The ASDQ factor structure requires replication in large ASD and developmental disability samples; several specific factors and scales had only a few items with high loadings. Although the resulting scales showed good overall and conditional reliability, content coverage of these domains is limited. Future instrument revisions may consider adding a small number of items to enhance content coverage for the weakest scales, particularly affiliation and sensory sensitivity. In the meantime, researchers investigating these domains with the ASDQ may simply choose additional measures, such as the Strengths and Difficulties Questionnaire 14 and the Dimensional Assessment of Repetitive Behavior, 37 which enhance coverage of key symptom domains.
Additional studies of convergent and discriminant validity with established criterion standard autism diagnostic instruments and informant report questionnaires are also needed, as well as studies of test-retest reliability and sensitivity to change. Investigations are also needed to examine different time intervals for recall to assess whether specific, shorter intervals maintain good validity for screening efficiency while also maximizing sensitivity to change.
A few ASDQ items are applicable only to individuals who use speech or older children. For example, items on restricted interests and high-level perspective taking are often not relevant for ages lower than 3 years. The approach of allowing informants to decide when a rating could not be provided is useful because most items were responded to; when items were rated not applicable, the instrument was still able to be F I G U R E 2 Receiver operating characteristic curves depicting the prediction of ASD diagnosis in the test subsample using the revised 39-item ASDQ total item average score, in all cases (a) and in at-risk (b) cases. Abbreviations: ASD, autism spectrum disorder; ASDQ, Autism Symptom Dimensions Questionnaire; AUC, area under the curve. scored for nearly all participants. Future large-sample studies are needed to confirm this observation and identify whether loss of validity occurs for the total score or specific subscales due to nonresponse. Given the reading grade level of 7.9 and the potential screening use of the ASDQ, future instrument revisions may also consider attempting to decrease the reading level to ensure that a broader swathe of parent informants can interpret the items and provide accurate ratings.
Modestly sized developmental disability and neurotypical control samples are a further limitation of this study. Future studies with well-characterized neurotypical and developmental disability controls are needed to better understand ASDQ score distributions in these groups and provide more accurate estimates of screening and diagnostic validity. Another important next step will be to collect normative data on the ASDQ for computation of standard scores and inform clinical interpretation. The present data suggest very few typically developing and developmental disability controls with item average scores above 3.2 and very few cases with ASD with item average scores below 2.7 (5-point Likert scale), further reinforcing the screening potential of the ASDQ. However, large-sample normative data coupled with a large ASD sample are needed to develop multilevel likelihood ratios and related algorithms to use the ASDQ in an evidence-based medicine fashion to inform screening and clinical diagnosis.
In summary, the present data provide preliminary evidence that the ASDQ is a free, informant report measure of core autism symptoms that may be useful for screening and treatment monitoring, particularly if validated in larger at-risk and ASD samples, including those where ASD has not yet been diagnosed or where concern for ASD is not a predominant presenting problem. Despite relative brevity, the ASDQ offers comprehensive and reliable assessment of key domains of the core ASD phenotype, which is consistent with the DSM-5 ASD criteria. The instrument has potential to not only inform identification, but also has sufficient measurement precision to track individual differences from very low to very high symptom levels. If the present results are replicated, the ASDQ has strong potential to be widely adopted in future research and clinical practice.

DATA AVA I L A BI L I T Y S TAT E M E N T
The data that support the findings of this study are available from the corresponding author upon reasonable request.

R E F E R E NC E S SU PP ORT I NG I N FOR M AT ION
The following additional material may be found online: Table S1: Model fit indices from factor analyses of the revised 39-item ASDQ in the exploratory and confirmatory subsamples Table S2: Standardized factor loadings for the nine-factor exploratory structural equation model with a general ASD bifactor computed using all 39 items from the revised ASDQ Table S3: Factor and scale correlations from the ASDQ in the total revised 39-item ASDQ sample Table S4: Measurement invariance model comparisons for the revised 39-item ASDQ across sex, age, race, and ethnicity Table S5: Reliability statistics for the revised 39-item ASDQ scales Table S6: Demographic and clinical characteristics of individuals from the KidsFirst subsample who completed the revised 39-item ASDQ and SCQ Table S7: Bivariate correlations between ASDQ and SCQ subscales Table S8: Prediction of ASD in the total and the at-risk samples using all 39 of the revised ASDQ items Figure S1: Exploratory bifactor model with nine specific factors and a general ASD factor in the revised 39-item ASDQ sample Figure S2: Revised 39-item ASDQ total item average distribution by diagnostic groups, neurotypical, developmental disability, and ASD