Sensory reactivity assessment in children: A systematic review

To identify sensory reactivity assessments published in the literature for children aged 3 to 12 years and evaluate their psychometric properties to select the most appropriate one for adaptation to South Africa, with implications for other low‐ and middle‐income countries.

0][21][22] High prevalences of sensory reactivity difficulties have been reported in ASD 20 and in fetal alcohol syndrome, which is a common condition in South Africa. 23Prevalence rates of sensory reactivity difficulties for these two conditions can be up to 90%. 17,20lthough sensory reactivity difficulties were initially identified in children, they have increasingly been reported in adults, adversely affecting mental health and well-being, and have been linked to several mental health issues such as schizophrenia, depression, and attention-deficit/hyperactivity disorder. 24,25any terms are currently used in the literature to refer to sensory reactivity, such as sensory modulation and sensory processing.In this review, the term sensory reactivity will be used for the following reasons.For the first time, sensory reactivity difficulties were included as a diagnostic criterion in the Diagnostic and Statistical Manual of Mental Disorders, Fifth Edition (DSM-V), 5 with the terms sensory hyporeactivity and sensory hyperreactivity used as part of the diagnostic criteria for ASD, under the domain of Restricted, Repetitive Patterns of Behaviour, Interests, or Activities.This has led to increased use of the term sensory reactivity by professionals in the field and in the more recent literature. 4,26Second, the identification of a sensory reactivity assessment resulting from this systematic review will be adapted for use in South Africa, where sensory reactivity is the recommended term.
Sensory reactivity assessments for children most frequently take the form of proxy-reported questionnaires completed by a parent or a caregiver or teachers. 27They rate aspects of the child's behaviours which may relate to sensory reactivity difficulties.The most frequently used questionnaires that evaluate sensory reactivity are the Sensory Profile suite of tests 28 and the Sensory Processing Measure (SPM), 29 both of which comprise caregiver and teacher questionnaires. 30Another form of assessment is performancebased assessments.There are, however, few of these available; they are not easily accessible and are seldom used by practitioners.Several performance-based assessments are in the developmental stage.3][34] Neurophysiological measures have also been recommended. 30,357][38][39] These are costly, require expensive and specialized equipment and technical staff not readily available to occupational therapists, and have therefore been used predominantly for research purposes.Most sensory reactivity questionnaires have been developed and normed in the USA. 30]40 Several scoping and systematic reviews have been published in the field of assessment of sensory processing, although a variety of definitions were used for the construct.The reviews have all been published in the past 11 years, with the first being a systematic review published in 2012. 41The population in four studies had a diagnosis of ASD. 35,42,44,45ne used a population of children born preterm. 45The age of the population of interest varied between reviews, ranging from birth to adulthood.One focused on birth to 2 years 41 and one on adolescents and adults. 44Two studies included children covering a similar age range to this systematic review, with one exploring the population 3 to 11 years old 30 and the second birth to 14 years. 46Both reviews explored the construct of sensory processing, which they defined to include sensory perception, sensory motor skills, praxis, and sensory reactivity.Four reviews included an evaluation of psychometric properties, although many provided limited information on these.None used the COSMIN (COnsensusbased standards for the Selection of health Measurement INstruments) methodology and they were not rigorous in their evaluation of measurement properties.There was thus no systematic review specifically examining the measurement properties of sensory reactivity assessments in childhood in the general population.
This systematic review was the first stage of a larger, three-stage study.The overarching aim of the study was to identify an existing assessment of sensory reactivity with high reliability and diagnostic validity for adaptation and evaluation for clinical use in South Africa.The aim of this systematic review was to identify sensory reactivity assessments published in the literature for children aged 3 to 12 years and evaluate their psychometric properties to select the most appropriate one for adaptation to South Africa, with implications for other low-and middle-income countries.The review questions were as follows.What tests have been published in the literature in the past 30 years to evaluate sensory reactivity with children aged 3 to 12 years?What is the validity and reliability of the tests reported in the studies?This information then informed the choice of a sensory reactivity assessment for adaptation so that it was contextually appropriate for South Africa.

What this paper adds
• Nineteen tests for assessing sensory reactivity in children were identified.• An extensive evaluation of the assessments' psychometric properties is provided.• There was limited evidence on psychometric properties in the identified studies.• The Sensory Processing 3-Dimensions Scale was identified as the most reliable and valid assessment.

M ET HOD
The review was registered on the international prospective register of systematic reviews (PROSPERO, number CRD42021234460).The three-stage study received ethical approval from Stellenbosch University Human Research Ethics Committee, reference number S21/02/029 (PhD).8][49][50] This methodology was used as it provided not only for the systematic review process specific to assessments, but also for a rigorous and detailed system of evaluating measurement properties, which supported the aim of this review.The COSMIN developers suggest the methodology can also be used with some adaptations for evaluating other types of assessment such as performance-based ones. 50The 10 steps for conducting a systematic review as outlined by the COSMIN methodology are represented in the flow chart in Figure S1, and these informed the methodological framework of the systematic review.

Search terms
The COSMIN guidelines were used to aid the selection of search terms. 50,51These consisted of three elements (Table 1).Two filters developed by the COSMIN researchers were added to these three fields: a search filter for finding studies on measurement properties of measurement instruments and an exclusion filter. 51

Eligibility criteria
The only limiter set was for publication date, specifying 1990 to 2020.The start date was selected for two reasons.
First, most research studies on measurement properties of measurement instruments have only been published since 1990. 51Second, a literature review identified no articles on the subject in the 1980s. 52Another literature review identified the first published use of the term sensory modulation (previously used to refer to sensory reactivity) in 2009. 53No limits were set for countries or contexts, to identify all articles potentially relevant for adaptation.In addition, no limits were set on language, as recommended by the COSMIN researchers. 50No limits were set for the types of children (typically and atypically developing) to be included, so as not to exclude potentially useful information.Animals were not specified as an exclusion, as the population field (child OR preschooler) implied humans.

Literature search
The literature search was conducted between 10th March 2021 and 28th June 2021 with a follow-up final search done on 11th November 2022.The selection of databases to be searched was guided by the aim of the review to identify sensory reactivity assessments used with children.An electronic search was conducted of the following databases: Pub-Med, Scopus, OTseeker, Web of Science, EBSCOhost, SACat, WorldCat, and Cochrane Library.This list of databases was comprehensive, and likely to identify all the relevant articles to achieve the aim of the systematic review.The search terms specific to PubMed can be found in Appendix S1.These were adapted where necessary for other databases.Reference lists from the identified articles were followed to source additional relevant publications.Peer-reviewed articles from the Sensory Processing Disorder Foundation and the Collaborative Leadership in Ayres Sensory Integration were sourced from their websites.All identified articles were imported into Mendeley (https://www.mendeley.com/) for deduplication.The remaining articles were then uploaded to Rayyan, an online software systematic review screening programme (https:// www.rayyan.ai/).A further de-duplication in Rayyan was T A B L E 1 COSMIN guidelines for search terms.

Conceptual framework of sensory reactivity
Various terms are used in the literature to refer to sensory reactivity, and all these terms were included in the search.

Population Children 3 years 0 months-12 years 11 months
Most caregiver sensory reactivity questionnaires for children cover a similar age range. 28,29n infants and toddlers, sensory reactivity is harder to distinguish from other regulatory disorders, such as eating and sleeping disorders; however, by 3 years there has been sufficient maturation of these systems, making it easier to identify sensory reactivity. 38,99,100ensory reactivity difficulties tend to become more evident when children enter a structured educational environment such as a pre-primary school, where behavioural expectations are higher, there are frequently more children and more environmental stimuli.This makes sensory reactivity difficulties easier to identify. 100ssessments with an age range greater than specified were included if the specified age range of this review was part of the age range of the identified assessment.

Caregiver questionnaire and performancebased assessment
Best-practice recommendation is to use both types to assess sensory reactivity. 4,31,32europhysiological tests were excluded as costly and impractical.
Abbreviation: COSMIN, COnsensus-based standards for the Selection of health Measurement INstruments.
conducted.All articles in a foreign language were identified, totalling 40 publications.Most of these articles had English abstracts available online.For those that did not have an English version, the abstract was translated using the online translation tool Google Translate for level one screening.There was one foreign language article in level two (fulltext) screening that required a full-text translation.Google Translate was unable to translate this, and it was translated using another online translation tool, Reverso (https://www.rever so.net/text-trans lation).

Exclusion criteria
The selection of articles was guided by the following criteria.(1) The aim of the study should be the evaluation of one or more measurement properties, or the test development (to rate content validity), or the evaluation of test interpretability. 50(2) Studies where the assessment was used only as an outcome measurement tool to measure outcomes in a study were excluded, including where the assessment was used in a validation study of another assessment, as these provide limited, indirect evidence of measurement properties of the assessment. 50(3) Only articles evaluating assessments in their original form were included.Studies of assessments that had been adapted for another country or culture were excluded so as not to duplicate results.Furthermore, the purpose of our study was a test adaptation that needed to be done from the original version, not an adapted version.(4) Systematic reviews and meta-analyses were excluded, as they would duplicate results and inflate the evidence base. 44(5) Only peer-reviewed articles were selected to ensure high-quality studies.Grey literature, reports, conference proceedings, test manuals, and information on websites, etc. were excluded, as they had not been peer-reviewed.(6) Only full-text articles were included, as abstracts would not contain sufficient information on the quality of the study and the measurement properties being evaluated. 50

Screening of articles
The articles were screened in two steps: a title/abstract screening (level one) and a full-text screening (level two).Level one screening was done of the article titles and abstracts by the first author (AFW).After this was completed, a random selection of 50 of the identified articles was made and these were reviewed independently by the second author (LGC).The percentage of agreement was calculated to check for interrater reliability between the two reviewers.
There was 92% agreement.The disagreement of the screening decisions on four articles was discussed and agreement reached.Where there was not clarity on the screening decisions from the titles and abstracts alone, these articles were included into the level two screening for a fulltext review.For each article that was excluded, AFW indicated the reason for exclusion.
In level two screening, the first and third authors (AFW and LDP) independently reviewed the full texts of the selected articles to determine their eligibility.Where the decision was unclear or there was disagreement, they came to a joint decision about eligibility, bearing in mind the stated eligibility criteria.Reasons were recorded for each exclusion.To avoid any bias in the review process, the review of three assessments that LDP was involved in developing, namely the Evaluation in Ayres Sensory Integration, ESP, and SPM, was performed by AFW and an independent reviewer who was not involved in the design and development of these three assessments.

Data extraction and evaluation of measurement properties
COSMIN developed rating scales for methodological quality of studies, measurement properties of the assessments reported in the studies, and quality of evidence for each measurement property. 50Methodological qualities of the studies are ranked according to specified criteria with four ratings: very good, adequate, doubtful, or inadequate.Scores obtained for measurement properties are allocated the four ratings of sufficient (+), insufficient (−), indeterminate (?), or inconsistent (±).The overall quality of evidence is rated as high, moderate, low, or very low.
Both data extraction and data analysis, including risk of bias assessment, quality of the studies, and evaluation of the measurement properties of the assessments, were done independently by two reviewers: the first author and a research assistant.The research assistant was an occupational therapist with 7 years' experience in the field of paediatric sensory integration and a master's degree in the field of paediatrics.She was provided with training by the first author on the latest theory about the construct of sensory reactivity, the COSMIN methodology, and measurement properties.Data charting forms and a Microsoft Excel spreadsheet developed by the COSMIN researchers were used. 54Where there was disagreement on ratings allocated, the two reviewers discussed the issue until it was resolved.A final rating for each aspect evaluated in the review was obtained by taking the score of the item in that category with the lowest rating as the final score, using the principle of 'lowest score counts'. 55Copies of the assessments were required for the reviewers to complete independent ratings of the test items as part of the evaluation of content validity.The test developers or publishers were Percentage of Agreement = 100 × Number of articles for which both reviewers agreed on inclusion (exclusion) decision Total number of articles contacted to obtain copies of any assessments that were not already in the possession of the authors or provided in the published articles.A time limit of 3 weeks was allowed for their response.Where copies of the assessment were not provided, the reviewer's rating was excluded from the evaluation of content validity. 55A final summary of findings' tables was then completed.

R E SU LTS
Forty-one studies were identified in the literature search and screening, reflecting 20 different assessments.Nine of the included articles, [56][57][58][59][60][61][62][63] including one assessment, the Sense and Self-Regulation Checklist, 64 were excluded during data extraction, as during this process it became apparent that they met the exclusion criteria.Therefore 19 assessments were considered for further analysis.One foreign language article was in the level two screening but was excluded.The search results and screening selection process are presented in Figure S2.Fifteen of the assessments were developed in the USA.The Sensory Sensitivity Questionnaire-Revised (SSQ-R) was developed in Australia, the Sensory Behaviour Questionnaire (SBQ) in the UK, the Thai Sensory Patterns Assessment (TSPA) in Thailand, and an unnamed Arabic sensory processing questionnaire was developed in Egypt.
The most common type of assessment was the proxyreported questionnaire completed by the child's caregiver, used in 13 of the assessments.Other types were the performance-based assessment used in six, a proxy-reported questionnaire completed by the teacher used in two (the Sensory Processing Measure-School [SPM-S] and Sensory Profile-School), a caregiver interview used in the Sensory Assessment for Neurodevelopmental Disorders (SAND), and a child self-report used in the Touch Inventory for Elementary-school-aged children (TIE).Four assessments relied on information provided by both a caregiver (either a questionnaire or an interview) and a performance-based assessment.These were the SAND, the Sensory Processing 3-Dimensions Scale (SP-3D), the Sensory Processing Scales (SPS), and the Sensory Over-Responsivity Scales (SensOR).
Most of the studies evaluated four assessments: the SPM, the Sensory Profile, the Sensory Experiences Questionnaire (SEQ), and the SP-3D and its precursors.Many of the remaining tests were only evaluated in one study, and most of the psychometric properties were only evaluated in the one study.The second editions of the Sensory Profile (published in 2014) and the SPM (published in 2021) had no studies evaluating their psychometric properties.The Sensory Profile and Sensory Profile, Second Edition were based on Dunn's conceptual framework of neurological thresholds and behavioural self-regulation patterns. 58The SPM and the SPM, Second Edition were based on the conceptual framework of Ayres sensory integration theory. 29here were several different versions of some tests.The SPM was a later version of the Evaluation of Sensory Processing (ESP). 65The SEQ had version 1.0 and a later version 3.0.The SensOR was an earlier version of the SPS, in turn a precursor of the SP-3D.The Short Sensory Profile (SSP) was an abbreviated version of the Sensory Profile, using the 38 most discriminating of the 125 items of the latter.
Five assessments, namely the Gravitational Insecurity Assessment, TIE, Auditory Behaviour Questionnaire (ABQ), SAND, and SensOR, evaluated only part of the construct of interest.Three of these evaluated one sensory system each: the Gravitational Insecurity Assessment only evaluated the vestibular system, the TIE the tactile system, and the ABQ only the auditory system.The SensOR only evaluated sensory hyperreactivity.The SAND only evaluated three sensory systems (visual, auditory, and tactile).The sensory systems most frequently assessed were the visual, tactile, auditory, and vestibular systems.
Six of the tests were developed specifically for the population with ASD: the ABQ, SBQ, SAND, SSQ-R, SEQ 1.0, and SEQ 3.0.A further six studies included participants with ASD.Five studies used participants with sensory integration or sensory processing difficulties.Other atypical study populations had participants with sensory hyperreactivity, developmental delays, gravitational insecurity, and attention-deficit/hyperactivity disorder.Most tests used child participants in the 2-to 14-year age band.Those outside this common age band were used in the SensOR with an upper age limit of 55 years, the SPS of 18 years, the ABQ of 21 years, and the SBQ of 17 years.Only the SEQ 1.0 had an age limit that was lower than 2 years, at 5 months.A summary of the characteristics of the studies included in the review is presented in Table S1.

Evaluation of measurement properties
Formal content validity studies were reported in only two studies, which examined the ESP and the SensOR. 66,67However, these two studies did not meet the criteria for eliciting feedback from patients and professionals in a content validity study as outlined by COSMIN, which requires feedback on relevance, comprehensiveness, and comprehensibility.The SensOR study 66 only reported on interviews with professionals on relevance, and the ESP study 67 reported on interviews with professionals and patients on relevance.Some other studies did not state that a content validity study had been done, but did report on some aspects considered in a content validity study.Table 2 summarizes eight studies that reported on aspects of content validity related to feedback from patients and professionals.
Information on test development is considered by COS-MIN to be an element of content validity.Test development studies were done on all assessments, except the Arabic questionnaire and the SSP.A test development study of the SSP was not necessary, as it was an abbreviated version of the Sensory Profile.The SPM-S, a teacher questionnaire, and the Evaluation in Ayres Sensory Integration, a performancebased assessment, reported in detail on test development. 68,69here were some aspects of test development, such as the recall period, that were only relevant to questionnaires and not to performance-based assessments.A recommended recall period was only provided in the test instructions for three questionnaires: the SBQ, SPM-Home, and SPM-S. 29,70In all three, the stated recall period was 1 month.Test items for 13 of the tests were available for rating by the reviewers as part of the evaluation of content validity.For the six tests where reviewers' ratings were not available, the methodological quality of the study was downgraded by one level.The results of the test development studies and the reviewers' consensus ratings can be viewed in Table S2.
Structural validity was evaluated using factor analysis for eight assessments and Rasch analysis for one, the SP-3D (Table S3).Two factor analyses were performed on the SSP, both with children with ASD.Six studies reported on exploratory factor analyses, with one of these studies on the SSP also performing a confirmatory factor analysis.Two studies reported on confirmatory factor analyses for the ESP and SEQ 3.0.Six of the studies had a large sample size (400 or above) and four were smaller (between 103 and 261).Methodological quality was adequate or very good for all studies except those evaluating the ABQ and SensOR.These were rated as inadequate because the sample size was not large enough in relation to the number of test items. 54ifteen assessments had internal consistency scores reported, with the Gravitational Insecurity Assessment, SSP, and SEQ 1.0 having two studies evaluating this.Cronbach's α was used in all studies except the SensOR and SPS, which used the α coefficient.The studies on the Gravitational Insecurity Assessment by May-Benson, 71 Sensory Profile, SSP, and Sensory Processing Scale: Assessment (SPS:A) were the only assessments to report ratings of sufficient.The second Gravitational Insecurity Assessment study 72 assessed internal consistency but did not report the scores.Nor did the study of the SSP report the scores, 73 simply saying that they were 'reliable'.Of the 15 subscales of the SP-3D, six had Cronbach's α scores lower than 0.50 and 12 had Cronbach's α scores below 0.70, 74 resulting in an insufficient rating for this study, even though the methodological quality was very good.The detailed results are presented in Table 3.
Seven assessments had evidence of reliability (Table 4).Two only evaluated interrater reliability and three only testretest reliability.The SAND evaluated both interrater and test-retest reliability, with ratings of sufficient for both these measurement properties.Only one evaluated intrarater reliability, the TSPA, and is not included in Table 4.The intraclass correlation coefficient was used in the SAND, SEQ 1.0, Sensory Profile, and TSPA.The TIE and SensOR used Pearson's product-moment correlations, and the SP-3D used percentage agreement and kappa, with very good ratings.Most scores were reported as above 0.7, with isolated subscales below 0.7 in the SensOR, SEQ 1.0, Sensory Profile, and TSPA.

Selection of an assessment for adaptation
Four of the identified assessments drew on dual sources of information, namely a caregiver questionnaire or interview, and a performance-based assessment.These were the SAND, the SP-3D, the SPS (which was an earlier version of the SP-3D), and the SensOR (which was in turn an earlier version of the SPS).Pertinent characteristics are summarized in Table 6.
All tests were developed in the USA and originally published in English.The SensOR is considerably older than the other tests, having been published in 2008.The SAND and SensOR both had limitations related to their construct, with the SensOR only evaluating hyperreactivity, and the SAND only evaluating three sensory systems.The SPS and SP-3D evaluated all sensory systems.The SP-3D evaluated the three components of sensory processing described by Rating according to criteria of good measurement properties: sufficient (+); insufficient (−); indeterminate (?).ESP was the precursor to the SPM-Home.All tests developed in the USA in English except the SBQ (UK), TSPA (Thailand; language not specified), and Arabic questionnaire (Egypt; Arabic).All studies conducted in the USA except Neil et al. 70 (UK), Sankar and Prema 72 (India), Sutthachai et al. 40 (Thailand), and Sobhy et al. 97  Miller et al., 78 one being the sensory reactivity component.Internal consistency, reliability, and construct validity (both convergent and discriminant) are reported in this review for the sensory reactivity component.Test development of the SP-3D had very good methodological quality, with the other tests all having a doubtful rating.Test items were not available for the reviewers to rate for the Sen-sOR, SPS:A, and SP-3D.The SP-3D used Rasch analysis to determine the structural validity with adequate methodological quality, whereas the SensOR had inadequate and the Sensory Processing Scale: Inventory (SPS:I) doubtful methodological quality.Only the SP-3D had very good methodological quality for internal consistency.There were some low Cronbach's α scores for the SP-3D, with 6 of the 15 subscales having Cronbach's α scores below 0.50.The SP-3D was the only one of the tests with very good methodological quality for reliability.Construct validity, both convergent and discriminant validity, was evaluated for all assessments.All had very good methodological quality except for discriminant validity on the SAND and SPS:A which were both adequate.

DISCUS SION
Sensory reactivity difficulties have been an area of increased research in recent years. 44,79This is reflected in this review where more than 64% of the included studies have been published since 2010.The purpose of this systematic review was to identify sensory reactivity assessments used with children aged 3 to 12 years and to evaluate the measurement properties of each test to select the most appropriate one for adaptation to the South African population.As far we are aware, this is the first systematic review of the literature to identify the tests used to assess sensory reactivity in children 3 to 12 years of age and to perform an extensive evaluation of the measurement properties of the assessments.A strength of this systematic review was the use of the COSMIN methodology to guide the process and the evaluation of measurement properties, 50 with a well-researched set of search filters which ensured an accurate and broad literature search. 51ineteen assessments evaluating sensory reactivity were identified in the review.Most used a proxy-reported questionnaire completed by either the parent or caregiver for home-and community-based information on the child's behaviour, or the teacher for information about the child's behaviour in school.This finding was consistent with other systematic reviews on sensory reactivity assessments. 35,41,42,45,46The SPM and Sensory Profile had questionnaires both for caregivers and for teachers to provide data from home and school environments.Although the caregiver and teacher questionnaires were designed to complement each other, there was not a high correlation between the results from the different contexts. 80This indicated the different viewpoints of parents and teachers, as well as the different expectations and behaviours of children in these two contexts. 80wer performance-based assessments than questionnaires were identified, with most performance-based assessments developed recently, or still in development.This seems to be an area where attention has been focused in recent years.Six assessments have been developed specifically for the population with ASD, and an additional six included participants with ASD, indicating a particular focus on identifying and understanding sensory reactivity in this disorder.This reflects the research focus on sensory reactivity since its addition as part of the diagnostic criteria for ASD in the DSM-V. 5It also reflects the high prevalence of sensory reactivity in children with ASD, reported to be between 45% and 96%. 73,81,82Three criteria that form part of the COSMIN methodology could not be evaluated as they were only pertinent to questionnaires and not to performance-based assessments. 50These related to the recall period that the caregiver had to consider when rating the child's behaviours, the Likert scale response options, and the wording of questionnaire test items read by the caregiver.

Measurement properties
Content validity is considered one of the most important measurement properties, because it evaluates whether the content of the test measures what it purports to measure. 49t is therefore the first measurement property to be evaluated. 49The criteria were met in very few of the studies in the review.An example was the requirement that a semistructured interview guide should be used for obtaining input from patients, and that these interviews should be recorded and transcribed verbatim. 55There was no evidence in any of the content validity studies that this was done.
The COSMIN group consider patient feedback to be the most important aspect of content validity, 49 because patients are deemed the primary experts on their condition.However, only four tests reported requesting feedback from patients.This lack of input from patients has serious implications for content validity, because if the patient feels that the test is not relevant, or understandable, or that it omits areas which they feel are important, they may be less invested in their participation, reducing their motivation and possibly introducing error. 49There are also significant research implications, as analyses that are based on inaccurate test results may lead to erroneous conclusions.
Test development studies are considered an aspect of content validity, as they provide evidence of generation of the items for evaluation of the construct. 49Test development was reported in detail in only two studies, the SPM-S and the Evaluation in Ayres Sensory Integration. 68,69The COSMIN methodology requires, as a factor of test development, that the study provides a justification for both the recall period used and the response options provided.However, no studies provided this. 49tructural validity examines the factor structure of an assessment and how items group together into factors. 54,83his was evaluated by 10 studies in the review.Factor analyses were used in all instances except the SP-3D, 74 which used Rasch analysis. 83Exploratory factor analysis and principal component analysis are used in the early stages of test development when there is no clarity on the dimensionality of the assessment. 54,83A confirmatory factor analysis is used to confirm the scales suggested in a hypothesis. 54,83COSMIN consider the confirmatory factor analysis to be the preferred method for performing a factor analysis at the confirmatory stage. 54However, this was only performed for the SSP, 84 ESP, 85 and SEQ 3.0. 86esults of factor analyses are only applicable for the population used in the study. 83Of the 10 studies in the review, four were specific to the population with ASD, 84,[86][87][88] thus limiting the generalizability of the findings to the general population.Eight studies used child participants of a similar age range to the population in this review.The remaining two studies included adolescents 87 and adults, 66 also limiting the generalizability of these results.Factor analyses require large sample sizes, 47 which was a strength of these studies as all except two 66,87 met the minimum COSMIN criteria for minimum sample size. 54nternal consistency reflects the interrelatedness between the items in a unidimensional scale or the subscales in the case of a multidimensional test. 41,50The factor structure of a test needs to be clear so that if the test is multidimensional, internal consistency can be reported for each subscale.However, this was not always the case.Calculations of internal consistency were performed on subscales identified by the test developers of the SEQ 1.0, SBQ, and SPS without a report of factor analysis having been done to confirm that these were actually the subscales. 32,70,89Internal consistency for the SAND was calculated as a total score, combining all items in the questionnaire and the performance-based assessment, thus reducing the interpretability of the score. 81he preferred statistical method for calculations of internal consistency is Cronbach's α. 50This was used in all studies except the SensOR and SPS, which used the α coefficient, 32,66,90 resulting in a downgrading of methodological quality ratings from very good to doubtful for internal consistency of these two tests. 54A score of at least 0.70 for each unidimensional scale or subscale is recommended to achieve a sufficient result rating. 50][91][92] The preferred statistical method for evaluating reliability (interrater, intrarater, and test-retest) where tests have continuous scores was the intraclass correlation coefficient. 54This was used in all studies except the TIE and the SensOR, 66,93 resulting in a downgrading by one level of the methodological quality rating for these two tests. 54he other assessments had interrater, intrarater, and testretest reliability scores in the moderate to high range, with very good or adequate methodological quality.The SP-3D had dichotomous scores, for which kappa was the preferred statistical method. 54The test used percentage agreement on an item level and kappa for total scores, 74 resulting in a very good rating for methodological quality of the study.The SensOR, SEQ 1.0, Sensory Profile, and TSPA reported isolated subscales below the COSMIN criteria cutoff of 0.7, 54 which resulted in the rating being downgraded to negative, even though most of the subscale scores were reported as above 0.7.This was an example of the application of the COSMIN principle of the 'worst score counts' in this review, 54 which resulted in a low rating that did not reflect the generally good rating of most subscale scores.
Construct validity refers to the degree to which the scores of an assessment are consistent with a hypothesis about the relationship between the scores of the assessment and the scores of another instrument (convergent validity), or to differences between specified groups (discriminant validity). 54The results are rated sufficient if they are in accordance with the hypothesis, and insufficient if they are not.The SSP was the most frequently used comparator instrument for examining convergent validity, and was used in four studies where it was compared with the SBQ, 70 SAND, 81 SensOR, 66 and SPS:A. 34ther studies of convergent validity involved the Sensory Profile, SPM, and SP-3D: Assessment.Comparator studies between typical and ASD groups for discriminant validity were reported using the SAND, SSQ-R, SEQ 1.0, Sensory Profile, and an unnamed Arabic questionnaire.Typical and sensory integration groups were compared using the Evaluation in Ayres Sensory Integration, ESP, and SPM-S.Typical groups and various aspects of sensory reactivity were compared using the SensOR, Gravitational Insecurity Assessment, SPS:A, and SPS:I.All studies of discriminant validity had very good or adequate methodological quality except for the SPM-S and SSQ-R, which did not provide an adequate description of the relevant characteristics of the comparator groups. 69,94here were no studies evaluating criterion validity.This was because there is no acknowledged criterion standard for assessing sensory reactivity, nor is there consensus on the best measuring tool. 34

Considerations for selection of an assessment for adaptation
The SAND, SP-3D, SPS, and SensOR were the strongest contenders for adaptation because they combined information provided by both a caregiver, in the form of a questionnaire or an interview, and assessment by a clinician, as recommended by experts in the field, 4,32,41 and they provided a multifaceted picture of the child's sensory reactivity profile. 35,90We recommend the SP-3D 74,95 as the assessment most suitable for adaptation.
The SP-3D had superior psychometric properties on many of the reported measures.It was the only test with a 'very good' methodological quality rating for the test development aspect of content validity, with the other three tests rated as doubtful owing to methodological flaws.The reliability of the SP-3D had the highest rating for methodological quality.The methodological quality of the SP-3D: Assessment for construct validity was also very good, with mild to moderate reported values for convergent validity and an ability to discriminate between typically developing children and those with sensory processing disorder (discriminant validity).
Additional factors besides psychometric properties also favoured the SP-3D.The SP-3D is the newest assessment, with test development completed, and the publishing date set for late 2023.It evaluates the construct of sensory reactivity as one of the three components evaluated, and this component can be administered separately.The sensory reactivity construct is defined for the SP-3D as hyperreactivity, hyporeactivity, and sensory seeking, which is very similar to how it was defined for this review.The omission of the vestibular system in the SAND was problematic, as it is considered a foundational system in Ayres sensory integration theory, and one of the first systems, after the tactile system, where Ayres identified reactivity difficulties. 96The SensOR only evaluates sensory hyperreactivity, omitting hyporeactivity.The omission of important elements of the construct in evaluating the SAND and SensOR was judged to be a particularly important consideration which ruled out these two tests from further consideration for possible adaptation.The considerable number of years since publication of the SensOR and the fact that it has been superseded by the same developers with the SPS and then the SP-3D were also significant disadvantages to its recommendation.For the same reason, the SP-3D was preferred over the SPS.The SAND and SP-3D are the most recently developed tests and are therefore more in line with the latest theory and developments in the field of sensory reactivity.Insufficient information was provided in the studies on the SP-3D administration, test materials, and scoring methods to determine the appropriateness of these aspects for the South African context.Nor were the test items provided by the publishers, which would have helped further evaluation of the test's suitability.
There were several limitations to this systematic review.There was considerable inconsistency in the use of terminology, the construct, and the conceptual framework.The lack of consistency in terminology related to sensory reactivity involved the use of terms such as sensory processing, sensory modulation, and sensory integration.This has been identified as an obstacle to clear reporting by other researchers. 4The meaning of the terms domains, subtests, factors, and subscales was frequently unclear and undefined.Some assessments evaluated sensory reactivity only, both hyper-and hyporeactivity.Others included sensory seeking, although there is lack of clarity on whether this is a separate subtype. 82Yet others included the evaluation of additional aspects of sensory integration; for example the SPM included praxis and social participation in addition to sensory reactivity. 29The construct being measured was not clearly defined in many of the studies.Where the constructs were defined, there were slight differences in the components that were hypothesized to reflect the sensory aspects being measured.This lack of agreement on the construct has been identified as problematic by other authors. 41Different conceptual frameworks underlie some assessments of sensory reactivity.The Sensory Profile and SPM both evaluate sensory reactivity; however, there are differences between them based on their different conceptual frameworks.Comparison of assessments and data synthesis requires consideration of the homogeneity of the study characteristics.Inconsistencies of terminology, construct, and conceptual framework made direct comparisons between assessments difficult, a concern also noted by other authors. 41Inconsistencies in their definitions made it difficult to compare psychometric properties in different studies.
A second limitation was the paucity of information on psychometric properties provided in the studies and their generally low ratings.None of the assessments had all the measurement properties specified in the COSMIN taxonomy evaluated or reported on. 54The TIE, 93 ABQ, 87 SBQ, 70 SAND, 81 SSQ-R, 94 TSPA, 40 and an unnamed Arabic questionnaire 97 had only one study that evaluated psychometric properties of the assessment, although the SAND, TSPA, and the Arabic questionnaire are recently developed tests that are likely to have further studies published.Pooling and synthesis of the data were not possible where only one study evaluated measurement properties of an assessment.In addition, few of the studies had good methodological quality or high ratings for measurement properties.These factors made recommending and selecting an assessment for adaptation difficult.This was consistent with findings of two other systematic reviews, which also reported insufficient information on psychometric properties and that, where this was provided, the ratings were moderate to poor. 35,43 third limitation was the COSMIN principle of the 'worst score counts' used for summarizing ratings.The reasoning of the COSMIN researchers was that poor methodological aspects of a study cannot be compensated for by some good methodological qualities. 55Although this argument carries some weight, the principle was at times problematic, and has been questioned by other researchers 98 because it could result in significant downgrading of findings that may not reflect the overall weighting and range of scores obtained.This limitation was addressed in this review by the reviewers not only considering the summary scores but also looking at individual rating scores and their weighting.
A fourth limitation was the COSMIN requirement that the reviewers rate each assessment for content validity, which required access to the assessments.Despite attempts made to obtain all the assessments, six could not be sourced.This was, however, not considered a significant limitation as the excluded assessments represented less than one-third of the total number in the review.In addition, the raters' review was only one of four factors considered when evaluating content validity.

CONCLUSION
This systematic review identified the SP-3D as the most appropriate sensory reactivity assessment to adapt for use with children 3 to 12 years old in South Africa.This test had fairly good psychometric properties and methodological quality, has been recently developed, and comprises a care-giver questionnaire and a performance-based assessment.The findings outlined in this review have generalizability to other low-and middle-income countries.
Many factors affected the ability to effectively compare the assessments identified in this review.These included a lack of information on psychometric properties, the generally low ratings of properties evaluated and methodological quality of the studies, and the inconsistencies of terminology, construct, and conceptual framework.These factors may cause clinicians and researchers to question whether assessment results obtained are meaningful. 35,43In line with recommendations of other studies, 42,45 further research into the psychometric properties of sensory reactivity assessments is recommended to enable clinicians to make more informed choices about assessments.

AC K NO W L E D GE M E N T S
We thank Ingrid van der Westhuizen for her assistance in developing the search terms for the database search.

F U N DI NG I N FOR M AT ION
The South African Institute for Sensory Integration, the South African Occupational Therapy Association, and the Harry Crossley Foundation.

C ON F L IC T OF I N T E R E S T S TAT E M E N T
LDP was also one of the developers of three of the included assessments: the Evaluation in Ayres Sensory Integration, ESP, and SPM.

DATA AVA I L A BI L I T Y S TAT E M E N T
Data sharing is not applicable to this article as no new data were created or analysed in this study.

R E F E R E NC E S
70ntent validity: patients and professionals.All studies conducted in the USA except Neil et al.70Sensory Behaviour Questionnaire; SensOR, Sensory Over-Responsivity Scales; SEQ, Sensory Experiences Questionnaire; SPM-S, Sensory Processing Measure-School; TSPA, Thai Sensory Patterns Assessment.
T A B L E 2InadequateAll tests developed in the USA in English except SBQ (UK), TSPA (Thailand; language not specified), and Arabic questionnaire (Egypt; Arabic).a No, was not investigated.Abbreviations: EASI, Evaluation in Ayres Sensory Integration; ESP, Evaluation of Sensory Processing; SBQ, Internal consistency.
T A B L E 3

Hypothesis testing for construct validity Convergent validity Discriminant validity Test number and name Reference n Methodological quality Rating (results) n
(Egypt).Abbreviations: ABQ, Auditory Behaviour Questionnaire; ESP, Evaluation of Sensory Processing; GI Ax, Gravitational Insecurity Assessment; SAND, Sensory Assessment for Neurodevelopmental Disorders; SBQ, Sensory Behaviour Questionnaire; SensOR, Sensory Over-Responsivity Scales; SEQ 1.0, Sensory Experiences Questionnaire Version 1.0; SP, Sensory Profile; SP-3D:A, Sensory Processing 3-Dimensions Scale: Assessment; SPM, Sensory Processing Measure; SPM-S, Sensory Processing Measure-School; SPS:A, Sensory Processing Scale: Assessment; SPS:I, Sensory Processing Scale: Inventory; SSP, Short Sensory Profile; TSPA, Thai Sensory Patterns Assessment.Interrater and test-retest reliability.Characteristics of dual-source assessments.
T A B L E 4Rating according to criteria of good measurement properties: sufficient (+); insufficient (−); indeterminate (?).All tests included in the table were developed and studies conducted in the USA in English except the Thai Sensory Patterns Assessment (Sutthachai et al.; 40 not in table; Thailand; language not specified).The shaded cells indicate validity measures were not evaluated in the study.Abbreviations: ICC, intraclass correlation coefficient; SAND, Sensory Assessment for Neurodevelopmental Disorders; SensOR, Sensory Over-Responsivity Scales; SEQ 1.0, Sensory Experiences Questionnaire Version 1.0; SP, Sensory Profile; SP-3D:A, Sensory Processing 3-Dimensions Scale: Assessment; SPS:A, Sensory Processing Scale: Assessment; TIE, Touch Inventory for Elementary-school-aged children.T A B L E 5 Construct validity, comprising convergent and discriminant validity.Rating according to criteria of good measurement properties: sufficient (+); insufficient (−); indeterminate (?).All tests developed in the USA in English except the not evaluated in the study.Abbreviations: ANOVA, analysis of variance; ESP, Evaluation of Sensory Processing; GI Ax, Gravitational Insecurity Assessment; MANOVA, multivariate analysis of variance; SAND, Sensory Assessment for Neurodevelopmental Disorders; SBQ, Sensory Behaviour Questionnaire; SensOR, Sensory Over-Responsivity Scales; SEQ 1.0, Sensory Experiences Questionnaire Version 1.0; SP, Sensory Profile; SP-3D:A, Sensory Processing 3-Dimensions Scale: Assessment; SPM, Sensory Processing Measure; SPM-S, Sensory Processing Measure-School; SPS:A, Sensory Processing Scale: Assessment; SPS:I, Sensory Processing Scale: Inventory; SPSC, Sensory Profile School Companion; SSP, Short Sensory Profile; SSQ-R, Sensory Sensitivities Questionnaire-Revised.T A B L E 5 (Continued) T A B L E 6 Abbreviations: EFA, exploratory factor analysis; ICC, intraclass correlation coefficient; PCA, principal component analysis; SAND, Sensory Assessment for Neurodevelopmental Disorders; SensOR, Sensory Over-Responsivity Scales; SP-3D, Sensory Processing 3-Dimensions; SP-3D:A, Sensory Processing 3-Dimensions Scale: Assessment; SPS, Sensory Processing Scale; SPS:A, Sensory Processing Scale: Assessment; SPS:I, Sensory Processing Scale: Inventory.