Validation of a touchscreen assessment tool to screen for cognitive delay at 24 months

Aim: To validate a touchscreen assessment as a screening tool for mild cognitive delay in typically developing children aged 24 months. Method: Secondary analysis of data was completed from an observational birth cohort study (The Cork Nutrition & Microbiome Maternal– Infant Cohort Study [COMBINE]), with children born between 2015 and 2017. Outcome data were collected at 24 months of age, at the INFANT Research Centre, Ireland. Outcomes were the Bayley Scales of Infant and Toddler Development, Third Edition cognitive composite score and a language-free, touchscreen-based cognitive measure (Babyscreen). Results: A total of 101 children (47 females, 54 males) aged 24 months (mean = 24.25, SD = 0.22) were included. Cognitive composite scores correlated with the total number of Babyscreen tasks completed, with moderate concurrent validity ( r = 0.358, p < 0.001). Children with cognitive composite scores lower than 90 (1 SD below the mean, defined as mild cognitive delay) had lower mean Babyscreen scores than those with cognitive scores equal to or greater than 90 (8.50 [SD = 4.89] vs 12.61 [SD = 3.68], p = 0.001). The area under the receiver operating characteristic curve for the prediction of a cognitive composite score less than 90 was 0.75 (95

Our previous research indicated the feasibility of the Babyscreen touchscreen tool (versions 1.0 and 1.5; Hello Labs, Guildford, UK) with children aged between 18 and 36 months. 4,10Importantly, these early data included children identified as at risk due to complications at birth. 10 However, the predictive ability of developmental delay is higher in high-risk groups.Our current research focuses on using this new assessment tool as a screening tool for cognitive delay in a typically developing, low-risk cohort.The Babyscreen tool is a valuable opportunity for clinicians to screen children for cognitive delay using fewer resources and in a more time-efficient way.In a clinical setting, the use of Babyscreen as a screening assessment would allow more children to be screened for cognitive delay and inform the decision to conduct a full developmental assessment or not.
To prepare Babyscreen for clinical use as a screening tool, further data are needed to validate the application at the age range commonly used for high-risk developmental follow-up in clinical practice, that is, 24 months of age. 11,12In the UK, National Institute for Health and Care Excellence guidance recommends a face-to-face assessment at 2 years for children born preterm 13 but also using the Parent Report of Children's Abilities, Revised 1 or a suitable alternative parent-report tool, which tend to be highly language dependent.This reinforces the importance of developing a language-free cognitive screening tool at this age.
Our aims were to validate the Babyscreen application as a screening tool for mild cognitive delay.We wished to evaluate its concurrent validity with the Bayley-III cognitive composite score in a low-risk cohort at 24 months of age.We investigated the optimum cut-off for Babyscreen as a cognitive screening tool for children deemed at risk of mild cognitive delay, as indexed by children's Bayley-III cognitive scores, and present normative reference ranges for performance using the application.Finally, we explored the relationship between the Babyscreen score and the neurodevelopmental constructs assessed by the Bayley-III, that is, the cognitive, language, and motor domains.

M ET HOD Participants
This study represents a cross-sectional analysis of a sample of 24-month-old children from a birth cohort study: the Cork Nutrition and Microbiome Maternal-Infant Cohort Study (COMBINE).This was a longitudinal, prospective birth cohort study based in Cork, Ireland, running from early pregnancy to 24 months of age, as described in previous research. 14Inclusion criteria referred to females who were part of the IMPROvED pregnancy cohort study, which included low-risk, nulliparous females with a singleton pregnancy, who attended antenatal care at Cork University Maternity Hospital.Exclusion criteria referred to infants who were admitted to the neonatal intensive care unit for more than 2 weeks, or infants with severe metabolic or congenital anomalies requiring ongoing specialist care during the neonatal period.The study complied with good clinical practice and had ethical approval from the Cork Research Ethics Committee (ECM4[hh]06/01/15 and ECM3[mmm]25/07/19), with the approved protocol followed.Written informed consent was obtained from all parents.This resulted in an initial sample of 456 mother-infant dyads who participated in the COMBINE study.
The initial sample for this study's secondary data analyses consisted of 134 typically developing children attending for COMBINE neurodevelopmental assessment at 24 months of age.Children were recruited at Cork University Maternity Hospital and were born between 6th January 2016 and 30th November 2017.Mean gestational age was 40.31 weeks (range = 35.29-42.14weeks).Inclusion criteria for the current study were children aged 24 months who completed both Babyscreen and Bayley-III assessments at the final 24month appointment (n = 134).Children with disabilities (e.g.motor, visual, hearing, neurodevelopmental, neurological) were not to be excluded from the study; however, it is worth noting that none of the children in the final sample had an identified disability, based on parent report before the 24-month COMBINE appointment.No parents refused the touchscreen assessment, but we excluded children who did not engage independently with the touchscreen (e.g.no interest in assessment, evidence of parent prompting; n = 9), children who did not have a Bayley-III cognitive composite score (n = 3), and children who came from a non-Englishspeaking household (n = 21).The latter criterion was applied to exclude language effects on the validity of the Bayley-III cognitive score.This left a final sample of 101 children for our non-randomized case cohort study.For details of participant flow, please refer to Figure S1.The demographic characteristics of the Babyscreen study sample were largely reflective of the initial COMBINE cohort (Table S1).

Assessment
Neurodevelopmental follow-up was conducted at the INFANT Research Centre, Cork University Hospital, when the child was aged from 24 months 0 days to 24 months 31 days.Data were collected between 11th January 2018 and 3rd December

What this paper adds
• Babyscreen scores were moderately correlated with Bayley Scales of Infant and Toddler Development, Third Edition cognitive composite scores in typically developing, 24-month-old children.• A Babyscreen score lower than 7 is the optimal cut-off for identifying mild cognitive delay.• Babyscreen has the potential to be developed into a 15-minute language-free cognitive screening tool in young children.
2019.Evaluation included a Bayley-III and Babyscreen touchscreen assessment, both described in the following sections.
Assessments were performed by study site professionals proficient in the administration of the respective tools.

Instruments
The Babyscreen software application v1.89 (Hello Games, Guildford, UK) is an assessment tool designed to tap into toddlers' fundamental cognitive abilities.Previous work showed that typically developing and at-risk children can interact meaningfully with Babyscreen. 4,10The Babyscreen application was used on a fully charged Apple iPad 2.0 (241.2 × 185.7 × 8.8 mm), with protective rubber covering to prevent damage.The iPad screen was cleaned with child-friendly disinfectant wipes preand post-assessment.Given that the results of the Babyscreen assessment were transferred from the iPad to the lab computer directly, no paper was needed, thereby removing the cost of printing materials.For this study, one performance variable of Babyscreen was focused on the number of items completed successfully (range 0-18), based on the total number of tasks completed independently by the child, without needing verbal instruction or a visual demonstration from the administrator, within a time limit of 30 seconds per task.No receptive or expressive language is required by the child to complete the Babyscreen assessment.Figure S2 displays the schematic of the 18 tasks used in the Babyscreen assessment, categorized under the respective cognitive constructs.The Bayley-III was developed and validated in a US population and is designed to quantify the developmental functioning of infants and children from 1 to 42 months of age.It consists of cognitive, language (receptive and expressive), and motor (fine and gross) scales.Each scale is converted into a composite score that is standardized to a mean of 100 and an SD of 15.Scores of 85 to 115 are within the average range and scores below 85 are indicative of cognitive delay.6][17][18][19][20][21] The present study followed the recommendation of using a cut-off for mild cognitive delay of less than 1 SD below the mean of a geographically relevant control group, to improve the detection of developmental delay. 15,22Previous researchers reported this cut-off to fall around 90, based on a mean cognitive composite score of approximately 105 and an SD of 15. 10,22

Statistical analysis
All data analysis was carried out using SPSS v26 (IBM Corp., Armonk, NY, USA).Descriptive data analyses are presented as the mean (SD) as appropriate.An alpha value of 0.05 was used to determine statistical significance.Because Babyscreen and Bayley-III scores were normally distributed, the relationship between Babyscreen total and Bayley-III cognitive composite scores was assessed using a Pearson's correlation coefficient.A Student's t-test was used to investigate the difference in Babyscreen and Bayley-III cognitive composite scores for those scoring above and below the cognitive cut-off.The ability of the Babyscreen score to predict a child's cognitive composite score below the cut-off was evaluated using receiver operating characteristic curves.The sensitivity, specificity, and predictive values corresponding to the optimal Babyscreen cut-off obtained from the receiver operating characteristic curve analysis were calculated.The intercorrelations between the Bayley-III subscales and the two target variables (Babyscreen score and Bayley-III cognitive composite score) were calculated using the Pearson's correlation coefficient; t-tests and analysis of variance (ANOVA) were used to test for the effect of confounding variables on the Babyscreen score.Observed centiles for performance on the Babyscreen were produced to indicate normative reference ranges (using scores of all children aged 24 months who engaged with the assessment tool, including those from non-English-speaking households).Power analyses were completed with MedCalc v20.116 (MedCalc Software, Ostend, Belgium), 23 based on a two-tailed alpha of 0.05 and beta of 0.20.Our sample size of 101 fulfilled all criteria for sample size and ratio of positive to negative cases for cognitive delay.Missing data for the Babyscreen score or Bayley-III cognitive composite scores were managed via pairwise exclusion.

R E SU LTS Sample
One hundred and thirty-four children were assessed using both Bayley-III and Babyscreen assessments at 24 months of age.From these, 33 cases were excluded, as described previously.This left a final study sample of 101 children aged 24 months.The 101 children (47 females, 54 males) were born with a mean gestational age of 40.27 weeks (range 35.29-42.14weeks) and a mean birth weight of 3632 g (SD = 459).Children were followed up at 24 months of age (mean = 24.25,SD = 0.22).Ninety-nine mothers self-identified their nationality as Irish (two British), with 99 self-identifying as White Irish (two White non-Irish).The three children with missing data for the cognitive composite score were excluded.Some children had no language composite (6) or motor composite (9) scores due to tiredness on the day of testing.However, these were deemed to be secondary variables; thus, analyses proceeded as planned.

Confounding variables
Several variables were tested for potential effects on the Babyscreen score, including the sex of the child, touchscreen use, and several demographic and socioeconomic proxy indicators (Table 1).There were no significant differences in Babyscreen scores for any of the potential confounding variables.Therefore, we did not control for these variables in further analyses.

Concurrent validity of Babyscreen and Bayley-III cognitive composite score
There was a significant moderate positive correlation between the Bayley-III cognitive composite score and the number of Babyscreen tasks completed (n = 101, r = 0.358, p < 0.001) as categorized by Cohen's criteria. 24

Babyscreen scores of children with and without cognitive delay
Our sample had a mean (SD) Babyscreen total score of 12 (SD = 4) items and a mean (SD) Bayley-III cognitive composite score of 104 (SD = 15.7).A score lower than 90 was used to represent mild cognitive delay, corresponding to approximately 1 SD below our sample mean.Eightynine children had a cognitive composite score equal to or greater than 90 and 12 scored less than 90.Children with a cognitive score lower than 90 completed an average of 8.5 (SD = 4.9) Babyscreen tasks compared with those with a cognitive score equal to or greater than 90 who completed an average of 12.6 (SD = 3.7) Babyscreen tasks (t[99] = −3.484,p = 0.001).The magnitude of the difference in the means was moderate (mean difference = 4.11; 95% confidence interval [CI] = −6.445 to −1.768; η 2 = 0.109).The distribution of Babyscreen scores for children with and without mild cognitive delay is shown in Figure 1.

Predictive ability of Babyscreen tool in identifying the risk of cognitive delay
Receiver operating characteristic curve analyses indicated that the Babyscreen score could predict the cognitive delay indicated by a Bayley-III cognitive composite score lower than 90 (p = 0.006, area under the curve = 0.746, 95% CI = 0.59-0.91).The optimal Babyscreen cut-off score for maximizing sensitivity and specificity was 6.5, with a Youden index of 0.433.This cut-off yielded a sensitivity of 50% and a specificity of 93% in predicting a cognitive score lower than 90.The positive predictive value of the Babyscreen cut-off was 50% (95% CI = 27.73-72.27),while the negative predictive value was 93.3% (95% CI = 88.68-96.07).Of the 89 children with a Babyscreen score equal to or greater than 7, six (6.7%) had a Bayley-III cognitive composite score lower than 90 compared with the 12 children with a Babyscreen total lower than 7, of whom six (50%) had a cognitive composite score lower than 90.

Relationships of the Babyscreen and Bayley-III cognitive scores with the Bayley-III developmental subscales
We calculated the correlations between the Babyscreen score and remaining composite and scaled scores (Table 2).The Babyscreen score did not correlate with the language composite score (n = 95, r = 0.144, p = 0.163) and had a weak correlation with the motor composite score (n = 92, r = 0.249, p = 0.016).The latter correlation was driven by the fine motor subscale, which showed a weak positive correlation with the Babyscreen score (n = 94, r = 0.253, p = 0.014).We also examined the intercorrelations between Bayley-III subscales.
The cognitive composite score correlated strongly with both language (n = 95, r = 0.722, p < 0.001) and motor (n = 92, r = 0.568, p < 0.001) composites.The relationship between cognitive and motor composite scores was driven by the fine motor subtest, which recorded a strong correlation with the cognitive composite score (n = 94, r = 0.655, p < 0.001).

Reference values
The overall histogram and calculated centiles for performance on the Babyscreen assessment for all children tested at 24 months are shown in Figure 2. Note that this normative data set was extended from 101 to 125 children (i.e.including 21 children from non-English-speaking households and three children without a Bayley-III cognitive composite score but excluding nine children who did not interact independently with the application).The estimated cut-off for the detection of cognitive delay (<7) is equivalent to the 10th centile for the Babyscreen performance.

DISCUS SION
In a typically developing cohort aged 24 months, the Babyscreen total score showed moderate concurrent validity with the Bayley-III cognitive composite score and provided moderate predictive value for cognitive delay.Babyscreen performance was independent of a range of socioeconomic measures and previous touchscreen use.
Our results build on previous evidence that demonstrated similarly reasonable concurrent validity between Babyscreen and the Bayley-III cognitive subscale in a high-risk cohort aged between 18 and 24 months. 10Our previous study suggested that Babyscreen had potential as a cognitive screening tool for children deemed at risk of cognitive delay.We The study also explored the neurodevelopmental constructs being tapped into by the Babyscreen and Bayley-III assessments (cognitive, language, and motor domains).There was no association between the Babyscreen score and Bayley-III language composite score.This contrasts with the strong association between Bayley-III cognitive and language composite scores.The strong overlap between language and cognitive scales in the Bayley-III assessment puts children from non-English-speaking households at a disadvantage.The Babyscreen score had a weak association with the Bayley-III motor composite score, which was driven by the fine motor subscale.This is to be expected, given the link between cognitive and fine motor coordination as seen in the strong relationship between Bayley-III cognitive and fine motor scores in the current sample and as previously reported by the Bayley-III developers. 25Importantly, Babyscreen scores were most strongly correlated with Bayley-III cognitive composite scores.This indicates that the Babyscreen assessment targets the cognitive neurodevelopmental domain independently from the language domain, thus underlying its value as a cognitive screening tool.We compared it to the Bayley-III in this study because it was conducted before the introduction of the Bayley Scales of Infant and Toddler Development, Fourth Edition in 2019.However, we expect that similar findings will apply because of the strong correlation reported between the Bayley-III and Bayley Scales of Infant and Toddler Development, Fourth Edition. 26

Feasibility of the Babyscreen assessment
The feasibility of using Babyscreen as a cognitive screening tool is highly relevant given the benefits of using touchscreen assessment tools with children.While the Bayley-III is the most popular developmental assessment, it is not without faults, including a lengthy administration time and heavy reliance on the child's language ability.A lightweight, transportable touchscreen device makes it more feasible for assessments by eliminating large testing kits and potentially improving test accessibility for children with mobility issues. 6Furthermore, by the age of 24 months, children have developed an array of motor skills required to use touchscreen technology. 7,8The use of touchscreen assessments would also promote standardized testing due to reduced administrator interaction, thus decreasing tester bias and measurement error. 6Our results support the call for greater use of non-verbal cognitive assessments to gain a more accurate reflection of young children's cognitive abilities, especially for children with speech delay or difficulties and for whom English is not their first language. 4,5The use of this language-free, touchscreen-based assessment offers a feasible option for cognitive screening by experienced psychologists and medical practitioners, particularly in our increasingly multicultural societies.
Given the ubiquity of touchscreen devices today, it is important to make the distinction between using touchscreens as an assessment tool and for other uses.Although our analyses did not find any difference in the scores of children with different reported levels of touchscreen use, most children in our geographical area are exposed to touchscreen devices from a young age. 8Conflicting evidence exists regarding the long-term developmental effect of children's exposure to touchscreen devices, 27,28 which is probably related to the quality of content, interactivity and engagement, and quantity. 29,30It is essential to use Babyscreen solely as an assessment tool, not for entertainment or other purposes.

Limitations
Our analyses are restricted to a 24-month sample to validate this cognitive screening tool at a commonly used time point of developmental follow-up.Further testing is required to establish normative data and cut-off scores for other age ranges.Our sample included predominately White Irish participants, with relatively high education and income compared to other countries.Our newly reported normative data provide a starting point to validate Babyscreen with multinational, multi-ethnic, and multilingual and other-lingual participants in Ireland and further afield.
Our sample was derived from a typically developing cohort of children, with the final sample having no children with identified physical, visual, or hearing difficulties, or with any known neurodevelopmental or neurological conditions.However, our study relied on parent report before the 24-month assessment to indicate if their child had a disability, instead of using criterion standard clinical or medical examinations to assess for the presence or absence of disabilities.It was not possible to assess the accessibility of Babyscreen for children with specific disabilities (e.g.visuomotor, fine motor, somatosensory processing, or social communication difficulties).Further research should explore the feasibility of Babyscreen as a cognitive screening tool for children with different physical, sensory, or neurological disabilities.
A relatively small number of children were identified as having mild cognitive delay within this typically developing sample.Therefore, while the specificity of the Babyscreen tool is relatively strong, it was not possible to get better tool sensitivity.For example, children with Bayley-III cognitive scores lower than 90 who completed more than seven tasks on the Babyscreen assessment may have cognitive skills within the normal range but may have underperformed on the Bayley-III due to attentional or behavioural difficulties on the day.Furthermore, while we confirmed the ability of Babyscreen as a screening tool for cognitive delay with a binary outcome recommendation for developmental follow-up, we do not yet have the data to recommend its use as an independent cognitive assessment.
Finally, we acknowledge that our results have limited generalizability.Although our sample is relatively similar demographically to the parent birth cohort study, the generalizability of findings is hampered because of the nonrandomized design of our study.

Future research
Future research should assess larger numbers of children from typically developing and at-risk cohorts.A larger sample size may improve the overall strength of the moderate correlation found between Babyscreen and Bayley-III, while also allowing for more accurate comparison of results between participant subgroups (e.g.previous touchscreen use, ethnicity, disability).Future testing and analyses would thus guide future development as a complete cognitive assessment encompassing relevant neuropsychological domains (Figure S1).Future research could compare the accuracy of Babyscreen as a cognitive screening tool with parentreport questionnaires, 1,2 given that Babyscreen may be less vulnerable to parental, language, and cultural bias.It is also important to assess Babyscreen's predictive validity for identifying children who have a definitive disability or neurodevelopmental condition at school age.Given the current cross-sectional analyses, evaluation of Babyscreen's ability to predict children's future cognitive scores should be investigated through further longitudinal studies.
To assess the generalizability of Babyscreen, future studies should consider the acceptability and validity of this touchscreen assessment if implemented in a general population (e.g.alongside public health developmental check-ups).Additional testing of Babyscreen across different low-and high-income countries is required to gain more cross-cultural relevance, given that rates of touchscreen use are likely to differ. 4,6There is already promising research that indicates the reliability and validity of using touchscreen assessments to assess cognitive skills in school-aged children in Malawi and in the UK. 6 Future research will need to address the potential effect of touchscreen exposure and cross-cultural variances on the viability of Babyscreen with similarly aged cohorts in different cultural settings.

Conclusions
We have shown that the Babyscreen touchscreen assessment has potential to be developed as a screening tool for early mild cognitive delay, defined as 1 SD below the sample mean on the Bayley-III cognitive composite score.This languageindependent tool, previously tested with high-risk children, was suitable for use with typically developing 24-month-old children.We have reported reference values for performance in a typically developing cohort.

AC K NOW L E D G M E N T S
The authors thank all the families and members of the COMBINE research team for their participation and support.MEK designed the COMBINE cohort study and is the This work was funded through a grant from Science Foundation Ireland (SFI) to MEK for the COMBINE project (grant no.INFANT/B3067), which was cofunded by the European Regional Development Fund under Ireland's European Structural and Investment Funds Programmes 2014-2020.SFI had no role in the design, analysis, or writing of this article.
DMM is the founding director of Liltoda, an academic spin out company of University College Cork launched in August 2021 to develop technology-based solutions for early cognitive assessment.NM holds consulting agreements with Novartis and InfanDx outside the published work.All other authors have no interests that could be perceived as posing a conflict or bias.
TC and DMM had full access to all the data in the study and take responsibility for the integrity of the data and the accuracy of the data analysis.Data are available upon reasonable request from the corresponding author DMM (d.murray@ucc.ie).

F I G U R E 1
Box plot of Babyscreen scores for children with normal Bayley Scales of Infant and Toddler Development, Third Edition cognitive composite scores equal to or greater than 90 (n = 89) (green) and children with a cognitive score lower than 90 (n = 12) (blue).The Babyscreen score reflects the number of tasks completed independently by the child within 30 seconds, without assessor instruction or visual demonstration (score range 0-18).T A B L E 2 Pearson's correlations (p) between Bayley-III subtests and the two performance variables, the Babyscreen score and Bayley-III cognitive score.the number of Babyscreen tasks completed independently by the child within 30 seconds, without assessor instruction or visual demonstration (score range 0-18).b The score reflects the Bayley-III cognitive composite score (score range 55-145).Abbreviation: Bayley-III, Bayley Scales of Infant and Toddler Development, Third Edition.extended this research by validating Babyscreen as a screening tool in a low-risk cohort aged 24 months.Using the screening cut-off identified with these data, children who achieved a Babyscreen score lower than 7 were more likely to attain a low cognitive composite score on the Bayley-III and thus would be recommended for a full developmental assessment.This research holds important translational implications for developmental follow-up of children.Babyscreen is a quick and practical option for clinical practitioners to screen large numbers of children aged 24 months for cognitive delay.This could reduce clinical caseloads by excluding typically developing children and highlighting those at risk, enabling quicker access to full developmental assessment and earlier intervention for the latter.

F I G U R E 2
Histogram and centile ranges of Babyscreen scores for all children aged 24 months who engaged independently with the assessment tool (n = 125).Centile ranges are indicated on the graph (5% = 5th centile).The Babyscreen score reflects the number of tasks completed independently by the child within 30 seconds, without assessor instruction or visual demonstration (score range 0-18).

14698749, 0 ,
Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/dmcn.15555 by University College London UCL Library Services, Wiley Online Library on [22/03/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons Licenseprincipal investigator.DMM, NM, and MDH contributed to the development of the Babyscreen application.TC and DMM formulated the research questions.TC conducted the Bayley-III neurodevelopmental and touchscreen assessments.DMM provided clinical advice and governance.TC and AJT prepared the data for analysis.TC, DMM, and VH analysed the data.TC and DMM prepared and drafted the manuscript.All authors read and approved the final version of the manuscript.Open access funding provided by IReL.
14698749, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/dmcn.15555 by University College London UCL Library Services, Wiley Online Library on [22/03/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License Comparison a of mean (SD) b Babyscreen scores based on demographics and socioeconomic indicators., 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/dmcn.15555 by University College London UCL Library Services, Wiley Online Library on [22/03/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License T A B L E 1a Based on the t-test or analysis of variance results.bMeannumber of tasks completed independently by the child within 30 seconds, without assessor instruction or visual demonstration (score range 0-18).cMean difference and 95% confidence interval (CI) reported for t-tests.14698749