Validation of Oxford Cognitive Screen: Executive Function (OCS-EF), a tablet-based executive function assessment tool amongst adolescent females in rural South Africa

S hort, reliable, easily administered executive function (EF) assessment tools are needed to measure EF in low- and middle-income countries, particularly in sub-Saharan Africa given the prevalence of human immunodeficiency virus (HIV)-associated neurocognitive disorder. We administered Oxford Cognitive Screen—Executive Function (OCS-EF) to 932 rural South African females (mean age 19.7 years). OCS-EF includes seven tasks: two hot inhibition tasks (a modified Iowa Gambling Task, emotional go/no-go) and five cool EF tasks, two switching tasks (visuospatial rule-finding, geometric trails) and three working memory tasks (digit recall, selection and figure drawing). We performed confirmatory factor analysis testing whether a three-factor, two-factor hot-cool, two-factor working memory and inhibition/switching, or one-factor EF model fitted the data better. The three-factor (switching, inhibition and working memory) model had the best local and global fit ( χ 2 (11) 24.21, p = 0.012; RMSEA 0.036; CFI 0.920; CD 0.617). We demonstrated the feasibility of OCS-EF administration by trained laypeople, the tripartite structure of EF amongst adolescent females and the factorial validity of OCS-EF in this population and context. OCS-EF tablet-based cognitive assessment tool can be administered by trained laypeople and is a valid tool for assessing cognition at scale amongst adolescents in rural South Africa and similar environments.

Research on executive function (EF) factor structure in adolescents from low-and middle-income countries (LMICs) in Africa is limited. EF is important for concentration and self-control. EF is associated with various important outcomes, for example, occupational performance, physical health, mental health and relational health (Diamond, 2013). EF is still developing during adolescence. Research within high-income countries suggests that EF differentiates with age, starting as a unidimensional ability in pre-schoolers then differentiating into two/three distinct abilities by adolescence or adulthood; however, models across the lifespan are inconsistent (Hughes et al., 2010;Karr et al., 2018;Shing et al., 2010). Re-analysis of factor-analytic studies found greater unidimensionality amongst child/adolescent samples, with unity and diversity in adults; threeand nested-factors (bifactor without inhibition) were most common amongst adolescent/adult samples (Karr et al., 2018).
Three core EFs described in the three-factor model are working memory (maintaining and manipulating retained information), switching (shifting from one learned rule/pattern to another) and inhibition (inhibiting automatic responses or ignoring distractors); more complex EFs include planning (Friedman et al., 2008). Much of existing EF research focuses on cool EF (no affective/emotional component). An alternative model posits a broader EF conceptualisation including hot/affective EF. Hot EF involves cognitive control in affective contexts with choices between instant gratification and longer-term rewards (e.g., gambling tasks). Factor-analytic work in younger children found a two-factor hot-cool EF model superior to a one-factor model (Willoughby et al., 2011), with a single-factor model being superior at the transition to adolescence (Prencipe et al., 2011). During adolescence, developmental trajectories differ between dorsolateral prefrontal cortex-mediated cool EF (ascends linearly), and orbitofrontal cortex-mediated hot EF (U-shaped curve) (Poon, 2017). There is limited work examining a two-factor hot-cool model in adolescence. This study only included females; however, the literature shows that there are no sex-specific differences in overall EF (Grissom & Reyes, 2019).
Research on EF in adolescents in Africa is limited. Studies have examined intelligence or cognition broadly, rather than EF, with validation in small samples (Shuttleworth-Edwards et al., 2013). Minimal research on the validation of cognitive assessment tools has occurred in large, local samples. The Siyakhula cohort found an acceptable fit of a four-factor cognitive model amongst 7-11-year-olds including two non-EF factors (learning, simultaneous processing) and two EF-related factors (planning/inhibition/switching, working memory) (Rochat et al., 2017). Measuring EF is particularly relevant in human immunodeficiency virus (HIV)-prevalent settings, such as southern Africa, where HIV-associated neurocognitive disorder (HAND) is common. Despite antiretroviral therapy availability, milder forms of HAND affecting EF (including working memory, inhibition and switching) remain common in HIV-infected adolescents and adults (Hoare et al., 2016;Walker & Brown, 2018). Women with HIV tend to have greater neurocognitive impairment than men (Rubin et al., 2019). Young women in southern Africa also have a heightened risk of acquiring HIV compared to young men (Harrison et al., 2015). To determine the cognitive profile associated with HAND, it is important to establish the validity and local population norms of sensitive neuropsychological tasks to measure EF accurately. It is also vital to develop platforms for simple standardised administration by non-specialists in LMICs where there is limited access to formal neuropsychological testing. A task-shifting approach, involving the delegation of cognitive assessment from highly qualified medical doctors or psychologists to trained lay health workers with fewer qualifications, is necessary to enable widespread testing and diagnosis. Past paper-based screening by trained community health workers in South Africa (SA) highlighted the pitfalls of this approach; some of which were mitigated using highly automated tablet-based cognitive screening (Robbins et al., 2018).
The original intention with the Oxford Cognitive Screen (OCS) was to develop an intuitive tablet-based cognitive screening platform for use globally across the lifespan. OCS was initially piloted in stroke populations (Demeyere et al., 2015). Oxford Cognitive Screen Plus (OCS-Plus) was then developed and tested more widely, including in ageing rural South African populations within the Medical Research Council (MRC)/Witwatersrand (Wits) Agincourt Health and Demographic Surveillance System (HDSS) (Demeyere et al., 2021;Humphreys et al., 2017). This demonstrated that large-scale fieldworker-administered tablet-based cognitive testing was feasible in rural SA. Three paper-based OCS-Plus EF tasks (trails, rule-finding, figure drawing) were then piloted amongst adolescents in this setting (Rosenberg et al., 2018). Challenges included standardising implementation; and increased time, cost and risk of data-capturing errors. A tablet-based version was then developed to help standardise administration with on-screen instructions, pre-programmed duration cut-offs, and minimising examiner interpretation by automating most scoring. The current paper introduces this tablet-based version, the Oxford Cognitive Screen-Executive Function (OCS-EF), OCS-Plus adapted for adolescents. A verbal working memory task (digit recall) and two hot EF tasks (Iowa Gambling Task; emotional go/no-go) were added to the OCS-Plus EF tasks (trails, figure drawing, rule-finding, selection).
The OCS-EF was then included in the Year 4 wave of assessments within the HIV Prevention Trials Network (HPTN) 068 study, focused on HIV prevention in young women in rural SA. This enabled OCS-EF validation amongst adolescent girls and young women (reported here), and examination of associations between EF, risk-taking and HIV (Rowe et al., 2020).
Objectives of this paper were to determine the: 1. feasibility of large-scale fieldworker-administered adolescent cognitive assessment using OCS-EF in rural SA. 2. OCS-EF task-by-task performance by a community-based adolescent sample in rural SA. 3. OCS-EF factorial validity in rural SA by confirming the factor structure.

Study design
This validation study utilised cross-sectional data collected during HPTN 068, a longitudinal cohort post-randomised controlled trial in the MRC/Wits Rural (Agincourt) Research Unit in SA. This region has high youth unemployment (∼ 75%), poverty and lacks quality education and work opportunities. Most households rely on government social grants and have limited access to water and basic sanitation. The population is predominantly black Tsonga-speaking Africans. Adolescent HIV prevalence is high affecting females disproportionately (> 1/4 20-24-year-olds) (Gomez-Olive et al., 2013). HPTN 068 trial results are reported elsewhere (Pettifor et al., 2016). The aim of the trial was to test whether providing cash transfers to young women and their households, conditional on school attendance, reduced young women's risk of acquiring HIV, compared to a control group. Eligibility criteria for trial participation included: 13-20-year-old females in grades 8-11 at government schools; not married or pregnant; able to complete tablet-based questionnaires alone; having documentation to open a bank account (to receive cash transfers safely); having a parent/guardian at home with similar documentation; currently living and intending to reside in the region until study completion. Eligibility criteria for this study (which occurred 5-year post-trial enrolment) included: 17-25-year-old females; confirmed HIV-negative status; a complete, single cognitive dataset.
Compliance with Ethical Standards: Ethical approval was obtained for the study from the University of Witwatersrand Human Research Ethics Committee, the University of North Carolina Institutional Review Board and the Oxford Tropical Research Ethics Committee. All study procedures were performed in accordance with the ethical standards of the 1964 Helsinki Declaration and its amendments.
Informed consent was obtained from all individual adult participants included in the study; caregiver consent with participant assent was obtained for minors under 18-years-old.

Sample characteristics
See Table 1 for sample sociodemographic characteristics.

Materials and administration
Socio-demographic data were obtained with Audio Computer-Assisted Self-Interview (ACASI) (self-administration being useful for sensitive data) and fieldworker-administered Computer-Assisted Personal Interview (CAPI), both allowing immediate tablet-based data-capturing. Questions and task instructions were translated into Tsonga by research staff. HIV screening, including pre-and post-test counselling, was performed at each visit.
OCS-EF, developed using MATLAB and Psychophysics Toolbox, runs as a stand-alone Windows/Android application with a fixed task order and full written on-screen task instructions to standardise administration. OCS-EF aims to be culturally unbiased, brief (∼ 30 minutes) and easy for trained laypeople to administer. We administered OCS-EF on Windows Surface Pro tablets using dedicated styluses (touch input disabled).
An overview of the seven tasks follows. Four tasks have easier baseline conditions before test conditions: digit recall; trails; selection; figure drawing. Trails is a geometric trail-making task assessing planning and switching. A practice round precedes two baseline conditions (connecting circles in ascending order of size then connecting squares in descending order) and test condition (switching between circles and squares, squares in descending order of size and circles in ascending order). Figure drawing assesses visuospatial constructional ability, visuospatial working memory and planning. The baseline condition involves copying a 20-component composite drawing and the test condition (immediate recall) involves drawing the same figure from memory after seeing it again briefly. Rule-finding assesses visuospatial problem-solving and switching. A red dot moves from one shape to another in a matrix of 24 shapes (three columns: squares-triangles-squares) following one of five predefined rules. Participants predict the dot's next move; the preceding position is highlighted. Switches to a new rule are unsignalled. Selection assesses selective attention (baseline visible), and visuospatial working memory and planning (test invisible). Participants are presented with 60 items: 30 vegetables (10 of each type); 30 fruit (10 of each type). They have to select fruit (targets) only, ignoring vegetables (distractors). In the visible condition, selected items remain highlighted throughout; whereas in the invisible condition, they only remain highlighted for a few seconds. Digit recall is a classic, widely used task. Digit span is the longest sequence length recalled correctly in each condition. Forwards (baseline) span assesses verbal short-term memory (storage); backwards (test) span assesses verbal working memory. Examiners read progressively longer digit sequences (two to nine digits' long) out loud. Participants recite the sequences forwards or backwards. Correct recital results in progression to the next level. Three failed attempts (different sequences) at a level terminates the task condition. Emotional go/no-go is based on Hare's task implementation (Hare et al., 2008). Participants are presented sequentially with facial visual stimuli (three women with happy, neutral and fearful facial expressions). Participants have to tap neutral faces (targets), ignoring happy/fearful faces (distractors). There is a practice block (six trials) and four test blocks (36 trials each). The Iowa Gambling Task is based on the Children's Gambling Task implementation (Kerr & Zelazo, 2004) assessing risk-taking in the context of uncertain reward. Participants are presented with two decks of cards: the advantageous, low-risk deck (low rewards, low penalties with net gain); and the disadvantageous, high-risk deck (high reward, high loss with net loss). Participants need to maximise their total score by learning the nature of the decks. They select a card from each deck for 50 trials and see their updated total score on-screen after each selection. See Supplementary Material S1 for full task and outcome measure descriptions.
Stimuli were adapted for the cultural context (selection using common local fruit and vegetables; go/no-go using grayscale photographs of three NimStim African American women's faces) (Tottenham et al., 2009). Task instructions were read in Tsonga. Testing was performed in testing rooms or at participants' homes (< 10%) by locally employed, trained female fieldworkers, fluent in English and Tsonga. Fieldworkers required a 12th grade formal education and preferably previous research fieldwork experience. None had tertiary qualifications. Quality control measures included fieldworker self-checks, cross-checks and random supervisor checks.
After each task, examiners were prompted to select a single test condition: • no issue • participant issues: no speech, visual problems, motor problems, auditory problems, fatigue, refused, ran out of time • environmental/technical: technical problems, interruptions, other • examiner error.
Selection of anything other than "no issue" resulted in participant task invalidation. Task timing was recorded automatically. Accuracy was scored automatically except for tasks requiring interpretation of drawings (figure) or speech (digits). The figures were marked blindly by a researcher using a standardised figure-scoring application with cross-checks. Each drawing component was scored for presence, accuracy and position.
In 2018, a post-study fieldworker feedback session with anonymous questionnaires containing both general and task-specific open-ended and Likert-style questions obtained qualitative data about OCS-EF administration.

Data analysis
Data analysis was performed in Stata 14. Participants with invalidated data due to test condition errors had data points estimated using maximum likelihood estimation. Analyses of variance (ANOVAs) by age, education level and socio-economic status (asset index) were performed to assess relationships between all the EF task outcome measures and these sociodemographic variables. Comparison between baseline and test condition accuracy was performed graphically for trails, figure, selection and digit to see if the expected shifts-to-the-left occurred. Confirmatory factor analyses compared four different EF models: 1. three-factor (Miyake et al., 2000): switching (indicators: trails, rule-finding), inhibition (indicators: go/no-go, Iowa), working memory (indicators: digit, selection, figure). 2. hot-cool (Zelazo & Müller, 2002): hot (indicators: go/no-go, Iowa), cool (indicators: trails, rule-finding, digit, selection, figure). 3. two-factor (switching/inhibition, working memory): working memory (indicators: digit, selection, figure) and inhibition/switching (indicators: trails, rule-finding, go/no-go, Iowa). 4. one-factor: same indicators all loading onto a single factor.
A single reflective indicator per task was included. For trails, other indicators were tested with similar results (not reported). We selected at least two indicators per latent variable (factor) so the models could be identified. We assumed latent variables have a standardised measurement unit fixing variances to 1 (Schumacker & Lomax, 2004). These indicators were selected: • Trails: switch accuracy cost (alternative scores tested: trails switch accuracy; trails switch duration cost). • Iowa: net score.
Other task scores derived to assess estimates of local cognitive performance: • Trails: baseline combined accuracy.
-Mean go reaction time.
Extreme collinearity (R 2 smc > 0.90) was not present. Maximum likelihood with missing values with observed information matrix estimation was used.
The model goodness-of-fit was assessed locally by examining parameter estimate sizes, standardised residual variances and squared multiple correlation coefficients, a measure of indicator reliability. Standardised regression coefficients were interpreted as Pearson's correlations using Cohen's rule: large r > |0.5|; medium r > |0.3|; small r > |0.1| (Cohen, 1988). Standardised residual variances were interpreted like standardised z scores (>1.96 suggesting model misspecification) (Schumacker & Lomax, 2004). Global model fit was assessed using these fit indices: likelihood-ratio χ 2 test, Steiger-Lind root mean squared error of approximation (RMSEA), comparative fit index (CFI) and coefficient of determination (CD) (close to 1 indicating good fit). Comparative model fit was assessed using a 2 difference test. We used these cut-offs for good model fit: RMSEA < 0.06; CFI > 0.95 (Hu & Bentler, 1999). The saturated model is delineated in Supplementary Material S2.
Qualitative data from fieldworker questionnaires were summarised by the cognitive researcher and short responses analysed using a semantic, inductive approach to thematic analysis.

Sample characteristics and feasibility
A total of 954 eligible 17-25-year-old females completed the OCS-EF; 22 were excluded due to missing/duplicate files; 932 (98%) participants were included (see Figure 1). Participants with invalid task data (see Figure 2) (n = 144; 15%) were included using maximum likelihood estimates. All tasks had > 90% valid data. The most common technical problem was tablet freezing during hot weather. This sometimes resulted in examiners double-clicking (due to tablets being unresponsive) and accidentally skipping trails/selection task sections (most examiner errors). Tasks with baseline and test conditions (i.e., multiple task sections) tended to have more issues and resultant invalidation. Trails had the most issues (7.8%: 5% task-skipping examiner error; 2% tablet freezing; 0.8% other with two interruptions). Selection also had a number of issues (3.6%: 2% task-skipping examiner error; 1% tablet freezing; 0.6% other including one participant refusal and one interruption). Digit recall had a few participant issues (2.6%: 1.2% participant fatigue; 0.8% technical issue; 0.6% other including three interruptions). Figure drawing also had > 1% invalidated data (1.1%: 0.8% technical issue; 0.3% other with one participant refusal). The remaining tasks had no or minimal invalidation: Iowa 0.2% technical issues; Rule-finding 0.1% examiner error; Go/no-go no issues. See Table 2 for task durations (all positively skewed).

Fieldworker feedback
Eight of the 19 fieldworkers (42%) completed post-study feedback questionnaires. Questionnaire data analysis indicated that most fieldworkers found OCS-EF easy to administer citing the app's similarity to a mobile phone menu and the clear task instructions in the local language as reasons. They felt it was more meaningful to administer the tasks when they understood the rationale and the abilities being assessed by each task. Fieldworkers suggested improving tablets to withstand the hot climate and adapting software to minimise/allow correction of resulting examiner error, particularly during the trails task. Fieldworkers felt participants found the trails baseline condition easy and the switching test condition more challenging. They described varying perceptions of rule-finding and Iowa Gambling task difficulty; some participants learned the rules/patterns while others did not. They reported that most participants found selection and figure easy. Emotional go/no-go was challenging to administer; its long duration sometimes resulted in participant fatigue. Digit recall was the most intensive task to administer. The task duration was long if participants reached the lengthier sequences, sometimes resulting in participant fatigue. It also required fieldworkers to concentrate intensely while listening to and capturing participant responses on screen. The main OCS-EF administration challenge was interruptions, particularly at participant homes. Interruptions made it harder for participants to concentrate and perform well, particularly on the more challenging tasks. Fieldworkers reported that most participants found it resulted in them struggling to complete the task. Easier tasks, specifically trails baseline and selection, were sometimes associated with participant boredom, according to the fieldworkers.

Task performance
Summary statistics are presented in Table 3. A pairwise correlation matrix is presented in Supplementary Material S3. There were ceiling effects (negative skewness) for     Figure 6. Model 4, one-factor, confirmatory factor analysis estimates. EF, executive function.
Coefficient and covariance estimates are also presented with standard errors, 95% confidence intervals, z statistics, probability values and squared multiple correlation coefficients in Tables 4-7 for Models 1-4, respectively.
With regards to local fit, no standardised residuals were high enough to suggest model misspecification. Correlations between indicators and latent variables varied in size. They were medium in all four models for the rule-finding, go/no-go and Iowa Gambling Task indicators with models explaining 10-25% of each indicator's variance. The correlations were small for the digit and selection indicators with models explaining <10% of each indicator's variance. Sometimes they varied between models. They were large for figure in the three-factor and two-factor models and small in the one-factor and hot-cool models. They were medium for trails in the two-and one-factor models, and small in the three-factor and hot-cool models. In the three-factor model, switching was significantly and positively correlated with both inhibition and working memory; however, inhibition and working memory were uncorrelated. Global goodness-of-fit indices are presented in Table 8. RMSEA, CFI and CD were all most favourable in model 1 (three-factor) with model 3 (working memory inhibition/switching) having the next best fit (RMSEA < 0.06 for both). Models 2 (hot-cool) and 4 (one-factor) had a higher RMSEA (> 0.06) suggesting worse fit. No model had CFI > 0.95 but model 1 (three-factor) came closest (CFI 0.92). Likelihood-ratio χ 2 tests revealed that    Note: Coefficient = standardised coefficient; EF = executive function; SE = standard error; z = z statistic; SMC = squared multiple correlation coefficient. models 2-4 were more significantly different to saturated models (p's < 0.001) than the three-factor model (p = 0.012). The χ 2 difference test revealed a significant difference between the two best models, three-factor and two-factor working memory inhibition/switching model (χ 2 difference = 25.28; df = 2; p < 0.0001). We retained the three-factor model as it had the best fit on multiple indices.

Main findings
We demonstrated the feasibility of large-scale fieldworker-administered adolescent cognitive testing with the OCS-EF in rural SA. Fieldworker feedback re-administering the OCS-EF and data integrity were generally good. All tasks had > 90% valid data (most tasks > 98% valid data); however, there were two tasks, trails and selection, where invalidation was more common, due to tablets freezing during these tasks in hot weather, resulting in fieldworkers sometimes double-clicking and accidentally skipping task sections. This problem was likely due to a combination of technical (both hardware and software), climate and fieldworker factors; however, the solution would primarily be technical as described in the recommendations section below. Optimising the testing environment to minimise interruptions and further automating the digit recall task could reduce data loss and the potential for human error, and thus improve data quality further. The feasibility of large-scale cognitive testing and research in these rural, underdeveloped villages was likely facilitated by the existing HDSS platform of the MRC/Wits Agincourt Research Unit with its skills development ethos, experience with training fieldworkers, experience with electronic tablet-based data collection with the support of a skilled data team, existing quality control standard operating procedures, and well-established relationships with community leaders and members.
This study, to our knowledge, represents the first attempt to confirm EF factor structure in a large community-based sample of late adolescents in Africa. The large sample size means parameter estimates are likely stable and that testing across a wide age range was possible. Similar to studies performed elsewhere amongst adolescent/adult samples, we found a differentiated EF model fit the data better than a single-factor or undifferentiated EF model (Karr et al., 2018); however, our models were unique because they included both hot and cool EF. We found a differentiated three-factor model including hot (inhibition) and cool (working memory, switching) EF factors fitting best, followed by the two-factor (working memory, combined inhibition/switching) model. This differentiated three-factor model incorporating hot and cool EF found switching and inhibition were highly correlated; however, collapsing them into a single switching/inhibition factor (Model 3) decreased global model fit. Switching and working memory were also correlated; however, collapsing them into a single cool factor in the two-factor hot-cool model also decreased global model fit. Inhibition and working memory factors were uncorrelated, possibly because the inhibition factor only included hot EF tasks. All the differentiated models fit better than an undifferentiated model. The three-factor model fitting better than the hot-cool model suggests that the differentiation of global EF into switching, inhibition and working memory may be more important than the hot-cool EF differentiation; however, the final three-factor model incorporates the hot-cool EF differentiation, with inhibition being the hot EF factor, so this hot-cool differentiation is likely also relevant.
These findings suggest that, like in high-income countries, EF differentiates into distinct, related factors by late adolescence (Miyake & Friedman, 2012). In contrast to Miyake's updated EF model, we did not assess cool inhibition as a common, underlying factor because we only used hot inhibition tasks. Although the emotional go/no-go task measures inhibition comparably to the non-emotional analogue (Schulz et al., 2007), Iowa Gambling Task performance has generally not been significantly associated with cool EF (Toplak et al., 2010). Yet, in this study, the inhibition factor onto which the Iowa Gambling Task loads was correlated with the cool EF switching factor. This commonality may represent the pattern-recognition/learning component that is also present in both the Iowa Gambling (inhibition) and rule-finding (switching) tasks. The lack of correlation between inhibition and working memory factors probably explains why a one-factor model fitted poorly. Low digit factor loading was surprising given its frequent use globally. Measurement error may be responsible. It required fieldworker input to capture responses correctly while participants verbalised responses.
Rule-finding, Iowa and go/no-go tasks explained greater proportions of the shared variance than selection, digit and trails tasks, with mixed findings for figure drawing. Some tasks may have explained lower proportions because they also measure other aspects of cognition. Digit's low loading on the working memory factor may be because the visuospatial working memory tasks (invisible selection; figure recall) reflected planning more than visuospatial working memory. This might explain why factor loadings were higher for selection and figure than for digits backwards, a traditional working memory task. The three-factor model fit was likely good because the figure factor loading was high in this model; it decreased substantially when the working memory and switching factors were collapsed into a single factor (hot-cool and one-factor models). The three-factor model thus explained a higher proportion of the overall variance. There were varying findings when examining the associations between EF and sociodemographic factors. First, in terms of age associations, late adolescence being associated with poorer EF in this study does not fit with the broader literature but may be due to unintentional selection bias. Older participants were by definition educationally delayed at recruitment (18-20-year-olds still in grades 8-11 despite the expected age range for these grades being 13-17-years-old). Educational delay is more common in rural South Africa; exclusion of these older participants would not have been representative of a typical rural secondary school population (Romero et al., 2018). Second, education level was significantly associated with better performance on trails baseline accuracy and switch duration cost, figure recall accuracy and rule-finding accuracy. It was not associated with Iowa Gambling, go/no-go or digit tasks. These findings generally fit the pattern of cool EF, rather than hot EF, being associated with academic performance (Poon, 2017). Education level was negatively associated with selection, perhaps due to perceived task ease resulting in boredom and carelessness. Finally, socio-economic status was likely not associated with cognitive performance as these stratifications were fine gradations within a relatively homogenous low-income population.

Limitations
There are several study limitations. It was difficult to ensure standardised laboratory-like test conditions in this context at scale. A further limitation is sample composition: it is likely not fully representative of the local population as it excluded some of the most vulnerable women given the trial's eligibility criteria. We only tested cognition at one time-point so we could not assess test-retest reliability or predictive validity.
Finally, task order meant the two inhibition tasks (similarly the two working memory tasks) were administered sequentially. This may have introduced error due to participant-by-order interaction and caused poorer performance on the second task each time (emotional go/no-go; digit recall) (Miyake et al., 2000). Fatigue was common in the last task, digit recall.

Recommendations
There are improvements that could be made at various levels. At implementation level, ongoing fieldworker training with open communication is vital for quality assurance. Test conditions should be optimised within the contextual constraints (e.g., mobile testing van). In terms of hardware, "rugged" tablets would be suitable for this environment. Software can be improved by: programming lock-in periods to prevent accidental section skipping or inserting a "back" button; amending task order to avoid sequential administration of two tasks assessing the same EF. Examiner digit recital can be pre-recorded and played off-screen to standardise administration further. Participant input of digit sequences could also minimise potential examiner input error. Full scoring automation with immediate on-screen results could be developed to increase clinical utility.

Research gaps
Although this study presents proof-of-concept for the feasibility and factorial validity of OCS-EF, there are many gaps to be explored in future research, including further work on implementation and quality control in rural African contexts, and assessing different types of validity. There is scope for more in-depth qualitative research with cognitive fieldworkers and participants to assess the tasks' face validity, and the impact of environmental factors (e.g., testing environment) on data quality. In terms of factorial validity, replication of the three-factor structure in other African adolescent samples (especially urban and/or males) will increase confidence in result stability. Given that HAND is a common cause of cognitive deficits in young people in Africa, the utility and diagnostic validity of OCS-EF in identifying and describing cognitive profiles in people living with HIV needs to be explored. Predictive validity should also be assessed by examining associations with relevant occupational (e.g., employment) and behavioural outcomes (e.g., risky sexual behaviour) (Rosenberg et al., 2018).

CONCLUSION
This study confirmed the factorial validity of the OCS-EF tablet-based EF assessment tool in a large community-based sample of adolescent females in rural South Africa. It is likely the largest study to examine EF factor structure in adolescents in Africa. Vitally, it has established informed estimates of cognitive status of rural adolescent females for future comparative work. OCS-EF will provide clinicians and researchers with a platform for measuring EF quickly and easily in LMICs with limited access to formal neuropsychological assessment.
Manuscript received May 2020 Revised manuscript accepted March 2021

SUPPORTING INFORMATION
Additional supporting information may be found online in the Supporting Information section at the end of the article.
Supplemental material 1 Supplemental material 2 Supplemental material 3 Supplemental material 4