Global Examination of Mental State: An open tool for the brief evaluation of cognition

Abstract Background The aim of this paper is to present a freely accessible new instrument for the evaluation of cognition: the Global Examination of Mental State (GEMS). Methods It is made up of 11 items tapping into a range of skills, such as Orientation in time and space, Memory, Working memory, Visuo‐spatial, Visuo‐constructional and Planning abilities, Perceptual and visual Attention, Language (Naming, Comprehension, and Verbal fluency), and Pragmatics. Results The psychometric strengths of this screening are: (1) extensive and updated normative data on the adult Italian population (from 18 to 100 years old); (2) absence of ceiling effect in healthy individuals, which allows to better detect interindividual variability; (3) comparison of the global scores with normative data taking into account Cognitive Reserve rather than only education, thus increasing diagnostic accuracy; (4) thresholds for significant change over time and the possibility to use parallel versions (GEMS‐A/GEMS‐B) for test‐retest; (5) solid psychometric properties and data on discriminant validity; and (6) free access to all materials (record forms, instructions, and cut‐off scores) on the web under a Creative Common License. Conclusions With all these characteristics, GEMS could be a very useful paper‐and‐pencil instrument for cognitive screening.

Another reason to prefer screenings is the fact that patients might be unable to cope with the demands of a whole neuropsychological assessment, which may last very long and sometimes require a high amount of resources (Plass et al., 2010).
Moreover, screenings may provide a concise picture of cognition, rather than a patchy frame of different cognitive functions (Riello et al., 2021). In many cases, for example, it is more useful to identify the general cognitive functioning of a patient rather than their specific disorders in order to evaluate day-to-day capacities (Block et al., 2017).
Furthermore, screenings are useful in many fields of research especially to verify inclusion/exclusion criteria or to evaluate the effect of experimental variables on cognition.
However, a limitation of screenings is that, although commonly used with different pathologies, they are developed for specific clinical populations. For example, Mini-Mental State Examination (Folstein et al., 1975) (MMSE) as well as Addenbrooke's Cognitive Examination-Revised (Mioshi et al., 2006) were originally focused on Alzheimer's type dementia; Montreal Cognitive Assessment (Nasreddine et al., 2005) was developed specifically to detect mild cognitive impairment (MCI, the predementia stage); Oxford Cognitive Screen (Demeyere et al., 2015) is tuned specifically for stroke disorders; Edinburgh Cognitive and Behavioural ALS Screen (Abrahams et al., 2014) considers patients with amyotrophic lateral sclerosis; Brief Assessment of Cognition in Schizophrenia (Keefe et al., 2004) applies to patients with schizophrenia; and Rao's Brief Repeatable Battery (Rao et al., 1991) to patients with multiple sclerosis.
On the contrary, other screenings are suitable for a wide range of pathologies, but determine only specific deficits; for example, the Frontal Assessment focuses on executive dysfunctions.
A common issue of screening tests is that their global score is derived from adding up all the items of each subtask, but not yielding an equal weight on the global score for all cognitive subtasks.
For example, in MMSE, the total score is 30, where 10 points (one third) are allotted to orientation (in space and time), whereas only 1 point is for constructional apraxia. Another example is in the Montreal Cognitive Assessment (MoCA): 6 points are attributed to the Visuospatial/Executive section, while 3 points are attributed to the Naming section.
Moreover, screening tests sometimes lack data on important psychometric properties, which limit their potential as assessment tools.
A recent systematic review on Italian tests has shown that validity and reliability are often neglected properties, while thresholds for significant change are hardly ever reported (Aiello et al., 2022).
Another limitation is that they are often used to draw general inferences on cognition even if there is no validation in this sense: often, there are no data showing that the screenings correlate satisfactorily with wider batteries or with general tests of cognition.
Another important point is that scores of screenings, as is the case for all cognitive tests, have so far only been adjusted for age, education, and sex (Strauss et al., 2006), although other variables may be at play and influence cognition (e.g., sociobehavioral or socioeconomic factors Fratiglioni et al., 2004;Livingston et al., 2017;Mondini et al., 2022;Ward et al., 2015). From this perspective, education is only one component of the concept of Cognitive Reserve (CR, Stern, 2003), which is well recognized as a comprehensive measure of abilities and knowledge acquired during life (Stern & Barulli, 2009), and as a modulator of cognitive performance (Lojo-Seoane et al., 2018;Mitchell et al., 2012;Montemurro et al., 2019;Steffener & Stern, 2012). CR is considered a protective factor against major neurocognitive decline , whereas in other pathologies, CR shows a positive effect on recovery after brain damage (Hindle et al., 2014;Menardi et al., 2020;Nunnari et al., 2014). Thus, in order to have a clearer picture of the examinees' performance, test scores should be better adjusted for CR rather than merely age and education.
Finally, following Open Science principles and sharing tools with appropriate licenses (as a Creative Common) could be a way to share neuropsychological instruments with professionals and researchers.
With this in mind, we present a new screening test, the Global Examination of Mental State (GEMS), which provides a fast measure of global cognition in approximately 10 min. This screening is easy to administer and takes into account many psychometric and methodological aspects often neglected in screening tests.

Participants
Healthy Italian volunteers (635; 396 females) were recruited in different social groups, organizations, and in other environments without connections with clinical settings. Inclusion criteria were: age over 18, Italian mother tongue, and autonomy in principal daily living activities.
Persons with neurological or psychiatric diseases were excluded.
All participants took part voluntarily in this study, signed the informed consent, were aware they could stop and withdraw from the

Study design
GEMS-A and CRIq were administered to 616 participants, while another 29 were tested with GEMS-B and CRIq. From the 616 group, 60 participants also underwent the MOCA (Nasreddine et al., 2005), and 50 the ENB-2 (Mondini et al., 2011). The assignment to one of these three groups was random. After 1-3 months, 52 individuals underwent GEMS-A, while 59 were retested with GEMS-B. See Figure 1 for more details about the data collection design. and Metaphor Comprehension.

Materials
Each of the 11 items gives a raw score, which is then proportionally recorded in a way that each item (representing mainly one cognitive function) weighs as any other on the final score. The total score aims to represent a cognitive profile without a priori relevance in any functions.
Thus, GEMS does not address any specific diagnosis or disorder.

Statistical analysis
An initial item analysis aimed to identify and select satisfactory tasks

RESULTS
The mean score of 635 GEMS total scores was 83.41/100 (SD = 12.8; range 21-99). The distribution was left-skewed, with no ceiling effect (see Table 1 for descriptive statistics of each item and total scores).

Internal consistency
Results showed high internal consistency (alpha = 0.81) and each item showed a high correlation with the global score: 0.80 for Orientation,

Construct validity
In order to verify GEMS capacity to measure global cognition, a subsample of 60 participants were further assessed with MoCA and ENB-2. GEMS correlated with MoCA (r = 0.723, p < .001) and with ENB-2 (r = 0.811, p < .001). To corroborate these results, we performed two additional regressions with GEMS total scores as dependent variable: in the first one, MoCA was the predictor (Adj. R 2 = 0.514, p < .001), while in the second, ENB-2 was the predictor (Adj. R 2 = 0.651, p < .001). Both models strengthened a satisfactory construct validity.

Test-retest reliability, practice effect, and parallel forms
GEMS test-retest reliability measured on 52 participants was very good (test-retest: r = 0.845, p < .001) ranging from good to excellent for each task, except for Verbal Comprehension. Table 2 shows more details.
Practice effect was calculated with a series of paired t-tests: a significant practice effect was found in 4 out of 11 tasks (i.e., Orientation, Immediate Memory, Naming, and Fluency) and in the GEMS total scores.
To account for practice effect, GEMS (hence GEMS-A) was paired with a second form: GEMS-B. A subsample of 59 participants was assessed with both versions and results showed good correlation (Pearson's r = 0.774, p < .001). The correlation coefficients deriving from GEMS-A/GEMS-A and GEMS-A/GEMS-B were verified using the Fisher r-to-z transformation, and results showed no significant differences (two-tailed p = .289).

Practice effect between GEMS-A and GEMS-B calculated with
paired t-tests showed no practice effect (Table 2).

Threshold for significant change
A regression-based approach was used to calculate significant (Crawford & Garthwaite, 2007) between two measurements, which allows TA B L E 2 The table shows the values of test-retest reliability (Pearson's r) and practice effect (paired t-tests) of scores of single tasks and global score after the administration of GEMS-A followed by a second administration of GEMS-A (GEMS A-A) compared with the administration of

Effect of demographic variables
The effect of age, sex, education, and CRI was assessed in multiple regressions to derive clinical cut-offs (see below). Results show that age, education, and CRI are significant predictors of GEMS, whereas sex has no effect. In particular, age has a negative relationship with GEMS, whereas education and CRI have a positive one. Interestingly, the main effect of CRI was stronger than that of education.
Visual inspection of the partial residuals indicated that age, education, and CRI were nonlinearly related with GEMS. Models 4-6 were thus built to check whether including nonlinear terms would improve the model fit. We used the modified version of the Akaike Information Criterion for model comparison and Adjusted R 2 to select the best fitting model. Model 6, including age, sex, and CRI, and the quadratic terms for age (age2), education (education2), and CRI (CRI2) showed the minimum loss of information (Burnham et al., 2011) and it was used for generating cut-offs. For more details, see Figure 2 and Table S3. showing a very good discriminant validity (Figure 3).

DISCUSSION
In this work, we have presented GEMS, a new paper-and-pencil screening test to investigate global cognition and impairments of any origin/etiology. GEMS psychometric properties, normative data, and cut-offs based on a well-represented sample for the current Italian population are reported. The score of the 11 GEMS tasks was obtained by transforming the raw scores into proportions and then averaged, so that each task contributes with equal weight to the final composite score which ranges from 0 to 100 (see the same approach in Arcara & Bambini, 2016).

F I G U R E 3 ROC curve for GEMS scores in discriminating between healthy participants and patients with Parkinson's disease
GEMS showed a high internal consistency, indicating that each task has a high correlation with the global score. Furthermore, GEMS showed optimal correlation with a complete and extensive neuropsychological battery (ENB-2 Mondini et al., 2011) and with a well-known cognitive screening (MoCA, Nasreddine et al., 2005). This highlights the potential of GEMS to measure the underlying construct (good convergent validity). Test-retest reliability (GEMS-A and GEMS-A) was optimal, demonstrating score stability across repeated measurements.
A practice effect was also found, but this was reduced using the parallel version GEMS-B. Indeed, the two versions showed high correlation and no practice effect. Thresholds of significant changes are also reported, allowing to detect a significant improvement/decrement over time.
Although parallel forms can certainly reduce practice effect, a significant change approach to detect possible meaningful changes over time is important to monitor cognition (see Aiello et al., 2022 for further considerations on the psychometric properties of cognitive screening).
Younger individuals performed better than older ones and those with higher education and/or higher CR performed better than those with lower education and lower CR; no sex difference was found.
GEMS cut-offs are generated considering not only age and education but the more comprehensive score of cognitive reserve, which provides a more precise expectation on performance and better understanding of the possible evolution of the profile. Indeed, we found that cognitive reserve is a more reliable predictor of cognitive performance than education (for similar results, see .
Comparison with a clinical population showed that GEMS has high sensitivity and high specificity in discriminating healthy individuals from individuals with Parkinson's disease.
In addition to its psychometric properties, this cognitive screening has other strengths.
In the spirit of Open Science, GEMS record forms, instructions, and cut-off scores are freely available on the web under a Creative Common license and interested neuropsychologists can use them in different clinical or research settings. Furthermore, the accessibility to all the materials will allow authors from different countries to easily translate and adapt GEMS into specific cultures and languages and proceed with the collection of normative data in different populations.
GEMS is not exempt from limitations. For example, information about inter-rater reliability could not be collected due to data being gathered during the recent pandemic restrictions. Furthermore, discriminating validity was measured with Parkinson's patients, but other clinical populations should be integrated or considered in future studies.
Despite the above limitations, and although the diagnostic capacity of any screening may not be comparable to a comprehensive test battery (Roebuck-Spencer et al., 2017), GEMS could significantly contribute to and enhance the quality of the neuropsychologist's toolkit.

ACKNOWLEDGMENTS
We thank all participants who took part in this research.

DATA AVAILABILITY STATEMENT
The datasets generated during and/or analyzed during the current study are not publicly available due to information that could compromise participant privacy but are available from the corresponding author on reasonable request.

PEER REVIEW
The peer review history for this article is available at https://publons.