Psychometric properties of measures designed to assess common mental health problems and wellbeing in adults with intellectual disabilities: a systematic review

Background Multiple measures of mental health problems and mental wellbeing for adults with intellectual disabilities are available, but investigations into their reliability and validity are still in the early stages. The aim of this systematic review was to provide an update to previous evaluations of measures of common mental health problems and wellbeing in adults with mild to moderate intellectual disabilities (ID). Methods A systematic search was performed across three databases (MEDLINE, PsycINFO and SCOPUS). The literature search was limited to the years from 2009 to 2021 and to the original English versions. Ten papers evaluating nine measures were reviewed, and the psychometric properties of these measures were discussed using the Characteristics of Assessment Instructions for Psychiatric Disorders in Persons with Intellectual Developmental Disorders as a framework. Results Four measures had at least one rating of ‘ good ’ across both dimensions of reliability and at least one dimension of validity and were deemed to have promising psychometric properties: the Clinical Outcomes in Routine Evaluation-Learning Disabilities, Impact of Events Scale-Intellectual Disabilities, Lancaster and Northgate Trauma Scales and Self-Assessment and Intervention (self-report section). Additionally, these measures were developed through consultations with mental health professionals and/or people with IDs and thus were deemed to have good content validity. Conclusions This review informs measurement choice for researchers and clinicians while highlighting a need for continued research efforts into the quality of measures available for people with IDs. The results were limited by incomplete psychometric evaluations of measures available. A paucity of psychometrically robust measures of mental wellbeing was observed.


Introduction
People with IDs may experience higher rates of mental health problems compared with the general population (Cooper et al. 2007;Dunn et al. 2020), although estimates of prevalence differ by sampling method and population studied ranging from 10% to 39% (Emerson & Hatton 2007;Pouls et al. 2021).The most common mental health problems, depression and anxiety, have been estimated to affect between 2.2% and 15.8%, and 3.8% and 17.4%, respectively, of this population (Reid et al. 2011;Hsieh et al. 2020).
Although there is a range of both global-and symptom-specific measures of mental health problems for adults with intellectual disabilities (IDs) available, investigations into their reliability and validity are still in the early stages (Hatton & Taylor 2013).To date, two systematic reviews of measures of depressive symptoms in people with IDs have been conducted (Perez-Achiaga et al. 2009;Hermans & Evenhuis 2010).The earlier review concluded that the Reiss Screen for Maladaptive Behaviour (RSMB; Reiss 1988) and the Psychiatric Assessment Schedule for Adults with Developmental Disabilities Checklist (PAS-ADD; Moss et al. 1993), two global mental health measures, demonstrated robust psychometric properties.However, Hermans & Evenhuis (2010) disagreed on the utility of the PAS-ADD Checklist and the RSMB for screening for depression, as sensitivity and specificity had not been measured, although agreed that the psychometric properties of the RSMB were promising.They concluded that the Glasgow Depression Scale for people with a Learning Disability (GDS-LD; Cuthill et al. 2003) was the most promising self-report instrument, while the Assessment of Dual Diagnosis (ADD; Matson & Bamburg 1998), the RSMB and the Children's Depression Inventory (CDI; Kovacs 1985) were promising informant-report measures.However, they noted that none of these informant-report measures had yet been satisfactorily assessed with regard to their psychometric properties when used with this population.Furthermore, Hermans et al. (2011) conducted a systematic review of measures of anxiety for people with IDs.They concluded that the Glasgow Anxiety Scale for people with an Intellectual Disability (GAS-ID; Mindham & Espie 2003) was the most robust self-report instrument, whereas the Anxiety, Depression And Mood Scale (ADAMS; Esbensen et al. 2003) was the most promising informant-report instrument.
In the field of mental health, there is a growing interest in promoting 'positive mental health', or mental wellbeing.Several conceptualisations of mental wellbeing have been debated, although the consensus is that wellbeing encompasses 'feeling well' (hedonia) and 'functioning well' (eudaimonia), as opposed to a mere absence of symptoms of mental illness (Keyes 2002;Deci & Ryan 2008;Stewart-Brown et al. 2015;Cooke et al. 2016).
Evidence suggests that mental wellbeing may protect against psychopathology (Trompetter et al. 2017).Compared with the general population, there is less research pertaining to individuals with IDs in this area (Raczka et al. 2020).
The term 'mental wellbeing' has often been used interchangeably with 'quality of life' (QoL) in the literature (Cooke et al. 2016) although it has been argued that they refer to different theoretical concepts (Skevington & Böhnke 2018), with QoL referring to 'an individual's perception of their position in life in the context of the culture and value systems in which they live and in relation to their goals, expectations, standards and concerns' (WHOQOL Group 1995, p. 1404).Two systematic reviews to date (Townsend-White et al. 2012;Li et al. 2013) have explored the measurement of QoL in adults with IDs, and these reviews included search terms related to 'wellbeing'.In the first review, the authors concluded that the Choice Questionnaire (CQ; Stancliffe & Parmenter 1999) and the Quality of Life Questionnaire (QOLQ; Schalock & Keith 1993) were the most psychometrically robust measures, while the authors of the subsequent review concluded that 6 out of the 24 measures of QoL they evaluated were psychometrically sound, although they did not express a preference for a particular measure.Flynn et al. (2017) recently conducted a systematic review of measures of mental health problems and mental wellbeing in children and adults with severe or profound IDs.The Aberrant Behaviour Checklist (ABC; Aman et al. 1985), the Diagnostic Assessment for the Severely Handicapped Scale-II (DASH-II; Matson 1995) and the Mood, Interest and Pleasure Questionnaire (MIPQ;Ross & Oliver 2002, 2003) were rated as having good methodological quality for use with individuals who had severe to profound IDs.The authors noted that tools measuring mental wellbeing in this population were lacking.
The aim of this paper is to extend the results of the aforementioned systematic reviews and provide an update to previous psychometric evaluations of measures of mental health problems and wellbeing, in adults with IDs.This will inform choice for clinicians and researchers interested in assessing mental health problems and mental wellbeing in this population.
As Flynn et al. (2017) recently evaluated measures of mental health problems and mental wellbeing for people with severe or profound IDs, this paper will evaluate measures used for people with mild to moderate IDs.In addition to mental wellbeing, the present review will focus on the measurement of anxiety disorders or depression, described by NICE (2011) as 'common mental health problems', because when combined, they affect more people than other mental health problems.
The review sets out to answer the following questions: 1 Which measures have been used to assess common mental health problems and mental wellbeing in adults with mild to moderate intellectual disabilities? 2 What are the psychometric properties of these measurement tools?

Design
The protocol for the present review was registered with Prospero (https://www.crd.york.ac.uk/prospero/; registration number: CRD42021270069).
The PICO (Population, Intervention, Comparator, Outcome) framework was considered to guide the search strategy.The population was adults (aged 18 + years) with a mild to moderate ID.The intervention was the psychometric evaluation of measures.With regard to the comparator, an evaluative and descriptive tool, the Characteristics of Assessment Instructions for Psychiatric Disorders in Persons with Intellectual Developmental Disorders (CAPs-IDD; Zeilinger et al. 2013), was used to allow comparison between the measures.The outcomes of interest were symptoms of anxiety disorders and depression, and mental wellbeing.

Search strategy
The Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines were used to inform the methodology employed.Electronic searches of the databases MEDLINE, PsycINFO and SCOPUS were conducted on 13 September 2021.The searches were limited to papers published from January 2009 to September 2021, in order to minimise overlap with previous reviews (Hermans & Evenhuis 2010;Hermans et al. 2011;Li et al. 2013), while providing an update to the literature.The full list of search terms used is summarised in Table 1.Search terms were identified based on previous similar reviews (Hermans & Evenhuis 2010;Flynn et al. 2017) and related to four headings, truncated where appropriate and combined using the Boolean terms 'OR' and 'AND'.These were as follows: (1) psychometric properties (e.g.validity, reliability and quality); ( 2) measurement (e.g.assessment, outcome, screening and questionnaire); (3) common mental health problems and mental wellbeing (e.g.anxiety, depression, mood and quality of life); (4) IDs (e.g.learning disability and intellectual developmental disorder).As the focus of the present review was measures used with adults, an additional term, 'NOT', was used to exclude papers related to children and adolescents.Additional synonyms of these headings were also used and search terms accounted for both British English and American English spelling.

Inclusion and exclusion criteria
Papers were included if they meet the following: 1 The participants in the study were adults aged 18 + years old.If a study included any participants who were aged 17 years or below, the paper was included if the results for the participants aged above and below 18 years were reported separately. 2 At least 50% of the sample were reported to have a mild to moderate ID, defined using administrative definitions (e.g.use of specialist services for people with ID).This was to ensure that there was a majority of people with mild to moderate IDs in the study sample.(Cooke et al. 2016).Therefore, to ensure that all of the relevant papers were included, studies, which referred to measures of either mental wellbeing or QoL, were included.4 The main aim of the study was to evaluate the psychometric properties of a measure.5 The article was published in a peer-reviewed journal in the English language.6 The measure was administered in the English language.This was so that the review may inform measurement choice for fellow English-speaking researchers and clinicians.
Papers were excluded if the study sample included individuals aged 17 years or below and the result for these participants were not reported separately.Additionally, measures of health-related QoL were excluded due to the narrow focus on physical health-based constructs, which does not capture broader aspects of QoL or mental wellbeing.

Screening process
See Figure 1 for the PRISMA flow diagram, which summarises the systematic review screening process.
The initial search yielded 3936 papers, which reduced to 2434 following the removal of duplicates.The titles were initially screened by the first author (M.P.), and if these were vague, abstracts were also reviewed.Additionally, citation searches and an inspection of reference lists were undertaken to ensure that no further eligible studies were missed.The full text of potentially eligible articles (n = 121) was reviewed against the inclusion criteria.The third author (K.S.) independently reviewed all 121 articles against the criteria and agreed with the inclusion/exclusion of 120 papers, with one paper requiring further careful discussion with M. P. before its inclusion.Ten eligible articles, which evaluated nine measures, were included in the review.

Quality appraisal
The CAPs-IDD is a comprehensive framework for evaluating and describing measures of psychiatric disorders in people with IDs.The CAPs-IDD does not produce a total score; however, it was used in the present review to summarise the psychometric properties of the included measures.There are two parts to the CAPs-IDD.Part 1 relates to the conceptual and measurement model of an instrument, describing basic information about the measure, how the measure was developed and measurement characteristics.Part 2 pertains to psychometric properties and summarises information about the validity (criterion, content, construct and face), reliability (internal consistency, reliability and measurement error), objectivity of application, objectivity of interpretation and feasibility of a measure.The review focussed on Part 2 of the CAPs-IDD framework (pertaining to psychometric properties) to discuss the findings of the quality appraisal.The results for Part 1 of the framework are reported in Tables S7-S15 in the supporting information.For the purpose of this review, measurement error was not discussed as no information pertaining to this was identified in the reviewed papers.Furthermore, face validity was not reviewed separately, as it overlapped with content validity in the CAPs-IDD framework, which encompassed both the relevance and comprehensiveness of items in the measures.
In order to determine the psychometric quality of included measures, they were subsequently rated on the four-point scale used by Flynn et al. (2017) (++ excellent; + good; À fair; -poor).Further information can be found in Table 2.With regard to interpreting results of factor analyses, a root mean square error of approximation of ≤.06 and a comparative fit index of ≥ 0.95 were rated as a good fit (Hu & Bentler 1999).Samples of 100-200 participants are generally considered adequate for examining the reliability and validity of instruments (EFPA 2013), and therefore, samples with fewer than 100 participants were considered 'inadequate'.
The second author (J.Y. L.) independently extracted data from the included papers and rated the quality of the measures.Differences in ratings were discussed until a consensus was reached.

Results
The screening process yielded 10 papers, which reviewed nine measures.The measures included in the review are listed in Table 3.Five papers presented initial studies of the psychometric properties of a measure (McGillivray et al. 2009;Wigham et al. 2011;Brooks et al. 2013;Chaplin et al. 2013;Raczka et al. 2020), and four papers detailed further analyses of measures, which had been previously validated in other studies (Devine et al. 2010;Rojahn et al. 2011;Briscoe et al. 2019;Wigham et al. 2021).Hall et al. (2014) reported both initial psychometric data of a measure and additional data from analyses of another measure.A description of each measure is presented in Table S5.
In summary, five measures included items that pertained to a broader spectrum of disorders or mental health difficulties (ADAMS, ADD, CORE-LD, Mini PAS-ADD and SAINT), two measured PTSD (IES-IDs and LANTS), and two measured QoL (Mini-MANS-LD and PWI-ID).Five of the measures reviewed were self-report (CORE-LD, IES-IDs, Mini-MANS-LD, PWI-ID and SAINT), and three were designed to be used with an informant (ADAMS, ADD and Mini-PAS-ADD).The LANTS included both a self-report and an informant-report scale.Three measures were recommended to detect changes over time and/or in response to an intervention (e.g., as a routine outcome measure; CORE-LD, Mini-MANS-LD and PWI-ID).
A summary of the psychometric data is presented in Table 4.The CAPs-IDD tables that provide a comprehensive description of the conceptual and measurement model and psychometric properties of the nine measures are presented in Tables S7-S15.

Internal consistency
Internal consistency is the extent to which items in a questionnaire are correlated and therefore measure the same concept (Terwee et al. 2007).Internal consistency was assessed for eight measures and was generally high.Three measures had excellent total score internal consistencies (ADAMS, ADD and IES-IDs), three measures were rated as having 'good' internal consistency (CORE-LD, LANTS and SAINT) and two were rated as 'fair' (Mini-MANS-LD and PWI-ID).However, subscale internal consistencies across the measures were generally lower, as detailed in Table 4.

Test-retest
Test-retest reliability refers to the degree to which repeated administrations of a measure provide similar responses (Terwee et al. 2007).The time period between administrations is often 1 or 2 weeks, to prevent recall while ensuring that clinical change has not occurred (Terwee et al. 2007).Test-retest reliability was reported for five measures (CORE-LD, IES-IDs, LANTS, SAINT and PWI-ID).The time period between administrations of the measure ranged from 1 to 6 weeks.The coefficients ranged between good (CORE-LD, LANTS informant scale and PWI-ID) and excellent (IES-IDs, LANTS self-report scale and SAINT).

Criterion validity
Criterion validity refers to the extent to which scores on an instrument relate to a 'gold standard' measure 7      (Terwee et al. 2007).Hatton & Taylor (2013) noted that an issue with testing validity stringently in this field is the lack of 'gold standard' measures of mental health problems for people with IDs.Clinical opinion is still preferred by many researchers as the gold standard method (Perez-Achiaga et al. 2009).In the present review, criterion validity was evaluated only for one measure (Mini PAS-ADD), by examining the sensitivity and specificity the measure compared with an assessment by a psychiatrist.Sensitivity analysis was found to be perfect; however, specificity analysis was lower.

Content validity
Content validity refers to the extent to which concepts are represented by the items in the measure (Terwee et al. 2007).Content validity is deemed to be good if a clear description of the concept being measured and item selection is provided, in addition to the target population and experts being involved in the measure development process (Terwee et al. 2007).In the present review, six measures addressed at least one aspect of content validity (CORE-LD, IES-IDs, LANTS, Mini-MANS-LD, Mini-PAS-ADD and SAINT).The items in four of these (CORE-LD, IES-IDs, Mini-MANS-LD and Mini-PAS-ADD) were derived from established measures of mental health or wellbeing.Four measures were reported to be developed through consultation with both people with IDs and mental health experts (CORE-LD, LANTS, Mini-MANS-LD and SAINT).Mental health professionals were consulted during the development of the IES-IDs.

Construct validity
Construct validity is the extent to which an instrument measures the distinct construct it is intending to measure (Markus & Lin 2010).
Convergent validity, a subset of convergent validity, refers to the extent to which measures of theoretically related constructs converge, therefore suggesting that they capture a common construct (Carlson & Herdman 2012).Construct validity also refers to the structural validity of a measure and whether different dimensions within the measure correlate.
Convergent validity was examined in all measures except for the Mini-PAS-ADD.The CORE-LD and Mini-MANS-LD were correlated with only one other measure, which may limit assessment of convergent validity, while the other measures were correlated with more than one measure.There was a broad range in the strength of significant correlations with other measures and five measures had a minimum rating of 'good' (ADAMS, ADD, CORE-LD, Mini-MANS-LD and SAINT).Six measures demonstrated excellent convergent validity with at least one other measure (ADAMS, ADD, IES-IDs, LANTS and SAINT).It appeared that the relationship between self-report and informant-report measures was poorer, as the correlation between the IES-IDs and the GDS informant scale was not significant, while the correlations between the IES-IDs and the LANTS informant subscales ranged from poor to fair.The correlation between the LANTS informant subscales and PAS-ADD ranged from poor to good.Furthermore, the convergence between the LANTS self-report and informant-report subscales was also poor, as the magnitude of the correlation with the behavioural changes subscale was low, while the correlations with the frequency and severity subscales were not significant.
There were insufficient investigations into the factorial structures of included measures.A factor analysis was attempted for four of the measures (ADAMS, ADD, LANTS self-report scale and PWI-ID).This was rated as 'poor' for the ADAMS, while the model would not converge for the ADD.The anticipated factor structures were confirmed for the PWI-ID and LANTS, although seven items were removed from the analysis of the LANTS due to particularly high skewness and/or kurtosis.There are different recommendations in the literature for the number of participants required for factor analytic techniques, for example, Guadagnoli & Velicer (1988) suggested n = 100-200, whereas a minimum of 1:5 item:case ratio was recommended by Floyd & Widaman (1995).With regard to the adequacy of sample sizes in the included studies that examined factor structure, this was considered acceptable for the ADAMS, ADD (both n = 263) and PWI-ID (n = 114).The sample size for the LANTS self-report study was slightly below the recommendation (n = 98).

Objectivity of application, interpretation, norming and fairness
With regard to the objectivity of application and interpretation, some instructions for the administration of the CORE-LD and IES-IDs were reported in the published papers although instructions for coding were not.A short manual was available for administration and coding for the Mini-MANS-LD.A more comprehensive manual is available for the Mini-PAS-ADD and PWI-ID.Guidelines for administration and coding were not found through a web search for the remaining measures.With regard to the SAINT, the authors reported that it was not intended to be coded as it has not been designed as a diagnostic tool, and its purpose was to help individuals with IDs to recognise symptoms of mental distress.
In the included studies, little to no information was reported with regard to normative or comparative data from the general population.McGillivray et al. (2009) compared PWI-ID ratings with PWI ratings from general population samples and found that the total scores did not differ significantly.For included measures which were adapted from measures designed for the general population (for example, the CORE-LD and IES-IDs), normative data may be found in the published papers assessing the psychometric properties of the general population measures, for example the Clinical Outcomes in Routine Evaluation-Outcome Measure (CORE-OM; Evans et al. 2002) and the Impact of Events Scale-Revised (IES-R; Weiss & Marmar 1997).
In terms of representativeness and generalisability, convenience samples were used in all of the included studies, which may have limited fairness concerning culture, gender and age.Data on ethnicity and age were not reported consistently in the studies.Further information may be found in the 'Sample characteristics' section of this review.

Feasibility
No information on the percentage of missing values was reported in the included papers.No information regarding the ease of administration, burden of completing the measure or acceptability was reported for the ADAMS or ADD.Briscoe et al. (2019) reported that participants indicated that they found completing the CORE-LD easier than the CORE-OM.In terms of the IES-IDs, professionals in an Adult Community Learning Disability Team were consulted to modify the language of the IES-R in order to ensure acceptability.An interviewer script was developed to increase the ease of administration so that the IES-IDs could be administered as a semi-structured interview.The LANTS was developed via consultation with a clinical sample, carers, advocates and clinicians, to ensure acceptability and inclusiveness.The authors reported that administration took between 10 and 20 min to complete.The Mini-MANS-LD was also developed following consultation with a group of experts by experience.Accessibility was enhanced by using pictures and colour-coded faces as prompts.The authors reported that it was rated by participants as 'easy to use' and acceptable to people with IDs and that administration took on average less than 12 min to complete.Information on the acceptability was not reported for the Mini PAS-ADD; however, the authors reported that interviewers were provided with training on how to administer the measure and that finding appropriate time to administer the tool was identified as a difficulty by informants.The PWI-ID included a pre-testing protocol to enhance ease of administration, by identifying the level of complexity respondents were able to use the scale.Participants were given the choice of an 11-, 5-, 3and 2-point scale.Administration took on average between 10 and 20 min.Finally, the developers of the SAINT consulted with professional experts and service user experts to enhance the acceptability and feasibility of the measure.

Discussion
The psychometric properties of nine measures were examined across the 10 papers considered.Although internal consistency was examined in eight of the nine measures, test-retest reliability was only assessed for five measures.Furthermore, criterion validity was only assessed for one measure.Six measures addressed at least one aspect of content validity.Convergent validity was examined for eight measures, although two measures were correlated with only one other measure.Factor analyses were attempted for four of the measures.Only three studies were considered to have an adequate sample size of more than 100 participants.
Based on the results of the present review, the CORE-LD, IES-IDs, LANTS and SAINT were deemed to have promising psychometric properties, as these measures had at least one rating of 'good' across both dimensions of reliability and at least one dimension of validity.Additionally, these measures were developed through consultation with mental health professionals and/or people with IDs; thus, they were deemed to have good content validity.Although Hermans & Evenhuis (2010) previously suggested that the ADD was a promising informant-report measure, the present review indicates a lack of evidence on the quality of this measure for the mild to moderate adult ID population.
Several limitations and strengths of the studies that validated these four measures should be noted.With regard to the CORE-LD, Brooks et al. (2013) noted that the sample size meant that it was not possible to establish a cutoff score or to investigate whether the measure was more appropriate for some groups of people with ID and not others.Briscoe et al. (2019) reported that the strength of the correlation coefficient between the CORE-LD and CORE-OM was lower compared with the correlation between other related measures, for example the GAS-ID and the Becks Anxiety Inventory (Beck et al. 1988;Mindham & Espie 2003) or the GDS-LD and the Becks Depression Inventory (Beck et al. 1996;Cuthill et al. 2003).A strength of the CORE-LD was the emphasis on inclusivity and collaboration in the development of the measure.Brooks et al. (2013) reported receiving feedback from the individuals with ID who were involved developing the measure, such as 'I have enjoyed every minute of this research' and 'I felt valued ' (p. 328).This is incredibly important given the barriers to participation in research that individuals with IDs face (Lennox et al. 2005).
In terms of the IES-IDs, Hall et al. (2014) reported that a limitation of their study was the small sample size, which meant that the factor structure of the measure could not be examined.However, the authors reported that a study strength was that the sample had experienced at least one traumatic event in their lives and were at risk of experiencing PTSD.This allowed an investigation into the whether there was a relationship between trauma frequency and symptomatology as measured by the IES-IDs, so that convergent validity could be assessed.The authors also contrasted the IES-IDs with the LANTS and noted that conceptually, the IES-IDs specifically assessed PTSD symptomology in response to a specific trauma, whereas the LANTS assessed more general trauma-related psychopathology, in addition to symptoms of anxiety and depression, which are comorbid with PTSD.
Regarding the LANTS, Wigham et al. (2011Wigham et al. ( , 2021) considered the inclusion of participants from both inpatient and community settings to be a study strength as this suggests that the LANTS may be utilised in both settings.Furthermore, the LANTS was developed via consultation with individuals with IDs, carers and clinicians, which supports their content validity.However, the sample recruited was 99% White British, which may limit the applicability for other ethnic groups.Wigham et al. (2011) also highlighted that the self-report version of the LANTS was only significantly correlated with one of the informant LANTS subscales and that the strength of this correlation was low.They posited that this may be because the two scales measured different aspects of trauma; the self-report version measured internal states whereas the informant scale measured observable behaviours.They suggested that construct validity was not compromised as both scales correlated significantly with the number of adverse life events experienced.This limitation is unlikely to be specific to this particular measure, and it has been suggested that the degree of convergence between self-report and informant-report scales may not reflect validity (Stancliffe 1995).This may be because informants cannot directly access the subjective experiences of individuals with IDs (Hartley & MacLean 2006).Although informant scales offer valuable information, it has been suggested that a mental health assessment of an individual with IDs should also include self-report questionnaires, which may add unique information about affective and cognitive symptoms which may not be apparent to caregivers (Mileviciute & Hartley 2015).A combination of these assessment approaches would enable clinicians to gain a more holistic understanding of an individual's presentation.Chaplin et al. (2013) reported that a limitation of the SAINT was that test-retest reliability was assessed on a small proportion (37%) of the participants (n = 20) and that retest data were collected on the same day.A strength of the study was that the convergent validity was examined using the GDS-LD and GAS-ID, which were reported in previous systematic reviews to have promising psychometric properties.Furthermore, the SAINT was developed through consultation with experts and service users.Chaplin et al. (2013) reported that an advantage of the SAINT was that it measured psychological distress more generally, rather than specific symptoms of depression and anxiety, which may present similarly in people with IDs.It is also unique, as the SAINT is not only a measure of distress but also covers specific coping strategies to reduce distress.
When considering the generalisability of the findings in the present review, it is important to note that the majority of participants were recruited from clinical or inpatient settings.For example, Briscoe et al. (2019) commented on their sample that comprised of forensic inpatients with IDs and other comorbidities.These samples may be unrepresentative of individuals with IDs in the general population as it is likely that these individuals experienced a higher level of distress compared with a community sample.

Limitations of the present review
Only studies that administered measures in the English language were included, and therefore, 33 articles that assessed the quality of measures administered in other languages were excluded.The decision was made to only include measures that were validated in the English language, to inform measurement choice for fellow English-speaking researchers and clinicians.When selecting measures for people with IDs in English-speaking countries, one may choose to translate measures that were validated in other languages, into English.However, it is recognised that cross-cultural adaptation of measures may be problematic for several reasons, such as the two languages having non-equivalent words, or items having very different meanings based on the specific cultural context (Epstein et al. 2015).Future researchers may wish to complete a review including reports on measures administered in languages other than English, to inform clinicians and researchers interested in selecting an appropriate measure in another language.
A further limitation was that the quality appraisal was somewhat limited by the lack of comprehensive evaluations of the psychometric properties of the included measures.There was very little information available on criterion and structural validity.Furthermore, little to no normative data from the general population were reported and there was a lack of information on the time taken to complete the measures or how they were scored.It was therefore difficult to make comparisons between measures as not all aspects of reliability and validity were assessed for each measure.This highlights the need for continued research efforts into the quality of measures available for people with IDs.
Additionally, the agreement between reviewers on the quality of measures, which may have provided support for the validity of the quality assessments, was not calculated.

Implications for research and practice
Researchers and clinicians may use the findings of this review to make informed decisions when choosing a mental health or wellbeing measure for adults with mild to moderate IDs.Mileviciute & Hartley (2015) reported that self-report questionnaires may capture internalised experiences of people with IDs, which may not be apparent to carers.Therefore, although informant questionnaires offer valuable information, assessments should also include self-reported information.The CORE-LD, IES-IDs, LANTS and SAINT all include self-report scales.The CORE-LD may be used by clinicians and researchers interested in measuring the wellbeing, psychosocial functioning and emotional difficulties experienced by adults with mild to moderate IDs.The SAINT self-report section forms part of a guided self-help tool for people with IDs and may be used to assist people with IDs in recognising and reporting symptoms that indicate mental distress.Finally, the IES-IDs and LANTS may be used to screen for symptoms of PTSD.The LANTS also assesses for comorbid symptoms of anxiety and depression and, additionally, allows informants to provide information based on their observations of individuals with IDs.
Although two measures of QoL were identified in the present review, a lack of psychometrically sound measures of mental wellbeing, encompassing dimensions of hedonia and eudaimonia, for adults with mild to moderate IDs, was observed.The CORE-LD only included one positively worded item, 'Have you felt happy with the things you have done?' and so this tool may not be sufficient for those interested in measuring positive aspects of mental wellbeing.The results from this review therefore have implications for research as they highlight a need to develop and validate measures of positive mental health, or mental wellbeing, in adults with mild to moderate IDs.
The present review also has implications for researchers who may be interested in adapting or evaluating measures for the ID population, as it has highlighted the impact of missing information on limiting psychometric research.Future researchers may wish to use the CAPs-IDD framework to guide their reporting and ensure that information pertaining to the development, administration, interpretation and feasibility of the measure is provided, in addition to the psychometric data.For example, many studies did not report how long a measure took to administer, which would be an important factor to consider when selecting a measure for this population.Additionally, providing a comprehensive description of the study sample, including a breakdown of the severity of ID, age range and ethnicity of participants, will enable readers to consider whether a measure may be suitable for a particular individual.Based on the findings of the present review, although internal consistency and convergent validity was often reported, fewer studies assessed test-retest reliability, criterion validity, content validity and structural validity.It is therefore recommended that future researchers endeavour to assess these dimensions of reliability and validity.

Conclusions
This review evaluated the psychometric properties of nine measures of common mental health problems and mental wellbeing in adults with mild to moderate IDs, administered in English.Four of these (CORE-LD, IES-IDs, LANTS and SAINT) were deemed to have promising psychometric properties.The results were limited by incomplete psychometric evaluations of measures, which made it difficult to compare measures.A paucity of psychometrically robust measures of mental wellbeing was observed.This review informs measurement choice for researchers and clinicians while highlighting a need for continued research efforts into the quality of measures available for people with IDs.

Figure 1 .
Figure 1.PRISMA flow diagram of study selection for review

Table 2
Criteria used to interpret the results of psychometric evaluations A summary of the sample characteristics is presented in TableS6.Most of the included studies were conducted in the UK (n = 7; 77.8%), in addition to one study that was conducted in Australia and another in the USA.Sample sizes ranged from 33 to 324.The percentage of male participants in each sample ranged from 40.7% to 85.9%.Seven studies (77.8%) reported the mean age of the participants, and this ranged between 33.0 and 45.6 years.The age range for participants was not reported in all of the papers, although they specified that only adults aged 18+ years were recruited.Participants were recruited from a range of sources, including clinical services for people with IDs, residential services and day centres.

Table 3
List of measures included in the present review., 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/jir.13018by University College London UCL Library Services, Wiley Online Library on [01/03/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License © 2023 The Authors.Journal of Intellectual Disability Research published by MENCAP and International Association of the Scientific Study of Intellectual and Developmental Disabilities and John Wiley & Sons Ltd.13652788

Table 4
Data on the reliability and validity of measures included in the present review

Table 4 .
(Continued) Abbreviations: CFI, comparative fit index; ICC, intraclass correlation coefficient; ND, no data; RMSEA, root mean square error of approximation.Measures: BLESID.Bangor Life Events Schedule for Intellectual Disabilities (Hulbert-Williams et al. 2011); BSI, Brief Symptom Inventory (Derogatis 1993); CORE-OM, Clinical Outcomes in Routine Evaluation -Outcome Measure (Evans et al. 2002); GAS, Glasgow Anxiety Scale; GDS, Glasgow Depression Scale; IES, Impact of Events Scale (Horowitz et al. © 2023 The Authors.Journal of Intellectual Disability Research published by MENCAP and International Association of the Scientific Study of Intellectual and Developmental Disabilities and John Wiley & Sons Ltd. 13652788, 0, Downloaded from https://onlinelibrary.wiley.com/doi/10.1111/jir.13018by University College London UCL Library Services, Wiley Online Library on [01/03/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License