This protocol is based on ‘Neuropsychological tests for the diagnosis of Alzheimer’s disease dementia and other dementias: a generic protocol for cross-sectional and delayed-verification studies’ (Davis 2013).
Target condition being diagnosed
Dementia is a progressive syndrome of global cognitive impairment. In the UK, affecting 5% of the population over 65 years of age and at least 25% of those over 85 years of age (Alzheimer's Society 2007). Worldwide, 36 million people were estimated to be living with dementia in 2010 (Wimo 2010), and this number will increase to more than 115 million by 2050, with the greatest increases in prevalence predicted to occur in the developing regions. Dementia encompasses a group of disorders characterised by progressive loss of both cognitive function and ability to perform activities of daily living that can be accompanied by neuropsychiatric symptoms and challenging behaviours of varying type and severity. The underlying pathology is usually degenerative, and subtypes of dementia include Alzheimer’s disease dementia (ADD), vascular dementia, dementia with Lewy bodies, and frontotemporal dementia.
The target conditions in this review will be dementia (all-cause) and any dementia subtype (e.g. Alzheimer, vascular, Lewy body dementia).
The index test is the Montreal Cognitive Assessment (MoCA) (Nasreddine 2005) for all-cause dementia and its subtypes.
The MoCA takes 10 minutes to administer (Ismail 2010). It assesses short-term memory, visuospatial function, executive function, attention, concentration and working memory, language, and orientation. The MoCA is regarded as an alternative to the Mini-Mental State Examination (MMSE) (Folstein 1975) since the latter is now copyrighted and there is a charge for its use. It is also considered to offer more detailed testing of executive function than the MMSE.
The MoCA is scored out of 30 points. The raw score is adjusted by educational attainment (1 extra point for 10 to 12 years of formal education; 2 points added for 4 to 9 years of formal education). A score of 26 or above is considered normal. A score below 26 indicates possible dementia. Any application of a different threshold will be noted in the analyses.
Three versions of the MoCA exist in English to minimise practice effects. Multiple translations are also available, as is a version for visually impaired persons (Wittich 2010). It is available online (www.mocatest.org).
Dementia usually develops over several years. Individuals, or their relatives, may notice subtle impairments of recent memory. Gradually, more cognitive domains become involved, and difficulty in planning complex tasks becomes increasingly apparent. Figure 1 gives an overview of a range of pathways through which individuals may present.
|Figure 1. Diagnostic pathways in dementia|
It is vital to appreciate that the pathway to dementia diagnosis influences the diagnostic test accuracy of the MoCA. This protocol will therefore separate the DTA analyses by population tested. Presenting the findings of the review by population emphasises that the utility of the test differs across different patient settings, and guidance is needed to decide where and how the test would best be used.
Developed in 2005, evidence for the utility of the MoCA in different settings is still emerging. Most diagnostic accuracy studies are likely to have been conducted in specialist clinics, and evidence is required for its use in hospital inpatient settings, in general outpatient settings, in primary care, and as a population screening tool. This protocol therefore stratifies all analyses based on the following four populations:
- Those with no memory complaints (population screening);
- Those presenting to primary care practitioners with subjective memory problems that have not been previously assessed;
- Those referred to a secondary care clinic for the specialist assessment of memory difficulties;
- Those tested during acute admission to a general hospital.
The severity (stage) of dementia at diagnosis will influence the utility of the MoCA. In later stages of the disorder, scores will be more specific for dementia, and MoCA may be used to aid diagnosis. In the earlier stages, lower scores will be more sensitive to dementia and MoCA may be used as a screening test.
In the UK, people usually first present to their general practitioner (Figure 1). One or more brief cognitive tests (including the index test) may be administered, and might result in a referral to a memory clinic for specialist diagnosis (Boustani 2003; Cordell 2013; Alzheimer's Society 2013).
However, many people with dementia may not present until later in the disorder and may follow a different pathway to diagnosis, e.g. referral to a community mental health team for individuals with complex problems otherwise unable to attend a memory clinic. Others may be identified during an assessment for an unrelated physical illness, e.g. during an outpatient appointment or an inpatient hospital admission.
In general, the role of non-specialist community services in dementia diagnosis is to recognise possible dementia and to refer on to appropriate care providers, though this may vary geographically (Greening 2009; Greaves 2010). Some community settings have a higher prevalence of dementia than others. For example, the pretest probability of prevalent dementia among residents in care homes is much higher than in the general population (Matthews 2002; Plassman 2007). This has led some to suggest that a cognitive assessment is made routinely for every person resident or admitted to a care home (Alzheimer's Society 2013). Through such an active case-finding strategy, a dementia diagnosis might be made outside the usual pathway.
Diagnostic assessment pathways vary across different countries, and diagnoses may be made by a variety of healthcare professionals including general practitioners, neurologists, psychiatrists, and geriatricians; thus we shall describe the target populations rather than the exact setting in order to facilitate generalisability of the results.
How might the index test improve diagnoses, treatments and patient outcomes?
The MoCA test may help identify people requiring specialist assessment and treatment for dementia. Some symptomatic treatments and cognitive-behavioural interventions are available (Birks 2006; McShane 2006; Bahar-Fuchs 2013). Furthermore, dementia diagnosed at an early stage can help patients, their families and potential carers access appropriate services and make timely plans for the future. Improved diagnostic accuracy might also reduce false positive diagnoses, which carry risk of significant costs (in the form of further unnecessary investigations or treatment) and harm (from side effects of investigations or treatment, or anxiety).
Outcomes for people with dementia in secondary care general hospital settings, including survival, length of stay and discharge to institutional care, are poor (RCPsych 2005; Sampson 2009; Zekry 2009) Accurate diagnosis may have specific benefits in addressing these adverse outcomes, in addition to facilitating access to the most suitable care and the use of non-pharmacological methods to manage behavioural and psychological symptoms of dementia.
The general rationale for the accurate diagnosis of dementia is detailed in the generic protocol (Davis 2013). Since the publication of the generic protocol, wide-ranging changes in policy with respect to case-finding and/or screening for dementia have resulted in an urgent need to evaluate the utility of the commonly-used cognitive tests for dementia.
Although dementia screening itself is not recommended by the United States Preventative Services Task Force (Boustani 2003) or the UK National Screening Committee, there appears to be a drift towards the opportunistic testing of older primary care attenders who have presented for reasons other than a memory complaint (Brunet 2012). The UK government has already incentivised screening for dementia on acute admission to secondary care service, and has proposed incentivising the identification of dementia in people in primary care settings (Dementia CQUIN 2012). In the USA, the Patient Protection and Affordable Care Act (2010) added an Annual Wellness Visit, which includes a mandatory assessment of cognitive impairment (Cordell 2013). With such strong policy drivers in play, the diagnostic test accuracy of the MoCA must be established, both in people who have presented with a memory problem and in those who are tested as part of a general ‘check-up’.
- To determine the diagnostic accuracy of the Montreal Cognitive Assessment (MoCA) at various thresholds for dementia and its subtypes, against a concurrently applied reference standard.
- To highlight the quality and quantity of research evidence available about the accuracy of the index test in the target population;
- To investigate the heterogeneity of test accuracy in the included studies;
- To identify gaps in the evidence and where further research is required.
Criteria for considering studies for this review
Types of studies
IInclusion criteria for studies in this review are based on the generic protocol (Davis 2013). We will consider studies that use cross-sectional study designs administering the index test to all study participants who are concurrently assessed for diagnosis either by an expert or by a trained researcher using a standardised diagnostic interview (or when application of the reference standard occurs within three months). These study designs include classic case-control studies, nested case-control studies and cohort studies. If no relevant nested case-control or cohort studies are found with these designs (that is, if only classic case-controls are available), then we will report our findings without a meta-analysis to avoid a biased estimate, acknowledging that case-controls studies provide the lowest quality evidence.
We will include all participants who meet criteria for inclusion (or registration) in memory clinic, general hospital, primary care and community populations, and are thus considered representative of the four target populations.
We will exclude studies carried out in selected populations, e.g. studies that include only people with Parkinson's disease or in post-stroke patients, or in studies investigating early-onset dementia. We will also exclude studies of participants with a secondary cause for cognitive impairment, e.g. current or history of alcohol/drug abuse, Central Nervous System trauma (e.g. subdural haematoma), tumour or infection.
Any form of the Montreal Cognitive Assessment (details in Background).
Dementia, and any dementia subtype.
We will include studies that have used a reference standard for all-cause dementia or any standardised definition of subtype as set out in the generic protocol. As detailed in the generic protocol (Davis 2013), a number of clinical reference standards exist for dementia (e.g. International Classification of Diseases (ICD), Diagnostic and Statistical Manual of the American Psychiatric Association (DSM)) and its subtypes ( National Institute of Neurological and Communicative Disorders and Stroke and the Alzheimer's Disease and Related Disorders Association (NINDS-ADRDA), National Institute of Neurological Disorders and Stroke and Association Internationale pour la Recherché et l'Enseignement en Neurosciences (NINDS-AIREN), McKeith, Lund criteria). This is a potential source of heterogeneity, which may be explored quantitatively if sufficient studies are identified (see below ‘Investigations of heterogeneity’). Specifically, studies applying criteria using biomarkers to support a diagnostic classification (National Institute on Aging-Alzheimer's Association (NIA-AA) criteria, DSM-5) must be examined separately.
Neuropathological diagnoses can only be made in the context of delayed-verification studies and therefore will not form part of this protocol.
Search methods for identification of studies
We will search MEDLINE (OvidSP), EMBASE (OvidSP), BIOSIS previews (Web of Knowledge), Science Citation Index (ISI Web of Knowledge), PsycINFO (OvidSP) and LILACS (Bireme). See Appendix 1 for a proposed draft strategy to be run in MEDLINE (OvidSP). We will design similarly structured search strategies using search terms appropriate for each database. We will use controlled vocabulary such as MeSH terms and EMTREE where appropriate. There will be no attempt to restrict studies based on sampling frame or setting in the searches developed. We will not use search filters (collections of terms aimed at reducing the number needed to screen) as an overall limiter because those published have not proved sensitive enough (Whiting 2011b). We will not apply any language restriction to the electronic searches, and will use translation services as necessary.
A single researcher with extensive experience of systematic review will conduct the initial searches.
Searching other resources
We will check the reference lists of all relevant papers for additional studies. We will also search:
- MEDION database (Meta-analyses van Diagnostisch Onderzoek: www.york.ac.uk/inst/crd/crddatabases.html);
- DARE (Database of Abstracts of Reviews of Effects: www.york.ac.uk/inst/crd/crddatabases.html);
- HTA Database (Health Technology Assessments Database, via The Cochrane Library);
- ARIF database (Aggressive Research Intelligence Facility: www.arif.bham.ac.uk).
We will use relevant studies in PubMed to search for additional studies using the 'related articles' feature. We will examine key studies in citation databases such as Science Citation Index and Scopus to ascertain any further relevant studies. We will identify some grey literature through Web of Science Conference Proceedings. We will aim to access theses or PhD abstracts from institutions known to be involved in prospective dementia studies. We will also attempt to contact researchers involved in studies with possibly relevant but unpublished data. We will not perform handsearching as there is little published evidence of the benefits of handsearching for reports of DTA studies at present (Glanville 2012).
Data collection and analysis
Selection of studies
Figure 2 shows a flowchart for inclusion of studies in the review. Inclusion criteria are:
|Figure 2. Flow diagram for inclusion of articles in review|
- Participants as described above;
- Reference standard as described above;
- MoCA is used as an index test (may be as one of many tests).
Exclusion criteria are:
- Participants from selected populations (e.g. post-head injury, stroke patients, people with Parkinson's disease only);
- If diagnostic accuracy cannot be calculated because index test and reference standard are not applied to both cases and controls.
Data extraction and management
We will extract data on study characteristics to a study-specific pro forma and will include data for the assessment of quality and for the investigation of heterogeneity (details are given in Appendix 2). We will pilot the pro forma against 10 primary diagnostic studies.
Two review authors will extract data. The results will be dichotomised if necessary and cross-tabulated in two-by-two tables of index test result (positive or negative) against target disorder (positive or negative), and we will transfer the results directly into Review Manager 5 (RevMan) tables.
Assessment of methodological quality
We will assess the methodological quality of each study using the QUADAS-2 tool (Whiting 2011a) as recommended by The Cochrane Collaboration.
Operational definitions describing the use of QUADAS-2 for MoCA are detailed in Appendix 3.
Statistical analysis and data synthesis
The target condition comprises two categories: (1) All-cause dementia, and (2) Dementia subtype. Studies may detail one or both outcomes.
For all included studies we will use the data in the two-by-two tables (showing the binary test results cross-classified with the binary reference standard) to calculate the sensitivities and specificities, with their 95% confidence intervals. We will present individual study results graphically by plotting estimates of sensitivities and specificities in both a forest plot and in receiver operating characteristic (ROC) space. We will consider these findings in the light of the previous systematic assessment (using QUADAS-2) of the methodological quality of individual studies. We will use RevMan software for these descriptive analyses, and to produce summary ROC curves. If more than one threshold is reported in an individual study, then we will present the graphical findings for all thresholds reported. However, we will avoid the study data being included in the calculation of a summary statistic on more than one occasion (in the same setting), by using only the threshold which is considered to be 'standard practice' for the target population in question. We will not pool studies across settings. If there is no agreed standard practice for the index test and target population in question then we will use the optimal threshold (the threshold nearest to the upper left corner of the ROC curve) in the calculation of the summary ROC curve in RevMan and for any subsequent meta-analysis; we recognise that this may lead to an overestimate of diagnostic accuracy (Leeflang 2008).
We will perform meta-analysis on pairs of sensitivity and specificity if it is appropriate to pool the data. Once the relevant studies have been identified, it will be clear if the majority of the studies report results with consistent thresholds. If so, a bivariate random-effects approach based on pairs of sensitivity and specificity using a bivariate random-effects model may be appropriate (Reitsma 2005). This approach enables us to calculate summary estimates of sensitivity and specificity, while correctly dealing with the different sources of variation: (i) imprecision by which sensitivity and specificity have been measured within each study; (ii) variation beyond chance in sensitivity and specificity between studies; (iii) any correlation that might exist between sensitivity and specificity. Categorised covariates can be incorporated in the bivariate model to examine the effect of potential sources of bias and variation across subgroups of studies as outlined in the Cochrane DTA Handbook Chapter 10. Because of the bivariate nature of the model, effects on sensitivity and specificity can be modelled separately. The results of the bivariate model can be processed to calculate likelihood ratios. If appropriate, we will use these to calculate post-test probabilities for different pretest probabilities derived from observed population prevalence estimates.
If different thresholds are reported, we will use hierarchical summary ROC models (Cochrane DTA Handbook Chapter 10). We will assess model fit by using likelihood ratio tests.
We will use Stata software, version 12.1 (StataCorp, Texas), to carry out the additional analyses using either the bivariate or HSROC approaches.
Investigations of heterogeneity
There are many potential sources of heterogeneity for this review (see Davis 2013 for a more extensive account). We intend to investigate heterogeneity due to the MoCA threshold score used to diagnose possible all-cause dementia or its subtypes, the reference standard used and the severity of the target disorder. It may be that heterogeneity due to disease severity is addressed principally in the QUADAS assessments of spectrum bias. Though likely to be important, training required for test administration may not be well reported or easily operationalised.
Differences in test utility are expected a priori in the four identified target populations described above and will be presented separately.
We will investigate potential sources of heterogeneity in the meta-analyses of high quality studies (cross-sectional cohort or nested case-control studies) for each of the target populations. It is likely that there will be only a handful of studies that are sufficiently robust to be included in the meta-analyses, which will allow only one or two sources of heterogeneity to be explored (due to insufficient data). Potential sources of heterogeneity include the following:
In addition to the four identified target populations (for which all analyses will be separated), differences may occur in studies performed in different countries where the background levels of education and prevalence of disease may affect the performance of the MoCA test. Furthermore, we will also examine for effects of differences in age and sex distribution in the target populations.
Another source of heterogeneity will be differences in the use and application of the index test. We will also analyse use of different language versions as a source of variation. The characteristics of the examiner may also result in differences in test application and outcome, such as whether there was training prior to the test.
We will also examine differences in thresholds used for inclusion in the study; some studies may use higher or lower thresholds for inclusion. We will address heterogeneity due to disease severity principally in the QUADAS assessments of spectrum bias.
We will only consider studies rated as high quality design (not classic case-control) by the QUADAS-2 tool for meta-analysis. We will explicitly consider incorporation bias in studies where examiners had prior knowledge of the results of the index test (not ‘blinded’) and may be more likely to diagnose the participant with or without dementia.
We will perform sensitivity analysis to determine the effect of excluding studies that are deemed to be at high risk of bias according to the QUADAS-2 checklist. Additionally, we will perform sensitivity analyses to determine the effect of excluding studies that were ﬂagged as possibly being less appropriate for inclusion (when disagreement between authors could not be resolved). Primary analysis will include all studies; sensitivity analysis will exclude studies of low quality (high likelihood of bias) to determine whether the results are inﬂuenced by inclusion of lower-quality studies.
Assessment of reporting bias
Quantitative methods for exploring reporting bias are not well established for studies of DTA. Speciﬁcally, funnel plots of the diagnostic odds ratio (DOR) versus the standard error of this estimate will not be considered..
Appendix 1. Search strategy (Medline OvidSP)
Search narrative: this is a single concept search using only the index test. This was felt to be the simplest and most sensitive approach.
1. "montreal cognitive assessment*".mp.
3. 1 or 2
In addition to the above single concept search based on the Index test, the Cochrane Dementia and Cognitive Improvement Group run a more complex, multi-concept search each month primarily for the identification of diagnostic test accuracy studies of neuropsychological tests. Where possible the full texts of the studies identified are obtained. This approach is expected to help identify those papers where the index test of interest (in this case MoCA) is used and the paper contains usable data but where MoCA was not alluded to in the report's citation.
The strategy used is below:
The MEDLINE search uses the following concepts:
A Specific neuropsychological tests
B General terms (both free text and MeSH) for tests/testing/screening
C Outcome: dementia diagnosis (unfocused MeSH with diagnostic sub-headings)
D Condition of interest: Dementia (general dementia terms both free text and MeSH – exploded and unfocused)
E Methodological filter: NOT used to limit all search
1. (A OR B) AND C
2. (A OR B) AND D AND E
3. A AND E
= 1 OR 2 OR 3
Setting not included as a concept in the MEDLINE search as these terms are generally not indexed well or consistently. This means that the search has been kept deliberately sensitive by not restricting it to a particular setting.
The search strategy
1. "word recall".ti,ab.
2. ("7-minute screen" OR “seven-minute screen”).ti,ab.
3. ("6 item cognitive impairment test" OR “six-item cognitive impairment test”).ti,ab.
4. "6 CIT".ti,ab.
5. "AB cognitive screen".ti,ab.
6. "abbreviated mental test".ti,ab.
9. "inform* interview".ti,ab.
10. "animal fluency test".ti,ab.
11. "brief alzheimer* screen".ti,ab.
12. "brief cognitive scale".ti,ab.
13. "clinical dementia rating scale".ti,ab.
14. "clinical dementia test".ti,ab.
15. "community screening interview for dementia".ti,ab.
16. "cognitive abilities screening instrument".ti,ab.
17. "cognitive assessment screening test".ti,ab.
18. "cognitive capacity screening examination".ti,ab.
19. "clock drawing test".ti,ab.
20. "deterioration cognitive observee".ti,ab.
21. ("Dem Tect" OR DemTect).ti,ab.
22. "object memory evaluation".ti,ab.
24. "mattis dementia rating scale".ti,ab.
25. "memory impairment screen".ti,ab.
26. "minnesota cognitive acuity screen".ti,ab.
28. "mini-mental state exam*".ti,ab.
30. "modified mini-mental state exam".ti,ab.
32. “neurobehavio?ral cognitive status exam*”.ti,ab.
34. "quick cognitive screening test".ti,ab.
36. "rapid dementia screening test".ti,ab.
38. "repeatable battery for the assessment of neuropsychological status".ti,ab.
40. "rowland universal dementia assessment scale".ti,ab.
42. "self-administered gerocognitive exam*".ti,ab.
43. ("self-administered" and "SAGE").ti,ab.
44. "self-administered computerized screening test for dementia".ti,ab.
45. "short and sweet screening instrument".ti,ab.
47. "short cognitive performance test".ti,ab.
48. "syndrome kurztest".ti,ab.
49. ("six item screener" OR “6-item screener”).ti,ab.
50. "short memory questionnaire".ti,ab.
51. ("short memory questionnaire" and "SMQ").ti,ab.
52. "short orientation memory concentration test".ti,ab.
54. "short blessed test".ti,ab.
55. "short portable mental status questionnaire".ti,ab.
57. "short test of mental status".ti,ab.
58. "telephone interview of cognitive status modified".ti,ab.
60. "trail making test".ti,ab.
61. "verbal fluency categories".ti,ab.
62. "WORLD test".ti,ab.
63. "general practitioner assessment of cognition".ti,ab.
65. "Hopkins verbal learning test".ti,ab.
67. "time and change test".ti,ab.
68. "modified world test".ti,ab.
69. "symptoms of dementia screener".ti,ab.
70. "dementia questionnaire".ti,ab.
72. ("concord informant dementia scale" or CIDS).ti,ab.
73. (SAPH or "dementia screening and perceived harm*").ti,ab.
75. exp Dementia/
76. Delirium, Dementia, Amnestic, Cognitive Disorders/
80. ("lewy bod*" or DLB or LBD or FTD or FTLD or “frontotemporal lobar degeneration” or “frontaltemporal dement*).ti,ab.
81. "cognit* impair*".ti,ab.
82. (cognit* adj4 (disorder* or declin* or fail* or function* or degenerat* or deteriorat*)).ti,ab.
83. (memory adj3 (complain* or declin* or function* or disorder*)).ti,ab.
85. exp "sensitivity and specificity"/
86. "reproducibility of results"/
87. (predict* adj3 (dement* or AD or alzheimer*)).ti,ab.
88. (identif* adj3 (dement* or AD or alzheimer*)).ti,ab.
89. (discriminat* adj3 (dement* or AD or alzheimer*)).ti,ab.
90. (distinguish* adj3 (dement* or AD or alzheimer*)).ti,ab.
91. (differenti* adj3 (dement* or AD or alzheimer*)).ti,ab.
96. (ROC or "receiver operat*").ab.
97. Area under curve/
98. ("Area under curve" or AUC).ab.
99. (detect* adj3 (dement* or AD or alzheimer*)).ti,ab.
102. (likelihood adj3 (ratio* or function*)).ab.
103. (conver* adj3 (dement* or AD or alzheimer*)).ti,ab.
104. ((true or false) adj3 (positive* or negative*)).ab.
105. ((positive* or negative* or false or true) adj3 rate*).ti,ab.
107. exp dementia/di
108. Cognition Disorders/di [Diagnosis]
109. Memory Disorders/di
111. *Neuropsychological Tests/
113. Geriatric Assessment/mt
114. *Geriatric Assessment/
115. Neuropsychological Tests/mt, st
116. "neuropsychological test*".ti,ab.
117. (neuropsychological adj (assess* or evaluat* or test*)).ti,ab.
118. (neuropsychological adj (assess* or evaluat* or test* or exam* or battery)).ti,ab.
119. Self report/
120. self-assessment/ or diagnostic self evaluation/
121. Mass Screening/
122. early diagnosis/
124. 74 or 123
125. 110 and 124
126. 74 or 123
127. 84 and 106 and 126
128. 74 and 106
129. 125 or 127 or 128
130. exp Animals/ not Humans.sh.
131. 129 not 130
Appendix 2. Information for extraction to pro forma
Bibliographic details of primary paper:
- Author, title of study, year and journal
Details of index test:
- Method of [index test] administration, including who administered and interpreted the test, and their training
- Thresholds used to define positive and negative tests
- Reference standard used
- Method of [reference standard] administration, including who administered the test and their training
- Number of participants
- Other characteristics e.g. ApoE status
- Settings: i) community; ii) primary care; iii) secondary care outpatients; iv) secondary care inpatients and residential care
- Participant recruitment
- Sampling procedures
- Time between index test and reference standard
- Proportion of people with dementia in sample
- Subtype and stage of dementia if available
- MCI definition used (if applicable)
- Duration of follow-up in delayed verification studies
- Attrition and missing data
Appendix 3. Assessment of methodological quality QUADAS-2
Appendix 4. Anchoring statements for quality assessment of MoCA diagnostic studies
We provide some core anchoring statements for quality assessment of diagnostic test accuracy reviews of the MoCA in dementia. These statements are designed for use with the QUADAS-2 tool and were derived during a two-day, multidisciplinary focus group in 2010. If a QUADAS-2 signalling question for a specific domain is answered 'yes' then the risk of bias can be judged to be 'low'. If a question is answered 'no' this indicates a potential risk of bias. The focus group was tasked with judging the extent of the bias for each domain. During this process it became clear that certain issues were key to assessing quality, whilst others were important to record but less important for assessing overall quality. To assist, we describe a 'weighting' system. Where an item is weighted 'high risk' then that section of the QUADAS-2 results table is judged to have a high potential for bias if a signalling question is answered 'no'. For example, in dementia diagnostic test accuracy studies, ensuring that clinicians performing dementia assessment are blinded to results of index test is fundamental. If this blinding was not present then the item on reference standard should be scored 'high risk of bias', regardless of the other contributory elements. Where an item is weighted 'low risk' then it is judged to have a low potential for bias if a signalling question for that section of the QUADAS-2 results table is answered 'no'. Overall bias will be judged on whether other signalling questions (with a high risk of bias) for the same domain are also answered 'no'.
In assessing individual items, the score of unclear should only be given if there is genuine uncertainty. In these situations review authors will contact the relevant study teams for additional information.
Anchoring statements to assist with assessment for risk of bias
Domain 1: Participant selection
Risk of bias: could the selection of participants have introduced bias? (high/low/unclear)
Was a consecutive or random sample of participants enrolled?
Where sampling is used, the methods least likely to cause bias are consecutive sampling or random sampling, which should be stated and/or described. Non-random sampling or sampling based on volunteers is more likely to be at high risk of bias.
Weighting: High risk of bias
Was a case-control design avoided?
Case-control study designs have a high risk of bias, but sometimes they are the only studies available, especially if the index test is expensive and/or invasive. Nested case-control designs (systematically selected from a defined population cohort) are less prone to bias but they will still narrow the spectrum of participants that receive the index test. Study designs (both cohort and case-control) that may also increase bias are those designs where the study team deliberately increase or decrease the proportion of participants with the target condition, for example a population study may be enriched with extra dementia participants from a secondary care setting.
Weighting: High risk of bias
Did the study avoid inappropriate exclusions?
The study will be automatically graded as unclear if exclusions are not detailed (pending contact with study authors). Where exclusions are detailed, the study will be graded as 'low risk' if exclusions are felt to be appropriate by the review authors. Certain exclusions common to many studies of dementia are: medical instability; terminal disease; alcohol/substance misuse; concomitant psychiatric diagnosis; other neurodegenerative condition. However if 'difficult to diagnose' groups are excluded this may introduce bias, so exclusion criteria must be justified. For a community sample we would expect relatively few exclusions. Post hoc exclusions will be labelled 'high risk' of bias.
Weighting: High risk of bias
Applicability: are there concerns that the included patients do not match the review question? (high/low/unclear)
The included patients should match the intended population as described in the review question. If not already specified in the review inclusion criteria, setting will be particularly important – the review authors should consider population in terms of symptoms; pre-testing; potential disease prevalence. Studies that use very selected subjects or subgroups will be classified as low applicability, unless they are intended to represent a defined target population, for example, people with memory problems referred to a specialist and investigated by lumbar puncture.
Domain 2: Index Test
Risk of bias: could the conduct or interpretation of the index test have introduced bias? (high/low/unclear)
Were the index test results interpreted without knowledge of the reference standard?
Terms such as 'blinded' or 'independently and without knowledge of' are sufficient, and full details of the blinding procedure are not required. This item may be scored as 'low risk' if explicitly described or if there is a clear temporal pattern to the order of testing that precludes the need for formal blinding, i.e. all [neuropsychological test] assessments were performed before the dementia assessment. As most neuropsychological tests are administered by a third party, knowledge of dementia diagnosis may influence their ratings; tests that are self-administered, for example using a computerised version, may have less risk of bias.
Weighting: High risk
Were the index test thresholds prespecified?
For neuropsychological scales there is usually a threshold above which participants are classified as 'test positive'; this may be referred to as threshold, clinical cut-off or dichotomisation point. Different thresholds are used in different populations. A study is classified at higher risk of bias if the authors define the optimal cut-off post hoc based on their own study data. Certain papers may use an alternative methodology for analysis that does not use thresholds and these papers should be classified as not applicable.
Weighting: Low risk
Were sufficient data on [neuropsychological test] application given for the test to be repeated in an independent study?
Particular points of interest include method of administration (for example self-completed questionnaire versus direct questioning interview); nature of informant; language of assessment. If a novel form of the index test is used, for example a translated questionnaire, details of the scale should be included and a reference given to an appropriate descriptive text, and there should be evidence of validation.
Weighting: Low risk
Applicability: are there concerns that the index test, its conduct, or interpretation differ from the review question? (high/low/unclear)
Variations in the length, structure, language and/or administration of the index test may all affect applicability if they vary from those specified in the review question.
Domain 3: Reference Standard
Risk of bias: could the reference standard, its conduct, or its interpretation have introduced bias? (high/low/unclear)
Is the reference standard likely to correctly classify the target condition?
Commonly-used international criteria to assist with clinical diagnosis of dementia include those detailed in DSM-IV and ICD-10. Criteria specific to dementia subtypes include but are not limited to NINCDS-ADRDA criteria for Alzheimer’s dementia; McKeith criteria for Lewy Body dementia; Lund criteria for frontotemporal dementias; and the NINDS-AIREN criteria for vascular dementia. Where the criteria used for assessment are not familiar to the review authors and the Cochrane Dementia and Cognitive Improvement Group, this item should be classified as 'high risk of bias'.
Weighting: High risk
Were the reference standard results interpreted without knowledge of the results of the index test?
Terms such as 'blinded' or 'independent' are sufficient, and full details of the blinding procedure are not required. This may be scored as 'low risk' if explicitly described or if there is a clear temporal pattern to order of testing, i.e. all dementia assessments performed before [neuropsychological test] testing.
Informant rating scales and direct cognitive tests present certain problems. It is accepted that informant interview and cognitive testing is a usual component of clinical assessment for dementia; however, specific use of the scale under review in the clinical dementia assessment should be scored as high risk of bias.
Weighting: High risk
Was sufficient information on the method of dementia assessment given for the assessment to be repeated in an independent study?
Particular points of interest for dementia assessment include the training/expertise of the assessor; and whether additional information was available to inform the diagnosis (e.g. neuroimaging; other neuropsychological test results), and whether this was available for all participants.
Weighting: Variable risk, but high risk if method of dementia assessment not described.
Applicability: are there concerns that the target condition as defined by the reference standard does not match the review question? (high/low/unclear)
There is the possibility that some methods of dementia assessment, although valid, may diagnose a far smaller or larger proportion of participants with disease than in usual clinical practice. In this instance the item should be rated poor applicability.
Domain 4: Patient flow and timing (n.b. refer to, or construct, a flow diagram)
Risk of bias: could the patient flow have introduced bias? (high/low/unclear)
Was there an appropriate interval between the index test and reference standard?
For a cross sectional study design, there is potential for the subject to change between assessments, however dementia is a slowly progressive disease, which is not reversible. The ideal scenario would be a same day assessment, but longer periods of time (for example, several weeks or months) are unlikely to lead to a high risk of bias. For delayed-verification studies the index and reference tests are necessarily separated in time given the nature of the condition.
Weighting: Low risk
Did all subjects receive the same reference standard?
There may be scenarios where those who score 'test positive' on the index test have a more detailed assessment for the target condition. Where dementia assessment (or reference standard) differs between participants this should be classified as high risk of bias.
Weighting: High risk
Were all participants included in the final analysis?
Attrition will vary with study design. Delayed verification studies will have higher attrition than cross-sectional studies due to mortality, and it is likely to be greater in participants with the target condition. Drop-outs (and missing data) should be accounted for. Attrition that is higher than expected (compared to other similar studies) should be treated as at high risk of bias. We have defined a cut-off of greater than 20% attrition as being high risk but this will be highly dependent on the length of follow-up in individual studies.
Weighting: High risk
Declarations of interest