The most common cause of dementia in the general population is Alzheimer’s disease (AD). It is useful to distinguish the term Alzheimer's disease, which refers to underlying pathology, and Alzheimer's disease dementia (ADD), which is the final stage of a clinical syndrome associated with the pathology.
Alzheimer’s disease dementia afflicts 5% of men and 6% of women over the age of 60 worldwide (WHO 2010). Its prevalence increases exponentially with age as Alzheimer’s dementia affects fewer than 1% of people 60 to 64 years old, but 24% to 33% in those over 85 (Ferri 2005). The earliest symptoms of Alzheimer's disease dementia include short-term memory loss, a gradual decline in other cognitive abilities and behavioural changes. Cortical intracellular neurofibrillary tangles (NFT) and extracellular β-amyloid (Aβ) plaques (Braak 1991) represent the neuropathological features of Alzheimer's disease dementia and are responsible for synapse dysfunction, neuronal cell loss and consequent brain atrophy (Ballard 2011). According to the National Institute of Neurological and Communicative Disorders and Stroke and the Alzheimer's Disease and Related Disorders Association (NINCDS-ADRDA) criteria, definite Alzheimer's diseases dementia can only be diagnosed following neuropathological examination of brain tissue, obtained by biopsy or autopsy.
Mild cognitive impairment (MCI) represents a possible intermediary condition between normal cognition and dementia (Morris 2001; Petersen 2009). Currently, 16 different classifications are used to define MCI (Matthews 2008). The different definitions of MCI are based on general criteria that include a cognitive complaint (self-reported and/or informant), preserved basic activities of daily living, cognitive impairment (not normal for age and education) or decline in cognition evidenced by performance on objective cognitive tasks, absence of dementia (Petersen 2004; Winblad 2004). In this protocol MCI refers to the clinical criteria defined by Petersen and Winbald (Petersen 1999; Petersen 2001; Petersen 2004; Winblad 2004) or Cognitive Dementia Rating (clinical dementia rating (CDR) = 0.5) scale (Morris 1993) or any of the 16 descriptions of MCI reported by Matthews (Matthews 2008).
There are four outcomes for those within an MCI population: progression to Alzheimer's disease dementia, progression to another dementia, maintaining stable MCI or recovery. An early identification of those subjects who would convert from MCI to Alzheimer’s disease dementia and other forms of dementia may improve the opportunities for early intervention and might help their carers to plan the future. However, current data in the medical literature are still not adequate to guide clinicians and researchers in understanding the progression of dementia. There is no clinical method to predict the possible conversion of subjects with MCI to Alzheimer's disease dementia or other dementias. Studies (Bruscoli 2004; Mattsson 2009; Petersen 1999; Petersen 2009) indicate that an annual average of 10% to 15% of MCI patients progress to Alzheimer's disease dementia. This all depends on clinical profile, settings and investigation for vascular disease.
Thus, the improvement of diagnostic accuracy is critical for management and treatment of Alzheimer’s disease dementia and other dementias. Research suggests that measurable change in positron emission tomography (PET), magnetic resonance imaging (MRI) and cerebrospinal fluid (CSF) biomarkers occurs years in advance of the onset of clinical symptoms (Beckett 2010).
This protocol focuses on the relation between the
Target condition being diagnosed
The primary target condition is Alzheimer's disease dementia. The diagnosis is based on the exclusion of other causes of dementia through clinical, paraclinical and neuropsychological investigations criteria as indicated in the NINCDS-ADRDA guidelines (McKhann 1984). Exclusion of other diseases such as depression, hypothyroidism, and non-AD brain lesions is a fundamental part of the diagnostic process (McKhann 1984). A standard diagnostic practice is based on clinical examinations and neurological and mental status examination of the patient. Moreover, the standard diagnostic practice includes caregiver or family member interviews, focusing on progressive cognitive impairments and behavioural changes associated with the disease.
The secondary target condition is any other forms of dementia, including all-cause dementia (APA 1987; APA 1994), vascular dementia (Román 1993), dementia with Lewy bodies (McKeith 2006) and frontotemporal dementia (Lund Manchester 1994; Neary 1998).
PET represents a unique diagnostic nuclear medicine modality of well-documented accuracy. It assesses pathophysiologic and chemical processes by using radiopharmaceuticals that mimic endogenous molecules. Depending on the distribution of the radiotracer in the human body, images are produced and diagnostic information acquired. Kinetic information may also be available.
The FDG-PET pattern for MCI is not so consistent, which is unsurprising, due to the variable physical history of the disorder. However, MCI patients usually present on PET with mild global and regional hypometabolism (Mosconi 2009). FDG-PET studies have found characteristic and progressive cerebral metabolic rate for glucose (CMRgl) reductions in posterior cingulate, precuneus, parietal, temporal and frontal regions in both ADD and MCI patients, with the findings being more pronounced in MCI patients who eventually converted to ADD (Chen 2010; Morbelli 2010; Patterson 2010). Moreover, a growing body of FDG-PET studies have been carried out specifically in order to evaluate the correlation between glucose metabolism impairment and the progression from MCI to ADD and other dementias. These studies suggest that certain findings on brain PET scans can potentially predict the decline of MCI to ADD. In agreement with this, a recent meta-analysis pointed out that MCI-converter patients, in comparison with subjects who did not convert to ADD, showed hypometabolism/hypoperfusion in the parietal lobe (Schroeter 2009).
The development and utilisation in recent years of new software tools for image analysis have helped in the direction of carrying out many brain FDG-PET studies. These software applications have enabled the quantification of brain PET scans, achieving objective evaluation and thus increasing the physicians’ interpretative confidence. Although subjective (visual) interpretation of the brain scan is usually the standard in clinical practice, the addition of quantitative information is crucial in such studies since it improves the diagnostic accuracy (Patterson 2010).
Dementia develops over a trajectory of several years. There is a presumed period when people are asymptomatic, and when pathology is accumulating. Individuals or their relatives may then notice subtle impairments of recent memory. Gradually, more cognitive domains become involved, and difficulty planning complex tasks becomes increasingly apparent. In the UK, people usually present to their general practitioner, who may administer some neuropsychological tests, and will potentially refer them to a hospital memory clinic. However many people with dementia do not present until much later in the disorder and will follow a different pathway to diagnosis, for example being identified during an admission to general hospital for a physical illness. Thus the pathway influences the accuracy of the diagnostic test. The accuracy of the test will vary with the experience of the administrator and the accuracy of the subsequent diagnosis will vary with the history of referrals to the particular healthcare setting. Diagnostic assessment pathways may vary in other countries and diagnoses may be made by a variety of specialists including neurologists and geriatricians.
We will not include alternative tests in this review because there are currently no standard practice tests available for the diagnosis of dementia.
The Cochrane Dementia and Cognitive Improvement Group (CDCIG) is in the process of conducting a series of diagnostic test accuracy reviews of biomarkers and scales (see list below). Although we are conducting reviews on individual tests compared to a reference standard, we plan to compare our results in an overview.
11C-PIB-PET(Positron emission tomography Pittsburg Compound-B)
- CSF (Cerebrospinal fluid analysis of abeta and tau)
- sMRI (structural magnetic resonance imaging)
- Neuropsychological tests (MMSE; MiniCOG; MoCA)
- Informant interviews (IQCODE; AD8)
- APOE e4
- FP-CIT SPECT (Fluoropropil-Carbomethoxy-lodophenil-Tropane Single-photon emission tomography)
According to the latest revised NINCDS-ADRA diagnostic criteria for Alzheimer's disease dementia of the National Institute on Aging and Alzheimer Association (Albert 2011; Dubois 2010; McKhann 2011; Sperling 2011), the confidence in diagnosing MCI due to Alzheimer's disease dementia is raised with the application of biomarkers based on imaging or CSF measures. These tests, added to core clinical criteria, might increase the sensitivity or specificity of a testing strategy. However, it is crucial that each of these biomarkers is assessed for their diagnostic accuracy before they are adopted as routine add-on tests in clinical practice.
- To determine the diagnostic accuracy of the
18F-FDG-PET index test for detecting people with MCI at baseline who would clinically convert to Alzheimer’s disease or other forms of dementia at follow-up.
- To investigate heterogeneity of test accuracy in the included studies.
We expect that heterogeneity will be likely and that it will be an important component of the review. The potential sources of heterogeneity, which will be used as a framework for the investigation of heterogeneity, include target population, index test, target disorder and study quality and are detailed in the analysis section.
Criteria for considering studies for this review
Types of studies
We will consider longitudinal cohort studies in which index test results are obtained at baseline and the reference standard results at follow-up (see below for detail about the nature of the index test and reference standard). These studies necessarily employ delayed verification of conversion to dementia and are sometimes labelled as ‘delayed verification cross-sectional studies’ (Bossuyt 2004; Bossuyt 2008; Knottnerus 2002).
We will include case-control studies if they incorporate a delayed verification design. We believe this can only occur in the context of a cohort study, so these studies are invariably diagnostic nested case-control studies.
Participants recruited and clinically classified as those with mild cognitive impairment (MCI) at baseline will be eligible for this review. Studies using the Petersen or revised Petersen criteria (Petersen 1999; Petersen 2004; Winblad 2004) or the Cognitive Dementia Raiting (CDR = 0.5) scale (Morris 1993) or any of the 16 different classifications of MCI described by Matthews 2008 will be included. The diagnostic criteria for MCI are presented in the Additional tables ( Table 1 and Table 2).
We will exclude those studies that involved people with MCI possibly caused by: i) a current or history of alcohol/drug abuse; ii) Central Nervous System trauma (e.g. subdural haematoma), tumour or infection; iii) other neurological conditions e.g. Parkinson’s or Huntington’s diseases.
There are currently no generally accepted standards for FDG positivity threshold, and therefore we will use the criteria which were applied in each included primary study to classify participants as either
A range of thresholds have been used in primary research, for instance: i) 'the regional cerebral glucose metabolism ratio (rCGM-r) is lower than 80% of whole brain mean of control subjects' (Chételat 2003); ii) 'the rCGM-r of temporoparietal and posterior cingulate < 1.3 - 8' (Anchisi 2005).
The use of any image analysis technique, FDG injection dose, the time between FDG injection and PET acquisition, and FDG reduction regions (e.g. parietal, temporal, frontal lobes, posterior cingulated, precuneus) will be included. The exact administered FDG activity does not affect the PET examination (as long as it ranges between the accepted limits for acquiring proper images) as this can be compensated by the duration of the scan; the number of counts detected by the scanner is the key finding.
The accepted limits of administered activity are defined by guidelines published by the Nuclear Medicine Societies. The two major ones are the Society of Nuclear Medicine (SNM, USA) (Waxman 2009) and the European Association of Nuclear Medicine (EANM, Europe) (Varrone 2009). According to SNM, the recommended FDG activity in adults for brain PET is 185 - 740 MBq (or 5 - 20 mCi). According to EANM, the recommended administered activity for adults is 300 – 600 MBq (typically 370 MBq) in 2-D mode and 125 – 250 MBq (typically 150 MBq) in 3-D mode.
The differences in exact timing of image acquisition also do not influence the study, as long as the acquisition does not start earlier than 30 minutes after FDG injection. It is recommended, however, that each department follow a standard protocol with a fixed time for starting the acquisition (e.g. 30 or 60 minutes after injection) (Varrone 2009; Waxman 2009). The aim of the acquisition is the good contrast between grey and white matter.
A comparator test will not be included because there are currently no standard practice tests available for the diagnosis of dementia. We will compare the index tests with a reference standard.
There are two target conditions in this review:
1. Alzheimer’s disease dementia (conversion from MCI to Alzheimer’s disease dementia);
2. Other forms of dementia (conversion from MCI to other forms of dementia, i.e. vascular dementia and/or dementia with Lewy bodies and /or frontotemporal dementia).
For the purpose of this review, several definitions of Alzheimer’s disease dementia are acceptable. Included studies may apply probable or possible NINCDS-ADRDA criteria (McKhann 1984). The Diagnostic and Statistical manual of Mental Disorders (DSM) (APA 1987; APA 1994) and International Classification of Diseases (ICD) (ICD 10) definitions for Alzheimer’s disease dementia will also be acceptable.
Similarly, differing clinical definitions of other dementias are acceptable. For Lewy Body Dementia the reference standard is the McKeith criteria (McKeith 1996; McKeith 2006). For frontotemporal dementia the reference standards are the Lund criteria (Lund Manchester 1994), Neary 1998, Boxer 2005, DSM-III (APA 1987), DSM-IV (APA 1994), ICD-9 (WHO 2006), ICD-10 (WHO 2010). For vascular dementia the reference standards are the NINDS-ARIEN criteria (Román 1993), DSM-III (APA 1987), DSM-IV (APA 1994), ICD-9 (WHO 2006) and ICD-10 (WHO 2010).
The time scale over which progression from MCI to Alzheimer’s disease dementia or other forms of dementia happen is also important. The minimum period of delay in the verification of the diagnosis (i.e. the time between the assessment at which a diagnosis of MCI is made and the assessment at which the diagnosis of dementia is made) is one year. Where a mean duration is specified, we will exclude the study if the mean minus one standard deviation is less than one year, which will ensure that no more than 16% of participants were followed up for less than one year if the times are normally distributed.
If possible, we will segment analyses into separate follow-up mean periods for the delay in verification: one year to two years; two to four years; and more than four years. In this event we will clearly note where the same included studies contribute to the analysis for more than one reference standard.
Search methods for identification of studies
We will use a variety of information sources to ensure all relevant studies are included. The Trials Search Co-ordinator of the Cochrane Dementia and Cognitive Improvement Group will devise search strategies for electronic database searching.
We will search MEDLINE (OvidSP), EMBASE (OvidSP), Science Citation Index (ISI Web of Knowledge), PsycINFO (OvidSP), BIOSIS previews (ISI Web of Knowledge) and LILACS (Bireme). See Appendix 1 for a proposed draft strategy to be run in MEDLINE (OvidSP). We will design similarly structured search strategies using search terms and syntax appropriate for each database listed above. We will request a search of the Cochrane Register of Diagnostic Test Accuracy Studies (maintained by the Cochrane Renal Group).
There will be no restrictions based on the language of the study reports, and we will use translation services as necessary.
A single review author with extensive experience of systematic reviewing will conduct the initial searches.
Searching other resources
Grey literature: chosen electronic databases will include assessments of conference proceedings.
Handsearching: we will not perform handsearching as there is little published evidence of the benefits of handsearching for reports of diagnostic test accuracy (DTA) studies (Glanville 2010).
Reference lists: we will scan reference lists of all eligible studies and reviews in the field for further possible titles, and will repeat the process until no new titles are found (Greenhalgh 2005).
Correspondence: we will contact research groups who have published or are conducting work on FDG-PET tests for dementia diagnosis.
Data collection and analysis
Selection of studies
One review author will screen all titles and abstracts generated by electronic database searches for relevance.
Two review authors will independently assess the remaining abstracts of selected titles, and will select all potentially eligible studies for full paper review. Two review authors will independently assess full manuscripts against the inclusion criteria. Where necessary, a third arbitrator will resolve disagreements that the two review authors cannot resolve through discussion.
Where a study may include usable data but these are not presented in the published manuscript, we will contact the authors directly to request further information. If the same data set is presented in more than one paper we will include the primary paper.
We will detail the numbers of studies selected at each point, using a PRISMA flow diagram.
Data extraction and management
We will extract the following data on study characteristics:
Bibliographic details of primary paper:
- Author, title of study, year and journal
Basic clinical and demographic details:
- Number of subjects
- MCI clinical criteria
- Referral centre(s)
- Participant recruitment
- Sampling procedures
Details of the index test:
- Method of the
18F-FDG-PET index test administration, including who administered the test
- Thresholds used to define positive and negative tests
- Other technical aspects as seems relevant to the review, e.g. brain areas
Details of the reference standard:
- Definition of Alzheimer's disease dementia and other dementias used in reference standard
- Duration of follow-up from time of index test used to define ADD and other dementias in reference standard: 1 to < 2 years; 2 to < 4 years; and > 4 years; if participants have been followed for varied amounts of time we will record a mean follow-up period for each included studies; if possible, we will group those data into minimum, maximum and median follow-up periods; these may then become the subject of subgroup analyses
- Prevalence or proportion of population developing Alzheimer's disease dementia and other dementias, with severity, if described
The results of the two-by-two tables cross-relating index test results of the reference standards
Table 1: Conversion from MCI to Alzheimer’s disease dementia
Table 2: Conversion from MCI to non-Alzheimer’s disease dementia
Table 3: Conversion from MCI to any forms of dementia
The numbers of lost-to-follow-up
We will also extract data necessary for the assessment of quality as defined below.
In general the data extraction proforma will be piloted against two included papers. Two review authors will extract the data independently. Where necessary, a third arbitrator will resolve disagreements about data extraction that the two review authors cannot resolve through discussion.
Assessment of methodological quality
We will assess the methodological quality of each study using QUADAS-2 (Whiting 2011) as recommended by The Cochrane Collaboration. The tool is made up of four domains: patient selection; index test; reference standard; patient flow. Each domain is assessed in terms of risk of bias, with the first three domains also considered in terms of applicability (Appendix 2). The components of each domain and a rubric which details how judgements concerning risk of bias are made are detailed in Appendix 3. Key areas important to quality assessment are participant selection, blinding and missing data.
We will pilot a QUADAS-2 assessment on two studies. If agreement is poor, we will refine the signalling questions. We will not use QUADAS-2 data to form a summary quality score, but will produce a narrative summary describing numbers of studies that found high/low/unclear risk of bias as well as concerns regarding applicability.
Statistical analysis and data synthesis
We will apply the DTA framework for the analysis of a single test and extract the data from a study into a two-by-two table, showing the binary test results cross-classified with the binary reference standard and ignoring any censoring that might have occurred. We acknowledge that such a reduction in the data may represent a significant oversimplification. We will therefore also adopt an Intention-to-diagnose (ITD) approach. If possible, we will present what the result would be if all drop-outs had or had not developed dementia. We may also need to assume that the proportion of positive and negative test results is the same in the unknown as in the known participants in order to do this.
We will use data from the two-by-two tables abstracted from the included studies (TP, FN, FP, TN) and entered into Review Manager 5 to calculate the sensitivities, specificities and their 95% confidence intervals. We will also present individual study results graphically by plotting estimates of sensitivities and specificities in both a forest plot and a receiver operating characteristic (ROC) space. If more than one threshold is published in primary studies we will report accuracy estimates for all thresholds.
If there are sufficient data we will meta-analyse the pairs of sensitivity and specificity. The preferred approach would be the hierarchical summary ROC curve (HSROC) method proposed by Rutter 2001 and Macaskill 2010, because implicit thresholds are expected in primary studies. We will conduct these analyses in SAS software with support from the UK DTA Support Unit. Particularly if there are common thresholds across included studies, we will also consider the bivariate random-effects approach (Reitsma 2005). When a primary study reports more than one threshold result, we will only select the threshold nearer to the upper left point of the ROC curve for the meta-analysis. We are aware that this data-driven method for threshold selection could lead to an overestimate of diagnostic accuracy (Leeflang 2008). However, there are no accepted thresholds to define positive
We will explore the implications of any credible summary accuracy estimates emerging by considering the numbers of false positives and false negatives in populations with different prevalence of dementia subtypes, and by presenting the results as natural frequencies and using alternative metrics such as likelihood ratios and predictive values.
We will prepare a Summary of Results table.
Investigations of heterogeneity
The framework for the investigation includes the following factors:
- Sociodemographic characteristics. For age, we will examine any studies that include 30% or more patients below the age of 65 years separately.
- Different clinical criteria of MCI: Petersen criteria versus revised Petersen criteria versus CDR = 0.5 criteria versus different MCI classification (Matthews 2008).
- Other characteristics (e.g. ApoE status, Mini-Mental State Examination (MMSE)
- Different referral centres (primary care versus memory clinic versus hospital): Although the
18F-FDG-PET test is carried out only in tertiary care, sources of referrals might differ in this setting. We will investigate the potential influence of different referral centre practices on diagnostic performance of the index test.
- Threshold: if different thresholds used in included studies.
- Technical features (including different versions of the test): time between
18F-FDG injection and PET acquisition less than 30 minutes after FDG injection. 18F-FDG reduction regions: not prespecified (e.g. parietal, temporal, frontal lobes, posterior cingulated, precuneus).
- Image analysis: variety of image analysis techniques.
- Operator characteristics e.g. training.
- Reference standard/s used: DSM definition, ICD definition, NINDS-ARDRA, or other classification, including pathological definitions; and operationalisation of these classifications (e.g. individual clinician/algorithm/consensus group).
- Spectrum of target disorder (Alzheimer’s disease dementia and any other dementia subtypes).
- Types of studies: longitudinal cohort studies or diagnostic nested case-control studies.
- Blinding. Prior clinical information will increase accuracy of the index test.
- Duration of follow-up: If possible, we will segment analyses into separate follow-up mean periods for the delay in verification: one to two years versus two to four years versus more than four years. In this case we will clearly note where the same included studies contribute to the analysis for more than one reference standard. Where a mean of duration is specified, we will exclude the study if the mean minus one standard deviation is less than one year, which will ensure that no more than 16% of participants were followed up for less than one year if the times are normally distributed.
- Loss to follow-up: we will consider separately those studies that have more than 20% drop-outs.
We will investigate heterogeneity in the first instance (informally) through visual examination of forest plots of sensitivities and specificities and through visual examination of the ROC plot of the raw data. Depending on the number of studies available, we will include as many covariates in the regression analyses as possible, up to 10 studies per covariate. We recognise that it is likely that power will be insufficient to allow formal investigation of all possible sources of heterogeneity. However, if we identify further likely sources of heterogeneity, we will investigate these by subgroup analyses and, if data allow, will include them as covariates in the regression model.
If not already explored as part of the investigation of heterogeneity above, we will perform a sensitivity analysis, for example in order to investigate the influence of limiting permitted time between index test and dementia diagnosis on overall diagnostic accuracy of the FDG-PET biomarker.
We will perform a sensitivity analysis with and without the intention-to-diagnose approach.
Assessment of reporting bias
We will not investigate reporting bias because of current uncertainty about how it operates in test accuracy studies and the interpretation of existing analytical tools such as funnel plots.
Appendix 1. Appendix: Search strategy for use with Medline electronic database
1 exp Dementia/
2 Cognition Disorders/
3 Mild Cognitive Impairment/
4 (alzheimer$ or dement$).ti,ab.
5 ((cognit$ or memory or cerebr$ or mental$) adj3 (declin$ or impair$ or los$ or deteriorat$ or degenerat$ or complain$ or disturb$ or disorder$)).ti,ab.
6 (forgetful$ or confused or confusion).ti,ab.
19 "Positron emission tomography".ti,ab.
20 exp Tomography, Emission-Computed/
25 ("18f-fdg" or 18fdg or fdg18).ti,ab.
27 Fluorodeoxyglucose F18/
29 glucose metabol*.ti,ab.
30 cerebral metabolic rate.ti,ab.
31 (CMRgl or rCMRGlu).ti,ab.
33 18 and 23 and 32
34 exp Dementia/di
35 34 AND 32
36 33 OR 35
Appendix 2. Appedix: Assessment of methodological quality table QUADAS-2 tool
Appendix 3. Appendix: Anchoring statements for quality assessment of
18F-FDG-PET biomarker diagnostic studies
Table 1: Review question and inclusion criteria
Anchoring statements for quality assessment of
We provide some core anchoring statements for quality assessment of diagnostic test accuracy review of
During the two-day, multidisciplinary focus group and the piloting/validation of the guidance, it was clear that certain issues were key to assessing quality, while other issues were important to record but less important for assessing overall quality. To assist, we describe a 'weighting' system. Where an item is weighted 'high risk' then that section of the QUADAS-2 results table is likely to be scored as at high risk of bias. For example in dementia diagnostic test accuracy studies, ensuring that clinicians performing dementia assessment are blinded to results of index test is fundamental. If this blinding was not present then the item on reference standard should be scored 'high risk of bias', regardless of the other contributory elements.
In assessing individual items, the score of 'Unclear' should only be given if there is genuine uncertainty. In these situations review authors will contact the relevant study teams for additional information.
Table 2: Anchoring statements to assist with assessment for risk of bias
Contributions of authors
All authors contributed to the drafting of the protocol.
Declarations of interest